HuggingFace
Install huggingface-cli
pip install -U "huggingface_hub[cli]"
Login with token
huggingface-cli login --token <your_token>
vLLM
Install
pip install vllm
可能会有 cuda/pytorch 兼容性问题,可以先安装 pytorch (例如 cuda 版本为 12.2)
# https://pytorch.org/get-started/previous-versions/
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
install vllm with existing pytorch
git clone https://github.com/vllm-project/vllm.git
cd vllm
python use_existing_torch.py
pip install -r requirements-build.txt
pip install -e . --no-build-isolation
Serving
Download model
huggingface-cli download --resume-download Qwen/Qwen2.5-1.5B --local-dir /path/to/your/directory/Qwen/Qwen2.5-1.5B
Serve downloaded model
vllm serve /path/to/your/directory/Qwen/Qwen2.5-1.5B --served-model-name Qwen2.5-1.5B
# test
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen2.5-1.5B",
"prompt": "San Francisco is a",
"max_tokens": 7,
"temperature": 0
}'