导入需要的库
打开pycharm的终端(是终端不是python程序),下载下面的库
pip install torch transformers datasets peft accelerate sentencepiece modelscope
pip install modelscope
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu126
这里的第二行下载torch,如果你的独显cuda不匹配torch可以去官网查看你的显存是那个cuda版本适配哪个pytorch,如果没有独显可能不用在意这个问题
下载模型
建一个python程序,复制下面代码,下载模型
from modelscope.hub.snapshot_download import snapshot_download # 自定义下载路径(可以是任意你有读写权限的目录)
model_dir = snapshot_download( 'Qwen/Qwen3-0.6B', revision='master', cache_dir='./models' # ← 自定义路径!
) print("模型保存路径:", model_dir) #记住这里的路径,后面路径要用到
记住下载模型的路径
运行语句
再新建一个py文件,记得修改model_path为你电脑上模型的路径
from modelscope import AutoModelForCausalLM, AutoTokenizer
import torch #model_name = "Qwen/Qwen3-0.6B"
model_path = "./models/qwen/Qwen3-0___6B" # ← 修改成你电脑上的实际路径! # 1. 加载 tokenizer 和 模型
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained( model_path, device_map="auto", # 自动分配 GPU/CPU dtype=torch.bfloat16, # 减少显存占用 trust_remote_code=True
) # prepare the model input
prompt = "你好,请介绍一下你自己"
messages = [ {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=False # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion
generated_ids = model.generate( **model_inputs, max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content
try: # rindex finding 151668 (</think>) index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError: index = 0 thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n") print("thinking content:", thinking_content)
print("content:", content)
看到代码有输出就说明部署成功
![[Pasted image 20250923210828.png]]