本地/服务器部署大语言模型

本博客以Llama3.2 的1B-Instruct版本为例，在远程服务器上部署模型（和在本地部署类似）。服务器为X86 Ubuntu系统

首先安装必要的环境如torch和transformers等

Llama3版本的模型权重和分词器需要申请，因此首先去HuggingFace的meta官方处申请使用模型，也可以直接下载第三方模型；
申请通过后，在“Files and versions”中下载模型文件.safetensor和配置文件.json到本地，这一步也可以直接通过git或代码中访问远程仓库下载，但需要远程服务器附魔；
文件下载完成后，本地打包上传至服务器目录/home/user/.cache/huggingface/hub/下；
写python脚本：

model_path = "/home/user/.cache/huggingface/hub/Llama3/"
    pipe = pipeline(
        "text-generation",
        model=model_path,
        torch_dtype=torch.bfloat16,
        device_map="auto",
    )

while True:
    prompt = input("Please input:")
    messages = [{"role": "system", "content": "You are a linguist who is good at English, now help me to fix the words in the sentence."},{"role": "user", "content": "{}".format(prompt)},]

    outputs = pipe(
        messages,
        max_new_tokens=256,
    )
    print(outputs[0]["generated_text"][-1]['content'])

运行脚本即可循环调用模型，实现和大模型的对话