Przeglądaj źródła

llama-on-prem.md update on the HG access token

Jeff Tang 1 rok temu
rodzic
commit
dd84ee36ae
1 zmienionych plików z 8 dodań i 6 usunięć
  1. 8 6
      demo_apps/llama-on-prem.md

+ 8 - 6
demo_apps/llama-on-prem.md

@@ -6,6 +6,10 @@ We'll use the Amazon EC2 instance running Ubuntu with an A10G 24GB GPU as an exa
 
 The Colab notebook to connect via LangChain with Llama 2 hosted as the vLLM and TGI API services is [here](https://colab.research.google.com/drive/1rYWLdgTGIU1yCHmRpAOB2D-84fPzmOJg?usp=sharing), also shown in the sections below.
 
+This tutorial assumes that you you have been granted access to the Meta Llama 2 on Hugging Face - you can open a Hugging Face Meta model page [here](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) to confirm that you see "Gated model You have been granted access to this model"; if you don't see the "granted access" message, simply follow the instructions under "Access Llama 2 on Hugging Face" in the page. 
+
+You'll also need your Hugging Face access token which you can get at your Settings page [here](https://huggingface.co/settings/tokens).
+
 ## Setting up vLLM with Llama 2
 
 On a terminal, run the following commands:
@@ -18,6 +22,8 @@ git clone https://github.com/vllm-project/vllm
 cd vllm/vllm/entrypoints/
 ```
 
+Then run `huggingface-cli login` and copy and paste your Hugging Face access token to complete the login.
+
 There are two ways to deploy Llama 2 via vLLM, as a general API server or an OpenAI-compatible server.
 
 ### Deploying Llama 2 as an API Server
@@ -111,9 +117,7 @@ You can now use the Llama 2 instance `llm` created this way in any of the [Llama
 
 ## Setting Up TGI with Llama 2
 
-The easiest way to deploy Llama 2 with TGI is using TGI's official docker image. First, make sure you have been granted access to the Meta Llama 2 on Hugging Face by opening the Hugging Face Meta model page [here](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) and confirming you see "Gated model You have been granted access to this model". If you don't see the "granted access" message, simply follow the instructions under "Access Llama 2 on Hugging Face" in the page.
-
-Then copy your Hugging Face access token, which you can create for free at your [tokens page](https://huggingface.co/settings/tokens) and set it as the value of one of the three required shell variables:
+The easiest way to deploy Llama 2 with TGI is using TGI's official docker image. First, replace `<your Hugging Face access token>` and set the three required shell variables (you may replace the `model` value above with another Llama 2 model):
 
 ```
 model=meta-llama/Llama-2-13b-chat-hf
@@ -121,9 +125,7 @@ volume=$PWD/data
 token=<your Hugging Face access token>
 ```
 
-You may replace the `model` value above with another Llama 2 model.
-
-Finally, run the command below to deploy a quantized version of the Llama 2 13b-chat model with TGI:
+Then run the command below to deploy a quantized version of the Llama 2 13b-chat model with TGI:
 
 ```
 docker run --gpus all --shm-size 1g -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.2 --model-id $model  --quantize bitsandbytes-nf4