1 year ago · c9051f2894
--- a/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_HF_transformers.ipynb
+++ b/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_HF_transformers.ipynb
@@ -4,8 +4,8 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "## Running Llama2 on Google Colab using Hugging Face transformers library\n",
			
 
				-    "This notebook goes over how you can set up and run Llama2 using Hugging Face transformers library\n",
			
 
				+    "## Running Meta Llama 3 on Google Colab using Hugging Face transformers library\n",
			
 
				+    "This notebook goes over how you can set up and run Llama 3 using Hugging Face transformers library\n",
			
 
				     "<a href=\"https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_HF_transformers.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
			
 
				    ]
			
 
				   },
			
@@ -14,11 +14,11 @@
 
				    "metadata": {},
			
 
				    "source": [
			
 
				     "### Steps at a glance:\n",
			
 
				-    "This demo showcases how to run the example with already converted Llama 2 weights on [Hugging Face](https://huggingface.co/meta-llama). Please Note: To use the downloads on Hugging Face, you must first request a download as shown in the steps below making sure that you are using the same email address as your Hugging Face account.\n",
			
 
				+    "This demo showcases how to run the example with already converted Llama 3 weights on [Hugging Face](https://huggingface.co/meta-llama). Please Note: To use the downloads on Hugging Face, you must first request a download as shown in the steps below making sure that you are using the same email address as your Hugging Face account.\n",
			
 
				     "\n",
			
 
				     "To use already converted weights, start here:\n",
			
 
				     "1. Request download of model weights from the Llama website\n",
			
 
				-    "2. Prepare the script\n",
			
 
				+    "2. Login to Hugging Face from your terminal using the same email address as (1). Follow the instructions [here](https://huggingface.co/docs/huggingface_hub/en/quick-start). \n",
			
 
				     "3. Run the example\n",
			
 
				     "\n",
			
 
				     "\n",
			
@@ -45,7 +45,7 @@
 
				     "Request download of model weights from the Llama website\n",
			
 
				     "Before you can run the model locally, you will need to get the model weights. To get the model weights, visit the [Llama website](https://llama.meta.com/) and click on “download models”. \n",
			
 
				     "\n",
			
 
				-    "Fill  the required information, select the models “Llama 2 & Llama Chat” and accept the terms & conditions. You will receive a URL in your email in a short time."
			
 
				+    "Fill  the required information, select the models “Meta Llama 3” and accept the terms & conditions. You will receive a URL in your email in a short time."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -79,7 +79,7 @@
 
				   },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				+   "execution_count": 2,
			
 
				    "metadata": {},
			
 
				    "outputs": [],
			
 
				    "source": [
			
@@ -92,7 +92,31 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "Then, we will set the model variable to a specific model we’d like to use. In this demo, we will use the 7b chat model `meta-llama/Llama-2-7b-chat-hf`."
			
 
				+    "Then, we will set the model variable to a specific model we’d like to use. In this demo, we will use the 8b chat model `meta-llama/Meta-Llama-3-8B-Instruct`. Using Meta models from Hugging Face requires you to\n",
			
 
				+    "\n",
			
 
				+    "1. Accept Terms of Service for Meta Llama 3 on Meta [website](https://llama.meta.com/llama-downloads).\n",
			
 
				+    "2. Use the same email address from Step (1) to login into Hugging Face.\n",
			
 
				+    "\n",
			
 
				+    "Follow the instructions on this Hugging Face page to login from your [terminal](https://huggingface.co/docs/huggingface_hub/en/quick-start). "
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "pip install --upgrade huggingface_hub"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from huggingface_hub import login\n",
			
 
				+    "login()"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -101,7 +125,7 @@
 
				    "metadata": {},
			
 
				    "outputs": [],
			
 
				    "source": [
			
 
				-    "model = \"meta-llama/Llama-2-7b-chat-hf\"\n",
			
 
				+    "model = \"meta-llama/Meta-Llama-3-8B-Instruct\"\n",
			
 
				     "tokenizer = AutoTokenizer.from_pretrained(model)"
			
 
				    ]
			
 
				   },
			
@@ -174,7 +198,7 @@
 
				     "Request download of model weights from the Llama website\n",
			
 
				     "Before you can run the model locally, you will need to get the model weights. To get the model weights, visit the [Llama website](https://llama.meta.com/) and click on “download models”. \n",
			
 
				     "\n",
			
 
				-    "Fill  the required information, select the models “Llama 2 & Llama Chat” and accept the terms & conditions. You will receive a URL in your email in a short time.\n"
			
 
				+    "Fill  the required information, select the models \"Meta Llama 3\" and accept the terms & conditions. You will receive a URL in your email in a short time."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -182,25 +206,24 @@
 
				    "metadata": {},
			
 
				    "source": [
			
 
				     "#### 2. Clone the llama repo and get the weights\n",
			
 
				-    "Git clone the [Llama repo](https://github.com/facebookresearch/llama.git). Enter the URL and get 7B-chat weights. This will download the tokenizer.model, and a directory llama-2-7b-chat with the weights in it.\n",
			
 
				+    "Git clone the [Meta Llama 3 repo](https://github.com/meta-llama/llama3). Run the `download.sh` script and follow the instructions. This will download the model checkpoints and tokenizer.\n",
			
 
				     "\n",
			
 
				-    "This example demonstrates a llama2 model with 7B-chat parameters, but the steps we follow would be similar for other llama models, as well as for other parameter models.\n",
			
 
				-    "\n"
			
 
				+    "This example demonstrates a Meta Llama 3 model with 8B-instruct parameters, but the steps we follow would be similar for other llama models, as well as for other parameter models."
			
 
				    ]
			
 
				   },
			
 
				   {
			
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "#### 3. Convert the model weights\n",
			
 
				-    "\n",
			
 
				-    "* Create a link to the tokenizer:\n",
			
 
				-    "Run `ln -h ./tokenizer.model ./llama-2-7b-chat/tokenizer.model`  \n",
			
 
				+    "#### 3. Convert the model weights using Hugging Face transformer from source\n",
			
 
				     "\n",
			
 
				-    "\n",
			
 
				-    "* Convert the model weights to run with Hugging Face:``TRANSFORM=`python -c \"import transformers;print('/'.join(transformers.__file__.split('/')[:-1])+'/models/llama/convert_llama_weights_to_hf.py')\"``\n",
			
 
				-    "\n",
			
 
				-    "* Then run: `pip install protobuf && python $TRANSFORM --input_dir ./llama-2-7b-chat --model_size 7B --output_dir ./llama-2-7b-chat-hf`\n"
			
 
				+    "* `python3 -m venv hf-convertor`\n",
			
 
				+    "* `source hf-convertor/bin/activate`\n",
			
 
				+    "* `git clone https://github.com/huggingface/transformers.git`\n",
			
 
				+    "* `cd transformers`\n",
			
 
				+    "* `pip install -e .`\n",
			
 
				+    "* `pip install torch tiktoken blobfile accelerate`\n",
			
 
				+    "* `python3 src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir ${path_to_meta_downloaded_model} --output_dir ${path_to_save_converted_hf_model} --model_size 8B --llama_version 3`"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -210,10 +233,9 @@
 
				     "\n",
			
 
				     "#### 4. Prepare the script\n",
			
 
				     "Import the following necessary modules in your script: \n",
			
 
				-    "* `LlamaForCausalLM` is the Llama 2 model class\n",
			
 
				-    "* `LlamaTokenizer` prepares your prompt for the model to process\n",
			
 
				-    "* `pipeline` is an abstraction to generate model outputs\n",
			
 
				-    "* `torch` allows us to use PyTorch and specify the datatype we’d like to use."
			
 
				+    "* `AutoModel` is the Llama 2 model class\n",
			
 
				+    "* `AutoTokenizer` prepares your prompt for the model to process\n",
			
 
				+    "* `pipeline` is an abstraction to generate model outputs"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -224,13 +246,16 @@
 
				    "source": [
			
 
				     "import torch\n",
			
 
				     "import transformers\n",
			
 
				-    "from transformers import LlamaForCausalLM, LlamaTokenizer\n",
			
 
				-    "\n",
			
 
				+    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
			
 
				     "\n",
			
 
				-    "model_dir = \"./llama-2-7b-chat-hf\"\n",
			
 
				-    "model = LlamaForCausalLM.from_pretrained(model_dir)\n",
			
 
				     "\n",
			
 
				-    "tokenizer = LlamaTokenizer.from_pretrained(model_dir)\n"
			
 
				+    "model_dir = \"/home/ubuntu/release/Meta-Llama-3-8B-Instruct-HF\"\n",
			
 
				+    "model = AutoModelForCausalLM.from_pretrained(\n",
			
 
				+    "        model_dir,\n",
			
 
				+    "        device_map=\"auto\",\n",
			
 
				+    "    )\n",
			
 
				+    "# model = LlamaForCausalLM.from_pretrained(model_dir)\n",
			
 
				+    "tokenizer = AutoTokenizer.from_pretrained(model_dir)\n"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -242,7 +267,7 @@
 
				   },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				+   "execution_count": 2,
			
 
				    "metadata": {},
			
 
				    "outputs": [],
			
 
				    "source": [
			
@@ -272,9 +297,18 @@
 
				   },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				+   "execution_count": 3,
			
 
				    "metadata": {},
			
 
				-   "outputs": [],
			
 
				+   "outputs": [
			
 
				+    {
			
 
				+     "name": "stderr",
			
 
				+     "output_type": "stream",
			
 
				+     "text": [
			
 
				+      "Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.\n",
			
 
				+      "Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.\n"
			
 
				+     ]
			
 
				+    }
			
 
				+   ],
			
 
				    "source": [
			
 
				     "sequences = pipeline(\n",
			
 
				     "    'I have tomatoes, basil and cheese at home. What can I cook for dinner?\\n',\n",
			
@@ -296,8 +330,16 @@
 
				    "name": "python3"
			
 
				   },
			
 
				   "language_info": {
			
 
				+   "codemirror_mode": {
			
 
				+    "name": "ipython",
			
 
				+    "version": 3
			
 
				+   },
			
 
				+   "file_extension": ".py",
			
 
				+   "mimetype": "text/x-python",
			
 
				    "name": "python",
			
 
				-   "version": "3.8.3"
			
 
				+   "nbconvert_exporter": "python",
			
 
				+   "pygments_lexer": "ipython3",
			
 
				+   "version": "3.8.10"
			
 
				   }
			
 
				  },
			
 
				  "nbformat": 4,