|
@@ -4,8 +4,8 @@
|
|
"cell_type": "markdown",
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"source": [
|
|
"source": [
|
|
- "## Running Llama2 on Google Colab using Hugging Face transformers library\n",
|
|
|
|
- "This notebook goes over how you can set up and run Llama2 using Hugging Face transformers library\n",
|
|
|
|
|
|
+ "## Running Meta Llama 3 on Google Colab using Hugging Face transformers library\n",
|
|
|
|
+ "This notebook goes over how you can set up and run Llama 3 using Hugging Face transformers library\n",
|
|
"<a href=\"https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_HF_transformers.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
|
"<a href=\"https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_HF_transformers.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
@@ -14,11 +14,11 @@
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"source": [
|
|
"source": [
|
|
"### Steps at a glance:\n",
|
|
"### Steps at a glance:\n",
|
|
- "This demo showcases how to run the example with already converted Llama 2 weights on [Hugging Face](https://huggingface.co/meta-llama). Please Note: To use the downloads on Hugging Face, you must first request a download as shown in the steps below making sure that you are using the same email address as your Hugging Face account.\n",
|
|
|
|
|
|
+ "This demo showcases how to run the example with already converted Llama 3 weights on [Hugging Face](https://huggingface.co/meta-llama). Please Note: To use the downloads on Hugging Face, you must first request a download as shown in the steps below making sure that you are using the same email address as your Hugging Face account.\n",
|
|
"\n",
|
|
"\n",
|
|
"To use already converted weights, start here:\n",
|
|
"To use already converted weights, start here:\n",
|
|
"1. Request download of model weights from the Llama website\n",
|
|
"1. Request download of model weights from the Llama website\n",
|
|
- "2. Prepare the script\n",
|
|
|
|
|
|
+ "2. Login to Hugging Face from your terminal using the same email address as (1). Follow the instructions [here](https://huggingface.co/docs/huggingface_hub/en/quick-start). \n",
|
|
"3. Run the example\n",
|
|
"3. Run the example\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
@@ -45,7 +45,7 @@
|
|
"Request download of model weights from the Llama website\n",
|
|
"Request download of model weights from the Llama website\n",
|
|
"Before you can run the model locally, you will need to get the model weights. To get the model weights, visit the [Llama website](https://llama.meta.com/) and click on “download models”. \n",
|
|
"Before you can run the model locally, you will need to get the model weights. To get the model weights, visit the [Llama website](https://llama.meta.com/) and click on “download models”. \n",
|
|
"\n",
|
|
"\n",
|
|
- "Fill the required information, select the models “Llama 2 & Llama Chat” and accept the terms & conditions. You will receive a URL in your email in a short time."
|
|
|
|
|
|
+ "Fill the required information, select the models “Meta Llama 3” and accept the terms & conditions. You will receive a URL in your email in a short time."
|
|
]
|
|
]
|
|
},
|
|
},
|
|
{
|
|
{
|
|
@@ -79,7 +79,7 @@
|
|
},
|
|
},
|
|
{
|
|
{
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
- "execution_count": null,
|
|
|
|
|
|
+ "execution_count": 2,
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"outputs": [],
|
|
"source": [
|
|
"source": [
|
|
@@ -92,7 +92,31 @@
|
|
"cell_type": "markdown",
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"source": [
|
|
"source": [
|
|
- "Then, we will set the model variable to a specific model we’d like to use. In this demo, we will use the 7b chat model `meta-llama/Llama-2-7b-chat-hf`."
|
|
|
|
|
|
+ "Then, we will set the model variable to a specific model we’d like to use. In this demo, we will use the 8b chat model `meta-llama/Meta-Llama-3-8B-Instruct`. Using Meta models from Hugging Face requires you to\n",
|
|
|
|
+ "\n",
|
|
|
|
+ "1. Accept Terms of Service for Meta Llama 3 on Meta [website](https://llama.meta.com/llama-downloads).\n",
|
|
|
|
+ "2. Use the same email address from Step (1) to login into Hugging Face.\n",
|
|
|
|
+ "\n",
|
|
|
|
+ "Follow the instructions on this Hugging Face page to login from your [terminal](https://huggingface.co/docs/huggingface_hub/en/quick-start). "
|
|
|
|
+ ]
|
|
|
|
+ },
|
|
|
|
+ {
|
|
|
|
+ "cell_type": "code",
|
|
|
|
+ "execution_count": null,
|
|
|
|
+ "metadata": {},
|
|
|
|
+ "outputs": [],
|
|
|
|
+ "source": [
|
|
|
|
+ "pip install --upgrade huggingface_hub"
|
|
|
|
+ ]
|
|
|
|
+ },
|
|
|
|
+ {
|
|
|
|
+ "cell_type": "code",
|
|
|
|
+ "execution_count": null,
|
|
|
|
+ "metadata": {},
|
|
|
|
+ "outputs": [],
|
|
|
|
+ "source": [
|
|
|
|
+ "from huggingface_hub import login\n",
|
|
|
|
+ "login()"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
{
|
|
{
|
|
@@ -101,7 +125,7 @@
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"outputs": [],
|
|
"source": [
|
|
"source": [
|
|
- "model = \"meta-llama/Llama-2-7b-chat-hf\"\n",
|
|
|
|
|
|
+ "model = \"meta-llama/Meta-Llama-3-8B-Instruct\"\n",
|
|
"tokenizer = AutoTokenizer.from_pretrained(model)"
|
|
"tokenizer = AutoTokenizer.from_pretrained(model)"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
@@ -174,7 +198,7 @@
|
|
"Request download of model weights from the Llama website\n",
|
|
"Request download of model weights from the Llama website\n",
|
|
"Before you can run the model locally, you will need to get the model weights. To get the model weights, visit the [Llama website](https://llama.meta.com/) and click on “download models”. \n",
|
|
"Before you can run the model locally, you will need to get the model weights. To get the model weights, visit the [Llama website](https://llama.meta.com/) and click on “download models”. \n",
|
|
"\n",
|
|
"\n",
|
|
- "Fill the required information, select the models “Llama 2 & Llama Chat” and accept the terms & conditions. You will receive a URL in your email in a short time.\n"
|
|
|
|
|
|
+ "Fill the required information, select the models \"Meta Llama 3\" and accept the terms & conditions. You will receive a URL in your email in a short time."
|
|
]
|
|
]
|
|
},
|
|
},
|
|
{
|
|
{
|
|
@@ -182,25 +206,24 @@
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"source": [
|
|
"source": [
|
|
"#### 2. Clone the llama repo and get the weights\n",
|
|
"#### 2. Clone the llama repo and get the weights\n",
|
|
- "Git clone the [Llama repo](https://github.com/facebookresearch/llama.git). Enter the URL and get 7B-chat weights. This will download the tokenizer.model, and a directory llama-2-7b-chat with the weights in it.\n",
|
|
|
|
|
|
+ "Git clone the [Meta Llama 3 repo](https://github.com/meta-llama/llama3). Run the `download.sh` script and follow the instructions. This will download the model checkpoints and tokenizer.\n",
|
|
"\n",
|
|
"\n",
|
|
- "This example demonstrates a llama2 model with 7B-chat parameters, but the steps we follow would be similar for other llama models, as well as for other parameter models.\n",
|
|
|
|
- "\n"
|
|
|
|
|
|
+ "This example demonstrates a Meta Llama 3 model with 8B-instruct parameters, but the steps we follow would be similar for other llama models, as well as for other parameter models."
|
|
]
|
|
]
|
|
},
|
|
},
|
|
{
|
|
{
|
|
"cell_type": "markdown",
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"source": [
|
|
"source": [
|
|
- "#### 3. Convert the model weights\n",
|
|
|
|
- "\n",
|
|
|
|
- "* Create a link to the tokenizer:\n",
|
|
|
|
- "Run `ln -h ./tokenizer.model ./llama-2-7b-chat/tokenizer.model` \n",
|
|
|
|
|
|
+ "#### 3. Convert the model weights using Hugging Face transformer from source\n",
|
|
"\n",
|
|
"\n",
|
|
- "\n",
|
|
|
|
- "* Convert the model weights to run with Hugging Face:``TRANSFORM=`python -c \"import transformers;print('/'.join(transformers.__file__.split('/')[:-1])+'/models/llama/convert_llama_weights_to_hf.py')\"``\n",
|
|
|
|
- "\n",
|
|
|
|
- "* Then run: `pip install protobuf && python $TRANSFORM --input_dir ./llama-2-7b-chat --model_size 7B --output_dir ./llama-2-7b-chat-hf`\n"
|
|
|
|
|
|
+ "* `python3 -m venv hf-convertor`\n",
|
|
|
|
+ "* `source hf-convertor/bin/activate`\n",
|
|
|
|
+ "* `git clone https://github.com/huggingface/transformers.git`\n",
|
|
|
|
+ "* `cd transformers`\n",
|
|
|
|
+ "* `pip install -e .`\n",
|
|
|
|
+ "* `pip install torch tiktoken blobfile accelerate`\n",
|
|
|
|
+ "* `python3 src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir ${path_to_meta_downloaded_model} --output_dir ${path_to_save_converted_hf_model} --model_size 8B --llama_version 3`"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
{
|
|
{
|
|
@@ -210,10 +233,9 @@
|
|
"\n",
|
|
"\n",
|
|
"#### 4. Prepare the script\n",
|
|
"#### 4. Prepare the script\n",
|
|
"Import the following necessary modules in your script: \n",
|
|
"Import the following necessary modules in your script: \n",
|
|
- "* `LlamaForCausalLM` is the Llama 2 model class\n",
|
|
|
|
- "* `LlamaTokenizer` prepares your prompt for the model to process\n",
|
|
|
|
- "* `pipeline` is an abstraction to generate model outputs\n",
|
|
|
|
- "* `torch` allows us to use PyTorch and specify the datatype we’d like to use."
|
|
|
|
|
|
+ "* `AutoModel` is the Llama 2 model class\n",
|
|
|
|
+ "* `AutoTokenizer` prepares your prompt for the model to process\n",
|
|
|
|
+ "* `pipeline` is an abstraction to generate model outputs"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
{
|
|
{
|
|
@@ -224,13 +246,16 @@
|
|
"source": [
|
|
"source": [
|
|
"import torch\n",
|
|
"import torch\n",
|
|
"import transformers\n",
|
|
"import transformers\n",
|
|
- "from transformers import LlamaForCausalLM, LlamaTokenizer\n",
|
|
|
|
- "\n",
|
|
|
|
|
|
+ "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
|
|
"\n",
|
|
"\n",
|
|
- "model_dir = \"./llama-2-7b-chat-hf\"\n",
|
|
|
|
- "model = LlamaForCausalLM.from_pretrained(model_dir)\n",
|
|
|
|
"\n",
|
|
"\n",
|
|
- "tokenizer = LlamaTokenizer.from_pretrained(model_dir)\n"
|
|
|
|
|
|
+ "model_dir = \"/home/ubuntu/release/Meta-Llama-3-8B-Instruct-HF\"\n",
|
|
|
|
+ "model = AutoModelForCausalLM.from_pretrained(\n",
|
|
|
|
+ " model_dir,\n",
|
|
|
|
+ " device_map=\"auto\",\n",
|
|
|
|
+ " )\n",
|
|
|
|
+ "# model = LlamaForCausalLM.from_pretrained(model_dir)\n",
|
|
|
|
+ "tokenizer = AutoTokenizer.from_pretrained(model_dir)\n"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
{
|
|
{
|
|
@@ -242,7 +267,7 @@
|
|
},
|
|
},
|
|
{
|
|
{
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
- "execution_count": null,
|
|
|
|
|
|
+ "execution_count": 2,
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"outputs": [],
|
|
"source": [
|
|
"source": [
|
|
@@ -272,9 +297,18 @@
|
|
},
|
|
},
|
|
{
|
|
{
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
- "execution_count": null,
|
|
|
|
|
|
+ "execution_count": 3,
|
|
"metadata": {},
|
|
"metadata": {},
|
|
- "outputs": [],
|
|
|
|
|
|
+ "outputs": [
|
|
|
|
+ {
|
|
|
|
+ "name": "stderr",
|
|
|
|
+ "output_type": "stream",
|
|
|
|
+ "text": [
|
|
|
|
+ "Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.\n",
|
|
|
|
+ "Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.\n"
|
|
|
|
+ ]
|
|
|
|
+ }
|
|
|
|
+ ],
|
|
"source": [
|
|
"source": [
|
|
"sequences = pipeline(\n",
|
|
"sequences = pipeline(\n",
|
|
" 'I have tomatoes, basil and cheese at home. What can I cook for dinner?\\n',\n",
|
|
" 'I have tomatoes, basil and cheese at home. What can I cook for dinner?\\n',\n",
|
|
@@ -296,8 +330,16 @@
|
|
"name": "python3"
|
|
"name": "python3"
|
|
},
|
|
},
|
|
"language_info": {
|
|
"language_info": {
|
|
|
|
+ "codemirror_mode": {
|
|
|
|
+ "name": "ipython",
|
|
|
|
+ "version": 3
|
|
|
|
+ },
|
|
|
|
+ "file_extension": ".py",
|
|
|
|
+ "mimetype": "text/x-python",
|
|
"name": "python",
|
|
"name": "python",
|
|
- "version": "3.8.3"
|
|
|
|
|
|
+ "nbconvert_exporter": "python",
|
|
|
|
+ "pygments_lexer": "ipython3",
|
|
|
|
+ "version": "3.8.10"
|
|
}
|
|
}
|
|
},
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat": 4,
|