Jeff Tang 5 месяцев назад
Родитель
Сommit
05bde7876d

recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_HF_transformers.ipynb → recipes/quickstart/Running_Llama3_Anywhere/Running_Llama_on_HF_transformers.ipynb


+ 2 - 2
recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb

@@ -40,9 +40,9 @@
     "\n",
     "Run `ollama pull llama3:70b` to download the Llama 3 70b chat model, also in the 4-bit quantized format with size 39GB.\n",
     "\n",
-    "Then you can run `ollama run llama3` and ask Llama 3 questions such as \"who wrote the book godfather?\" or \"who wrote the book godfather? answer in one sentence.\" You can also try `ollama run llama3:70b`, but the inference speed will most likely be too slow - for example, on an Apple M1 Pro with 32GB RAM, it takes over 10 seconds to generate one token (vs over 10 tokens per second with Llama 3 7b chat).\n",
+    "Then you can run `ollama run llama3` and ask Llama 3 questions such as \"who wrote the book godfather?\" or \"who wrote the book godfather? answer in one sentence.\" You can also try `ollama run llama3:70b`, but the inference speed will most likely be too slow - for example, on an Apple M1 Pro with 32GB RAM, it takes over 10 seconds to generate one token using Llama 3 70b chat (vs over 10 tokens per second with Llama 3 8b chat).\n",
     "\n",
-    "You can also run the following command to test Llama 3 (7b chat):\n",
+    "You can also run the following command to test Llama 3 8b chat:\n",
     "```\n",
     " curl http://localhost:11434/api/chat -d '{\n",
     "  \"model\": \"llama3\",\n",