1 year ago · c68410cbad
--- a/recipes/quickstart/Running_Llama3_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb
+++ b/recipes/quickstart/Running_Llama3_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb
@@ -40,9 +40,9 @@
 
																     "\n",
															
 
																     "Run `ollama pull llama3:70b` to download the Llama 3 70b chat model, also in the 4-bit quantized format with size 39GB.\n",
															
 
																     "\n",
															
 
																-    "Then you can run `ollama run llama3` and ask Llama 3 questions such as \"who wrote the book godfather?\" or \"who wrote the book godfather? answer in one sentence.\" You can also try `ollama run llama3:70b`, but the inference speed will most likely be too slow - for example, on an Apple M1 Pro with 32GB RAM, it takes over 10 seconds to generate one token (vs over 10 tokens per second with Llama 3 7b chat).\n",
															
 
																+    "Then you can run `ollama run llama3` and ask Llama 3 questions such as \"who wrote the book godfather?\" or \"who wrote the book godfather? answer in one sentence.\" You can also try `ollama run llama3:70b`, but the inference speed will most likely be too slow - for example, on an Apple M1 Pro with 32GB RAM, it takes over 10 seconds to generate one token using Llama 3 70b chat (vs over 10 tokens per second with Llama 3 8b chat).\n",
															
 
																     "\n",
															
 
																-    "You can also run the following command to test Llama 3 (7b chat):\n",
															
 
																+    "You can also run the following command to test Llama 3 8b chat:\n",
															
 
																     "```\n",
															
 
																     " curl http://localhost:11434/api/chat -d '{\n",
															
 
																     "  \"model\": \"llama3\",\n",