|
@@ -40,9 +40,9 @@
|
|
"\n",
|
|
"\n",
|
|
"Run `ollama pull llama3:70b` to download the Llama 3 70b chat model, also in the 4-bit quantized format with size 39GB.\n",
|
|
"Run `ollama pull llama3:70b` to download the Llama 3 70b chat model, also in the 4-bit quantized format with size 39GB.\n",
|
|
"\n",
|
|
"\n",
|
|
- "Then you can run `ollama run llama3` and ask Llama 3 questions such as \"who wrote the book godfather?\" or \"who wrote the book godfather? answer in one sentence.\" You can also try `ollama run llama3:70b`, but the inference speed will most likely be too slow - for example, on an Apple M1 Pro with 32GB RAM, it takes over 10 seconds to generate one token (vs over 10 tokens per second with Llama 3 7b chat).\n",
|
|
|
|
|
|
+ "Then you can run `ollama run llama3` and ask Llama 3 questions such as \"who wrote the book godfather?\" or \"who wrote the book godfather? answer in one sentence.\" You can also try `ollama run llama3:70b`, but the inference speed will most likely be too slow - for example, on an Apple M1 Pro with 32GB RAM, it takes over 10 seconds to generate one token using Llama 3 70b chat (vs over 10 tokens per second with Llama 3 8b chat).\n",
|
|
"\n",
|
|
"\n",
|
|
- "You can also run the following command to test Llama 3 (7b chat):\n",
|
|
|
|
|
|
+ "You can also run the following command to test Llama 3 8b chat:\n",
|
|
"```\n",
|
|
"```\n",
|
|
" curl http://localhost:11434/api/chat -d '{\n",
|
|
" curl http://localhost:11434/api/chat -d '{\n",
|
|
" \"model\": \"llama3\",\n",
|
|
" \"model\": \"llama3\",\n",
|