Преглед изворни кода

updated Running_Llama_on_Mac for Llama 3 and include Mac, Windows and Linux

Jeff Tang пре 6 месеци
родитељ
комит
ae4578cba6

+ 0 - 219
recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_Mac.ipynb

@@ -1,219 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Running Llama2 on Mac\n",
-    "This notebook goes over how you can set up and run Llama2 locally on a Mac using llama-cpp-python and the llama-cpp's quantized Llama2 model. It also goes over how to use LangChain to ask Llama general questions"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Steps at a glance:\n",
-    "1. Use CMAKE and install required packages\n",
-    "2. Request download of model weights from the Llama website\n",
-    "3. Clone the llama repo and get the weights\n",
-    "4. Clone the llamacpp repo and quantize the model\n",
-    "5. Prepare the script\n",
-    "6. Run the example\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<br>\n",
-    "\n",
-    "#### 1. Use CMAKE and install required packages\n",
-    "\n",
-    "Type the following command:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "#CMAKE_ARGS=\"-DLLAMA_METAL=on\" FORCE_CMAKE=1: sets the appropriate build configuration options for the llama-cpp-python package \n",
-    "#and enables the use of Metal in Mac and forces the use of CMake as the build system.\n",
-    "!CMAKE_ARGS=\"-DLLAMA_METAL=on\" FORCE_CMAKE=1 pip install llama-cpp-python\n",
-    "\n",
-    "#pip install llama-cpp-python: installs the llama-cpp-python package and its dependencies:\n",
-    "!pip install pypdf sentence-transformers chromadb langchain"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "If running without a Jupyter notebook, use the command without the `!`"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "A brief look at the installed libraries:\n",
-    "- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) a simple Python bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) library\n",
-    "- pypdf gives us the ability to work with pdfs\n",
-    "- sentence-transformers for text embeddings\n",
-    "- chromadb gives us database capabilities \n",
-    "- langchain provides necessary RAG tools for this demo"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<br>\n",
-    "\n",
-    "#### 2. Request download of model weights from the Llama website\n",
-    "Before you can run the model locally, you will need to get the model weights. To get the model weights, visit the [Llama website](https://llama.meta.com/) and click on “download models”. \n",
-    "Fill  the required information, select the models “Llama 2 & Llama Chat” and accept the terms & conditions. You will receive a URL in your email in a short time.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<br>\n",
-    "\n",
-    "#### 3. Clone the llama repo and get the weights\n",
-    "Git clone the [Llama repo](https://github.com/facebookresearch/llama.git). Enter the URL and get 13B weights. This example demonstrates a llama2 model with 13B parameters, but the steps we follow would be similar for other llama models, as well as for other parameter models.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<br>\n",
-    "\n",
-    "#### 4. Clone the llamacpp repo and quantize the model\n",
-    "* Git clone the [Llamacpp repo](https://github.com/ggerganov/llama.cpp). \n",
-    "* Enter the repo:\n",
-    "`cd llama.cpp`\n",
-    "* Install requirements:\n",
-    "`python3 -m pip install -r requirements.txt`\n",
-    "* Convert the weights:\n",
-    "`python convert.py <path_to_your_downloaded_llama-2-13b_model>`\n",
-    "* Run make to generate the 'quantize' method that we will use in the next step\n",
-    "`make`\n",
-    "* Quantize the weights:\n",
-    "`./quantize <path_to_your_downloaded_llama-2-13b_model>/ggml-model-f16.gguf <path_to_your_downloaded_llama-2-13b_model>/ggml-model-q4_0.gguf q4_0`"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "\n",
-    "#### 5. Prepare the script\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# mentions the instance of the Llama model that we will use\n",
-    "from langchain.llms import LlamaCpp\n",
-    "\n",
-    "# defines a chain of operations that can be performed on text input to generate the output using the LLM\n",
-    "from langchain.chains import LLMChain\n",
-    "\n",
-    "# manages callbacks that are triggered at various stages during the execution of an LLMChain\n",
-    "from langchain.callbacks.manager import CallbackManager\n",
-    "\n",
-    "# defines a callback that streams the output of the LLMChain to the console in real-time as it gets generated\n",
-    "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
-    "\n",
-    "# allows to define prompt templates that can be used to generate custom inputs for the LLM\n",
-    "from langchain.prompts import PromptTemplate\n",
-    "\n",
-    "\n",
-    "# Initialize the langchain CallBackManager. This handles callbacks from Langchain and for this example we will use \n",
-    "# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question\n",
-    "callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])\n",
-    "\n",
-    "# Set up the model\n",
-    "llm = LlamaCpp(\n",
-    "    model_path=\"<path-to-llama-gguf-file>\",\n",
-    "    temperature=0.0,\n",
-    "    top_p=1,\n",
-    "    n_ctx=6000,\n",
-    "    callback_manager=callback_manager, \n",
-    "    verbose=True,\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### 6. Run the example\n",
-    "\n",
-    "With the model set up, you are now ready to ask some questions. \n",
-    "\n",
-    "Here is an example of the simplest way to ask the model some general questions."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Run the example\n",
-    "question = \"who wrote the book Pride and Prejudice?\"\n",
-    "answer = llm(question)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Alternatively, you can use LangChain's `PromptTemplate` for some flexibility in your prompts and questions. For more information on LangChain's prompt template visit this [link](https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "prompt = PromptTemplate.from_template(\n",
-    "    \"who wrote {book}?\"\n",
-    ")\n",
-    "chain = LLMChain(llm=llm, prompt=prompt)\n",
-    "answer = chain.run(\"A tale of two cities\")"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

+ 166 - 0
recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb

@@ -0,0 +1,166 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Running Llama 3 on Mac, Windows or Linux\n",
+    "This notebook goes over how you can set up and run Llama 3 locally on a Mac, Windows or Linux using [Ollama](https://ollama.com/)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Steps at a glance:\n",
+    "1. Download and install Ollama.\n",
+    "2. Download and test run Llama 3.\n",
+    "3. Use local Llama 3 via Python.\n",
+    "4. Use local Llama 3 via LangChain.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 1. Download and install Ollama\n",
+    "\n",
+    "On Mac or Windows, go to the Ollama download page [here](https://ollama.com/download) and select your platform to download it, then double click the downloaded file to install Ollama.\n",
+    "\n",
+    "On Linux, you can simply run on a terminal `curl -fsSL https://ollama.com/install.sh | sh` to download and install Ollama."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 2. Download and test run Llama 3\n",
+    "\n",
+    "On a terminal or console, run `ollama pull llama3` to download the Llama 3 8b chat model, in the 4-bit quantized format with size about 4.7 GB.\n",
+    "\n",
+    "Run `ollama pull llama3:70b` to download the Llama 3 70b chat model, also in the 4-bit quantized format with size 39GB.\n",
+    "\n",
+    "Then you can run `ollama run llama3` and ask Llama 3 questions such as \"who wrote the book godfather?\" or \"who wrote the book godfather? answer in one sentence.\" You can also try `ollama run llama3:70b`, but the inference speed will most likely be too slow - for example, on an Apple M1 Pro with 32GB RAM, it takes over 10 seconds to generate one token (vs over 10 tokens per second with Llama 3 7b chat).\n",
+    "\n",
+    "You can also run the following command to test Llama 3:\n",
+    "```\n",
+    " curl http://localhost:11434/api/chat -d '{\n",
+    "  \"model\": \"llama3\",\n",
+    "  \"messages\": [\n",
+    "    {\n",
+    "      \"role\": \"user\",\n",
+    "      \"content\": \"who wrote the book godfather?\"\n",
+    "    }\n",
+    "  ],\n",
+    "  \"stream\": false\n",
+    "}'\n",
+    "```\n",
+    "\n",
+    "The complete Ollama API doc is [here](https://github.com/ollama/ollama/blob/main/docs/api.md)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 3. Use local Llama 3 via Python\n",
+    "\n",
+    "The Python code below is the port of the curl command above."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "url = \"http://localhost:11434/api/chat\"\n",
+    "\n",
+    "def llama3(prompt):\n",
+    "    data = {\n",
+    "        \"model\": \"llama3\",\n",
+    "        \"messages\": [\n",
+    "            {\n",
+    "              \"role\": \"user\",\n",
+    "              \"content\": prompt\n",
+    "            }\n",
+    "        ],\n",
+    "        \"stream\": False\n",
+    "    }\n",
+    "    \n",
+    "    headers = {\n",
+    "        'Content-Type': 'application/json'\n",
+    "    }\n",
+    "    \n",
+    "    response = requests.post(url, headers=headers, json=data)\n",
+    "    \n",
+    "    return(response.json()['message']['content'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = llama3(\"who wrote the book godfather\")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 4. Use local Llama 3 via LangChain\n",
+    "\n",
+    "Code below use LangChain with Ollama to query Llama 3 running locally. For a more advanced example of using local Llama 3 with LangChain and agent-powered RAG, see [this](https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install langchain"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.chat_models import ChatOllama\n",
+    "\n",
+    "llm = ChatOllama(model=\"llama3\", temperature=0)\n",
+    "response = llm.invoke(\"who wrote the book godfather?\")\n",
+    "print(response.content)\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}