1 year ago · 0e09458fea
--- a/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_Mac.ipynb
+++ b/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_Mac.ipynb
@@ -1,219 +0,0 @@
 
				-{
			
 
				- "cells": [
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "## Running Llama2 on Mac\n",
			
 
				-    "This notebook goes over how you can set up and run Llama2 locally on a Mac using llama-cpp-python and the llama-cpp's quantized Llama2 model. It also goes over how to use LangChain to ask Llama general questions"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "### Steps at a glance:\n",
			
 
				-    "1. Use CMAKE and install required packages\n",
			
 
				-    "2. Request download of model weights from the Llama website\n",
			
 
				-    "3. Clone the llama repo and get the weights\n",
			
 
				-    "4. Clone the llamacpp repo and quantize the model\n",
			
 
				-    "5. Prepare the script\n",
			
 
				-    "6. Run the example\n"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "<br>\n",
			
 
				-    "\n",
			
 
				-    "#### 1. Use CMAKE and install required packages\n",
			
 
				-    "\n",
			
 
				-    "Type the following command:"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				-   "metadata": {},
			
 
				-   "outputs": [],
			
 
				-   "source": [
			
 
				-    "#CMAKE_ARGS=\"-DLLAMA_METAL=on\" FORCE_CMAKE=1: sets the appropriate build configuration options for the llama-cpp-python package \n",
			
 
				-    "#and enables the use of Metal in Mac and forces the use of CMake as the build system.\n",
			
 
				-    "!CMAKE_ARGS=\"-DLLAMA_METAL=on\" FORCE_CMAKE=1 pip install llama-cpp-python\n",
			
 
				-    "\n",
			
 
				-    "#pip install llama-cpp-python: installs the llama-cpp-python package and its dependencies:\n",
			
 
				-    "!pip install pypdf sentence-transformers chromadb langchain"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "If running without a Jupyter notebook, use the command without the `!`"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "A brief look at the installed libraries:\n",
			
 
				-    "- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) a simple Python bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) library\n",
			
 
				-    "- pypdf gives us the ability to work with pdfs\n",
			
 
				-    "- sentence-transformers for text embeddings\n",
			
 
				-    "- chromadb gives us database capabilities \n",
			
 
				-    "- langchain provides necessary RAG tools for this demo"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "<br>\n",
			
 
				-    "\n",
			
 
				-    "#### 2. Request download of model weights from the Llama website\n",
			
 
				-    "Before you can run the model locally, you will need to get the model weights. To get the model weights, visit the [Llama website](https://llama.meta.com/) and click on “download models”. \n",
			
 
				-    "Fill  the required information, select the models “Llama 2 & Llama Chat” and accept the terms & conditions. You will receive a URL in your email in a short time.\n"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "<br>\n",
			
 
				-    "\n",
			
 
				-    "#### 3. Clone the llama repo and get the weights\n",
			
 
				-    "Git clone the [Llama repo](https://github.com/facebookresearch/llama.git). Enter the URL and get 13B weights. This example demonstrates a llama2 model with 13B parameters, but the steps we follow would be similar for other llama models, as well as for other parameter models.\n",
			
 
				-    "\n"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "<br>\n",
			
 
				-    "\n",
			
 
				-    "#### 4. Clone the llamacpp repo and quantize the model\n",
			
 
				-    "* Git clone the [Llamacpp repo](https://github.com/ggerganov/llama.cpp). \n",
			
 
				-    "* Enter the repo:\n",
			
 
				-    "`cd llama.cpp`\n",
			
 
				-    "* Install requirements:\n",
			
 
				-    "`python3 -m pip install -r requirements.txt`\n",
			
 
				-    "* Convert the weights:\n",
			
 
				-    "`python convert.py <path_to_your_downloaded_llama-2-13b_model>`\n",
			
 
				-    "* Run make to generate the 'quantize' method that we will use in the next step\n",
			
 
				-    "`make`\n",
			
 
				-    "* Quantize the weights:\n",
			
 
				-    "`./quantize <path_to_your_downloaded_llama-2-13b_model>/ggml-model-f16.gguf <path_to_your_downloaded_llama-2-13b_model>/ggml-model-q4_0.gguf q4_0`"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "\n",
			
 
				-    "#### 5. Prepare the script\n"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				-   "metadata": {},
			
 
				-   "outputs": [],
			
 
				-   "source": [
			
 
				-    "# mentions the instance of the Llama model that we will use\n",
			
 
				-    "from langchain.llms import LlamaCpp\n",
			
 
				-    "\n",
			
 
				-    "# defines a chain of operations that can be performed on text input to generate the output using the LLM\n",
			
 
				-    "from langchain.chains import LLMChain\n",
			
 
				-    "\n",
			
 
				-    "# manages callbacks that are triggered at various stages during the execution of an LLMChain\n",
			
 
				-    "from langchain.callbacks.manager import CallbackManager\n",
			
 
				-    "\n",
			
 
				-    "# defines a callback that streams the output of the LLMChain to the console in real-time as it gets generated\n",
			
 
				-    "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
			
 
				-    "\n",
			
 
				-    "# allows to define prompt templates that can be used to generate custom inputs for the LLM\n",
			
 
				-    "from langchain.prompts import PromptTemplate\n",
			
 
				-    "\n",
			
 
				-    "\n",
			
 
				-    "# Initialize the langchain CallBackManager. This handles callbacks from Langchain and for this example we will use \n",
			
 
				-    "# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question\n",
			
 
				-    "callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])\n",
			
 
				-    "\n",
			
 
				-    "# Set up the model\n",
			
 
				-    "llm = LlamaCpp(\n",
			
 
				-    "    model_path=\"<path-to-llama-gguf-file>\",\n",
			
 
				-    "    temperature=0.0,\n",
			
 
				-    "    top_p=1,\n",
			
 
				-    "    n_ctx=6000,\n",
			
 
				-    "    callback_manager=callback_manager, \n",
			
 
				-    "    verbose=True,\n",
			
 
				-    ")"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "#### 6. Run the example\n",
			
 
				-    "\n",
			
 
				-    "With the model set up, you are now ready to ask some questions. \n",
			
 
				-    "\n",
			
 
				-    "Here is an example of the simplest way to ask the model some general questions."
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				-   "metadata": {},
			
 
				-   "outputs": [],
			
 
				-   "source": [
			
 
				-    "# Run the example\n",
			
 
				-    "question = \"who wrote the book Pride and Prejudice?\"\n",
			
 
				-    "answer = llm(question)"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "markdown",
			
 
				-   "metadata": {},
			
 
				-   "source": [
			
 
				-    "Alternatively, you can use LangChain's `PromptTemplate` for some flexibility in your prompts and questions. For more information on LangChain's prompt template visit this [link](https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/)"
			
 
				-   ]
			
 
				-  },
			
 
				-  {
			
 
				-   "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				-   "metadata": {},
			
 
				-   "outputs": [],
			
 
				-   "source": [
			
 
				-    "prompt = PromptTemplate.from_template(\n",
			
 
				-    "    \"who wrote {book}?\"\n",
			
 
				-    ")\n",
			
 
				-    "chain = LLMChain(llm=llm, prompt=prompt)\n",
			
 
				-    "answer = chain.run(\"A tale of two cities\")"
			
 
				-   ]
			
 
				-  }
			
 
				- ],
			
 
				- "metadata": {
			
 
				-  "kernelspec": {
			
 
				-   "display_name": "Python 3",
			
 
				-   "language": "python",
			
 
				-   "name": "python3"
			
 
				-  },
			
 
				-  "language_info": {
			
 
				-   "codemirror_mode": {
			
 
				-    "name": "ipython",
			
 
				-    "version": 3
			
 
				-   },
			
 
				-   "file_extension": ".py",
			
 
				-   "mimetype": "text/x-python",
			
 
				-   "name": "python",
			
 
				-   "nbconvert_exporter": "python",
			
 
				-   "pygments_lexer": "ipython3",
			
 
				-   "version": "3.8.3"
			
 
				-  }
			
 
				- },
			
 
				- "nbformat": 4,
			
 
				- "nbformat_minor": 2
			
 
				-}
			
--- a/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb
+++ b/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb
@@ -0,0 +1,166 @@
 
				+{
			
 
				+ "cells": [
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "## Running Llama 3 on Mac, Windows or Linux\n",
			
 
				+    "This notebook goes over how you can set up and run Llama 3 locally on a Mac, Windows or Linux using [Ollama](https://ollama.com/)."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "### Steps at a glance:\n",
			
 
				+    "1. Download and install Ollama.\n",
			
 
				+    "2. Download and test run Llama 3.\n",
			
 
				+    "3. Use local Llama 3 via Python.\n",
			
 
				+    "4. Use local Llama 3 via LangChain.\n"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "#### 1. Download and install Ollama\n",
			
 
				+    "\n",
			
 
				+    "On Mac or Windows, go to the Ollama download page [here](https://ollama.com/download) and select your platform to download it, then double click the downloaded file to install Ollama.\n",
			
 
				+    "\n",
			
 
				+    "On Linux, you can simply run on a terminal `curl -fsSL https://ollama.com/install.sh | sh` to download and install Ollama."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "#### 2. Download and test run Llama 3\n",
			
 
				+    "\n",
			
 
				+    "On a terminal or console, run `ollama pull llama3` to download the Llama 3 8b chat model, in the 4-bit quantized format with size about 4.7 GB.\n",
			
 
				+    "\n",
			
 
				+    "Run `ollama pull llama3:70b` to download the Llama 3 70b chat model, also in the 4-bit quantized format with size 39GB.\n",
			
 
				+    "\n",
			
 
				+    "Then you can run `ollama run llama3` and ask Llama 3 questions such as \"who wrote the book godfather?\" or \"who wrote the book godfather? answer in one sentence.\" You can also try `ollama run llama3:70b`, but the inference speed will most likely be too slow - for example, on an Apple M1 Pro with 32GB RAM, it takes over 10 seconds to generate one token (vs over 10 tokens per second with Llama 3 7b chat).\n",
			
 
				+    "\n",
			
 
				+    "You can also run the following command to test Llama 3 (7b chat):\n",
			
 
				+    "```\n",
			
 
				+    " curl http://localhost:11434/api/chat -d '{\n",
			
 
				+    "  \"model\": \"llama3\",\n",
			
 
				+    "  \"messages\": [\n",
			
 
				+    "    {\n",
			
 
				+    "      \"role\": \"user\",\n",
			
 
				+    "      \"content\": \"who wrote the book godfather?\"\n",
			
 
				+    "    }\n",
			
 
				+    "  ],\n",
			
 
				+    "  \"stream\": false\n",
			
 
				+    "}'\n",
			
 
				+    "```\n",
			
 
				+    "\n",
			
 
				+    "The complete Ollama API doc is [here](https://github.com/ollama/ollama/blob/main/docs/api.md)."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "#### 3. Use local Llama 3 via Python\n",
			
 
				+    "\n",
			
 
				+    "The Python code below is the port of the curl command above."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "import requests\n",
			
 
				+    "import json\n",
			
 
				+    "\n",
			
 
				+    "url = \"http://localhost:11434/api/chat\"\n",
			
 
				+    "\n",
			
 
				+    "def llama3(prompt):\n",
			
 
				+    "    data = {\n",
			
 
				+    "        \"model\": \"llama3\",\n",
			
 
				+    "        \"messages\": [\n",
			
 
				+    "            {\n",
			
 
				+    "              \"role\": \"user\",\n",
			
 
				+    "              \"content\": prompt\n",
			
 
				+    "            }\n",
			
 
				+    "        ],\n",
			
 
				+    "        \"stream\": False\n",
			
 
				+    "    }\n",
			
 
				+    "    \n",
			
 
				+    "    headers = {\n",
			
 
				+    "        'Content-Type': 'application/json'\n",
			
 
				+    "    }\n",
			
 
				+    "    \n",
			
 
				+    "    response = requests.post(url, headers=headers, json=data)\n",
			
 
				+    "    \n",
			
 
				+    "    return(response.json()['message']['content'])"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "response = llama3(\"who wrote the book godfather\")\n",
			
 
				+    "print(response)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "#### 4. Use local Llama 3 via LangChain\n",
			
 
				+    "\n",
			
 
				+    "Code below use LangChain with Ollama to query Llama 3 running locally. For a more advanced example of using local Llama 3 with LangChain and agent-powered RAG, see [this](https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb)."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "!pip install langchain"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from langchain_community.chat_models import ChatOllama\n",
			
 
				+    "\n",
			
 
				+    "llm = ChatOllama(model=\"llama3\", temperature=0)\n",
			
 
				+    "response = llm.invoke(\"who wrote the book godfather?\")\n",
			
 
				+    "print(response.content)\n"
			
 
				+   ]
			
 
				+  }
			
 
				+ ],
			
 
				+ "metadata": {
			
 
				+  "kernelspec": {
			
 
				+   "display_name": "Python 3 (ipykernel)",
			
 
				+   "language": "python",
			
 
				+   "name": "python3"
			
 
				+  },
			
 
				+  "language_info": {
			
 
				+   "codemirror_mode": {
			
 
				+    "name": "ipython",
			
 
				+    "version": 3
			
 
				+   },
			
 
				+   "file_extension": ".py",
			
 
				+   "mimetype": "text/x-python",
			
 
				+   "name": "python",
			
 
				+   "nbconvert_exporter": "python",
			
 
				+   "pygments_lexer": "ipython3",
			
 
				+   "version": "3.11.9"
			
 
				+  }
			
 
				+ },
			
 
				+ "nbformat": 4,
			
 
				+ "nbformat_minor": 4
			
 
				+}
			
--- a/recipes/use_cases/RAG/HelloLlamaCloud.ipynb
+++ b/recipes/use_cases/RAG/HelloLlamaCloud.ipynb
--- a/recipes/use_cases/RAG/HelloLlamaLocal.ipynb
+++ b/recipes/use_cases/RAG/HelloLlamaLocal.ipynb
--- a/recipes/use_cases/RAG/llama2.pdf
+++ b/recipes/use_cases/RAG/llama2.pdf