|
@@ -1,219 +0,0 @@
|
|
-{
|
|
|
|
- "cells": [
|
|
|
|
- {
|
|
|
|
- "cell_type": "markdown",
|
|
|
|
- "metadata": {},
|
|
|
|
- "source": [
|
|
|
|
- "## Running Llama2 on Mac\n",
|
|
|
|
- "This notebook goes over how you can set up and run Llama2 locally on a Mac using llama-cpp-python and the llama-cpp's quantized Llama2 model. It also goes over how to use LangChain to ask Llama general questions"
|
|
|
|
- ]
|
|
|
|
- },
|
|
|
|
- {
|
|
|
|
- "cell_type": "markdown",
|
|
|
|
- "metadata": {},
|
|
|
|
- "source": [
|
|
|
|
- "### Steps at a glance:\n",
|
|
|
|
- "1. Use CMAKE and install required packages\n",
|
|
|
|
- "2. Request download of model weights from the Llama website\n",
|
|
|
|
- "3. Clone the llama repo and get the weights\n",
|
|
|
|
- "4. Clone the llamacpp repo and quantize the model\n",
|
|
|
|
- "5. Prepare the script\n",
|
|
|
|
- "6. Run the example\n"
|
|
|
|
- ]
|
|
|
|
- },
|
|
|
|
- {
|
|
|
|
- "cell_type": "markdown",
|
|
|
|
- "metadata": {},
|
|
|
|
- "source": [
|
|
|
|
- "<br>\n",
|
|
|
|
- "\n",
|
|
|
|
- "#### 1. Use CMAKE and install required packages\n",
|
|
|
|
- "\n",
|
|
|
|
- "Type the following command:"
|
|
|
|
- ]
|
|
|
|
- },
|
|
|
|
- {
|
|
|
|
- "cell_type": "code",
|
|
|
|
- "execution_count": null,
|
|
|
|
- "metadata": {},
|
|
|
|
- "outputs": [],
|
|
|
|
- "source": [
|
|
|
|
- "#CMAKE_ARGS=\"-DLLAMA_METAL=on\" FORCE_CMAKE=1: sets the appropriate build configuration options for the llama-cpp-python package \n",
|
|
|
|
- "#and enables the use of Metal in Mac and forces the use of CMake as the build system.\n",
|
|
|
|
- "!CMAKE_ARGS=\"-DLLAMA_METAL=on\" FORCE_CMAKE=1 pip install llama-cpp-python\n",
|
|
|
|
- "\n",
|
|
|
|
- "#pip install llama-cpp-python: installs the llama-cpp-python package and its dependencies:\n",
|
|
|
|
- "!pip install pypdf sentence-transformers chromadb langchain"
|
|
|
|
- ]
|
|
|
|
- },
|
|
|
|
- {
|
|
|
|
- "cell_type": "markdown",
|
|
|
|
- "metadata": {},
|
|
|
|
- "source": [
|
|
|
|
- "If running without a Jupyter notebook, use the command without the `!`"
|
|
|
|
- ]
|
|
|
|
- },
|
|
|
|
- {
|
|
|
|
- "cell_type": "markdown",
|
|
|
|
- "metadata": {},
|
|
|
|
- "source": [
|
|
|
|
- "A brief look at the installed libraries:\n",
|
|
|
|
- "- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) a simple Python bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) library\n",
|
|
|
|
- "- pypdf gives us the ability to work with pdfs\n",
|
|
|
|
- "- sentence-transformers for text embeddings\n",
|
|
|
|
- "- chromadb gives us database capabilities \n",
|
|
|
|
- "- langchain provides necessary RAG tools for this demo"
|
|
|
|
- ]
|
|
|
|
- },
|
|
|
|
- {
|
|
|
|
- "cell_type": "markdown",
|
|
|
|
- "metadata": {},
|
|
|
|
- "source": [
|
|
|
|
- "<br>\n",
|
|
|
|
- "\n",
|
|
|
|
- "#### 2. Request download of model weights from the Llama website\n",
|
|
|
|
- "Before you can run the model locally, you will need to get the model weights. To get the model weights, visit the [Llama website](https://llama.meta.com/) and click on “download models”. \n",
|
|
|
|
- "Fill the required information, select the models “Llama 2 & Llama Chat” and accept the terms & conditions. You will receive a URL in your email in a short time.\n"
|
|
|
|
- ]
|
|
|
|
- },
|
|
|
|
- {
|
|
|
|
- "cell_type": "markdown",
|
|
|
|
- "metadata": {},
|
|
|
|
- "source": [
|
|
|
|
- "<br>\n",
|
|
|
|
- "\n",
|
|
|
|
- "#### 3. Clone the llama repo and get the weights\n",
|
|
|
|
- "Git clone the [Llama repo](https://github.com/facebookresearch/llama.git). Enter the URL and get 13B weights. This example demonstrates a llama2 model with 13B parameters, but the steps we follow would be similar for other llama models, as well as for other parameter models.\n",
|
|
|
|
- "\n"
|
|
|
|
- ]
|
|
|
|
- },
|
|
|
|
- {
|
|
|
|
- "cell_type": "markdown",
|
|
|
|
- "metadata": {},
|
|
|
|
- "source": [
|
|
|
|
- "<br>\n",
|
|
|
|
- "\n",
|
|
|
|
- "#### 4. Clone the llamacpp repo and quantize the model\n",
|
|
|
|
- "* Git clone the [Llamacpp repo](https://github.com/ggerganov/llama.cpp). \n",
|
|
|
|
- "* Enter the repo:\n",
|
|
|
|
- "`cd llama.cpp`\n",
|
|
|
|
- "* Install requirements:\n",
|
|
|
|
- "`python3 -m pip install -r requirements.txt`\n",
|
|
|
|
- "* Convert the weights:\n",
|
|
|
|
- "`python convert.py <path_to_your_downloaded_llama-2-13b_model>`\n",
|
|
|
|
- "* Run make to generate the 'quantize' method that we will use in the next step\n",
|
|
|
|
- "`make`\n",
|
|
|
|
- "* Quantize the weights:\n",
|
|
|
|
- "`./quantize <path_to_your_downloaded_llama-2-13b_model>/ggml-model-f16.gguf <path_to_your_downloaded_llama-2-13b_model>/ggml-model-q4_0.gguf q4_0`"
|
|
|
|
- ]
|
|
|
|
- },
|
|
|
|
- {
|
|
|
|
- "cell_type": "markdown",
|
|
|
|
- "metadata": {},
|
|
|
|
- "source": [
|
|
|
|
- "\n",
|
|
|
|
- "#### 5. Prepare the script\n"
|
|
|
|
- ]
|
|
|
|
- },
|
|
|
|
- {
|
|
|
|
- "cell_type": "code",
|
|
|
|
- "execution_count": null,
|
|
|
|
- "metadata": {},
|
|
|
|
- "outputs": [],
|
|
|
|
- "source": [
|
|
|
|
- "# mentions the instance of the Llama model that we will use\n",
|
|
|
|
- "from langchain.llms import LlamaCpp\n",
|
|
|
|
- "\n",
|
|
|
|
- "# defines a chain of operations that can be performed on text input to generate the output using the LLM\n",
|
|
|
|
- "from langchain.chains import LLMChain\n",
|
|
|
|
- "\n",
|
|
|
|
- "# manages callbacks that are triggered at various stages during the execution of an LLMChain\n",
|
|
|
|
- "from langchain.callbacks.manager import CallbackManager\n",
|
|
|
|
- "\n",
|
|
|
|
- "# defines a callback that streams the output of the LLMChain to the console in real-time as it gets generated\n",
|
|
|
|
- "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
|
|
|
|
- "\n",
|
|
|
|
- "# allows to define prompt templates that can be used to generate custom inputs for the LLM\n",
|
|
|
|
- "from langchain.prompts import PromptTemplate\n",
|
|
|
|
- "\n",
|
|
|
|
- "\n",
|
|
|
|
- "# Initialize the langchain CallBackManager. This handles callbacks from Langchain and for this example we will use \n",
|
|
|
|
- "# for token-wise streaming so you'll see the answer gets generated token by token when Llama is answering your question\n",
|
|
|
|
- "callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])\n",
|
|
|
|
- "\n",
|
|
|
|
- "# Set up the model\n",
|
|
|
|
- "llm = LlamaCpp(\n",
|
|
|
|
- " model_path=\"<path-to-llama-gguf-file>\",\n",
|
|
|
|
- " temperature=0.0,\n",
|
|
|
|
- " top_p=1,\n",
|
|
|
|
- " n_ctx=6000,\n",
|
|
|
|
- " callback_manager=callback_manager, \n",
|
|
|
|
- " verbose=True,\n",
|
|
|
|
- ")"
|
|
|
|
- ]
|
|
|
|
- },
|
|
|
|
- {
|
|
|
|
- "cell_type": "markdown",
|
|
|
|
- "metadata": {},
|
|
|
|
- "source": [
|
|
|
|
- "#### 6. Run the example\n",
|
|
|
|
- "\n",
|
|
|
|
- "With the model set up, you are now ready to ask some questions. \n",
|
|
|
|
- "\n",
|
|
|
|
- "Here is an example of the simplest way to ask the model some general questions."
|
|
|
|
- ]
|
|
|
|
- },
|
|
|
|
- {
|
|
|
|
- "cell_type": "code",
|
|
|
|
- "execution_count": null,
|
|
|
|
- "metadata": {},
|
|
|
|
- "outputs": [],
|
|
|
|
- "source": [
|
|
|
|
- "# Run the example\n",
|
|
|
|
- "question = \"who wrote the book Pride and Prejudice?\"\n",
|
|
|
|
- "answer = llm(question)"
|
|
|
|
- ]
|
|
|
|
- },
|
|
|
|
- {
|
|
|
|
- "cell_type": "markdown",
|
|
|
|
- "metadata": {},
|
|
|
|
- "source": [
|
|
|
|
- "Alternatively, you can use LangChain's `PromptTemplate` for some flexibility in your prompts and questions. For more information on LangChain's prompt template visit this [link](https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/)"
|
|
|
|
- ]
|
|
|
|
- },
|
|
|
|
- {
|
|
|
|
- "cell_type": "code",
|
|
|
|
- "execution_count": null,
|
|
|
|
- "metadata": {},
|
|
|
|
- "outputs": [],
|
|
|
|
- "source": [
|
|
|
|
- "prompt = PromptTemplate.from_template(\n",
|
|
|
|
- " \"who wrote {book}?\"\n",
|
|
|
|
- ")\n",
|
|
|
|
- "chain = LLMChain(llm=llm, prompt=prompt)\n",
|
|
|
|
- "answer = chain.run(\"A tale of two cities\")"
|
|
|
|
- ]
|
|
|
|
- }
|
|
|
|
- ],
|
|
|
|
- "metadata": {
|
|
|
|
- "kernelspec": {
|
|
|
|
- "display_name": "Python 3",
|
|
|
|
- "language": "python",
|
|
|
|
- "name": "python3"
|
|
|
|
- },
|
|
|
|
- "language_info": {
|
|
|
|
- "codemirror_mode": {
|
|
|
|
- "name": "ipython",
|
|
|
|
- "version": 3
|
|
|
|
- },
|
|
|
|
- "file_extension": ".py",
|
|
|
|
- "mimetype": "text/x-python",
|
|
|
|
- "name": "python",
|
|
|
|
- "nbconvert_exporter": "python",
|
|
|
|
- "pygments_lexer": "ipython3",
|
|
|
|
- "version": "3.8.3"
|
|
|
|
- }
|
|
|
|
- },
|
|
|
|
- "nbformat": 4,
|
|
|
|
- "nbformat_minor": 2
|
|
|
|
-}
|
|
|