il y a 1 an · a7504a2e99
--- a/README.md
+++ b/README.md
@@ -1,4 +1,7 @@
 
				-# Llama 2 Fine-tuning / Inference Recipes and Examples
			
 
				+# Llama 2 Fine-tuning / Inference Recipes, Examples and Demo Apps
			
 
				+
			
 
				+**[Update Oct. 20, 2023] We have just released a series of Llama 2 demo apps [here](./demo_apps). These apps show how to run Llama 2 locally and in the cloud to chat about data (PDF, DB, or live) and generate video summary.**
			
 
				+
			
 
				 
			
 
				 The 'llama-recipes' repository is a companion to the [Llama 2 model](https://github.com/facebookresearch/llama). The goal of this repository is to provide examples to quickly get started with fine-tuning for domain adaptation and how to run inference for the fine-tuned models. For ease of use, the examples use Hugging Face converted versions of the models. See steps for conversion of the model [here](#model-conversion-to-hugging-face).
			
 
				 
			
@@ -181,14 +184,14 @@ You can read more about our fine-tuning strategies [here](./docs/LLM_finetuning.
 
				 # Demo Apps
			
 
				 This folder contains a series of Llama2-powered apps:
			
 
				 * Quickstart Llama deployments and basic interactions with Llama
			
 
				-1. Llama on your Mac and ask Llama general questions
			
 
				-2. Llama on Google Colab
			
 
				-3. Llama on Cloud and ask Llama questions about unstructured data in a PDF
			
 
				+  1. Llama on your Mac and ask Llama general questions
			
 
				+  2. Llama on Google Colab
			
 
				+  3. Llama on Cloud and ask Llama questions about unstructured data in a PDF
			
 
				 
			
 
				 * Specialized Llama use cases:
			
 
				-1. Ask Llama to summarize a video content
			
 
				-2. Ask Llama questions about structured data in a DB
			
 
				-3. Ask Llama questions about live data on the web
			
 
				+  1. Ask Llama to summarize a video content
			
 
				+  2. Ask Llama questions about structured data in a DB
			
 
				+  3. Ask Llama questions about live data on the web
			
 
				 
			
 
				 # Repository Organization
			
 
				 This repository is organized in the following way:
			
--- a/demo_apps/HelloLlamaCloud.ipynb
+++ b/demo_apps/HelloLlamaCloud.ipynb
--- a/demo_apps/HelloLlamaLocal.ipynb
+++ b/demo_apps/HelloLlamaLocal.ipynb
--- a/demo_apps/LiveData.ipynb
+++ b/demo_apps/LiveData.ipynb
--- a/demo_apps/Llama2_Gradio.ipynb
+++ b/demo_apps/Llama2_Gradio.ipynb
@@ -1,5 +1,29 @@
 
				 {
			
 
				  "cells": [
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "47a9adb3",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "## This demo app shows how to query Llama 2 using the Gradio UI.\n",
			
 
				+    "\n",
			
 
				+    "Since we are using Replicate in this example, you will need to replace `<your replicate api token>` with your API token.\n",
			
 
				+    "\n",
			
 
				+    "To get the Replicate token: \n",
			
 
				+    "\n",
			
 
				+    "- You will need to first sign in with Replicate with your github account\n",
			
 
				+    "- Then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while \n",
			
 
				+    "\n",
			
 
				+    "**Note** After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on Replicate.\n",
			
 
				+    "\n",
			
 
				+    "To run this example:\n",
			
 
				+    "- Set up your Replicate API token and enter it in place of `<your replicate api token>`\n",
			
 
				+    "- Run the notebook\n",
			
 
				+    "- Enter your question and click Submit\n",
			
 
				+    "\n",
			
 
				+    "In the notebook or a browser with URL http://127.0.0.1:7860 you should see a UI with your answer."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 1,
			
--- a/demo_apps/StructuredLlama.ipynb
+++ b/demo_apps/StructuredLlama.ipynb
@@ -5,9 +5,25 @@
 
				    "id": "e8cba0b6",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "## This demo shows how to use LangChain's SQLDatabaseChain with Llama2 to query about structured data stored in a SQL DB.  \n",
			
 
				-    "* As the 2023-24 NBA season is around the corner, we use the NBA roster info saved in a SQLite DB to show you how to ask Llama2 questions about your favorite teams or players. \n",
			
 
				-    "* Because the SQLDatabaseChain API implementation is still in the langchain_experimental package, you'll see more issues that come with using the cutting edge experimental features, and how we succeed resolving some of the issues but fail on some others."
			
 
				+    "## This demo shows how to use LangChain's SQLDatabaseChain with Llama2 to query structured data stored in a SQL DB.  \n",
			
 
				+    "* We use the 2023-24 NBA roster info saved in a SQLite DB to show you how to ask Llama2 questions about your favorite teams or players \n",
			
 
				+    "* At the time of writing this, the SQLDatabaseChain API implementation is still in the langchain_experimental package. With this in mind you will see more issues that come with using the cutting edge experimental features, and how we succeed resolving some of the issues but fail on some others"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "f839d07d",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We start by installing the necessary packages:\n",
			
 
				+    "- [Replicate](https://replicate.com/) to host the Llama 2 model\n",
			
 
				+    "- [langchain](https://python.langchain.com/docs/get_started/introduction) provides necessary RAG tools for this demo\n",
			
 
				+    "- langchain_experimental Langchain's experimental version to get us access to SQLDatabaseChain\n",
			
 
				+    "\n",
			
 
				+    "And setting up the Replicate token.\n",
			
 
				+    "\n",
			
 
				+    "**Note** To get a Replicate token, you will need to first sign in with Replicate with your github account, then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. \n",
			
 
				+    "After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on Replicate."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -40,7 +56,7 @@
 
				    "metadata": {},
			
 
				    "outputs": [
			
 
				     {
			
 
				-     "name": "stdin",
			
 
				+     "name": "stdout",
			
 
				      "output_type": "stream",
			
 
				      "text": [
			
 
				       " ········\n"
			
@@ -55,6 +71,16 @@
 
				     "os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "1e586b75",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Next we call the Llama 2 model from replicate. In this example we will use the llama 2 13b chat model. You can find more Llama 2 models by searching for them on the [Replicate model explore page](https://replicate.com/explore?query=llama).\n",
			
 
				+    "\n",
			
 
				+    "You can add them here in the format: model_name/version"
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 3,
			
@@ -70,6 +96,20 @@
 
				     ")"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "6d421ae7",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Next you will need create the `nba_roster.db` file. \n",
			
 
				+    "\n",
			
 
				+    "To do this run the following commands while in this folder:\n",
			
 
				+    "- `python txt2csv.py`  This will convert the `nba.txt` file to `nba_roster.csv`. The `nba.txt` file was created by scraping the NBA roster info from the web.\n",
			
 
				+    "- Then run `python csv2db.py` to convert `nba_roster.csv` to `nba_roster.db`.\n",
			
 
				+    "\n",
			
 
				+    "Once you have your `nba_roster.db` ready, we set up the database to be queried by Llama 2 through Langchain's [SQL chains](https://python.langchain.com/docs/use_cases/qa_structured/sql)."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 4,
			
@@ -77,9 +117,7 @@
 
				    "metadata": {},
			
 
				    "outputs": [],
			
 
				    "source": [
			
 
				-    "# The nba_roster.db was created by running the two scripts:\n",
			
 
				-    "# python txt2csv.py # convert the `nba.txt` file, created by scraping the NBA roster info from the web, to nba_roster.csv\n",
			
 
				-    "# python csv2db.py # convert nba_roster.csv to nba_roster.db\n",
			
 
				+    "\n",
			
 
				     "db = SQLDatabase.from_uri(\"sqlite:///nba_roster.db\", sample_rows_in_table_info= 0)\n",
			
 
				     "\n",
			
 
				     "PROMPT_SUFFIX = \"\"\"\n",
			
@@ -93,6 +131,14 @@
 
				     "                                     template=PROMPT_SUFFIX))"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "afcf423a",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We will go ahead and turn on LangChain debug to get an idea of how many calls are made to Llama 2 and what the inputs and outputs are."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 5,
			
@@ -158,7 +204,7 @@
 
				     }
			
 
				    ],
			
 
				    "source": [
			
 
				-    "# turn on the debug of LangChain so we can see how many calls to Llama are made and exactly what are inputs and outputs\n",
			
 
				+    "\n",
			
 
				     "import langchain\n",
			
 
				     "langchain.debug = True\n",
			
 
				     "\n",
			
@@ -304,6 +350,18 @@
 
				     "db_chain.run(\"What's his salary?\")"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "98b2c523",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "\n",
			
 
				+    "Since we did not pass any context along with the follow-up to the model it did not know who \"his\" is and just picked LeBron James.\n",
			
 
				+    "\n",
			
 
				+    "Let's try to fix the issue that the context (the previous question and answer) was not sent to the model along with the new question.\n",
			
 
				+    "`SQLDatabaseChain.from_llm` has a parameter \"memory\" which can be set to a `ConversationBufferMemory` instance, which looks promising.\n"
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 8,
			
@@ -311,10 +369,7 @@
 
				    "metadata": {},
			
 
				    "outputs": [],
			
 
				    "source": [
			
 
				-    "# since we didn't pass the context along with the follow-up to llm so it didn't know what \"his\" is and just picked LeBron James\n",
			
 
				     "\n",
			
 
				-    "# let's try to fix the issue that the context (the previous question and answer) was not sent to LLM along with the new question\n",
			
 
				-    "# SQLDatabaseChain.from_llm has a parameter \"memory\" which can be set to a ConversationBufferMemory instance, which looks promising.\n",
			
 
				     "from langchain.memory import ConversationBufferMemory\n",
			
 
				     "\n",
			
 
				     "memory = ConversationBufferMemory()\n",
			
--- a/demo_apps/VideoSummary.ipynb
+++ b/demo_apps/VideoSummary.ipynb
@@ -6,9 +6,24 @@
 
				    "metadata": {},
			
 
				    "source": [
			
 
				     "## This demo app shows:\n",
			
 
				-    "* how to use LangChain's YoutubeLoader to retrieve the caption in a YouTube video;\n",
			
 
				-    "* how to ask Llama to summarize the content (per the Llama's input size limit) of the video in a naive way using LangChain's stuff method;\n",
			
 
				-    "* how to bypass the limit of Llama's max input token size by using more sophisticated way using LangChain's map_reduce and refine methods - see [here](https://python.langchain.com/docs/use_cases/summarization) for more info."
			
 
				+    "* How to use LangChain's YoutubeLoader to retrieve the caption in a YouTube video\n",
			
 
				+    "* How to ask Llama to summarize the content (per the Llama's input size limit) of the video in a naive way using LangChain's stuff method\n",
			
 
				+    "* How to bypass the limit of Llama's max input token size by using a more sophisticated way using LangChain's map_reduce and refine methods - see [here](https://python.langchain.com/docs/use_cases/summarization) for more info"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "c866f6be",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We start by installing the necessary packages:\n",
			
 
				+    "- [youtube-transcript-api](https://pypi.org/project/youtube-transcript-api/) API to get transcript/subtitles of a YouTube video\n",
			
 
				+    "- [langchain](https://python.langchain.com/docs/get_started/introduction) provides necessary RAG tools for this demo\n",
			
 
				+    "- [tiktoken](https://github.com/openai/tiktoken) BytePair Encoding tokenizer\n",
			
 
				+    "- [pytube](https://pytube.io/en/latest/) Utility for downloading YouTube videos\n",
			
 
				+    "\n",
			
 
				+    "**Note** This example uses Replicate to host the Llama model. If you have not set up/or used Replicate before, we suggest you take a look at the [HelloLlamaCloud](HelloLlamaCloud.ipynb) example for information on how to set up Replicate before continuing with this example.\n",
			
 
				+    "If you do not want to use Replicate, you will need to make some changes to this notebook as you go along."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -21,6 +36,14 @@
 
				     "!pip install langchain youtube-transcript-api tiktoken pytube"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "af3069b1",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Let's load the YouTube video transcript using the YoutubeLoader."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 1,
			
@@ -69,6 +92,25 @@
 
				     "len(docs[0].page_content), docs[0].page_content[:300]"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "4af7cc16",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We are using Replicate in this example to host our Llama 2 model so you will need to get a Replicate token.\n",
			
 
				+    "\n",
			
 
				+    "To get the Replicate token: \n",
			
 
				+    "\n",
			
 
				+    "- You will need to first sign in with Replicate with your github account\n",
			
 
				+    "- Then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. \n",
			
 
				+    "\n",
			
 
				+    "**Note** After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on Replicate.\n",
			
 
				+    "\n",
			
 
				+    "Alternatively, you can run Llama locally. See:\n",
			
 
				+    "- [HelloLlamaCloud](HelloLlamaCloud.ipynb) for further information on how to run Llama using Replicate.\n",
			
 
				+    "- [HelloLlamaLocal](HelloLlamaLocal.ipynb) for further information on how to run Llama locally."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 4,
			
@@ -76,7 +118,7 @@
 
				    "metadata": {},
			
 
				    "outputs": [
			
 
				     {
			
 
				-     "name": "stdin",
			
 
				+     "name": "stdout",
			
 
				      "output_type": "stream",
			
 
				      "text": [
			
 
				       " ········\n"
			
@@ -92,6 +134,18 @@
 
				     "os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN\n"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "6b911efd",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Next we call the Llama 2 model from Replicate. In this example we will use the llama 2 13b chat model. You can find more Llama 2 models by searching for them on the [Replicate model explore page](https://replicate.com/explore?query=llama).\n",
			
 
				+    "\n",
			
 
				+    "You can add them here in the format: model_name/version\n",
			
 
				+    "\n",
			
 
				+    "If you using local Llama, just set llm accordingly - see the [HelloLlamaLocal notebook](HelloLlamaLocal.ipynb)"
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": null,
			
@@ -99,7 +153,7 @@
 
				    "metadata": {},
			
 
				    "outputs": [],
			
 
				    "source": [
			
 
				-    "# set llm to be Llama2-13b model; if you use local Llama, just set llm accordingly - see the HelloLlamaLocal notebook\n",
			
 
				+    "\n",
			
 
				     "from langchain.llms import Replicate\n",
			
 
				     "\n",
			
 
				     "llama2_13b = \"meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d\"\n",
			
@@ -109,6 +163,14 @@
 
				     ")"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "8e3baa56",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Once everything is set up, we prompt Llama 2 to summarize the first 4000 characters of the transcript for us."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 6,
			
@@ -141,6 +203,14 @@
 
				     "print(summary)"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "8b684b29",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Next we try to summarize all the content of the transcript and we should get a `RuntimeError: Your input is too long. Max input length is 4096 tokens, but you supplied 5597 tokens.`."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 7,
			
@@ -174,8 +244,18 @@
 
				     "# try to get a summary of the whole content\n",
			
 
				     "text = docs[0].page_content\n",
			
 
				     "summary = chain.run(text)\n",
			
 
				-    "print(summary)\n",
			
 
				-    "# and you'll get - RuntimeError: Your input is too long. Max input length is 4096 tokens, but you supplied 5597 tokens."
			
 
				+    "print(summary)\n"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "1ad1881a",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "\n",
			
 
				+    "Let's try some workarounds to see if we can summarize the entire transcript without running into the `RuntimeError`.\n",
			
 
				+    "\n",
			
 
				+    "We will use the LangChain's `load_summarize_chain` and play around with the `chain_type`.\n"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -260,6 +340,15 @@
 
				     "chain.run(docs)"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "aecf6328",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "\n",
			
 
				+    "Since the transcript is bigger than the model can handle, we can split the transcript into chunks instead and use the [`refine`](https://python.langchain.com/docs/modules/chains/document/refine) `chain_type` to iteratively create an answer."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 10,
			
@@ -321,6 +410,14 @@
 
				     "chain.run(split_docs)"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "c3976c92",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "You can also use [`map_reduce`](https://python.langchain.com/docs/modules/chains/document/map_reduce) `chain_type` to implement a map reduce like architecture while summarizing the documents."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 14,
			
@@ -400,6 +497,15 @@
 
				     "chain.run(split_docs)"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "77d580de",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "To investigate further, let's turn on Langchain's debug mode on to get an idea of how many calls are made to the model and the details of the inputs and outputs.\n",
			
 
				+    "We will then run our summary using the `stuff` and `refine` `chain_types` and take a look at our output."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 15,
			
@@ -559,12 +665,13 @@
 
				    ]
			
 
				   },
			
 
				   {
			
 
				-   "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				+   "cell_type": "markdown",
			
 
				    "id": "61ccd0fb-5cdb-43c4-afaf-05bc9f7cf959",
			
 
				    "metadata": {},
			
 
				-   "outputs": [],
			
 
				-   "source": []
			
 
				+   "source": [
			
 
				+    "\n",
			
 
				+    "As you can see, `stuff` fails because it tries to treat all the split documents as one and \"stuffs\" it into one prompt which leads to a much larger prompt than Llama 2 can handle while `refine` iteratively runs over the documents updating its answer as it goes."
			
 
				+   ]
			
 
				   }
			
 
				  ],
			
 
				  "metadata": {
			
--- a/docs/FAQ.md
+++ b/docs/FAQ.md