1 year ago · 11addc7a0a
--- a/demo_apps/VideoSummary.ipynb
+++ b/demo_apps/VideoSummary.ipynb
@@ -6,9 +6,24 @@
 
				    "metadata": {},
			
 
				    "source": [
			
 
				     "## This demo app shows:\n",
			
 
				-    "* how to use LangChain's YoutubeLoader to retrieve the caption in a YouTube video;\n",
			
 
				-    "* how to ask Llama to summarize the content (per the Llama's input size limit) of the video in a naive way using LangChain's stuff method;\n",
			
 
				-    "* how to bypass the limit of Llama's max input token size by using more sophisticated way using LangChain's map_reduce and refine methods - see [here](https://python.langchain.com/docs/use_cases/summarization) for more info."
			
 
				+    "* How to use LangChain's YoutubeLoader to retrieve the caption in a YouTube video.\n",
			
 
				+    "* How to ask Llama to summarize the content (per the Llama's input size limit) of the video in a naive way using LangChain's stuff method.\n",
			
 
				+    "* How to bypass the limit of Llama's max input token size by using a more sophisticated way using LangChain's map_reduce and refine methods - see [here](https://python.langchain.com/docs/use_cases/summarization) for more info."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "c866f6be",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We start by installing the necessary packages:\n",
			
 
				+    "- [youtube-transcript-api](https://pypi.org/project/youtube-transcript-api/) API to get transcript/subtitles of a YouTube video.\n",
			
 
				+    "- [langchain](https://python.langchain.com/docs/get_started/introduction) provides necessary RAG tools for this demo.\n",
			
 
				+    "- [tiktoken](https://github.com/openai/tiktoken) BytePair Encoding tokenizer.\n",
			
 
				+    "- [pytube](https://pytube.io/en/latest/) Utility for downloading YouTube videos.\n",
			
 
				+    "\n",
			
 
				+    "**Note** This example uses Replicate to host the Llama model. If you have not set up/or used Replicate before, we suggest you take a look at the [HelloLlamaCloud](HelloLlamaCloud.ipynb) example for information on how to set up Replicate before continuing with this example.\n",
			
 
				+    "If you do not want to use Replicate, you will need to make some changes to this notebook as you go along."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -21,6 +36,14 @@
 
				     "!pip install langchain youtube-transcript-api tiktoken pytube"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "af3069b1",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Next we load the YouTube video transcript using the YoutubeLoader."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 1,
			
@@ -69,6 +92,25 @@
 
				     "len(docs[0].page_content), docs[0].page_content[:300]"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "4af7cc16",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We are using Replicate in this example to host our Llama 2 model so you will need to get a Replicate token.\n",
			
 
				+    "\n",
			
 
				+    "To get the Replicate token: \n",
			
 
				+    "\n",
			
 
				+    "- You will need to first sign in with Replicate with your github account\n",
			
 
				+    "- Then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. \n",
			
 
				+    "\n",
			
 
				+    "**Note** After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on Replicate.\n",
			
 
				+    "\n",
			
 
				+    "Alternatively, you can run Llama locally. See:\n",
			
 
				+    "- [HelloLlamaCloud](HelloLlamaCloud.ipynb) for further information on how to run Llama using Replicate.\n",
			
 
				+    "- [HelloLlamaLocal](HelloLlamaLocal.ipynb) for further information on how to run Llama locally."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 4,
			
@@ -76,7 +118,7 @@
 
				    "metadata": {},
			
 
				    "outputs": [
			
 
				     {
			
 
				-     "name": "stdin",
			
 
				+     "name": "stdout",
			
 
				      "output_type": "stream",
			
 
				      "text": [
			
 
				       " ········\n"
			
@@ -92,6 +134,17 @@
 
				     "os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN\n"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "6b911efd",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Next we call the Llama 2 model from Replicate. In this example we will use the llama 2 13b chat model. You can find more Llama 2 models by searching for them on the [Replicate model explore page](https://replicate.com/explore?query=llama).\n",
			
 
				+    "You can add them here in the format: model_name/version\n",
			
 
				+    "\n",
			
 
				+    "If you using local Llama, just set llm accordingly - see the [HelloLlamaLocal notebook](HelloLlamaLocal.ipynb)"
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": null,
			
@@ -99,7 +152,7 @@
 
				    "metadata": {},
			
 
				    "outputs": [],
			
 
				    "source": [
			
 
				-    "# set llm to be Llama2-13b model; if you use local Llama, just set llm accordingly - see the HelloLlamaLocal notebook\n",
			
 
				+    "\n",
			
 
				     "from langchain.llms import Replicate\n",
			
 
				     "\n",
			
 
				     "llama2_13b = \"meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d\"\n",
			
@@ -109,6 +162,14 @@
 
				     ")"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "8e3baa56",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Once everything is set up, we prompt Llama 2 to summarize the first 4000 characters of the transcript for us."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 6,
			
@@ -141,6 +202,14 @@
 
				     "print(summary)"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "8b684b29",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Next we try to summarize all the content of the transcript and we should get a `RuntimeError: Your input is too long. Max input length is 4096 tokens, but you supplied 5597 tokens.`."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 7,
			
@@ -174,8 +243,16 @@
 
				     "# try to get a summary of the whole content\n",
			
 
				     "text = docs[0].page_content\n",
			
 
				     "summary = chain.run(text)\n",
			
 
				-    "print(summary)\n",
			
 
				-    "# and you'll get - RuntimeError: Your input is too long. Max input length is 4096 tokens, but you supplied 5597 tokens."
			
 
				+    "print(summary)\n"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "1ad1881a",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "\n",
			
 
				+    "Let's try some workarounds to see if we can summarize the entire transcript without running into the `RuntimeError`.\n"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -260,6 +337,15 @@
 
				     "chain.run(docs)"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "aecf6328",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "\n",
			
 
				+    "Since the transcript is bigger than the model can handle, we can split the transcript into chunks instead and use the [`refine`](https://python.langchain.com/docs/modules/chains/document/refine) `chain_type` iteratively to create an answer."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 10,
			
@@ -321,6 +407,14 @@
 
				     "chain.run(split_docs)"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "c3976c92",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "You can also use [`map_reduce`](https://python.langchain.com/docs/modules/chains/document/map_reduce) `chain_type` to implement a map reduce like architecture while summarizing the documents."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 14,
			
@@ -400,6 +494,15 @@
 
				     "chain.run(split_docs)"
			
 
				    ]
			
 
				   },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "77d580de",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "To investigate further, let's turn on Langchain's debug mode on to get an idea of how many calls are made to the model and the details of the inputs and outputs.\n",
			
 
				+    "We will then run our summary using the `stuff` and `refine` `chain_types` and take a look at our output."
			
 
				+   ]
			
 
				+  },
			
 
				   {
			
 
				    "cell_type": "code",
			
 
				    "execution_count": 15,
			
@@ -559,12 +662,13 @@
 
				    ]
			
 
				   },
			
 
				   {
			
 
				-   "cell_type": "code",
			
 
				-   "execution_count": null,
			
 
				+   "cell_type": "markdown",
			
 
				    "id": "61ccd0fb-5cdb-43c4-afaf-05bc9f7cf959",
			
 
				    "metadata": {},
			
 
				-   "outputs": [],
			
 
				-   "source": []
			
 
				+   "source": [
			
 
				+    "\n",
			
 
				+    "As you can see, `stuff` fails because it tries to treat all the split documents as one and \"stuffs\" it into one prompt which leads to a much larger prompt than Llama 2 can handle while `refine` iteratively runs over the documents updating its answer as it goes."
			
 
				+   ]
			
 
				   }
			
 
				  ],
			
 
				  "metadata": {