|
@@ -6,9 +6,24 @@
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"source": [
|
|
"source": [
|
|
"## This demo app shows:\n",
|
|
"## This demo app shows:\n",
|
|
- "* how to use LangChain's YoutubeLoader to retrieve the caption in a YouTube video;\n",
|
|
|
|
- "* how to ask Llama to summarize the content (per the Llama's input size limit) of the video in a naive way using LangChain's stuff method;\n",
|
|
|
|
- "* how to bypass the limit of Llama's max input token size by using more sophisticated way using LangChain's map_reduce and refine methods - see [here](https://python.langchain.com/docs/use_cases/summarization) for more info."
|
|
|
|
|
|
+ "* How to use LangChain's YoutubeLoader to retrieve the caption in a YouTube video\n",
|
|
|
|
+ "* How to ask Llama to summarize the content (per the Llama's input size limit) of the video in a naive way using LangChain's stuff method\n",
|
|
|
|
+ "* How to bypass the limit of Llama's max input token size by using a more sophisticated way using LangChain's map_reduce and refine methods - see [here](https://python.langchain.com/docs/use_cases/summarization) for more info"
|
|
|
|
+ ]
|
|
|
|
+ },
|
|
|
|
+ {
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
+ "id": "c866f6be",
|
|
|
|
+ "metadata": {},
|
|
|
|
+ "source": [
|
|
|
|
+ "We start by installing the necessary packages:\n",
|
|
|
|
+ "- [youtube-transcript-api](https://pypi.org/project/youtube-transcript-api/) API to get transcript/subtitles of a YouTube video\n",
|
|
|
|
+ "- [langchain](https://python.langchain.com/docs/get_started/introduction) provides necessary RAG tools for this demo\n",
|
|
|
|
+ "- [tiktoken](https://github.com/openai/tiktoken) BytePair Encoding tokenizer\n",
|
|
|
|
+ "- [pytube](https://pytube.io/en/latest/) Utility for downloading YouTube videos\n",
|
|
|
|
+ "\n",
|
|
|
|
+ "**Note** This example uses Replicate to host the Llama model. If you have not set up/or used Replicate before, we suggest you take a look at the [HelloLlamaCloud](HelloLlamaCloud.ipynb) example for information on how to set up Replicate before continuing with this example.\n",
|
|
|
|
+ "If you do not want to use Replicate, you will need to make some changes to this notebook as you go along."
|
|
]
|
|
]
|
|
},
|
|
},
|
|
{
|
|
{
|
|
@@ -21,6 +36,14 @@
|
|
"!pip install langchain youtube-transcript-api tiktoken pytube"
|
|
"!pip install langchain youtube-transcript-api tiktoken pytube"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
|
|
+ {
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
+ "id": "af3069b1",
|
|
|
|
+ "metadata": {},
|
|
|
|
+ "source": [
|
|
|
|
+ "Let's load the YouTube video transcript using the YoutubeLoader."
|
|
|
|
+ ]
|
|
|
|
+ },
|
|
{
|
|
{
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"execution_count": 1,
|
|
@@ -69,6 +92,25 @@
|
|
"len(docs[0].page_content), docs[0].page_content[:300]"
|
|
"len(docs[0].page_content), docs[0].page_content[:300]"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
|
|
+ {
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
+ "id": "4af7cc16",
|
|
|
|
+ "metadata": {},
|
|
|
|
+ "source": [
|
|
|
|
+ "We are using Replicate in this example to host our Llama 2 model so you will need to get a Replicate token.\n",
|
|
|
|
+ "\n",
|
|
|
|
+ "To get the Replicate token: \n",
|
|
|
|
+ "\n",
|
|
|
|
+ "- You will need to first sign in with Replicate with your github account\n",
|
|
|
|
+ "- Then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. \n",
|
|
|
|
+ "\n",
|
|
|
|
+ "**Note** After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on Replicate.\n",
|
|
|
|
+ "\n",
|
|
|
|
+ "Alternatively, you can run Llama locally. See:\n",
|
|
|
|
+ "- [HelloLlamaCloud](HelloLlamaCloud.ipynb) for further information on how to run Llama using Replicate.\n",
|
|
|
|
+ "- [HelloLlamaLocal](HelloLlamaLocal.ipynb) for further information on how to run Llama locally."
|
|
|
|
+ ]
|
|
|
|
+ },
|
|
{
|
|
{
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"execution_count": 4,
|
|
@@ -76,7 +118,7 @@
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"outputs": [
|
|
"outputs": [
|
|
{
|
|
{
|
|
- "name": "stdin",
|
|
|
|
|
|
+ "name": "stdout",
|
|
"output_type": "stream",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"text": [
|
|
" ········\n"
|
|
" ········\n"
|
|
@@ -92,6 +134,18 @@
|
|
"os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN\n"
|
|
"os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN\n"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
|
|
+ {
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
+ "id": "6b911efd",
|
|
|
|
+ "metadata": {},
|
|
|
|
+ "source": [
|
|
|
|
+ "Next we call the Llama 2 model from Replicate. In this example we will use the llama 2 13b chat model. You can find more Llama 2 models by searching for them on the [Replicate model explore page](https://replicate.com/explore?query=llama).\n",
|
|
|
|
+ "\n",
|
|
|
|
+ "You can add them here in the format: model_name/version\n",
|
|
|
|
+ "\n",
|
|
|
|
+ "If you using local Llama, just set llm accordingly - see the [HelloLlamaLocal notebook](HelloLlamaLocal.ipynb)"
|
|
|
|
+ ]
|
|
|
|
+ },
|
|
{
|
|
{
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"execution_count": null,
|
|
@@ -99,7 +153,7 @@
|
|
"metadata": {},
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"outputs": [],
|
|
"source": [
|
|
"source": [
|
|
- "# set llm to be Llama2-13b model; if you use local Llama, just set llm accordingly - see the HelloLlamaLocal notebook\n",
|
|
|
|
|
|
+ "\n",
|
|
"from langchain.llms import Replicate\n",
|
|
"from langchain.llms import Replicate\n",
|
|
"\n",
|
|
"\n",
|
|
"llama2_13b = \"meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d\"\n",
|
|
"llama2_13b = \"meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d\"\n",
|
|
@@ -109,6 +163,14 @@
|
|
")"
|
|
")"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
|
|
+ {
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
+ "id": "8e3baa56",
|
|
|
|
+ "metadata": {},
|
|
|
|
+ "source": [
|
|
|
|
+ "Once everything is set up, we prompt Llama 2 to summarize the first 4000 characters of the transcript for us."
|
|
|
|
+ ]
|
|
|
|
+ },
|
|
{
|
|
{
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"execution_count": 6,
|
|
@@ -141,6 +203,14 @@
|
|
"print(summary)"
|
|
"print(summary)"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
|
|
+ {
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
+ "id": "8b684b29",
|
|
|
|
+ "metadata": {},
|
|
|
|
+ "source": [
|
|
|
|
+ "Next we try to summarize all the content of the transcript and we should get a `RuntimeError: Your input is too long. Max input length is 4096 tokens, but you supplied 5597 tokens.`."
|
|
|
|
+ ]
|
|
|
|
+ },
|
|
{
|
|
{
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"execution_count": 7,
|
|
@@ -174,8 +244,18 @@
|
|
"# try to get a summary of the whole content\n",
|
|
"# try to get a summary of the whole content\n",
|
|
"text = docs[0].page_content\n",
|
|
"text = docs[0].page_content\n",
|
|
"summary = chain.run(text)\n",
|
|
"summary = chain.run(text)\n",
|
|
- "print(summary)\n",
|
|
|
|
- "# and you'll get - RuntimeError: Your input is too long. Max input length is 4096 tokens, but you supplied 5597 tokens."
|
|
|
|
|
|
+ "print(summary)\n"
|
|
|
|
+ ]
|
|
|
|
+ },
|
|
|
|
+ {
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
+ "id": "1ad1881a",
|
|
|
|
+ "metadata": {},
|
|
|
|
+ "source": [
|
|
|
|
+ "\n",
|
|
|
|
+ "Let's try some workarounds to see if we can summarize the entire transcript without running into the `RuntimeError`.\n",
|
|
|
|
+ "\n",
|
|
|
|
+ "We will use the LangChain's `load_summarize_chain` and play around with the `chain_type`.\n"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
{
|
|
{
|
|
@@ -260,6 +340,15 @@
|
|
"chain.run(docs)"
|
|
"chain.run(docs)"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
|
|
+ {
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
+ "id": "aecf6328",
|
|
|
|
+ "metadata": {},
|
|
|
|
+ "source": [
|
|
|
|
+ "\n",
|
|
|
|
+ "Since the transcript is bigger than the model can handle, we can split the transcript into chunks instead and use the [`refine`](https://python.langchain.com/docs/modules/chains/document/refine) `chain_type` to iteratively create an answer."
|
|
|
|
+ ]
|
|
|
|
+ },
|
|
{
|
|
{
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"execution_count": 10,
|
|
@@ -321,6 +410,14 @@
|
|
"chain.run(split_docs)"
|
|
"chain.run(split_docs)"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
|
|
+ {
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
+ "id": "c3976c92",
|
|
|
|
+ "metadata": {},
|
|
|
|
+ "source": [
|
|
|
|
+ "You can also use [`map_reduce`](https://python.langchain.com/docs/modules/chains/document/map_reduce) `chain_type` to implement a map reduce like architecture while summarizing the documents."
|
|
|
|
+ ]
|
|
|
|
+ },
|
|
{
|
|
{
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"execution_count": 14,
|
|
@@ -400,6 +497,15 @@
|
|
"chain.run(split_docs)"
|
|
"chain.run(split_docs)"
|
|
]
|
|
]
|
|
},
|
|
},
|
|
|
|
+ {
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
+ "id": "77d580de",
|
|
|
|
+ "metadata": {},
|
|
|
|
+ "source": [
|
|
|
|
+ "To investigate further, let's turn on Langchain's debug mode on to get an idea of how many calls are made to the model and the details of the inputs and outputs.\n",
|
|
|
|
+ "We will then run our summary using the `stuff` and `refine` `chain_types` and take a look at our output."
|
|
|
|
+ ]
|
|
|
|
+ },
|
|
{
|
|
{
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"execution_count": 15,
|
|
@@ -559,12 +665,13 @@
|
|
]
|
|
]
|
|
},
|
|
},
|
|
{
|
|
{
|
|
- "cell_type": "code",
|
|
|
|
- "execution_count": null,
|
|
|
|
|
|
+ "cell_type": "markdown",
|
|
"id": "61ccd0fb-5cdb-43c4-afaf-05bc9f7cf959",
|
|
"id": "61ccd0fb-5cdb-43c4-afaf-05bc9f7cf959",
|
|
"metadata": {},
|
|
"metadata": {},
|
|
- "outputs": [],
|
|
|
|
- "source": []
|
|
|
|
|
|
+ "source": [
|
|
|
|
+ "\n",
|
|
|
|
+ "As you can see, `stuff` fails because it tries to treat all the split documents as one and \"stuffs\" it into one prompt which leads to a much larger prompt than Llama 2 can handle while `refine` iteratively runs over the documents updating its answer as it goes."
|
|
|
|
+ ]
|
|
}
|
|
}
|
|
],
|
|
],
|
|
"metadata": {
|
|
"metadata": {
|