|
@@ -0,0 +1,385 @@
|
|
|
+{
|
|
|
+ "cells": [
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "30b1235c-2f3e-4628-9c90-30385f741550",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "## This demo app shows:\n",
|
|
|
+ "* How to use LangChain's YoutubeLoader to retrieve the caption in a YouTube video\n",
|
|
|
+ "* How to ask Llama to summarize the content (per the Llama's input size limit) of the video in a naive way using LangChain's stuff method\n",
|
|
|
+ "* How to bypass the limit of Llama's max input token size by using a more sophisticated way using LangChain's map_reduce and refine methods - see [here](https://python.langchain.com/docs/use_cases/summarization) for more info"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "c866f6be",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "We start by installing the necessary packages:\n",
|
|
|
+ "- [youtube-transcript-api](https://pypi.org/project/youtube-transcript-api/) API to get transcript/subtitles of a YouTube video\n",
|
|
|
+ "- [langchain](https://python.langchain.com/docs/get_started/introduction) provides necessary RAG tools for this demo\n",
|
|
|
+ "- [tiktoken](https://github.com/openai/tiktoken) BytePair Encoding tokenizer\n",
|
|
|
+ "- [pytube](https://pytube.io/en/latest/) Utility for downloading YouTube videos\n",
|
|
|
+ "\n",
|
|
|
+ "**Note** This example uses OctoAI to host the Llama model. If you have not set up/or used OctoAI before, we suggest you take a look at the [HelloLlamaCloud](HelloLlamaCloud.ipynb) example for information on how to set up OctoAI before continuing with this example.\n",
|
|
|
+ "If you do not want to use OctoAI, you will need to make some changes to this notebook as you go along."
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "02482167",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "!pip install langchain octoai-sdk youtube-transcript-api tiktoken pytube"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "af3069b1",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "Let's load the YouTube video transcript using the YoutubeLoader."
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "3e4b8598",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "from langchain.document_loaders import YoutubeLoader\n",
|
|
|
+ "\n",
|
|
|
+ "loader = YoutubeLoader.from_youtube_url(\n",
|
|
|
+ " \"https://www.youtube.com/watch?v=1k37OcjH7BM\", add_video_info=True\n",
|
|
|
+ ")"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "dca32ebb",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "# load the youtube video caption into Documents\n",
|
|
|
+ "docs = loader.load()"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "afba128f-b7fd-4b2f-873f-9b5163455d54",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "# check the docs length and content\n",
|
|
|
+ "len(docs[0].page_content), docs[0].page_content[:300]"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "4af7cc16",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "We are using OctoAI in this example to host our Llama 2 model so you will need to get a OctoAI token.\n",
|
|
|
+ "\n",
|
|
|
+ "To get the OctoAI token:\n",
|
|
|
+ "\n",
|
|
|
+ "- You will need to first sign in with OctoAI with your github account\n",
|
|
|
+ "- Then create a free API token [here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) that you can use for a while (a month or $10 in OctoAI credits, whichever one runs out first)\n",
|
|
|
+ "\n",
|
|
|
+ "**Note** After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on OctoAI.\n",
|
|
|
+ "\n",
|
|
|
+ "Alternatively, you can run Llama locally. See:\n",
|
|
|
+ "- [HelloLlamaLocal](HelloLlamaLocal.ipynb) for further information on how to run Llama locally."
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "ab3ac00e",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "# enter your OctoAI API token, or you can use local Llama. See README for more info\n",
|
|
|
+ "from getpass import getpass\n",
|
|
|
+ "import os\n",
|
|
|
+ "\n",
|
|
|
+ "OCTOAI_API_TOKEN = getpass()\n",
|
|
|
+ "os.environ[\"OCTOAI_API_TOKEN\"] = OCTOAI_API_TOKEN\n"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "6b911efd",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "Next we call the Llama 2 model from OctoAI. In this example we will use the Llama 2 13b chat FP16 model. You can find more on Llama 2 models on the [OctoAI text generation solution page](https://octoai.cloud/tools/text).\n",
|
|
|
+ "\n",
|
|
|
+ "At the time of writing this notebook the following Llama models are available on OctoAI:\n",
|
|
|
+ "* llama-2-13b-chat-fp16\n",
|
|
|
+ "* llama-2-70b-chat-int4\n",
|
|
|
+ "* llama-2-70b-chat-fp16\n",
|
|
|
+ "* codellama-7b-instruct-fp16\n",
|
|
|
+ "* codellama-13b-instruct-fp16\n",
|
|
|
+ "* codellama-34b-instruct-int4\n",
|
|
|
+ "* codellama-34b-instruct-fp16\n",
|
|
|
+ "* codellama-70b-instruct-fp16\n",
|
|
|
+ "\n",
|
|
|
+ "If you using local Llama, just set llm accordingly - see the [HelloLlamaLocal notebook](HelloLlamaLocal.ipynb)"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "adf8cf3d",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "from langchain.llms.octoai_endpoint import OctoAIEndpoint\n",
|
|
|
+ "\n",
|
|
|
+ "llama2_13b = \"llama-2-13b-chat-fp16\"\n",
|
|
|
+ "llm = OctoAIEndpoint(\n",
|
|
|
+ " endpoint_url=\"https://text.octoai.run/v1/chat/completions\",\n",
|
|
|
+ " model_kwargs={\n",
|
|
|
+ " \"model\": llama2_13b,\n",
|
|
|
+ " \"messages\": [\n",
|
|
|
+ " {\n",
|
|
|
+ " \"role\": \"system\",\n",
|
|
|
+ " \"content\": \"You are a helpful, respectful and honest assistant.\"\n",
|
|
|
+ " }\n",
|
|
|
+ " ],\n",
|
|
|
+ " \"max_tokens\": 500,\n",
|
|
|
+ " \"top_p\": 1,\n",
|
|
|
+ " \"temperature\": 0.01\n",
|
|
|
+ " },\n",
|
|
|
+ ")"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "8e3baa56",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "Once everything is set up, we prompt Llama 2 to summarize the first 4000 characters of the transcript for us."
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "51739e11",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "from langchain.prompts import ChatPromptTemplate\n",
|
|
|
+ "from langchain.chains import LLMChain\n",
|
|
|
+ "prompt = ChatPromptTemplate.from_template(\n",
|
|
|
+ " \"Give me a summary of the text below: {text}?\"\n",
|
|
|
+ ")\n",
|
|
|
+ "chain = LLMChain(llm=llm, prompt=prompt)\n",
|
|
|
+ "# be careful of the input text length sent to LLM\n",
|
|
|
+ "text = docs[0].page_content[:4000]\n",
|
|
|
+ "summary = chain.run(text)\n",
|
|
|
+ "# this is the summary of the first 4000 characters of the video content\n",
|
|
|
+ "print(summary)"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "8b684b29",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "Next we try to summarize all the content of the transcript and we should get a `RuntimeError: Your input is too long. Max input length is 4096 tokens, but you supplied 5597 tokens.`."
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "88a2c17f",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "# try to get a summary of the whole content\n",
|
|
|
+ "text = docs[0].page_content\n",
|
|
|
+ "summary = chain.run(text)\n",
|
|
|
+ "print(summary)"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "1ad1881a",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "\n",
|
|
|
+ "Let's try some workarounds to see if we can summarize the entire transcript without running into the `RuntimeError`.\n",
|
|
|
+ "\n",
|
|
|
+ "We will use the LangChain's `load_summarize_chain` and play around with the `chain_type`.\n"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "9bfee2d3-3afe-41d9-8968-6450cc23f493",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "from langchain.chains.summarize import load_summarize_chain\n",
|
|
|
+ "# see https://python.langchain.com/docs/use_cases/summarization for more info\n",
|
|
|
+ "chain = load_summarize_chain(llm, chain_type=\"stuff\") # other supported methods are map_reduce and refine\n",
|
|
|
+ "chain.run(docs)\n",
|
|
|
+ "# same RuntimeError: Your input is too long. but stuff works for shorter text with input length <= 4096 tokens"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "682799a8-3846-41b1-a908-02ab5ac3ecee",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "chain = load_summarize_chain(llm, chain_type=\"refine\")\n",
|
|
|
+ "# still get the \"RuntimeError: Your input is too long. Max input length is 4096 tokens\"\n",
|
|
|
+ "chain.run(docs)"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "aecf6328",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "\n",
|
|
|
+ "Since the transcript is bigger than the model can handle, we can split the transcript into chunks instead and use the [`refine`](https://python.langchain.com/docs/modules/chains/document/refine) `chain_type` to iteratively create an answer."
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "3be1236a-fe6a-4bf6-983f-0e72dde39fee",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
|
|
+ "\n",
|
|
|
+ "# we need to split the long input text\n",
|
|
|
+ "text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n",
|
|
|
+ " chunk_size=3000, chunk_overlap=0\n",
|
|
|
+ ")\n",
|
|
|
+ "split_docs = text_splitter.split_documents(docs)"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "12ae9e9d-3434-4a84-a298-f2b98de9ff01",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "# check the splitted docs lengths\n",
|
|
|
+ "len(split_docs), len(docs), len(split_docs[0].page_content), len(docs[0].page_content)"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "127f17fe-d5b7-43af-bd2f-2b47b076d0b1",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "# now get the summary of the whole docs - the whole youtube content\n",
|
|
|
+ "chain = load_summarize_chain(llm, chain_type=\"refine\")\n",
|
|
|
+ "print(str(chain.run(split_docs)))"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "c3976c92",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "You can also use [`map_reduce`](https://python.langchain.com/docs/modules/chains/document/map_reduce) `chain_type` to implement a map reduce like architecture while summarizing the documents."
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "8991df49-8578-46de-8b30-cb2cd11e30f1",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "# another method is map_reduce\n",
|
|
|
+ "chain = load_summarize_chain(llm, chain_type=\"map_reduce\")\n",
|
|
|
+ "print(str(chain.run(split_docs)))"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "77d580de",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "To investigate further, let's turn on Langchain's debug mode on to get an idea of how many calls are made to the model and the details of the inputs and outputs.\n",
|
|
|
+ "We will then run our summary using the `stuff` and `refine` `chain_types` and take a look at our output."
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "f2138911-d2b9-41f3-870f-9bc37e2043d9",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "# to find how many calls to Llama have been made and the details of inputs and outputs of each call, set langchain to debug\n",
|
|
|
+ "import langchain\n",
|
|
|
+ "langchain.debug = True\n",
|
|
|
+ "\n",
|
|
|
+ "# stuff method will cause the error in the end\n",
|
|
|
+ "chain = load_summarize_chain(llm, chain_type=\"stuff\")\n",
|
|
|
+ "chain.run(split_docs)"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "id": "60d1a531-ab48-45cc-a7de-59a14e18240d",
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "# but refine works\n",
|
|
|
+ "chain = load_summarize_chain(llm, chain_type=\"refine\")\n",
|
|
|
+ "chain.run(split_docs)"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "id": "61ccd0fb-5cdb-43c4-afaf-05bc9f7cf959",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "\n",
|
|
|
+ "As you can see, `stuff` fails because it tries to treat all the split documents as one and \"stuffs\" it into one prompt which leads to a much larger prompt than Llama 2 can handle while `refine` iteratively runs over the documents updating its answer as it goes."
|
|
|
+ ]
|
|
|
+ }
|
|
|
+ ],
|
|
|
+ "metadata": {
|
|
|
+ "kernelspec": {
|
|
|
+ "display_name": "Python 3 (ipykernel)",
|
|
|
+ "language": "python",
|
|
|
+ "name": "python3"
|
|
|
+ },
|
|
|
+ "language_info": {
|
|
|
+ "codemirror_mode": {
|
|
|
+ "name": "ipython",
|
|
|
+ "version": 3
|
|
|
+ },
|
|
|
+ "file_extension": ".py",
|
|
|
+ "mimetype": "text/x-python",
|
|
|
+ "name": "python",
|
|
|
+ "nbconvert_exporter": "python",
|
|
|
+ "pygments_lexer": "ipython3",
|
|
|
+ "version": "3.11.6"
|
|
|
+ }
|
|
|
+ },
|
|
|
+ "nbformat": 4,
|
|
|
+ "nbformat_minor": 5
|
|
|
+}
|