1 год назад · 0db941413b
--- a/demo_apps/OctoAI_API_example/VideoSummary.ipynb
+++ b/demo_apps/OctoAI_API_example/VideoSummary.ipynb
@@ -0,0 +1,385 @@
 
				+{
			
 
				+ "cells": [
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "30b1235c-2f3e-4628-9c90-30385f741550",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "## This demo app shows:\n",
			
 
				+    "* How to use LangChain's YoutubeLoader to retrieve the caption in a YouTube video\n",
			
 
				+    "* How to ask Llama to summarize the content (per the Llama's input size limit) of the video in a naive way using LangChain's stuff method\n",
			
 
				+    "* How to bypass the limit of Llama's max input token size by using a more sophisticated way using LangChain's map_reduce and refine methods - see [here](https://python.langchain.com/docs/use_cases/summarization) for more info"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "c866f6be",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We start by installing the necessary packages:\n",
			
 
				+    "- [youtube-transcript-api](https://pypi.org/project/youtube-transcript-api/) API to get transcript/subtitles of a YouTube video\n",
			
 
				+    "- [langchain](https://python.langchain.com/docs/get_started/introduction) provides necessary RAG tools for this demo\n",
			
 
				+    "- [tiktoken](https://github.com/openai/tiktoken) BytePair Encoding tokenizer\n",
			
 
				+    "- [pytube](https://pytube.io/en/latest/) Utility for downloading YouTube videos\n",
			
 
				+    "\n",
			
 
				+    "**Note** This example uses OctoAI to host the Llama model. If you have not set up/or used OctoAI before, we suggest you take a look at the [HelloLlamaCloud](HelloLlamaCloud.ipynb) example for information on how to set up OctoAI before continuing with this example.\n",
			
 
				+    "If you do not want to use OctoAI, you will need to make some changes to this notebook as you go along."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "02482167",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "!pip install langchain octoai-sdk youtube-transcript-api tiktoken pytube"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "af3069b1",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Let's load the YouTube video transcript using the YoutubeLoader."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "3e4b8598",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from langchain.document_loaders import YoutubeLoader\n",
			
 
				+    "\n",
			
 
				+    "loader = YoutubeLoader.from_youtube_url(\n",
			
 
				+    "    \"https://www.youtube.com/watch?v=1k37OcjH7BM\", add_video_info=True\n",
			
 
				+    ")"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "dca32ebb",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# load the youtube video caption into Documents\n",
			
 
				+    "docs = loader.load()"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "afba128f-b7fd-4b2f-873f-9b5163455d54",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# check the docs length and content\n",
			
 
				+    "len(docs[0].page_content), docs[0].page_content[:300]"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "4af7cc16",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We are using OctoAI in this example to host our Llama 2 model so you will need to get a OctoAI token.\n",
			
 
				+    "\n",
			
 
				+    "To get the OctoAI token:\n",
			
 
				+    "\n",
			
 
				+    "- You will need to first sign in with OctoAI with your github account\n",
			
 
				+    "- Then create a free API token [here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) that you can use for a while (a month or $10 in OctoAI credits, whichever one runs out first)\n",
			
 
				+    "\n",
			
 
				+    "**Note** After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on OctoAI.\n",
			
 
				+    "\n",
			
 
				+    "Alternatively, you can run Llama locally. See:\n",
			
 
				+    "- [HelloLlamaLocal](HelloLlamaLocal.ipynb) for further information on how to run Llama locally."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "ab3ac00e",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# enter your OctoAI API token, or you can use local Llama. See README for more info\n",
			
 
				+    "from getpass import getpass\n",
			
 
				+    "import os\n",
			
 
				+    "\n",
			
 
				+    "OCTOAI_API_TOKEN = getpass()\n",
			
 
				+    "os.environ[\"OCTOAI_API_TOKEN\"] = OCTOAI_API_TOKEN\n"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "6b911efd",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Next we call the Llama 2 model from OctoAI. In this example we will use the Llama 2 13b chat FP16 model. You can find more on Llama 2 models on the [OctoAI text generation solution page](https://octoai.cloud/tools/text).\n",
			
 
				+    "\n",
			
 
				+    "At the time of writing this notebook the following Llama models are available on OctoAI:\n",
			
 
				+    "* llama-2-13b-chat-fp16\n",
			
 
				+    "* llama-2-70b-chat-int4\n",
			
 
				+    "* llama-2-70b-chat-fp16\n",
			
 
				+    "* codellama-7b-instruct-fp16\n",
			
 
				+    "* codellama-13b-instruct-fp16\n",
			
 
				+    "* codellama-34b-instruct-int4\n",
			
 
				+    "* codellama-34b-instruct-fp16\n",
			
 
				+    "* codellama-70b-instruct-fp16\n",
			
 
				+    "\n",
			
 
				+    "If you using local Llama, just set llm accordingly - see the [HelloLlamaLocal notebook](HelloLlamaLocal.ipynb)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "adf8cf3d",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from langchain.llms.octoai_endpoint import OctoAIEndpoint\n",
			
 
				+    "\n",
			
 
				+    "llama2_13b = \"llama-2-13b-chat-fp16\"\n",
			
 
				+    "llm = OctoAIEndpoint(\n",
			
 
				+    "    endpoint_url=\"https://text.octoai.run/v1/chat/completions\",\n",
			
 
				+    "    model_kwargs={\n",
			
 
				+    "        \"model\": llama2_13b,\n",
			
 
				+    "        \"messages\": [\n",
			
 
				+    "            {\n",
			
 
				+    "                \"role\": \"system\",\n",
			
 
				+    "                \"content\": \"You are a helpful, respectful and honest assistant.\"\n",
			
 
				+    "            }\n",
			
 
				+    "        ],\n",
			
 
				+    "        \"max_tokens\": 500,\n",
			
 
				+    "        \"top_p\": 1,\n",
			
 
				+    "        \"temperature\": 0.01\n",
			
 
				+    "    },\n",
			
 
				+    ")"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "8e3baa56",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Once everything is set up, we prompt Llama 2 to summarize the first 4000 characters of the transcript for us."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "51739e11",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from langchain.prompts import ChatPromptTemplate\n",
			
 
				+    "from langchain.chains import LLMChain\n",
			
 
				+    "prompt = ChatPromptTemplate.from_template(\n",
			
 
				+    "    \"Give me a summary of the text below: {text}?\"\n",
			
 
				+    ")\n",
			
 
				+    "chain = LLMChain(llm=llm, prompt=prompt)\n",
			
 
				+    "# be careful of the input text length sent to LLM\n",
			
 
				+    "text = docs[0].page_content[:4000]\n",
			
 
				+    "summary = chain.run(text)\n",
			
 
				+    "# this is the summary of the first 4000 characters of the video content\n",
			
 
				+    "print(summary)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "8b684b29",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Next we try to summarize all the content of the transcript and we should get a `RuntimeError: Your input is too long. Max input length is 4096 tokens, but you supplied 5597 tokens.`."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "88a2c17f",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# try to get a summary of the whole content\n",
			
 
				+    "text = docs[0].page_content\n",
			
 
				+    "summary = chain.run(text)\n",
			
 
				+    "print(summary)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "1ad1881a",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "\n",
			
 
				+    "Let's try some workarounds to see if we can summarize the entire transcript without running into the `RuntimeError`.\n",
			
 
				+    "\n",
			
 
				+    "We will use the LangChain's `load_summarize_chain` and play around with the `chain_type`.\n"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "9bfee2d3-3afe-41d9-8968-6450cc23f493",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from langchain.chains.summarize import load_summarize_chain\n",
			
 
				+    "# see https://python.langchain.com/docs/use_cases/summarization for more info\n",
			
 
				+    "chain = load_summarize_chain(llm, chain_type=\"stuff\") # other supported methods are map_reduce and refine\n",
			
 
				+    "chain.run(docs)\n",
			
 
				+    "# same RuntimeError: Your input is too long. but stuff works for shorter text with input length <= 4096 tokens"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "682799a8-3846-41b1-a908-02ab5ac3ecee",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "chain = load_summarize_chain(llm, chain_type=\"refine\")\n",
			
 
				+    "# still get the \"RuntimeError: Your input is too long. Max input length is 4096 tokens\"\n",
			
 
				+    "chain.run(docs)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "aecf6328",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "\n",
			
 
				+    "Since the transcript is bigger than the model can handle, we can split the transcript into chunks instead and use the [`refine`](https://python.langchain.com/docs/modules/chains/document/refine) `chain_type` to iteratively create an answer."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "3be1236a-fe6a-4bf6-983f-0e72dde39fee",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
			
 
				+    "\n",
			
 
				+    "# we need to split the long input text\n",
			
 
				+    "text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n",
			
 
				+    "    chunk_size=3000, chunk_overlap=0\n",
			
 
				+    ")\n",
			
 
				+    "split_docs = text_splitter.split_documents(docs)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "12ae9e9d-3434-4a84-a298-f2b98de9ff01",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# check the splitted docs lengths\n",
			
 
				+    "len(split_docs), len(docs), len(split_docs[0].page_content), len(docs[0].page_content)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "127f17fe-d5b7-43af-bd2f-2b47b076d0b1",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# now get the summary of the whole docs - the whole youtube content\n",
			
 
				+    "chain = load_summarize_chain(llm, chain_type=\"refine\")\n",
			
 
				+    "print(str(chain.run(split_docs)))"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "c3976c92",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "You can also use [`map_reduce`](https://python.langchain.com/docs/modules/chains/document/map_reduce) `chain_type` to implement a map reduce like architecture while summarizing the documents."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "8991df49-8578-46de-8b30-cb2cd11e30f1",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# another method is map_reduce\n",
			
 
				+    "chain = load_summarize_chain(llm, chain_type=\"map_reduce\")\n",
			
 
				+    "print(str(chain.run(split_docs)))"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "77d580de",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "To investigate further, let's turn on Langchain's debug mode on to get an idea of how many calls are made to the model and the details of the inputs and outputs.\n",
			
 
				+    "We will then run our summary using the `stuff` and `refine` `chain_types` and take a look at our output."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "f2138911-d2b9-41f3-870f-9bc37e2043d9",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# to find how many calls to Llama have been made and the details of inputs and outputs of each call, set langchain to debug\n",
			
 
				+    "import langchain\n",
			
 
				+    "langchain.debug = True\n",
			
 
				+    "\n",
			
 
				+    "# stuff method will cause the error in the end\n",
			
 
				+    "chain = load_summarize_chain(llm, chain_type=\"stuff\")\n",
			
 
				+    "chain.run(split_docs)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "60d1a531-ab48-45cc-a7de-59a14e18240d",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# but refine works\n",
			
 
				+    "chain = load_summarize_chain(llm, chain_type=\"refine\")\n",
			
 
				+    "chain.run(split_docs)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "61ccd0fb-5cdb-43c4-afaf-05bc9f7cf959",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "\n",
			
 
				+    "As you can see, `stuff` fails because it tries to treat all the split documents as one and \"stuffs\" it into one prompt which leads to a much larger prompt than Llama 2 can handle while `refine` iteratively runs over the documents updating its answer as it goes."
			
 
				+   ]
			
 
				+  }
			
 
				+ ],
			
 
				+ "metadata": {
			
 
				+  "kernelspec": {
			
 
				+   "display_name": "Python 3 (ipykernel)",
			
 
				+   "language": "python",
			
 
				+   "name": "python3"
			
 
				+  },
			
 
				+  "language_info": {
			
 
				+   "codemirror_mode": {
			
 
				+    "name": "ipython",
			
 
				+    "version": 3
			
 
				+   },
			
 
				+   "file_extension": ".py",
			
 
				+   "mimetype": "text/x-python",
			
 
				+   "name": "python",
			
 
				+   "nbconvert_exporter": "python",
			
 
				+   "pygments_lexer": "ipython3",
			
 
				+   "version": "3.11.6"
			
 
				+  }
			
 
				+ },
			
 
				+ "nbformat": 4,
			
 
				+ "nbformat_minor": 5
			
 
				+}