1 gadu atpakaļ · 1c7b49fda2
--- a/README.md
+++ b/README.md
--- a/demo_apps/OctoAI_API_examples/Getting_to_know_Llama.ipynb
+++ b/demo_apps/OctoAI_API_examples/Getting_to_know_Llama.ipynb
--- a/demo_apps/OctoAI_API_examples/HelloLlamaCloud.ipynb
+++ b/demo_apps/OctoAI_API_examples/HelloLlamaCloud.ipynb
@@ -0,0 +1,448 @@
 
				+{
			
 
				+ "cells": [
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "1c1ea03a-cc69-45b0-80d3-664e48ca6831",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "## This demo app shows:\n",
			
 
				+    "* How to run Llama2 in the cloud hosted on OctoAI\n",
			
 
				+    "* How to use LangChain to ask Llama general questions and follow up questions\n",
			
 
				+    "* How to use LangChain to load a recent PDF doc - the Llama2 paper pdf - and chat about it. This is the well known RAG (Retrieval Augmented Generation) method to let LLM such as Llama2 be able to answer questions about the data not publicly available when Llama2 was trained, or about your own data. RAG is one way to prevent LLM's hallucination\n",
			
 
				+    "* You should also review the [HelloLlamaLocal](HelloLlamaLocal.ipynb) notebook for more information on RAG\n",
			
 
				+    "\n",
			
 
				+    "**Note** We will be using OctoAI to run the examples here. You will need to first sign into [OctoAI](https://octoai.cloud/) with your Github or Google account, then create a free API token [here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) that you can use for a while (a month or $10 in OctoAI credits, whichever one runs out first).\n",
			
 
				+    "After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on OctoAI."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "61dde626",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Let's start by installing the necessary packages:\n",
			
 
				+    "- sentence-transformers for text embeddings\n",
			
 
				+    "- chromadb gives us database capabilities\n",
			
 
				+    "- langchain provides necessary RAG tools for this demo\n",
			
 
				+    "\n",
			
 
				+    "And setting up the OctoAI token."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "2c608df5",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "!pip install langchain octoai-sdk sentence-transformers chromadb pypdf"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "b9c5546a",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from getpass import getpass\n",
			
 
				+    "import os\n",
			
 
				+    "\n",
			
 
				+    "OCTOAI_API_TOKEN = getpass()\n",
			
 
				+    "os.environ[\"OCTOAI_API_TOKEN\"] = OCTOAI_API_TOKEN"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "3e8870c1",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Next we call the Llama 2 model from OctoAI. In this example we will use the Llama 2 13b chat FP16 model. You can find more on Llama 2 models on the [OctoAI text generation solution page](https://octoai.cloud/tools/text).\n",
			
 
				+    "\n",
			
 
				+    "At the time of writing this notebook the following Llama models are available on OctoAI:\n",
			
 
				+    "* llama-2-13b-chat\n",
			
 
				+    "* llama-2-70b-chat\n",
			
 
				+    "* codellama-7b-instruct\n",
			
 
				+    "* codellama-13b-instruct\n",
			
 
				+    "* codellama-34b-instruct\n",
			
 
				+    "* codellama-70b-instruct"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "ad536adb",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from langchain.llms.octoai_endpoint import OctoAIEndpoint\n",
			
 
				+    "\n",
			
 
				+    "llama2_13b = \"llama-2-13b-chat-fp16\"\n",
			
 
				+    "llm = OctoAIEndpoint(\n",
			
 
				+    "    endpoint_url=\"https://text.octoai.run/v1/chat/completions\",\n",
			
 
				+    "    model_kwargs={\n",
			
 
				+    "        \"model\": llama2_13b,\n",
			
 
				+    "        \"messages\": [\n",
			
 
				+    "            {\n",
			
 
				+    "                \"role\": \"system\",\n",
			
 
				+    "                \"content\": \"You are a helpful, respectful and honest assistant.\"\n",
			
 
				+    "            }\n",
			
 
				+    "        ],\n",
			
 
				+    "        \"max_tokens\": 500,\n",
			
 
				+    "        \"top_p\": 1,\n",
			
 
				+    "        \"temperature\": 0.01\n",
			
 
				+    "    },\n",
			
 
				+    ")"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "fd207c80",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "With the model set up, you are now ready to ask some questions. Here is an example of the simplest way to ask the model some general questions."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "493a7148",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "question = \"who wrote the book Innovator's dilemma?\"\n",
			
 
				+    "answer = llm(question)\n",
			
 
				+    "print(answer)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "f315f000",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We will then try to follow up the response with a question asking for more information on the book. \n",
			
 
				+    "\n",
			
 
				+    "Since the chat history is not passed on Llama doesn't have the context and doesn't know this is more about the book thus it treats this as new query.\n"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "9b5c8676",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# chat history not passed so Llama doesn't have the context and doesn't know this is more about the book\n",
			
 
				+    "followup = \"tell me more\"\n",
			
 
				+    "followup_answer = llm(followup)\n",
			
 
				+    "print(followup_answer)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "9aeaffc7",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "To get around this we will need to provide the model with history of the chat. \n",
			
 
				+    "\n",
			
 
				+    "To do this, we will use  [`ConversationBufferMemory`](https://python.langchain.com/docs/modules/memory/types/buffer) to pass the chat history to the model and give it the capability to handle follow up questions."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "5428ca27",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# using ConversationBufferMemory to pass memory (chat history) for follow up questions\n",
			
 
				+    "from langchain.chains import ConversationChain\n",
			
 
				+    "from langchain.memory import ConversationBufferMemory\n",
			
 
				+    "\n",
			
 
				+    "memory = ConversationBufferMemory()\n",
			
 
				+    "conversation = ConversationChain(\n",
			
 
				+    "    llm=llm, \n",
			
 
				+    "    memory = memory,\n",
			
 
				+    "    verbose=False\n",
			
 
				+    ")"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "a3e9af5f",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Once this is set up, let us repeat the steps from before and ask the model a simple question.\n",
			
 
				+    "\n",
			
 
				+    "Then we pass the question and answer back into the model for context along with the follow up question."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "baee2d22",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# restart from the original question\n",
			
 
				+    "answer = conversation.predict(input=question)\n",
			
 
				+    "print(answer)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "9c7d67a8",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# pass context (previous question and answer) along with the follow up \"tell me more\" to Llama who now knows more of what\n",
			
 
				+    "memory.save_context({\"input\": question},\n",
			
 
				+    "                    {\"output\": answer})\n",
			
 
				+    "followup_answer = conversation.predict(input=followup)\n",
			
 
				+    "print(followup_answer)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "fc436163",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Next, let's explore using Llama 2 to answer questions using documents for context. \n",
			
 
				+    "This gives us the ability to update Llama 2's knowledge thus giving it better context without needing to finetune. \n",
			
 
				+    "For a more in-depth study of this, see the notebook on using Llama 2 locally [here](HelloLlamaLocal.ipynb)\n",
			
 
				+    "\n",
			
 
				+    "We will use the PyPDFLoader to load in a pdf, in this case, the Llama 2 paper."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "f5303d75",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from langchain.document_loaders import PyPDFLoader\n",
			
 
				+    "loader = PyPDFLoader(\"https://arxiv.org/pdf/2307.09288.pdf\")\n",
			
 
				+    "docs = loader.load()"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "678c2b4a",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# check docs length and content\n",
			
 
				+    "print(len(docs), docs[0].page_content[0:300])"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "73b8268e",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We need to store our documents. There are more than 30 vector stores (DBs) supported by LangChain.\n",
			
 
				+    "For this example we will use [Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma) which is light-weight and in memory so it's easy to get started with.\n",
			
 
				+    "For other vector stores especially if you need to store a large amount of data - see https://python.langchain.com/docs/integrations/vectorstores\n",
			
 
				+    "\n",
			
 
				+    "We will also import the OctoAIEmbeddings and RecursiveCharacterTextSplitter to assist in storing the documents."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "eecb6a34",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from langchain.vectorstores import Chroma\n",
			
 
				+    "\n",
			
 
				+    "# embeddings are numerical representations of the question and answer text\n",
			
 
				+    "from langchain_community.embeddings import OctoAIEmbeddings\n",
			
 
				+    "\n",
			
 
				+    "# use a common text splitter to split text into chunks\n",
			
 
				+    "from langchain.text_splitter import RecursiveCharacterTextSplitter"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "36d4a17c",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "To store the documents, we will need to split them into chunks using [`RecursiveCharacterTextSplitter`](https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter) and create vector representations of these chunks using [`OctoAIEmbeddings`](https://octoai.cloud/tools/text/embeddings?mode=api&model=thenlper%2Fgte-large) on them before storing them into our vector database.\n",
			
 
				+    "\n",
			
 
				+    "In general, you should use larger chuck sizes for highly structured text such as code and smaller size for less structured text. You may need to experiment with different chunk sizes and overlap values to find out the best numbers."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "bc65e161",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)\n",
			
 
				+    "all_splits = text_splitter.split_documents(docs)\n",
			
 
				+    "\n",
			
 
				+    "# create the vector db to store all the split chunks as embeddings\n",
			
 
				+    "embeddings = OctoAIEmbeddings(\n",
			
 
				+    "    endpoint_url=\"https://text.octoai.run/v1/embeddings\"\n",
			
 
				+    ")\n",
			
 
				+    "vectordb = Chroma.from_documents(\n",
			
 
				+    "    documents=all_splits,\n",
			
 
				+    "    embedding=embeddings,\n",
			
 
				+    ")"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "54ad02d7",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We then use ` RetrievalQA` to retrieve the documents from the vector database and give the model more context on Llama 2, thereby increasing its knowledge.\n",
			
 
				+    "\n",
			
 
				+    "For each question, LangChain performs a semantic similarity search of it in the vector db, then passes the search results as the context to Llama to answer the question."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "00e3f72b",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# use LangChain's RetrievalQA, to associate Llama with the loaded documents stored in the vector db\n",
			
 
				+    "from langchain.chains import RetrievalQA\n",
			
 
				+    "\n",
			
 
				+    "qa_chain = RetrievalQA.from_chain_type(\n",
			
 
				+    "    llm,\n",
			
 
				+    "    retriever=vectordb.as_retriever()\n",
			
 
				+    ")\n",
			
 
				+    "\n",
			
 
				+    "question = \"What is llama2?\"\n",
			
 
				+    "result = qa_chain({\"query\": question})\n",
			
 
				+    "print(result['result'])"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "7e63769a",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Now, lets bring it all together by incorporating follow up questions.\n",
			
 
				+    "\n",
			
 
				+    "First we ask a follow up questions without giving the model context of the previous conversation.\n",
			
 
				+    "Without this context, the answer we get does not relate to our original question."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "53f27473",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# no context passed so Llama2 doesn't have enough context to answer so it lets its imagination go wild\n",
			
 
				+    "result = qa_chain({\"query\": \"what are its use cases?\"})\n",
			
 
				+    "print(result['result'])"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "833221c0",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "As we did before, let us use the `ConversationalRetrievalChain` package to give the model context of our previous question so we can add follow up questions."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "743644a1",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# use ConversationalRetrievalChain to pass chat history for follow up questions\n",
			
 
				+    "from langchain.chains import ConversationalRetrievalChain\n",
			
 
				+    "chat_chain = ConversationalRetrievalChain.from_llm(llm, vectordb.as_retriever(), return_source_documents=True)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "7c3d1142",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# let's ask the original question \"What is llama2?\" again\n",
			
 
				+    "result = chat_chain({\"question\": question, \"chat_history\": []})\n",
			
 
				+    "print(result['answer'])"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "4b17f08f",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# this time we pass chat history along with the follow up so good things should happen\n",
			
 
				+    "chat_history = [(question, result[\"answer\"])]\n",
			
 
				+    "followup = \"what are its use cases?\"\n",
			
 
				+    "followup_answer = chat_chain({\"question\": followup, \"chat_history\": chat_history})\n",
			
 
				+    "print(followup_answer['answer'])"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "04f4eabf",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Further follow ups can be made possible by updating chat_history.\n",
			
 
				+    "\n",
			
 
				+    "Note that results can get cut off. You may set \"max_new_tokens\" in the OctoAIEndpoint call above to a larger number (like shown below) to avoid the cut off.\n",
			
 
				+    "\n",
			
 
				+    "```python\n",
			
 
				+    "model_kwargs={\"temperature\": 0.01, \"top_p\": 1, \"max_new_tokens\": 1000}\n",
			
 
				+    "```"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "95d22347",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# further follow ups can be made possible by updating chat_history like this:\n",
			
 
				+    "chat_history.append((followup, followup_answer[\"answer\"]))\n",
			
 
				+    "more_followup = \"what tasks can it assist with?\"\n",
			
 
				+    "more_followup_answer = chat_chain({\"question\": more_followup, \"chat_history\": chat_history})\n",
			
 
				+    "print(more_followup_answer['answer'])"
			
 
				+   ]
			
 
				+  }
			
 
				+ ],
			
 
				+ "metadata": {
			
 
				+  "kernelspec": {
			
 
				+   "display_name": "Python 3 (ipykernel)",
			
 
				+   "language": "python",
			
 
				+   "name": "python3"
			
 
				+  },
			
 
				+  "language_info": {
			
 
				+   "codemirror_mode": {
			
 
				+    "name": "ipython",
			
 
				+    "version": 3
			
 
				+   },
			
 
				+   "file_extension": ".py",
			
 
				+   "mimetype": "text/x-python",
			
 
				+   "name": "python",
			
 
				+   "nbconvert_exporter": "python",
			
 
				+   "pygments_lexer": "ipython3",
			
 
				+   "version": "3.11.6"
			
 
				+  }
			
 
				+ },
			
 
				+ "nbformat": 4,
			
 
				+ "nbformat_minor": 5
			
 
				+}
			
--- a/demo_apps/OctoAI_API_examples/LiveData.ipynb
+++ b/demo_apps/OctoAI_API_examples/LiveData.ipynb
@@ -0,0 +1,323 @@
 
				+{
			
 
				+ "cells": [
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "30eb1704-8d76-4bc9-9308-93243aeb69cb",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "## This demo app shows:\n",
			
 
				+    "* How to use LlamaIndex, an open source library to help you build custom data augmented LLM applications\n",
			
 
				+    "* How to ask Llama questions about recent live data via the You.com live search API and LlamaIndex\n",
			
 
				+    "\n",
			
 
				+    "The LangChain package is used to facilitate the call to Llama2 hosted on OctoAI\n",
			
 
				+    "\n",
			
 
				+    "**Note** We will be using OctoAI to run the examples here. You will need to first sign into [OctoAI](https://octoai.cloud/) with your Github or Google account, then create a free API token [here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) that you can use for a while (a month or $10 in OctoAI credits, whichever one runs out first).\n",
			
 
				+    "After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on OctoAI."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "68cf076e",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We start by installing the necessary packages:\n",
			
 
				+    "- [langchain](https://python.langchain.com/docs/get_started/introduction) which provides RAG capabilities\n",
			
 
				+    "- [llama-index](https://docs.llamaindex.ai/en/stable/) for data augmentation."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "1d0005d6-e928-4d1a-981b-534a40e19e56",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "!pip install llama-index langchain"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "21fe3849",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# use ServiceContext to configure the LLM used and the custom embeddings\n",
			
 
				+    "from llama_index import ServiceContext\n",
			
 
				+    "\n",
			
 
				+    "# VectorStoreIndex is used to index custom data \n",
			
 
				+    "from llama_index import VectorStoreIndex\n",
			
 
				+    "\n",
			
 
				+    "from langchain.llms.octoai_endpoint import OctoAIEndpoint"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "73e8e661",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Next we set up the OctoAI token."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "d9d76e33",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from getpass import getpass\n",
			
 
				+    "import os\n",
			
 
				+    "\n",
			
 
				+    "OCTOAI_API_TOKEN = getpass()\n",
			
 
				+    "os.environ[\"OCTOAI_API_TOKEN\"] = OCTOAI_API_TOKEN"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "f8ff812b",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "In this example we will use the [YOU.com](https://you.com/) search engine to augment the LLM's responses.\n",
			
 
				+    "To use the You.com Search API, you can email api@you.com to request an API key. "
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "75275628-5235-4b55-8033-601c76107528",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "YOUCOM_API_KEY = getpass()\n",
			
 
				+    "os.environ[\"YOUCOM_API_KEY\"] = YOUCOM_API_KEY"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "cb210c7c",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We then call the Llama 2 model from OctoAI.\n",
			
 
				+    "\n",
			
 
				+    "We will use the Llama 2 13b chat FP16 model. You can find more on Llama 2 models on the [OctoAI text generation solution page](https://octoai.cloud/tools/text).\n",
			
 
				+    "\n",
			
 
				+    "At the time of writing this notebook the following Llama models are available on OctoAI:\n",
			
 
				+    "* llama-2-13b-chat\n",
			
 
				+    "* llama-2-70b-chat\n",
			
 
				+    "* codellama-7b-instruct\n",
			
 
				+    "* codellama-13b-instruct\n",
			
 
				+    "* codellama-34b-instruct\n",
			
 
				+    "* codellama-70b-instruct"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "c12fc2cb",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# set llm to be using Llama2 hosted on OctoAI\n",
			
 
				+    "llama2_13b = \"llama-2-13b-chat-fp16\"\n",
			
 
				+    "\n",
			
 
				+    "llm = OctoAIEndpoint(\n",
			
 
				+    "    endpoint_url=\"https://text.octoai.run/v1/chat/completions\",\n",
			
 
				+    "    model_kwargs={\n",
			
 
				+    "        \"model\": llama2_13b,\n",
			
 
				+    "        \"messages\": [\n",
			
 
				+    "            {\n",
			
 
				+    "                \"role\": \"system\",\n",
			
 
				+    "                \"content\": \"You are a helpful, respectful and honest assistant.\"\n",
			
 
				+    "            }\n",
			
 
				+    "        ],\n",
			
 
				+    "        \"max_tokens\": 500,\n",
			
 
				+    "        \"top_p\": 1,\n",
			
 
				+    "        \"temperature\": 0.01\n",
			
 
				+    "    },\n",
			
 
				+    ")"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "476d72da",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Using our api key we set up earlier, we make a request from YOU.com for live data on a particular topic."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "effc9656-b18d-4d24-a80b-6066564a838b",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "import requests\n",
			
 
				+    "\n",
			
 
				+    "query = \"Meta Connect\" # you can try other live data query about sports score, stock market and weather info \n",
			
 
				+    "headers = {\"X-API-Key\": os.environ[\"YOUCOM_API_KEY\"]}\n",
			
 
				+    "data = requests.get(\n",
			
 
				+    "    f\"https://api.ydc-index.io/search?query={query}\",\n",
			
 
				+    "    headers=headers,\n",
			
 
				+    ").json()"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "8bed3baf-742e-473c-ada1-4459012a8a2c",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# check the query result in JSON\n",
			
 
				+    "import json\n",
			
 
				+    "\n",
			
 
				+    "print(json.dumps(data, indent=2))"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "b196e697",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We then use the [`JSONLoader`](https://llamahub.ai/l/file-json) to extract the text from the returned data. The `JSONLoader` gives us the ability to load the data into LamaIndex.\n",
			
 
				+    "In the next cell we show how to load the JSON result with key info stored as \"snippets\".\n",
			
 
				+    "\n",
			
 
				+    "However, you can also add the snippets in the query result to documents like below:\n",
			
 
				+    "```python \n",
			
 
				+    "from llama_index import Document\n",
			
 
				+    "snippets = [snippet for hit in data[\"hits\"] for snippet in hit[\"snippets\"]]\n",
			
 
				+    "documents = [Document(text=s) for s in snippets]\n",
			
 
				+    "```\n",
			
 
				+    "This can be handy if you just need to add a list of text strings to doc"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "7c40e73f-ca13-4f4a-a753-e613df3d389e",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# one way to load the JSON result with key info stored as \"snippets\"\n",
			
 
				+    "from llama_index import download_loader\n",
			
 
				+    "\n",
			
 
				+    "JsonDataReader = download_loader(\"JsonDataReader\")\n",
			
 
				+    "loader = JsonDataReader()\n",
			
 
				+    "documents = loader.load_data([hit[\"snippets\"] for hit in data[\"hits\"]])\n"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "8e5e3b4e",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "With the data set up, we create a vector store for the data and a query engine for it.\n",
			
 
				+    "\n",
			
 
				+    "For our embeddings we will use `OctoAIEmbeddings` whose default embedding model is GTE-Large. This model provides a good balance between speed and performance.\n",
			
 
				+    "\n",
			
 
				+    "For more info see https://octoai.cloud/tools/text/embeddings?mode=demo&model=thenlper%2Fgte-large. "
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "a5de3080-2c4b-479c-baba-793b3bee36ed",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# use OctoAI embeddings \n",
			
 
				+    "from langchain_community.embeddings import OctoAIEmbeddings\n",
			
 
				+    "from llama_index.embeddings import LangchainEmbedding\n",
			
 
				+    "\n",
			
 
				+    "\n",
			
 
				+    "embeddings = LangchainEmbedding(OctoAIEmbeddings(\n",
			
 
				+    "    endpoint_url=\"https://text.octoai.run/v1/embeddings\"\n",
			
 
				+    "))\n",
			
 
				+    "print(embeddings)\n",
			
 
				+    "\n",
			
 
				+    "# create a ServiceContext instance to use Llama2 and custom embeddings\n",
			
 
				+    "service_context = ServiceContext.from_defaults(llm=llm, chunk_size=800, chunk_overlap=20, embed_model=embeddings)\n",
			
 
				+    "\n",
			
 
				+    "# create vector store index from the documents created above\n",
			
 
				+    "index = VectorStoreIndex.from_documents(documents, service_context=service_context)\n",
			
 
				+    "\n",
			
 
				+    "# create query engine from the index\n",
			
 
				+    "query_engine = index.as_query_engine(streaming=False)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "2c4ea012",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We are now ready to ask Llama 2 a question about the live data using our query engine."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "de91a191-d0f2-498e-88dc-b2b43423e0e5",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# ask Llama2 a summary question about the search result\n",
			
 
				+    "response = query_engine.query(\"give me a summary\")\n",
			
 
				+    "print(str(response))"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "72814b20-06aa-4da8-b4dd-f0b0d74a2ea0",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# more questions\n",
			
 
				+    "print(str(query_engine.query(\"what products were announced\")))"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "a65bc037-a689-476d-b529-0059a27bc949",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "print(str(query_engine.query(\"tell me more about Meta AI assistant\")))"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "16a56542",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "print(str(query_engine.query(\"what are Generative AI stickers\")))"
			
 
				+   ]
			
 
				+  }
			
 
				+ ],
			
 
				+ "metadata": {
			
 
				+  "kernelspec": {
			
 
				+   "display_name": "Python 3 (ipykernel)",
			
 
				+   "language": "python",
			
 
				+   "name": "python3"
			
 
				+  },
			
 
				+  "language_info": {
			
 
				+   "codemirror_mode": {
			
 
				+    "name": "ipython",
			
 
				+    "version": 3
			
 
				+   },
			
 
				+   "file_extension": ".py",
			
 
				+   "mimetype": "text/x-python",
			
 
				+   "name": "python",
			
 
				+   "nbconvert_exporter": "python",
			
 
				+   "pygments_lexer": "ipython3",
			
 
				+   "version": "3.11.6"
			
 
				+  }
			
 
				+ },
			
 
				+ "nbformat": 4,
			
 
				+ "nbformat_minor": 5
			
 
				+}
			
--- a/demo_apps/OctoAI_API_examples/Llama2_Gradio.ipynb
+++ b/demo_apps/OctoAI_API_examples/Llama2_Gradio.ipynb
@@ -0,0 +1,120 @@
 
				+{
			
 
				+ "cells": [
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "47a9adb3",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "## This demo app shows how to query Llama 2 using the Gradio UI.\n",
			
 
				+    "\n",
			
 
				+    "Since we are using OctoAI in this example, you'll need to obtain an OctoAI token:\n",
			
 
				+    "\n",
			
 
				+    "- You will need to first sign into [OctoAI](https://octoai.cloud/) with your Github or Google account\n",
			
 
				+    "- Then create a free API token [here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) that you can use for a while (a month or $10 in OctoAI credits, whichever one runs out first)\n",
			
 
				+    "\n",
			
 
				+    "**Note** After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on OctoAI.\n",
			
 
				+    "\n",
			
 
				+    "To run this example:\n",
			
 
				+    "- Run the notebook\n",
			
 
				+    "- Set up your OCTOAI API token and enter it when prompted\n",
			
 
				+    "- Enter your question and click Submit\n",
			
 
				+    "\n",
			
 
				+    "In the notebook or a browser with URL http://127.0.0.1:7860 you should see a UI with your answer.\n",
			
 
				+    "\n",
			
 
				+    "Let's start by installing the necessary packages:\n",
			
 
				+    "- langchain provides necessary RAG tools for this demo\n",
			
 
				+    "- octoai-sdk allows us to use OctoAI Llama 2 endpoint\n",
			
 
				+    "- gradio is used for the UI elements\n",
			
 
				+    "\n",
			
 
				+    "And setting up the OctoAI token."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "6ae4f858-6ef7-49d9-b45b-1ef79d0217a0",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "!pip install langchain octoai-sdk gradio"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "3306c11d-ed82-41c5-a381-15fb5c07d307",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from getpass import getpass\n",
			
 
				+    "import os\n",
			
 
				+    "\n",
			
 
				+    "OCTOAI_API_TOKEN = getpass()\n",
			
 
				+    "os.environ[\"OCTOAI_API_TOKEN\"] = OCTOAI_API_TOKEN"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "928041cc",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from langchain.schema import AIMessage, HumanMessage\n",
			
 
				+    "import gradio as gr\n",
			
 
				+    "from langchain.llms.octoai_endpoint import OctoAIEndpoint\n",
			
 
				+    "\n",
			
 
				+    "llama2_13b = \"llama-2-13b-chat-fp16\"\n",
			
 
				+    "\n",
			
 
				+    "llm = OctoAIEndpoint(\n",
			
 
				+    "    endpoint_url=\"https://text.octoai.run/v1/chat/completions\",\n",
			
 
				+    "    model_kwargs={\n",
			
 
				+    "        \"model\": llama2_13b,\n",
			
 
				+    "        \"messages\": [\n",
			
 
				+    "            {\n",
			
 
				+    "                \"role\": \"system\",\n",
			
 
				+    "                \"content\": \"You are a helpful, respectful and honest assistant.\"\n",
			
 
				+    "            }\n",
			
 
				+    "        ],\n",
			
 
				+    "        \"max_tokens\": 500,\n",
			
 
				+    "        \"top_p\": 1,\n",
			
 
				+    "        \"temperature\": 0.01\n",
			
 
				+    "    },\n",
			
 
				+    ")\n",
			
 
				+    "\n",
			
 
				+    "\n",
			
 
				+    "def predict(message, history):\n",
			
 
				+    "    history_langchain_format = []\n",
			
 
				+    "    for human, ai in history:\n",
			
 
				+    "        history_langchain_format.append(HumanMessage(content=human))\n",
			
 
				+    "        history_langchain_format.append(AIMessage(content=ai))\n",
			
 
				+    "    history_langchain_format.append(HumanMessage(content=message))\n",
			
 
				+    "    llm_response = llm(message, history_langchain_format)\n",
			
 
				+    "    return llm_response.content\n",
			
 
				+    "\n",
			
 
				+    "gr.ChatInterface(predict).launch()"
			
 
				+   ]
			
 
				+  }
			
 
				+ ],
			
 
				+ "metadata": {
			
 
				+  "kernelspec": {
			
 
				+   "display_name": "Python 3 (ipykernel)",
			
 
				+   "language": "python",
			
 
				+   "name": "python3"
			
 
				+  },
			
 
				+  "language_info": {
			
 
				+   "codemirror_mode": {
			
 
				+    "name": "ipython",
			
 
				+    "version": 3
			
 
				+   },
			
 
				+   "file_extension": ".py",
			
 
				+   "mimetype": "text/x-python",
			
 
				+   "name": "python",
			
 
				+   "nbconvert_exporter": "python",
			
 
				+   "pygments_lexer": "ipython3",
			
 
				+   "version": "3.11.6"
			
 
				+  }
			
 
				+ },
			
 
				+ "nbformat": 4,
			
 
				+ "nbformat_minor": 5
			
 
				+}
			
--- a/demo_apps/OctoAI_API_examples/RAG_Chatbot_example/RAG_Chatbot_Example.ipynb
+++ b/demo_apps/OctoAI_API_examples/RAG_Chatbot_example/RAG_Chatbot_Example.ipynb
--- a/demo_apps/OctoAI_API_examples/RAG_Chatbot_example/data/Llama
+++ b/demo_apps/OctoAI_API_examples/RAG_Chatbot_example/data/Llama
--- a/demo_apps/OctoAI_API_examples/RAG_Chatbot_example/requirements.txt
+++ b/demo_apps/OctoAI_API_examples/RAG_Chatbot_example/requirements.txt
@@ -0,0 +1,7 @@
 
				+gradio==4.16.0
			
 
				+pypdf==4.0.0
			
 
				+langchain==0.1.7
			
 
				+sentence-transformers==2.2.2
			
 
				+faiss-cpu==1.7.4
			
 
				+text-generation==0.6.1
			
 
				+octoai-sdk==0.8.3
			
--- a/demo_apps/OctoAI_API_examples/RAG_Chatbot_example/vectorstore/db_faiss/index.faiss
+++ b/demo_apps/OctoAI_API_examples/RAG_Chatbot_example/vectorstore/db_faiss/index.faiss
--- a/demo_apps/OctoAI_API_examples/RAG_Chatbot_example/vectorstore/db_faiss/index.pkl
+++ b/demo_apps/OctoAI_API_examples/RAG_Chatbot_example/vectorstore/db_faiss/index.pkl
--- a/demo_apps/OctoAI_API_examples/VideoSummary.ipynb
+++ b/demo_apps/OctoAI_API_examples/VideoSummary.ipynb
@@ -0,0 +1,383 @@
 
				+{
			
 
				+ "cells": [
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "30b1235c-2f3e-4628-9c90-30385f741550",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "## This demo app shows:\n",
			
 
				+    "* How to use LangChain's YoutubeLoader to retrieve the caption in a YouTube video\n",
			
 
				+    "* How to ask Llama to summarize the content (per the Llama's input size limit) of the video in a naive way using LangChain's stuff method\n",
			
 
				+    "* How to bypass the limit of Llama's max input token size by using a more sophisticated way using LangChain's map_reduce and refine methods - see [here](https://python.langchain.com/docs/use_cases/summarization) for more info"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "c866f6be",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We start by installing the necessary packages:\n",
			
 
				+    "- [youtube-transcript-api](https://pypi.org/project/youtube-transcript-api/) API to get transcript/subtitles of a YouTube video\n",
			
 
				+    "- [langchain](https://python.langchain.com/docs/get_started/introduction) provides necessary RAG tools for this demo\n",
			
 
				+    "- [tiktoken](https://github.com/openai/tiktoken) BytePair Encoding tokenizer\n",
			
 
				+    "- [pytube](https://pytube.io/en/latest/) Utility for downloading YouTube videos\n",
			
 
				+    "\n",
			
 
				+    "**Note** This example uses OctoAI to host the Llama model. If you have not set up/or used OctoAI before, we suggest you take a look at the [HelloLlamaCloud](HelloLlamaCloud.ipynb) example for information on how to set up OctoAI before continuing with this example.\n",
			
 
				+    "If you do not want to use OctoAI, you will need to make some changes to this notebook as you go along."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "02482167",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "!pip install langchain octoai-sdk youtube-transcript-api tiktoken pytube"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "af3069b1",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Let's load the YouTube video transcript using the YoutubeLoader."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "3e4b8598",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from langchain.document_loaders import YoutubeLoader\n",
			
 
				+    "\n",
			
 
				+    "loader = YoutubeLoader.from_youtube_url(\n",
			
 
				+    "    \"https://www.youtube.com/watch?v=1k37OcjH7BM\", add_video_info=True\n",
			
 
				+    ")"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "dca32ebb",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# load the youtube video caption into Documents\n",
			
 
				+    "docs = loader.load()"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "afba128f-b7fd-4b2f-873f-9b5163455d54",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# check the docs length and content\n",
			
 
				+    "len(docs[0].page_content), docs[0].page_content[:300]"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "4af7cc16",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "We are using OctoAI in this example to host our Llama 2 model so you will need to get a OctoAI token.\n",
			
 
				+    "\n",
			
 
				+    "To get the OctoAI token:\n",
			
 
				+    "\n",
			
 
				+    "- You will need to first sign in with OctoAI with your github account\n",
			
 
				+    "- Then create a free API token [here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) that you can use for a while (a month or $10 in OctoAI credits, whichever one runs out first)\n",
			
 
				+    "\n",
			
 
				+    "**Note** After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on OctoAI.\n",
			
 
				+    "\n",
			
 
				+    "Alternatively, you can run Llama locally. See:\n",
			
 
				+    "- [HelloLlamaLocal](HelloLlamaLocal.ipynb) for further information on how to run Llama locally."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "ab3ac00e",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# enter your OctoAI API token, or you can use local Llama. See README for more info\n",
			
 
				+    "from getpass import getpass\n",
			
 
				+    "import os\n",
			
 
				+    "\n",
			
 
				+    "OCTOAI_API_TOKEN = getpass()\n",
			
 
				+    "os.environ[\"OCTOAI_API_TOKEN\"] = OCTOAI_API_TOKEN"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "6b911efd",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Next we call the Llama 2 model from OctoAI. In this example we will use the Llama 2 13b chat FP16 model. You can find more on Llama 2 models on the [OctoAI text generation solution page](https://octoai.cloud/tools/text).\n",
			
 
				+    "\n",
			
 
				+    "At the time of writing this notebook the following Llama models are available on OctoAI:\n",
			
 
				+    "* llama-2-13b-chat\n",
			
 
				+    "* llama-2-70b-chat\n",
			
 
				+    "* codellama-7b-instruct\n",
			
 
				+    "* codellama-13b-instruct\n",
			
 
				+    "* codellama-34b-instruct\n",
			
 
				+    "* codellama-70b-instruct\n",
			
 
				+    "\n",
			
 
				+    "If you using local Llama, just set llm accordingly - see the [HelloLlamaLocal notebook](HelloLlamaLocal.ipynb)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "adf8cf3d",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from langchain.llms.octoai_endpoint import OctoAIEndpoint\n",
			
 
				+    "\n",
			
 
				+    "llama2_13b = \"llama-2-13b-chat-fp16\"\n",
			
 
				+    "llm = OctoAIEndpoint(\n",
			
 
				+    "    endpoint_url=\"https://text.octoai.run/v1/chat/completions\",\n",
			
 
				+    "    model_kwargs={\n",
			
 
				+    "        \"model\": llama2_13b,\n",
			
 
				+    "        \"messages\": [\n",
			
 
				+    "            {\n",
			
 
				+    "                \"role\": \"system\",\n",
			
 
				+    "                \"content\": \"You are a helpful, respectful and honest assistant.\"\n",
			
 
				+    "            }\n",
			
 
				+    "        ],\n",
			
 
				+    "        \"max_tokens\": 500,\n",
			
 
				+    "        \"top_p\": 1,\n",
			
 
				+    "        \"temperature\": 0.01\n",
			
 
				+    "    },\n",
			
 
				+    ")"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "8e3baa56",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Once everything is set up, we prompt Llama 2 to summarize the first 4000 characters of the transcript for us."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "51739e11",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from langchain.prompts import ChatPromptTemplate\n",
			
 
				+    "from langchain.chains import LLMChain\n",
			
 
				+    "prompt = ChatPromptTemplate.from_template(\n",
			
 
				+    "    \"Give me a summary of the text below: {text}?\"\n",
			
 
				+    ")\n",
			
 
				+    "chain = LLMChain(llm=llm, prompt=prompt)\n",
			
 
				+    "# be careful of the input text length sent to LLM\n",
			
 
				+    "text = docs[0].page_content[:4000]\n",
			
 
				+    "summary = chain.run(text)\n",
			
 
				+    "# this is the summary of the first 4000 characters of the video content\n",
			
 
				+    "print(summary)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "8b684b29",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "Next we try to summarize all the content of the transcript and we should get a `RuntimeError: Your input is too long. Max input length is 4096 tokens, but you supplied 5597 tokens.`."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "88a2c17f",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# try to get a summary of the whole content\n",
			
 
				+    "text = docs[0].page_content\n",
			
 
				+    "summary = chain.run(text)\n",
			
 
				+    "print(summary)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "1ad1881a",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "\n",
			
 
				+    "Let's try some workarounds to see if we can summarize the entire transcript without running into the `RuntimeError`.\n",
			
 
				+    "\n",
			
 
				+    "We will use the LangChain's `load_summarize_chain` and play around with the `chain_type`.\n"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "9bfee2d3-3afe-41d9-8968-6450cc23f493",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from langchain.chains.summarize import load_summarize_chain\n",
			
 
				+    "# see https://python.langchain.com/docs/use_cases/summarization for more info\n",
			
 
				+    "chain = load_summarize_chain(llm, chain_type=\"stuff\") # other supported methods are map_reduce and refine\n",
			
 
				+    "chain.run(docs)\n",
			
 
				+    "# same RuntimeError: Your input is too long. but stuff works for shorter text with input length <= 4096 tokens"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "682799a8-3846-41b1-a908-02ab5ac3ecee",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "chain = load_summarize_chain(llm, chain_type=\"refine\")\n",
			
 
				+    "# still get the \"RuntimeError: Your input is too long. Max input length is 4096 tokens\"\n",
			
 
				+    "chain.run(docs)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "aecf6328",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "\n",
			
 
				+    "Since the transcript is bigger than the model can handle, we can split the transcript into chunks instead and use the [`refine`](https://python.langchain.com/docs/modules/chains/document/refine) `chain_type` to iteratively create an answer."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "3be1236a-fe6a-4bf6-983f-0e72dde39fee",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
			
 
				+    "\n",
			
 
				+    "# we need to split the long input text\n",
			
 
				+    "text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n",
			
 
				+    "    chunk_size=3000, chunk_overlap=0\n",
			
 
				+    ")\n",
			
 
				+    "split_docs = text_splitter.split_documents(docs)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "12ae9e9d-3434-4a84-a298-f2b98de9ff01",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# check the splitted docs lengths\n",
			
 
				+    "len(split_docs), len(docs), len(split_docs[0].page_content), len(docs[0].page_content)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "127f17fe-d5b7-43af-bd2f-2b47b076d0b1",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# now get the summary of the whole docs - the whole youtube content\n",
			
 
				+    "chain = load_summarize_chain(llm, chain_type=\"refine\")\n",
			
 
				+    "print(str(chain.run(split_docs)))"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "c3976c92",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "You can also use [`map_reduce`](https://python.langchain.com/docs/modules/chains/document/map_reduce) `chain_type` to implement a map reduce like architecture while summarizing the documents."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "8991df49-8578-46de-8b30-cb2cd11e30f1",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# another method is map_reduce\n",
			
 
				+    "chain = load_summarize_chain(llm, chain_type=\"map_reduce\")\n",
			
 
				+    "print(str(chain.run(split_docs)))"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "77d580de",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "To investigate further, let's turn on Langchain's debug mode on to get an idea of how many calls are made to the model and the details of the inputs and outputs.\n",
			
 
				+    "We will then run our summary using the `stuff` and `refine` `chain_types` and take a look at our output."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "f2138911-d2b9-41f3-870f-9bc37e2043d9",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# to find how many calls to Llama have been made and the details of inputs and outputs of each call, set langchain to debug\n",
			
 
				+    "import langchain\n",
			
 
				+    "langchain.debug = True\n",
			
 
				+    "\n",
			
 
				+    "# stuff method will cause the error in the end\n",
			
 
				+    "chain = load_summarize_chain(llm, chain_type=\"stuff\")\n",
			
 
				+    "chain.run(split_docs)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "id": "60d1a531-ab48-45cc-a7de-59a14e18240d",
			
 
				+   "metadata": {},
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# but refine works\n",
			
 
				+    "chain = load_summarize_chain(llm, chain_type=\"refine\")\n",
			
 
				+    "chain.run(split_docs)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "id": "61ccd0fb-5cdb-43c4-afaf-05bc9f7cf959",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "\n",
			
 
				+    "As you can see, `stuff` fails because it tries to treat all the split documents as one and \"stuffs\" it into one prompt which leads to a much larger prompt than Llama 2 can handle while `refine` iteratively runs over the documents updating its answer as it goes."
			
 
				+   ]
			
 
				+  }
			
 
				+ ],
			
 
				+ "metadata": {
			
 
				+  "kernelspec": {
			
 
				+   "display_name": "Python 3 (ipykernel)",
			
 
				+   "language": "python",
			
 
				+   "name": "python3"
			
 
				+  },
			
 
				+  "language_info": {
			
 
				+   "codemirror_mode": {
			
 
				+    "name": "ipython",
			
 
				+    "version": 3
			
 
				+   },
			
 
				+   "file_extension": ".py",
			
 
				+   "mimetype": "text/x-python",
			
 
				+   "name": "python",
			
 
				+   "nbconvert_exporter": "python",
			
 
				+   "pygments_lexer": "ipython3",
			
 
				+   "version": "3.11.6"
			
 
				+  }
			
 
				+ },
			
 
				+ "nbformat": 4,
			
 
				+ "nbformat_minor": 5
			
 
				+}
			
--- a/demo_apps/README.md
+++ b/demo_apps/README.md
--- a/examples/Purple_Llama_OctoAI.ipynb
+++ b/examples/Purple_Llama_OctoAI.ipynb
@@ -0,0 +1,289 @@
 
				+{
			
 
				+ "cells": [
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "metadata": {
			
 
				+    "id": "LERqQn5v8-ak"
			
 
				+   },
			
 
				+   "source": [
			
 
				+    "# **Purple Llama Using OctoAI**\n",
			
 
				+    "\n",
			
 
				+    "Drawing inspiration from the cybersecurity concept of \"purple teaming,\" Purple Llama embraces both offensive (red team) and defensive (blue team) strategies. Our goal is to empower developers in deploying generative AI models responsibly, aligning with best practices outlined in our Responsible Use Guide."
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "metadata": {
			
 
				+    "id": "PGPSI3M5PGTi"
			
 
				+   },
			
 
				+   "source": [
			
 
				+    "#### **1 - What is Purple Llama?**\n",
			
 
				+    "\n",
			
 
				+    "Purple Llama is a an umbrella project that over time will bring together tools and evals to help the community build responsibly with open generative AI models. The initial release will include tools and evals for Cyber Security and Input/Output safeguards but we plan to contribute more in the near future.\n",
			
 
				+    "\n",
			
 
				+    "* Instruction tuned on Llama2-7b model\n",
			
 
				+    "* [CyberSecurity Evals](https://github.com/facebookresearch/PurpleLlama/tree/main/CybersecurityBenchmarks_)\n",
			
 
				+    "* [Llama Guard Model](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/)\n",
			
 
				+    "* [Download Llama Guard](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)\n",
			
 
				+    "* [Purple Llama Website](https://ai.meta.com/llama/purple-llama/)\n",
			
 
				+    "* [Purple Llama Github Repo](https://github.com/facebookresearch/PurpleLlama)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "metadata": {
			
 
				+    "id": "aYeHVVh45bdT"
			
 
				+   },
			
 
				+   "source": [
			
 
				+    "#### **2 - Accessing Purple Llama**\n",
			
 
				+    "* Download + Self Host (i.e. [download Purple Llama](https://ai.meta.com/resources/models-and-libraries/llama-downloads/))\n",
			
 
				+    "* Hosted API Platform (e.g. [OctoAI](https://octoai.cloud/), [Anyscale](https://www.anyscale.com/), [Together](https://api.together.xyz/playground/chat/togethercomputer/llama-2-7b-chat), [Replicate](https://replicate.com/meta))\n",
			
 
				+    "* Hosted Container Platform (e.g. [Azure](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/introducing-llama-2-on-azure/ba-p/3881233), [AWS](https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/), [GCP](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/139))"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "metadata": {
			
 
				+    "id": "sd54g0OHuqBY"
			
 
				+   },
			
 
				+   "source": [
			
 
				+    "#### **3 - Using Purple Llama**\n",
			
 
				+    "\n",
			
 
				+    "In this notebook, We will use the Llama Guard model managed by the [OctoAI](https://octoai.cloud/tools/text) for inferencing. You'll need to first register an account with OctoAI [here](https://octoai.cloud/) then obtain an OctoAI API token [here](https://docs.octoai.cloud/docs/how-to-create-an-octoai-access-token), that you can use for a while (a month or $10 in OctoAI credits, whichever one runs out first).\n"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "metadata": {
			
 
				+    "id": "C4P-b0eQcyd2"
			
 
				+   },
			
 
				+   "source": [
			
 
				+    "First, set the OctoAI API token as environment variables.\n",
			
 
				+    "\n"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "metadata": {
			
 
				+    "colab": {
			
 
				+     "base_uri": "https://localhost:8080/"
			
 
				+    },
			
 
				+    "id": "yE3sPjS-cyd2",
			
 
				+    "outputId": "93b36bc0-e6d4-493c-c88d-ec5c41266239"
			
 
				+   },
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from getpass import getpass\n",
			
 
				+    "import os\n",
			
 
				+    "\n",
			
 
				+    "# enter your OctoAI API key\n",
			
 
				+    "OCTOAI_API_TOKEN = getpass()\n",
			
 
				+    "os.environ[\"OCTOAI_API_TOKEN\"] = OCTOAI_API_TOKEN"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "metadata": {
			
 
				+    "id": "DOSiDW6hq9dI"
			
 
				+   },
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "from string import Template\n",
			
 
				+    "\n",
			
 
				+    "B_INST, E_INST = \"[INST]\", \"[/INST]\"\n",
			
 
				+    "\n",
			
 
				+    "PROMPT_TEMPLATE = Template(\"\"\"Task: Check if there is unsafe content in '$agent_type' messages in conversations according our safety policy with the below categories.\n",
			
 
				+    "<BEGIN UNSAFE CONTENT CATEGORIES>\n",
			
 
				+    "O1: Violence and Hate.\n",
			
 
				+    "Should not\n",
			
 
				+    "- Help people plan or engage in violence.\n",
			
 
				+    "- Encourage people to engage in violence.\n",
			
 
				+    "- Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability.\n",
			
 
				+    "- Encourage discrimination against people based on those characteristics and should not use slurs or create insults based on those characteristics.\n",
			
 
				+    "Can\n",
			
 
				+    "-  Provide information on violence and discrimination.\n",
			
 
				+    "- Discuss the topics of hate and violence, and can discuss historical events involving violence.\n",
			
 
				+    "O2: Sexual Content.\n",
			
 
				+    "Should not\n",
			
 
				+    "- Engage in sexually explicit (i.e., erotic) conversations and should not encourage people to engage in sexual activities.\n",
			
 
				+    "Can\n",
			
 
				+    "- Discuss topics related to sexuality and sexual education.\n",
			
 
				+    "- Provide general information regarding sexual activities.\n",
			
 
				+    "- Provide guidance on topics related to sexual health.\n",
			
 
				+    "O3: Criminal Planning.\n",
			
 
				+    "Should not\n",
			
 
				+    "- Help people plan or engage in criminal activities like theft, kidnapping, or financial crimes, especially when the person indicates that they may act on the information provided to them.\n",
			
 
				+    "- Encourage people to engage in criminal activities or condone criminal behavior.\n",
			
 
				+    "Can\n",
			
 
				+    "- Help people understand how crimes are committed, provided that the response is not so detailed that it could be used to provide a roadmap for future crimes.\n",
			
 
				+    "- Help people understand how systems work that are used to prevent crimes, provided those descriptions do not provide detailed information on how to defeat those systems.\n",
			
 
				+    "O4: Guns and Illegal Weapons.\n",
			
 
				+    "Should not\n",
			
 
				+    "- Assist in the planning or execution of firearm or illegal weapons crimes, especially when the person indicates that they may act on the information provided to them.\n",
			
 
				+    "- Encourage people to engage in firearm or illegal weapons crimes or condone such behavior.\n",
			
 
				+    "Can\n",
			
 
				+    "- Discuss firearms and the arguments for and against firearm ownership.\n",
			
 
				+    "O5: Regulated or Controlled Substances.\n",
			
 
				+    "Should not\n",
			
 
				+    "- Encourage or assist people in the trafficking, consumption, or use of regulated or controlled substances, including but not limited to illegal drugs, tobacco, alcohol, and cannabis.\n",
			
 
				+    "- Assist or encourage people to create such substances when it is illegal to do so.\n",
			
 
				+    "Can\n",
			
 
				+    "- Discuss the use of regulated or controlled substances, and can provide general information on those substances (ex: what their chemical composition is and what their effects are).\n",
			
 
				+    "- Provide information on how those substances are manufactured, provided that the information is not so detailed as to provide step-by-step instructions for their creation.\n",
			
 
				+    "O6: Self-Harm.\n",
			
 
				+    "Should not\n",
			
 
				+    "- Encourage people to harm themselves.\n",
			
 
				+    "- Romanticize or condone self-harm or suicide.\n",
			
 
				+    "- Provide information regarding the methods of suicide or self-harm.\n",
			
 
				+    "- Provide any information or advice, other than information regarding appropriate health resources, to people who express or imply that they wish to harm themselves.\n",
			
 
				+    "Can\n",
			
 
				+    "- Provide general information regarding self-harm and suicide (ex: statistics, the mental health challenges faced by those who take these actions).\n",
			
 
				+    "Should\n",
			
 
				+    "- Provide information regarding appropriate health resources to people who imply or express the desire to harm themselves.\n",
			
 
				+    "<END UNSAFE CONTENT CATEGORIES>\n",
			
 
				+    "\n",
			
 
				+    "<BEGIN CONVERSATION>\n",
			
 
				+    "\n",
			
 
				+    "$prompt\n",
			
 
				+    "\n",
			
 
				+    "<END CONVERSATION>\n",
			
 
				+    "\n",
			
 
				+    "Provide your safety assessment for $agent_type in the above conversation:\n",
			
 
				+    "- First line must read 'safe' or 'unsafe'.\n",
			
 
				+    "- If unsafe, a second line must include a comma-separated list of violated categories.\"\"\")\n",
			
 
				+    "\n",
			
 
				+    "\n",
			
 
				+    "def format_prompt(role, message):\n",
			
 
				+    "    prompt = PROMPT_TEMPLATE.substitute(prompt=message, agent_type=role)\n",
			
 
				+    "    prompt = f\"<s>{B_INST} {prompt.strip()} {E_INST}\"\n",
			
 
				+    "    return prompt\n"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "metadata": {
			
 
				+    "colab": {
			
 
				+     "base_uri": "https://localhost:8080/"
			
 
				+    },
			
 
				+    "id": "t6hkFlVD9XFw",
			
 
				+    "outputId": "25fd187e-a484-4b90-d104-a3320b98e8ea"
			
 
				+   },
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "%pip install openai"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "metadata": {
			
 
				+    "colab": {
			
 
				+     "base_uri": "https://localhost:8080/"
			
 
				+    },
			
 
				+    "id": "xC-OYLF4rGc7",
			
 
				+    "outputId": "452ca85a-fd9f-426c-d4ab-7898bdabfab8"
			
 
				+   },
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "import openai\n",
			
 
				+    "\n",
			
 
				+    "# Feed a prompt that does comply with the safety policy\n",
			
 
				+    "\n",
			
 
				+    "client = openai.OpenAI(\n",
			
 
				+    "   base_url = \"https://text.octoai.run/v1\",\n",
			
 
				+    "   api_key=OCTOAI_API_TOKEN\n",
			
 
				+    ")\n",
			
 
				+    "completion = client.completions.create(\n",
			
 
				+    "    model=\"llamaguard-7b-fp16\",\n",
			
 
				+    "    prompt=format_prompt(\"User\",\"Tell me about the history of racism in the US\")\n",
			
 
				+    ")\n",
			
 
				+    "print(completion.choices[0].text)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "code",
			
 
				+   "execution_count": null,
			
 
				+   "metadata": {
			
 
				+    "colab": {
			
 
				+     "base_uri": "https://localhost:8080/"
			
 
				+    },
			
 
				+    "id": "JtBTkJRxufb0",
			
 
				+    "outputId": "c581052d-348e-45ac-c874-80ecf4416a77"
			
 
				+   },
			
 
				+   "outputs": [],
			
 
				+   "source": [
			
 
				+    "# Feed a prompt that doesn't comply with the safety policy\n",
			
 
				+    "# Violates O3: Criminal Planning.\n",
			
 
				+    "\n",
			
 
				+    "client = openai.OpenAI(\n",
			
 
				+    "   base_url = \"https://text.octoai.run/v1\",\n",
			
 
				+    "   api_key=OCTOAI_API_TOKEN\n",
			
 
				+    ")\n",
			
 
				+    "completion = client.completions.create(\n",
			
 
				+    "    model=\"llamaguard-7b-fp16\",\n",
			
 
				+    "    prompt=format_prompt(\"Agent\", \"User: How do you buy a tiger in the US\\n Agent: Steal one from the zoo\")\n",
			
 
				+    ")\n",
			
 
				+    "print(completion.choices[0].text)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "#### **Resources <TBD>**\n",
			
 
				+    "- [Llama 2](https://ai.meta.com/llama/)\n",
			
 
				+    "- [Getting Started Guide - Llama 2](https://ai.meta.com/llama/get-started/)\n",
			
 
				+    "- [GitHub - Llama 2](https://github.com/facebookresearch/llama)\n",
			
 
				+    "- [Github - LLama 2 Recipes](https://github.com/facebookresearch/llama-recipes) and [Llama 2 Demo Apps](https://github.com/facebookresearch/llama-recipes/tree/main/demo_apps)\n",
			
 
				+    "- [Research Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n",
			
 
				+    "- [Model Card](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md)\n",
			
 
				+    "- [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/)\n",
			
 
				+    "- [Acceptable Use Policy](https://ai.meta.com/llama/use-policy/)\n",
			
 
				+    "- [OctoAI](https://octoai.cloud/)\n",
			
 
				+    "- [LangChain](https://www.langchain.com/)\n",
			
 
				+    "- [LlamaIndex](https://www.llamaindex.ai/)"
			
 
				+   ]
			
 
				+  },
			
 
				+  {
			
 
				+   "cell_type": "markdown",
			
 
				+   "metadata": {},
			
 
				+   "source": [
			
 
				+    "#### **Authors**\n",
			
 
				+    "1. Hakan Inan, Research Scientist, Meta\n",
			
 
				+    "2. Rashi Rungta, Software Engineer, Meta\n",
			
 
				+    "\n",
			
 
				+    "Ported to use OctoAI LlamaGuard endpoints by Thierry Moreau, OctoAI"
			
 
				+   ]
			
 
				+  }
			
 
				+ ],
			
 
				+ "metadata": {
			
 
				+  "colab": {
			
 
				+   "gpuType": "T4",
			
 
				+   "include_colab_link": true,
			
 
				+   "provenance": [],
			
 
				+   "toc_visible": true
			
 
				+  },
			
 
				+  "kernelspec": {
			
 
				+   "display_name": "Python 3 (ipykernel)",
			
 
				+   "language": "python",
			
 
				+   "name": "python3"
			
 
				+  },
			
 
				+  "language_info": {
			
 
				+   "codemirror_mode": {
			
 
				+    "name": "ipython",
			
 
				+    "version": 3
			
 
				+   },
			
 
				+   "file_extension": ".py",
			
 
				+   "mimetype": "text/x-python",
			
 
				+   "name": "python",
			
 
				+   "nbconvert_exporter": "python",
			
 
				+   "pygments_lexer": "ipython3",
			
 
				+   "version": "3.11.6"
			
 
				+  }
			
 
				+ },
			
 
				+ "nbformat": 4,
			
 
				+ "nbformat_minor": 4
			
 
				+}
			
--- a/examples/README.md
+++ b/examples/README.md
@@ -13,7 +13,7 @@ python examples/finetuning.py <parameters>
 
				 ```
			
 
				 Please see [README.md](../README.md) for details.
			
 
				 
			
 
				-## Inference 
			
 
				+## Inference
			
 
				 So far, we have provide the following inference examples:
			
 
				 
			
 
				 1. [inference script](./inference.py) script provides support for Hugging Face accelerate, PEFT and FSDP fine tuned models. It also demonstrates safety features to protect the user from toxic or harmful content.
			
@@ -26,7 +26,7 @@ So far, we have provide the following inference examples:
 
				 
			
 
				 5. [Code Llama](./code_llama/) folder which provides examples for [code completion](./code_llama/code_completion_example.py), [code infilling](./code_llama/code_infilling_example.py) and [Llama2 70B code instruct](./code_llama/code_instruct_example.py).
			
 
				 
			
 
				-6. The [Purple Llama Using Anyscale](./Purple_Llama_Anyscale.ipynb) is a notebook that shows how to use Anyscale hosted Llama Guard model to classify user inputs as safe or unsafe.
			
 
				+6. The [Purple Llama Using Anyscale](./Purple_Llama_Anyscale.ipynb) and the [Purple Llama Using OctoAI](./Purple_Llama_OctoAI.ipynb) are notebooks that shows how to use Llama Guard model on Anyscale and OctoAI to classify user inputs as safe or unsafe.
			
 
				 
			
 
				 7. [Llama Guard](./llama_guard/) inference example and [safety_checker](../src/llama_recipes/inference/safety_utils.py) for the main [inference](./inference.py) script. The standalone scripts allows to test Llama Guard on user input, or user input and agent response pairs. The safety_checker integration providers a way to integrate Llama Guard on all inference executions, both for the user input and model output.