Browse Source

Update LiveData.ipynb

sekyonda 1 year ago
parent
commit
6008742df1
1 changed files with 93 additions and 16 deletions
  1. 93 16
      demo_apps/LiveData.ipynb

+ 93 - 16
demo_apps/LiveData.ipynb

@@ -6,9 +6,23 @@
    "metadata": {},
    "source": [
     "## This demo app shows:\n",
-    "* how to use LlamaIndex, an open source library to help you build custom data augmented LLM applications;\n",
-    "* how to ask Llama questions about recent live data via the You.com live search API and LlamaIndex.\n",
-    "The LangChain package is used to facilitate the call to Llama2 hosted on Replicate."
+    "* How to use LlamaIndex, an open source library to help you build custom data augmented LLM applications.\n",
+    "* How to ask Llama questions about recent live data via the You.com live search API and LlamaIndex.\n",
+    "\n",
+    "The LangChain package is used to facilitate the call to Llama2 hosted on Replicate.\n",
+    "\n",
+    "**Note** We will be using Replicate to run the examples here. You will need to first sign in with Replicate with your github account, then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. \n",
+    "After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on Replicate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68cf076e",
+   "metadata": {},
+   "source": [
+    "We start by installing the necessary packages:\n",
+    "- [langchain](https://python.langchain.com/docs/get_started/introduction) which provides RAG capabilities\n",
+    "- [llama-index](https://docs.llamaindex.ai/en/stable/) for data augmentation."
    ]
   },
   {
@@ -37,6 +51,14 @@
     "from langchain.llms import Replicate"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "73e8e661",
+   "metadata": {},
+   "source": [
+    "Next we set up the Replicate token."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 2,
@@ -44,7 +66,7 @@
    "metadata": {},
    "outputs": [
     {
-     "name": "stdin",
+     "name": "stdout",
      "output_type": "stream",
      "text": [
       " ········\n"
@@ -59,6 +81,15 @@
     "os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "f8ff812b",
+   "metadata": {},
+   "source": [
+    "In this example we will use the [YOU.com](https://you.com/)search engine to augment the LLM's responses.\n",
+    "To use the You.com Search API, you can email api@you.com to request an API key. "
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 3,
@@ -66,7 +97,7 @@
    "metadata": {},
    "outputs": [
     {
-     "name": "stdin",
+     "name": "stdout",
      "output_type": "stream",
      "text": [
       " ········\n"
@@ -74,12 +105,20 @@
     }
    ],
    "source": [
-    "# You.com’s Search API is used to augment LLM responses with the snippets that contains rich info on the query results\n",
-    "# To use the You.com Search API, you can email api@you.com to request an API key. \n",
+    "\n",
     "YOUCOM_API_KEY = getpass()\n",
     "os.environ[\"YOUCOM_API_KEY\"] = YOUCOM_API_KEY"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "cb210c7c",
+   "metadata": {},
+   "source": [
+    "We then call the Llama 2 model from replicate. In this example we will use the llama 2 13b chat model. You can find more Llama 2 models by searching for them on the [Replicate model explore page](https://replicate.com/explore?query=llama).\n",
+    "You can add them here in the format: model_name/version"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 4,
@@ -96,6 +135,14 @@
     ")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "476d72da",
+   "metadata": {},
+   "source": [
+    "Using our api key we set up earlier, we make a request from YOU.com for live data on a particular topic."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 5,
@@ -103,7 +150,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# chat about live data by first making an API request to the You.com search API\n",
+    "\n",
     "import requests\n",
     "\n",
     "query = \"Meta Connect\" # you can try other live data query about sports score, stock market and weather info \n",
@@ -248,6 +295,23 @@
     "print(json.dumps(data, indent=2))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "b196e697",
+   "metadata": {},
+   "source": [
+    "We then use the [JSONLoader](https://llamahub.ai/l/file-json) to extract the text from the returned data. The JSONLoader gives us the ability to load the data into LamaIndex.\n",
+    "In this example we show how to load the JSON result with key info stored as \"snippets\"\n",
+    "\n",
+    "You can also add the snippets in the query result to documents for example:\n",
+    "```python \n",
+    "from llama_index import Document\n",
+    "snippets = [snippet for hit in data[\"hits\"] for snippet in hit[\"snippets\"]]\n",
+    "documents = [Document(text=s) for s in snippets]\n",
+    "```\n",
+    "This can be handy if you just need to add a list of text strings to doc"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 7,
@@ -260,12 +324,19 @@
     "\n",
     "JsonDataReader = download_loader(\"JsonDataReader\")\n",
     "loader = JsonDataReader()\n",
-    "documents = loader.load_data([hit[\"snippets\"] for hit in data[\"hits\"]])\n",
+    "documents = loader.load_data([hit[\"snippets\"] for hit in data[\"hits\"]])\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8e5e3b4e",
+   "metadata": {},
+   "source": [
+    "With the data set up, we create a vector store for the data and a query engine for it.\n",
     "\n",
-    "# another way to add the snippets in the query result to documents - this can be handy if you just need to add a list of text strings to doc:\n",
-    "# from llama_index import Document\n",
-    "# snippets = [snippet for hit in data[\"hits\"] for snippet in hit[\"snippets\"]]\n",
-    "# documents = [Document(text=s) for s in snippets]"
+    "For our embeddings we will use `HuggingFaceEmbeddings` whose default embedding model is sentence-transformers/all-mpnet-base-v2. This model provides a good balance between speed and performance\n",
+    "To change the default one, call HuggingFaceEmbeddings(model_name=<another_embedding_model>). \n",
+    "For more info see https://huggingface.co/blog/mteb. "
    ]
   },
   {
@@ -287,9 +358,7 @@
     "from langchain.embeddings.huggingface import HuggingFaceEmbeddings\n",
     "from llama_index import LangchainEmbedding\n",
     "\n",
-    "# the default embedding model is sentence-transformers/all-mpnet-base-v2, which provides a good balance between speed and performance\n",
-    "# to change the default one, call HuggingFaceEmbeddings(model_name=<another_embedding_model>). \n",
-    "# for more info see https://huggingface.co/blog/mteb. \n",
+    "\n",
     "embeddings = LangchainEmbedding(HuggingFaceEmbeddings())\n",
     "print(embeddings)\n",
     "\n",
@@ -303,6 +372,14 @@
     "query_engine = index.as_query_engine(streaming=True)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "2c4ea012",
+   "metadata": {},
+   "source": [
+    "We are now ready to ask Llama 2 a question about the live data using our query engine."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 9,