marx/llama-recipes @ c5a382e509a4ffb2a124d209697707e7a5159e18

This demo app shows:

How to use LlamaIndex, an open source library to help you build custom data augmented LLM applications
How to ask Llama questions about recent live data via the You.com live search API and LlamaIndex

The LangChain package is used to facilitate the call to Llama2 hosted on Replicate

Note We will be using Replicate to run the examples here. You will need to first sign in with Replicate with your github account, then create a free API token here that you can use for a while. After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on Replicate.

We start by installing the necessary packages:

langchain which provides RAG capabilities
llama-index for data augmentation.

!pip install llama-index langchain

# use ServiceContext to configure the LLM used and the custom embeddings 
from llama_index import ServiceContext

# VectorStoreIndex is used to index custom data 
from llama_index import VectorStoreIndex

from langchain.llms import Replicate

Next we set up the Replicate token.

from getpass import getpass
import os

REPLICATE_API_TOKEN = getpass()
os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN

In this example we will use the YOU.com search engine to augment the LLM's responses. To use the You.com Search API, you can email api@you.com to request an API key.


YOUCOM_API_KEY = getpass()
os.environ["YOUCOM_API_KEY"] = YOUCOM_API_KEY

We then call the Llama 2 model from replicate.

We will use the llama 2 13b chat model. You can find more Llama 2 models by searching for them on the Replicate model explore page. You can add them here in the format: model_name/version

# set llm to be using Llama2 hosted on Replicate
llama2_13b_chat = "meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d"

llm = Replicate(
    model=llama2_13b_chat,
    model_kwargs={"temperature": 0.01, "top_p": 1, "max_new_tokens":500}
)

Using our api key we set up earlier, we make a request from YOU.com for live data on a particular topic.


import requests

query = "Meta Connect" # you can try other live data query about sports score, stock market and weather info 
headers = {"X-API-Key": os.environ["YOUCOM_API_KEY"]}
data = requests.get(
    f"https://api.ydc-index.io/search?query={query}",
    headers=headers,
).json()

# check the query result in JSON
import json

print(json.dumps(data, indent=2))

We then use the JSONLoader to extract the text from the returned data. The JSONLoader gives us the ability to load the data into LamaIndex. In the next cell we show how to load the JSON result with key info stored as "snippets".

However, you can also add the snippets in the query result to documents like below:

from llama_index import Document
snippets = [snippet for hit in data["hits"] for snippet in hit["snippets"]]
documents = [Document(text=s) for s in snippets]

This can be handy if you just need to add a list of text strings to doc

# one way to load the JSON result with key info stored as "snippets"
from llama_index import download_loader

JsonDataReader = download_loader("JsonDataReader")
loader = JsonDataReader()
documents = loader.load_data([hit["snippets"] for hit in data["hits"]])

With the data set up, we create a vector store for the data and a query engine for it.

For our embeddings we will use HuggingFaceEmbeddings whose default embedding model is sentence-transformers/all-mpnet-base-v2. This model provides a good balance between speed and performance. To change the default model, call HuggingFaceEmbeddings(model_name=<another_embedding_model>).

For more info see https://huggingface.co/blog/mteb.

# use HuggingFace embeddings 
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import LangchainEmbedding


embeddings = LangchainEmbedding(HuggingFaceEmbeddings())
print(embeddings)

# create a ServiceContext instance to use Llama2 and custom embeddings
service_context = ServiceContext.from_defaults(llm=llm, chunk_size=800, chunk_overlap=20, embed_model=embeddings)

# create vector store index from the documents created above
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

# create query engine from the index
query_engine = index.as_query_engine(streaming=True)

We are now ready to ask Llama 2 a question about the live data using our query engine.

# ask Llama2 a summary question about the search result
response = query_engine.query("give me a summary")
response.print_response_stream()

# more questions
query_engine.query("what products were announced").print_response_stream()

query_engine.query("tell me more about Meta AI assistant").print_response_stream()

query_engine.query("what are Generative AI stickers").print_response_stream()

LiveData.ipynb 9.1 KB History Raw

This demo app shows:

LiveData.ipynb 9.1 KB

History Raw