"Open

## This demo app shows:
* How to use LlamaIndex, an open source library to help you build custom data augmented LLM applications
* How to ask Llama 3 questions about recent live data via the [Trvily](https://tavily.com) live search API

In [None]:
!pip install llama-index 
!pip install llama-index-core
!pip install llama-index-llms-replicate
!pip install llama-index-embeddings-huggingface
!pip install tavily-python
!pip install replicate

You will be using [Replicate](https://replicate.com/meta/meta-llama-3-8b-instruct) to run the examples here. You will need to first sign in with Replicate with your github account, then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. You can also use other Llama 3 cloud providers such as [Groq](https://console.groq.com/), [Together](https://api.together.xyz/playground/language/meta-llama/Llama-3-8b-hf), or [Anyscale](https://app.endpoints.anyscale.com/playground) - see Section 2 of the Getting to Know Llama [notebook](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Getting_to_know_Llama.ipynb) for more information.

If you'd like to run Llama 3 locally for the benefits of privacy, no cost or no rate limit (some Llama 3 hosting providers set limits for free plan of queries or tokens per second or minute), see [Running Llama Locally](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb).

In [None]:
from getpass import getpass
import os

REPLICATE_API_TOKEN = getpass()
os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN

You'll set up the Llama 3 8b chat model from Replicate. You can also use Llama 3 70b model by replacing the `model` name with "meta/meta-llama-3-70b-instruct".

In [None]:
from llama_index.core import Settings, VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.replicate import Replicate

Settings.llm = Replicate(
 model="meta/meta-llama-3-8b-instruct",
 temperature=0.0,
 additional_kwargs={"top_p": 1, "max_new_tokens": 500},
)

Settings.embed_model = HuggingFaceEmbedding(
 model_name="BAAI/bge-small-en-v1.5"
)

Next you will use the [Trvily](https://tavily.com/) search engine to augment the Llama 3's responses. To create a free trial Trvily Search API, sign in with your Google or Github account [here](https://app.tavily.com/sign-in).

In [None]:
from tavily import TavilyClient

TAVILY_API_KEY = getpass()
tavily = TavilyClient(api_key=TAVILY_API_KEY)

Do a live web search on "Llama 3 fine-tuning".

In [None]:
response = tavily.search(query="Llama 3 fine-tuning")
context = [{"url": obj["url"], "content": obj["content"]} for obj in response['results']]

In [None]:
context

Create documents based on the search results, index and save them to a vector store, then create a query engine.

In [None]:
from llama_index.core import Document

documents = [Document(text=ct['content']) for ct in context]
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine(streaming=True)

You are now ready to ask Llama 3 questions about the live data using the query engine.

In [None]:
response = query_engine.query("give me a summary")
response.print_response_stream()

In [None]:
query_engine.query("what's the latest about Llama 3 fine-tuning?").print_response_stream()

In [None]:
query_engine.query("tell me more about Llama 3 fine-tuning").print_response_stream()