Llama2_Gradio.ipynb 3.6 KB

## This demo app shows how to query Llama 2 using the Gradio UI. Since we are using Replicate in this example, you will need to replace `` with your API token. To get the Replicate token: - You will need to first sign in with Replicate with your github account - Then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while **Note** After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on Replicate. To run this example: - Set up your Replicate API token and enter it in place of `` - Run the notebook - Enter your question and click Submit In the notebook or a browser with URL http://127.0.0.1:7860 you should see a UI with your answer.
from langchain.schema import AIMessage, HumanMessage
import gradio as gr
from langchain.llms import Replicate
import os

os.environ["REPLICATE_API_TOKEN"] = "<your replicate api token>"

llama2_13b_chat = "meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d"

llm = Replicate(
    model=llama2_13b_chat,
    model_kwargs={"temperature": 0.01, "top_p": 1, "max_new_tokens":500}
)


def predict(message, history):
    history_langchain_format = []
    for human, ai in history:
        history_langchain_format.append(HumanMessage(content=human))
        history_langchain_format.append(AIMessage(content=ai))
    history_langchain_format.append(HumanMessage(content=message))
    gpt_response = llm(message) #history_langchain_format)
    return gpt_response#.content

gr.ChatInterface(predict).launch()
Init param `input` is deprecated, please use `model_kwargs` instead.
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.