Llama2_Gradio.ipynb 3.6 KB

This demo app shows how to query Llama 2 using the Gradio UI.

Since we are using Replicate in this example, you will need to replace <your replicate="" api="" token=""> with your API token.

To get the Replicate token:

  • You will need to first sign in with Replicate with your github account
  • Then create a free API token here that you can use for a while

Note After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on Replicate.

To run this example:

  • Set up your Replicate API token and enter it in place of <your replicate="" api="" token="">
  • Run the notebook
  • Enter your question and click Submit

In the notebook or a browser with URL http://127.0.0.1:7860 you should see a UI with your answer.

from langchain.schema import AIMessage, HumanMessage
import gradio as gr
from langchain.llms import Replicate
import os

os.environ["REPLICATE_API_TOKEN"] = "<your replicate api token>"

llama2_13b_chat = "meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d"

llm = Replicate(
    model=llama2_13b_chat,
    model_kwargs={"temperature": 0.01, "top_p": 1, "max_new_tokens":500}
)


def predict(message, history):
    history_langchain_format = []
    for human, ai in history:
        history_langchain_format.append(HumanMessage(content=human))
        history_langchain_format.append(AIMessage(content=ai))
    history_langchain_format.append(HumanMessage(content=message))
    gpt_response = llm(message) #history_langchain_format)
    return gpt_response#.content

gr.ChatInterface(predict).launch()
Init param `input` is deprecated, please use `model_kwargs` instead.
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.