Llama2_Gradio.ipynb 3.7 KB

## This demo app shows how to query Llama 2 using the Gradio UI. Since we are using OctoAI in this example, you'll need to obtain an OctoAI token: - You will need to first sign into [OctoAI](https://octoai.cloud/) with your Github or Google account - Then create a free API token [here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) that you can use for a while (a month or $10 in OctoAI credits, whichever one runs out first) **Note** After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on OctoAI. To run this example: - Run the notebook - Set up your OCTOAI API token and enter it when prompted - Enter your question and click Submit In the notebook or a browser with URL http://127.0.0.1:7860 you should see a UI with your answer. Let's start by installing the necessary packages: - langchain provides necessary RAG tools for this demo - octoai-sdk allows us to use OctoAI Llama 2 endpoint - gradio is used for the UI elements And setting up the OctoAI token.
!pip install langchain octoai-sdk gradio
from getpass import getpass
import os

OCTOAI_API_TOKEN = getpass()
os.environ["OCTOAI_API_TOKEN"] = OCTOAI_API_TOKEN
from langchain.schema import AIMessage, HumanMessage
import gradio as gr
from langchain.llms.octoai_endpoint import OctoAIEndpoint

llama2_13b = "llama-2-13b-chat-fp16"

llm = OctoAIEndpoint(
    endpoint_url="https://text.octoai.run/v1/chat/completions",
    model_kwargs={
        "model": llama2_13b,
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful, respectful and honest assistant."
            }
        ],
        "max_tokens": 500,
        "top_p": 1,
        "temperature": 0.01
    },
)


def predict(message, history):
    history_langchain_format = []
    for human, ai in history:
        history_langchain_format.append(HumanMessage(content=human))
        history_langchain_format.append(AIMessage(content=ai))
    history_langchain_format.append(HumanMessage(content=message))
    gpt_response = llm(message) #history_langchain_format)
    return gpt_response#.content

gr.ChatInterface(predict).launch()