marx/llama-recipes @ a11946944e8754aac6ab8e1d2bdcf9ac479c3446

Llama2_Gradio.ipynb3.7 KB
					
					Geçmiş
					Ham
				
## This demo app shows how to query Llama 2 using the Gradio UI.

Since we are using OctoAI in this example, you'll need to obtain an OctoAI token:

- You will need to first sign into [OctoAI](https://octoai.cloud/) with your Github or Google account
- Then create a free API token [here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) that you can use for a while (a month or $10 in OctoAI credits, whichever one runs out first)

**Note** After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on OctoAI.

To run this example:
- Run the notebook
- Set up your OCTOAI API token and enter it when prompted
- Enter your question and click Submit

In the notebook or a browser with URL http://127.0.0.1:7860 you should see a UI with your answer.

Let's start by installing the necessary packages:
- langchain provides necessary RAG tools for this demo
- octoai-sdk allows us to use OctoAI Llama 2 endpoint
- gradio is used for the UI elements

And setting up the OctoAI token.
!pip install langchain octoai-sdk gradio
from getpass import getpass
import os

OCTOAI_API_TOKEN = getpass()
os.environ["OCTOAI_API_TOKEN"] = OCTOAI_API_TOKEN
from langchain.schema import AIMessage, HumanMessage
import gradio as gr
from langchain.llms.octoai_endpoint import OctoAIEndpoint

llama2_13b = "llama-2-13b-chat-fp16"

llm = OctoAIEndpoint(
    endpoint_url="https://text.octoai.run/v1/chat/completions",
    model_kwargs={
        "model": llama2_13b,
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful, respectful and honest assistant."
            }
        ],
        "max_tokens": 500,
        "top_p": 1,
        "temperature": 0.01
    },
)


def predict(message, history):
    history_langchain_format = []
    for human, ai in history:
        history_langchain_format.append(HumanMessage(content=human))
        history_langchain_format.append(AIMessage(content=ai))
    history_langchain_format.append(HumanMessage(content=message))
    gpt_response = llm(message) #history_langchain_format)
    return gpt_response#.content

gr.ChatInterface(predict).launch()
Llama2_Gradio.ipynb 3.7 KB Geçmiş Ham

Llama2_Gradio.ipynb 3.7 KB

Geçmiş Ham