vor 1 Jahr · 27455cf3b6
--- a/demo_apps/Azure_API_example/azure_api_example.ipynb
+++ b/demo_apps/Azure_API_example/azure_api_example.ipynb
@@ -7,8 +7,8 @@
 
				     "# Use Azure API with Llama 2\n",
			
 
				     "\n",
			
 
				     "This notebook shows examples of how to use Llama 2 APIs offered by Microsoft Azure. We will cover:  \n",
			
 
				-    "* HTTP requests API usage for Llama 2 70B pretrained and chat models in CLI\n",
			
 
				-    "* HTTP requests API usage for Llama 2 70B pretrained and chat models in Python\n",
			
 
				+    "* HTTP requests API usage for Llama 2 pretrained and chat models in CLI\n",
			
 
				+    "* HTTP requests API usage for Llama 2 pretrained and chat models in Python\n",
			
 
				     "* Plug the APIs into LangChain\n",
			
 
				     "* Wire the model with Gradio to build a simple chatbot with memory\n",
			
 
				     "\n"
			
@@ -22,13 +22,13 @@
 
				     "\n",
			
 
				     "Before we start building with Azure Llama 2 APIs, there are certain steps we need to take to deploy the models:\n",
			
 
				     "\n",
			
 
				-    "* Register for a valid Azure account with subscription \n",
			
 
				-    "* Make sure you have access to [Azure AI Studio](https://learn.microsoft.com/en-us/azure/ai-studio/what-is-ai-studio?tabs=home)\n",
			
 
				-    "* Create a project and resource group\n",
			
 
				+    "* Register for a valid Azure account with subscription [here](https://azure.microsoft.com/en-us/free/search/?ef_id=_k_CjwKCAiA-P-rBhBEEiwAQEXhH5OHAJLhzzcNsuxwpa5c9EJFcuAjeh6EvZw4afirjbWXXWkiZXmU2hoC5GoQAvD_BwE_k_&OCID=AIDcmm5edswduu_SEM__k_CjwKCAiA-P-rBhBEEiwAQEXhH5OHAJLhzzcNsuxwpa5c9EJFcuAjeh6EvZw4afirjbWXXWkiZXmU2hoC5GoQAvD_BwE_k_&gad_source=1&gclid=CjwKCAiA-P-rBhBEEiwAQEXhH5OHAJLhzzcNsuxwpa5c9EJFcuAjeh6EvZw4afirjbWXXWkiZXmU2hoC5GoQAvD_BwE)\n",
			
 
				+    "* Take a quick look on what is the [Azure AI Studio](https://learn.microsoft.com/en-us/azure/ai-studio/what-is-ai-studio?tabs=home) and navigate to the website from the link in the article\n",
			
 
				+    "* Follow the demos in the article to create a project and [resource](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/manage-resource-groups-portal) group, or you can also follow the guide [here](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-llama?tabs=azure-studio)\n",
			
 
				     "* Select Llama models from Model catalog\n",
			
 
				     "* Deploy with \"Pay-as-you-go\"\n",
			
 
				     "\n",
			
 
				-    "Once deployed successfully, you should be assigned for an API endpoint and a security key for inference. You can also deploy the model by using Azure ML Python SDK.   \n",
			
 
				+    "Once deployed successfully, you should be assigned for an API endpoint and a security key for inference.  \n",
			
 
				     "\n",
			
 
				     "For more information, you should consult Azure's official documentation [here](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-llama?tabs=azure-studio) for model deployment and inference."
			
 
				    ]
			
@@ -41,10 +41,10 @@
 
				     "\n",
			
 
				     "### Basics\n",
			
 
				     "\n",
			
 
				-    "For using the REST API, You will need to have a Endpoint url and Authentication Key associated with that endpoint.  \n",
			
 
				+    "For using the REST API, You will need to have an Endpoint url and Authentication Key associated with that endpoint.  \n",
			
 
				     "This can be acquired from previous steps.  \n",
			
 
				     "\n",
			
 
				-    "In this text completion example for 70B pre-trained model, we use a simple curl call for illustration. There are three major components:  \n",
			
 
				+    "In this text completion example for pre-trained model, we use a simple curl call for illustration. There are three major components:  \n",
			
 
				     "\n",
			
 
				     "* The `host-url` is your endpoint url with completion schema. \n",
			
 
				     "* The `headers` defines the content type as well as your api key. \n",
			
@@ -66,7 +66,7 @@
 
				    "source": [
			
 
				     "For chat completion, the API schema and request payload are slightly different.\n",
			
 
				     "\n",
			
 
				-    "For `host-url` the path changed to `/v1/chat/completions` and the request payload also changed to include roles in conversations. Here is a sample payload:  \n",
			
 
				+    "The `host-url` needs to be `/v1/chat/completions` and the request payload to include roles in conversations. Here is a sample payload:  \n",
			
 
				     "\n",
			
 
				     "```\n",
			
 
				     "{ \n",
			
@@ -93,7 +93,7 @@
 
				    "metadata": {},
			
 
				    "outputs": [],
			
 
				    "source": [
			
 
				-    "!curl -X POST -L https://your-endpoint.inference.ai.azure.com/v1/chat/completions -H 'Content-Type: application/json' -H 'Authorization: your-auth-key' -d '{\"messages\":[{\"content\":\"You are a helpful assistant.\",\"role\":\"system\"},{\"content\":\"What is good about Wuhan?\",\"role\":\"user\"}], \"max_tokens\": 50}'"
			
 
				+    "!curl -X POST -L https://your-endpoint.inference.ai.azure.com/v1/chat/completions -H 'Content-Type: application/json' -H 'Authorization: your-auth-key' -d '{\"messages\":[{\"content\":\"You are a helpful assistant.\",\"role\":\"system\"},{\"content\":\"Who wrote the book Innovators dilemma?\",\"role\":\"user\"}], \"max_tokens\": 50}'"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -103,7 +103,7 @@
 
				     "If you compare the generation result for both text and chat completion API calls, you will notice that:  \n",
			
 
				     "\n",
			
 
				     "* Text completion returns a list of `choices` for the input prompt, each contains generated text and completion information such as `logprobs`.\n",
			
 
				-    "* Chat completion returns a list of `cnoices` each has a `message` object with completion result and using the same `message` object in the request.  \n",
			
 
				+    "* Chat completion returns a list of `choices` each with a `message` object with completion result, matching the `messages` object in the request.  \n",
			
 
				     "\n",
			
 
				     "\n"
			
 
				    ]
			
@@ -114,7 +114,7 @@
 
				    "source": [
			
 
				     "### Streaming\n",
			
 
				     "\n",
			
 
				-    "One fantastic feature the API offered is the streaming capability.  \n",
			
 
				+    "One fantastic feature the API offers is the streaming capability.  \n",
			
 
				     "Streaming allows the generated tokens to be sent as data-only server-sent events whenever they become available.  \n",
			
 
				     "This is extremely important for interactive applications such as chatbots, so the user is always engaged.  \n",
			
 
				     "\n",
			
@@ -130,7 +130,7 @@
 
				    "metadata": {},
			
 
				    "outputs": [],
			
 
				    "source": [
			
 
				-    "!curl -X POST -L https://your-endpoint.inference.ai.azure.com/v1/chat/completions -H 'Content-Type: application/json' -H 'Authorization: your-auth-key' -d '{\"messages\":[{\"content\":\"You are a helpful assistant.\",\"role\":\"system\"},{\"content\":\"What is good about Wuhan?\",\"role\":\"user\"}], \"max_tokens\": 500, \"stream\": \"True\"}'"
			
 
				+    "!curl -X POST -L https://your-endpoint.inference.ai.azure.com/v1/chat/completions -H 'Content-Type: application/json' -H 'Authorization: your-auth-key' -d '{\"messages\":[{\"content\":\"You are a helpful assistant.\",\"role\":\"system\"},{\"content\":\"Who wrote the book Innovators dilemma?\",\"role\":\"user\"}], \"max_tokens\": 500, \"stream\": \"True\"}'"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -147,10 +147,10 @@
 
				    "source": [
			
 
				     "### Content Safety Filtering\n",
			
 
				     "\n",
			
 
				-    "All Azure Llama 2 API endpoint will have content safety feature turned on. Both input prompt and output tokens are filtered by this service automatically.  \n",
			
 
				+    "All Azure Llama 2 API endpoints have content safety feature turned on. Both input prompt and output tokens are filtered by this service automatically.  \n",
			
 
				     "To know more about the impact to the request/response payload, please refer to official guide [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/content-filter?tabs=python).   \n",
			
 
				     "\n",
			
 
				-    "For model input and output, if the filter detected there is harmful content. The generation will error out with reponse payload containing the reasoning, along with which type of content violation it is and severity.  \n",
			
 
				+    "For model input and output, if the filter detects there is harmful content, the generation will error out with a response payload containing the reasoning, along with information on the type of content violation and its severity. \n",
			
 
				     "\n",
			
 
				     "Here is an example prompt that triggered content safety filtering:\n"
			
 
				    ]
			
@@ -170,9 +170,9 @@
 
				    "source": [
			
 
				     "## HTTP Requests API Usage in Python\n",
			
 
				     "\n",
			
 
				-    "Besides calling the API directly from command line tools. You can also programatically call them in Python.  \n",
			
 
				+    "Besides calling the API directly from command line tools, you can also programatically call them in Python.  \n",
			
 
				     "\n",
			
 
				-    "Here is an example for text completion model:\n",
			
 
				+    "Here is an example for the text completion model:\n",
			
 
				     "\n",
			
 
				     "\n"
			
 
				    ]
			
@@ -236,7 +236,7 @@
 
				     "#Configure payload data sending to API endpoint\n",
			
 
				     "data = {\"messages\":[\n",
			
 
				     "            {\"role\":\"system\", \"content\":\"You are a helpful assistant.\"},\n",
			
 
				-    "            {\"role\":\"user\", \"content\":\"What is good about Wuhan?\"}], \n",
			
 
				+    "            {\"role\":\"user\", \"content\":\"Who wrote the book Innovators dilemma?\"}], \n",
			
 
				     "        \"max_tokens\": 500,\n",
			
 
				     "        \"temperature\": 0.9,\n",
			
 
				     "        \"stream\": \"True\",\n",
			
@@ -271,7 +271,7 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "However in this example, the streamed data content returns back as a single payload. It didn't stream as a serial of data events as we wished. To build true streaming capabilities utilizing the API endpoint, we will utilize [`requests`](https://requests.readthedocs.io/en/latest/) library instead."
			
 
				+    "However in this example, the streamed data content returns back as a single payload. It didn't stream as a serial of data events as we wished. To build true streaming capabilities utilizing the API endpoint, we will utilize the [`requests`](https://requests.readthedocs.io/en/latest/) library instead."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -296,7 +296,7 @@
 
				     "\n",
			
 
				     "data = {\"messages\":[\n",
			
 
				     "            {\"role\":\"system\", \"content\":\"You are a helpful assistant.\"},\n",
			
 
				-    "            {\"role\":\"user\", \"content\":\"What is good about Wuhan?\"}],\n",
			
 
				+    "            {\"role\":\"user\", \"content\":\"Who wrote the book Innovators dilemma?\"}],\n",
			
 
				     "        \"max_tokens\": 500,\n",
			
 
				     "        \"temperature\": 0.9,\n",
			
 
				     "        \"stream\": \"True\"\n",
			
@@ -325,9 +325,9 @@
 
				    "source": [
			
 
				     "## Use Llama 2 API with LangChain\n",
			
 
				     "\n",
			
 
				-    "In this section, we will demonstrate how to use Llama 2 APIs with LangChain, one of the most popoular framework to accelerate building your AI product.  \n",
			
 
				+    "In this section, we will demonstrate how to use Llama 2 APIs with LangChain, one of the most popular framework to accelerate building your AI product.  \n",
			
 
				     "One common solution here is to create your customized LLM instance, so you can add it to various chains to complete different tasks.  \n",
			
 
				-    "In this example, we will use `AzureMLOnlineEndpoint` class LangChain provided to build this customized LLM instance. This particular class is designed to take in Azure endpoint and API keys as inputs and wired it with HTTP calls. So the underlying of it is very similar to how we used `urllib.request` library to send RESTful calls in previous examples to Azure Endpoint.   \n",
			
 
				+    "In this example, we will use the `AzureMLOnlineEndpoint` class LangChain provides to build a customized LLM instance. This particular class is designed to take in Azure endpoint and API keys as inputs and wire it with HTTP calls. So the underlying of it is very similar to how we used `urllib.request` library to send RESTful calls in previous examples to the Azure Endpoint.   \n",
			
 
				     "\n",
			
 
				     "Note Azure is working on a standard solution for LangChain integration in this [PR](https://github.com/langchain-ai/langchain/pull/14560), you should consider migrating to that in the future. \n",
			
 
				     "\n",
			
@@ -348,7 +348,7 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "Once all dependencies installed, you can directly create a `llm` instance based on `AzureMLOnlineEndpoint` as follow:  "
			
 
				+    "Once all dependencies are installed, you can directly create a `llm` instance based on `AzureMLOnlineEndpoint` as follows:  "
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -411,7 +411,7 @@
 
				    "metadata": {},
			
 
				    "outputs": [],
			
 
				    "source": [
			
 
				-    "print(llm(\"What is good about Wuhan?\"))"
			
 
				+    "print(llm(\"Who wrote the book Innovators dilemma?\"))"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -443,7 +443,7 @@
 
				     "        ),\n",
			
 
				     ")\n",
			
 
				     "\n",
			
 
				-    "print(translator_chain.run(input_language=\"English\", output_language=\"French\", input_content=\"What is good about Wuhan?\"))\n"
			
 
				+    "print(translator_chain.run(input_language=\"English\", output_language=\"French\", input_content=\"Who wrote the book Innovators dilemma?\"))\n"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -482,9 +482,9 @@
 
				    "source": [
			
 
				     "Let's use `AzureMLOnlineEndpoint` class from the previous example.  \n",
			
 
				     "In this example, we have three major components:  \n",
			
 
				-    "1. Chatbot UI hosted as web interface by Gradio. These are the UI logics render our model predictions.\n",
			
 
				-    "2. Model itself, which is the core component that ingest prompts and return an answer back.\n",
			
 
				-    "3. Memory component, which stores previous conversation context. In this example, we will use [conversation window buffer](https://python.langchain.com/docs/modules/memory/types/buffer_window) which only logs context in certain time window in the past. \n",
			
 
				+    "1. Chatbot UI hosted as web interface by Gradio. These are the UI logics that render our model predictions.\n",
			
 
				+    "2. Model itself, which is the core component that ingests prompts and returns an answer back.\n",
			
 
				+    "3. Memory component, which stores previous conversation context. In this example, we will use [conversation window buffer](https://python.langchain.com/docs/modules/memory/types/buffer_window) which logs context in certain time window in the past. \n",
			
 
				     "\n",
			
 
				     "All of them are chained together using LangChain."
			
 
				    ]
			
@@ -578,7 +578,7 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "After successfully execute the code above, a chat interface should appear as the interactive output or you can open the localhost url in your selected browser window.  \n",
			
 
				+    "After successfully executing the code above, a chat interface should appear as the interactive output or you can open the localhost url in your selected browser window.  \n",
			
 
				     "\n",
			
 
				     "This concludes our tutorial and examples. Here are some additional reference:  \n",
			
 
				     "* [Fine-tune Llama](https://learn.microsoft.com/azure/ai-studio/how-to/fine-tune-model-llama)\n",