Quellcode durchsuchen

Adding the new llama inference code into the relevant readmes

Beto vor 1 Jahr
Ursprung
Commit
fe40d7f60e
4 geänderte Dateien mit 20 neuen und 4 gelöschten Zeilen
  1. 14 2
      docs/inference.md
  2. 2 0
      examples/README.md
  3. 3 1
      examples/llama_guard/README.md
  4. 1 1
      examples/llama_guard/inference.py

+ 14 - 2
docs/inference.md

@@ -41,14 +41,14 @@ model.resize_token_embeddings(model.config.vocab_size + 1)
 ```
 Padding would be required for batch inference. In this this [example](../examples/inference.py), batch size = 1 so essentially padding is not required. However,We added the code pointer as an example in case of batch inference.
 
-**Chat completion**
+### Chat completion
 The inference folder also includes a chat completion example, that adds built-in safety features in fine-tuned models to the prompt tokens. To run the example:
 
 ```bash
 python examples/chat_completion/chat_completion.py --model_name "PATH/TO/MODEL/7B/" --prompt_file examples/chat_completion/chats.json  --quantization --use_auditnlg
 
 ```
-**Code Llama**
+### Code Llama
 
 Code llama was recently released with three flavors, base-model that support multiple programming languages, Python fine-tuned model and an instruction fine-tuned and aligned variation of Code Llama, please read more [here](https://ai.meta.com/blog/code-llama-large-language-model-coding/). Also note that the Python fine-tuned model and 34B models are not trained on infilling objective, hence can not be used for infilling use-case.
 
@@ -80,6 +80,18 @@ python examples/code_llama/code_infilling_example.py --model_name MODEL_NAME --p
 
 ```
 
+### Llama Guard
+
+Llama Guard is a new experimental model that provides input and output guardrails for LLM deployments. For more details, please visit the main [repository](https://github.com/facebookresearch/PurpleLlama/tree/main/Llama-Guard).
+
+Find the inference script for Llama Guard [here](../examples/llama_guard/).
+
+**Note** Please find the right model on HF side [here](https://huggingface.co/meta-llama/LlamaGuard-7b). 
+
+Edit [inference.py](../examples/llama_guard/inference.py) to add test prompts for Llama Guard and execute it with this command:
+
+`python examples/llama_guard/inference.py`
+
 ## Flash Attention and Xformer Memory Efficient Kernels
 
 Setting `use_fast_kernels` will enable using of Flash Attention or Xformer memory-efficient kernels based on the hardware being used. This would speed up inference when used for batched inputs. This has been enabled in `optimum` library from HuggingFace as a one-liner API, please read more [here](https://pytorch.org/blog/out-of-the-box-acceleration/).

+ 2 - 0
examples/README.md

@@ -28,6 +28,8 @@ So far, we have provide the following inference examples:
 
 6. The [Purple Llama Using Anyscale](./Purple_Llama_Anyscale.ipynb) is a notebook that shows how to use Anyscale hosted Llama Guard model to classify user inputs as safe or unsafe.
 
+7. [Llama Guard](./llama_guard/) inference example and [safety_checker](../src/llama_recipes/inference/safety_utils.py) for the main [inference](./inference.py) script. The standalone scripts allows to test Llama Guard on user input, or user input and agent response pairs. The safety_checker integration providers a way to integrate Llama Guard on all inference excecutions, both for the user input and model output.
+
 For more in depth information on inference including inference safety checks and examples, see the inference documentation [here](../docs/inference.md).
 
 **Note** The [sensitive topics safety checker](../src/llama_recipes/inference/safety_utils.py) utilizes AuditNLG which is an optional dependency. Please refer to installation section of the main [README.md](../README.md#install-with-optional-dependencies) for details.

+ 3 - 1
examples/llama_guard/README.md

@@ -27,7 +27,7 @@ For testing, you can add User or User/Agent interactions into the prompts list a
 
     ]
 ```
-The complete prompt is built with the `build_prompt` function, defined in [prompt_format.py](../../src/llama_recipes/inference/prompt_format.py#L110). The file contains the default Llama Guard  categories. These categories can adjusted and new ones can be added, as described in the [research paper](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/), on section 4.5 Studying the adaptability of the model.
+The complete prompt is built with the `build_prompt` function, defined in [prompt_format.py](../../src/llama_recipes/inference/prompt_format.py). The file contains the default Llama Guard  categories. These categories can adjusted and new ones can be added, as described in the [research paper](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/), on section 4.5 Studying the adaptability of the model.
 <!-- markdown-link-check-enable -->
 
 To run the samples, with all the dependencies installed, execute this command:
@@ -37,6 +37,8 @@ To run the samples, with all the dependencies installed, execute this command:
 ## Inference Safety Checker
 When running the regular inference script with prompts, Llama Guard will be used as a safety checker on the user prompt and the model output. If both are safe, the result will be show, else a message with the error will be show, with the word unsafe and a comma separated list of categories infringed. Llama Guard is always loaded quantized using Hugging Face Transformers library.
 
+In this case, the default categories are applied by the tokenizer, using the `apply_chat_template` method.
+
 Use this command for testing with a quantized Llama model, modifying the values accordingly:
 
 `python examples/inference.py --model_name <path_to_regular_llama_model> --prompt_file <path_to_prompt_file> --quantization --enable_llamaguard_content_safety`

+ 1 - 1
examples/llama_guard/inference.py

@@ -2,7 +2,7 @@ import fire
 from transformers import AutoTokenizer, AutoModelForCausalLM
 
 
-from llama_recipes.inference.prompt_format import build_prompt, create_conversation, LLAMA_GUARD_CATEGORY
+from llama_recipes.inference.prompt_format_utils import build_prompt, create_conversation, LLAMA_GUARD_CATEGORY
 from typing import List, Tuple
 from enum import Enum