Hamid Shojanazeri db7accfbe7 dix PR link		1 rok temu
..
hf-text-generation-inference	b3067b55dc fix typos and spelling errors	1 rok temu
README.md	db7accfbe7 dix PR link	1 rok temu
chat_completion.py	557e881fcc aliginng the pad token with HF latest	1 rok temu
chat_utils.py	f6152893d8 update prompts	1 rok temu
chats.json	f6152893d8 update prompts	1 rok temu
checkpoint_converter_fsdp_hf.py	50e9d17045 add the default option for find the HF model_name/path from train_param.yaml	1 rok temu
inference.py	c4e96af6ee clean up	1 rok temu
model_utils.py	76a187c4d2 clean up	1 rok temu
safety_utils.py	4767f09ecd Initial commit	1 rok temu
samsum_prompt.txt	4767f09ecd Initial commit	1 rok temu
vLLM_inference.py	4767f09ecd Initial commit	1 rok temu

Inference

This folder contains inference examples for Llama 2. So far, we have provided support for three methods of inference:

inference script script provides support for Hugging Face accelerate, PEFT and FSDP fine tuned models.
vLLM_inference.py script takes advantage of vLLM's paged attention concept for low latency.
The hf-text-generation-inference folder contains information on Hugging Face Text Generation Inference (TGI).

For more in depth information on inference including inference safety checks and examples, see the inference documentation here.

System Prompt Update

Observed Issue

We received feedback from the community on our prompt template and we are providing an update to reduce the false refusal rates seen. False refusals occur when the model incorrectly refuses to answer a question that it should, for example due to overly broad instructions to be cautious in how it provides responses.

Updated approach

Based on evaluation and analysis, we recommend the removal of the system prompt as the default setting. Pull request #105 removes the system prompt as the default option, but still provides an example to help enable experimentation for those using it.

README.md

Inference

System Prompt Update

Observed Issue

Updated approach