marx/llama-recipes @ 2bc70bdd54d1d323c39b806b8050e60d2f466c71

Suraj Subramanian 12602f32e2 Merge branch 'main' into subramen-patch-deadlinks		vor 1 Jahr
..
hf_text_generation_inference	6d449a859b New folder structure (#1)	vor 1 Jahr
vllm	6d449a859b New folder structure (#1)	vor 1 Jahr
README.md	6d449a859b New folder structure (#1)	vor 1 Jahr
llama-on-prem.md	12602f32e2 Merge branch 'main' into subramen-patch-deadlinks	vor 1 Jahr

Running Llama2 On-Prem with vLLM and TGI

This tutorial shows how to use Llama 2 with vLLM and Hugging Face TGI to build Llama 2 on-prem apps.

* To run a quantized Llama2 model on iOS and Android, you can use the open source MLC LLM or llama.cpp. You can even make a Linux OS that boots to Llama2 (repo).

README.md

Running Llama2 On-Prem with vLLM and TGI