
# **Getting to know Llama 2: Everything you need to start building**
Our goal in this session is to provide a guided tour of Llama 2, including understanding different Llama 2 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 2 projects.
##**0 - Prerequisites**
* Basic understanding of Large Language Models
* Basic understanding of Python
##**1 - Understanding Llama 2**
### **1.1 - What is Llama 2?**
* State of the art (SOTA), Open Source LLM
* 7B, 13B, 70B
* Pretrained + Chat
* Choosing model: Size, Quality, Cost, Speed
* [Research paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)
* [Responsible use guide](https://ai.meta.com/llama/responsible-use-guide/)
###**1.2 - Accessing Llama 2**
* Download + Self Host (on-premise)
* Hosted API Platform (e.g. [Replicate](https://replicate.com/meta))
* Hosted Container Platform (e.g. [Azure](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/introducing-llama-2-on-azure/ba-p/3881233), [AWS](https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/), [GCP](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/139))
### **1.3 - Use Cases of Llama 2**
* Content Generation
* Chatbots
* Summarization
* Programming (e.g. Code Llama)
* and many more...
##**2 - Using Llama 2**
In this notebook, we are going to access [Llama 13b chat model](https://replicate.com/meta/llama-2-13b-chat) using hosted API from Replicate.
### **2.1 - Install dependencies**
### **2.2 - Basic completion**
### **2.3 - System prompts**
### **2.4 - Response formats**
* Can support different formatted outputs e.g. text, JSON, etc.
## **3 - Gen AI Application Architecture**
Here is the high-level tech stack/architecture of Generative AI application.
##4 - **Chatbot Architecture**
Here are the key components and the information flow in a chatbot.
* User Prompts
* Input Safety
* Llama 2
* Output Safety
* Memory & Context
### **4.1 - Chat conversation**
* LLMs are stateless
* Single Turn
* Multi Turn (Memory)
Chat app requires us to send in previous context to LLM to get in valid responses. Below is an example of Multi-turn chat.
### **4.2 - Prompt Engineering**
* Prompt engineering refers to the science of designing effective prompts to get desired responses
* Helps reduce hallucination
#### **4.2.1 - In-Context Learning (e.g. Zero-shot, Few-shot)**
* In-context learning - specific method of prompt engineering where demonstration of task are provided as part of prompt.
1. Zero-shot learning - model is performing tasks without any
input examples.
2. Few or “N-Shot” Learning - model is performing and behaving based on input examples in user's prompt.
#### **4.2.2 - Chain of Thought**
"Chain of thought" enables complex reasoning through logical step by step thinking and generates meaningful and contextually relevant responses.
### **4.3 - Retrieval Augmented Generation (RAG)**
* Prompt Eng Limitations - Knowledge cutoff & lack of specialized data
* Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 2.
For our demo, we are going to download an external PDF file from a URL and query against the content in the pdf file to get contextually relevant information back with the help of Llama!
#### **4.3.1 - LangChain**
LangChain is a framework that helps make it easier to implement RAG.
#### **4.3.2 - LangChain Q&A Retriever**
* ConversationalRetrievalChain
* Query the Source documents
## **5 - Fine-Tuning Models**
* Limitatons of Prompt Eng and RAG
* Fine-Tuning Arch
* Types (PEFT, LoRA, QLoRA)
* Using PyTorch for Pre-Training & Fine-Tuning
* Evals + Quality
## **6 - Responsible AI**
* Power + Responsibility
* Hallucinations
* Input & Output Safety
* Red-teaming (simulating real-world cyber attackers)
* [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/)
##**7 - Conclusion**
* Active research on LLMs and Llama
* Leverage the power of Llama and its open community
* Safety and responsible use is paramount!
* Call-To-Action
* [Replicate Free Credits](https://replicate.fyi/connect2023) for Connect attendees!
* This notebook is available through Llama Github recipes
* Use Llama in your projects and give us feedback
#### **Resources**
- [GitHub - Llama 2](https://github.com/facebookresearch/llama)
- [Github - LLama 2 Recipes](https://github.com/facebookresearch/llama-recipes)
- [Llama 2](https://ai.meta.com/llama/)
- [Research Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)
- [Model Card](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md)
- [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/)
- [Acceptable Use Policy](https://ai.meta.com/llama/use-policy/)
- [Replicate](https://replicate.com/meta/)
- [LangChain](https://www.langchain.com/)
#### **Authors & Contact**
* asangani@meta.com, [Amit Sangani | LinkedIn](https://www.linkedin.com/in/amitsangani/)
* mohsena@meta.com, [Mohsen Agsen | LinkedIn](https://www.linkedin.com/in/mohsen-agsen-62a9791/)