1 year ago · 53c65f3657
--- a/README.md
+++ b/README.md
@@ -38,12 +38,12 @@ pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 llama-
 
				 To install from source e.g. for development use this command. We're using hatchling as our build backend which requires an up-to-date pip as well as setuptools package.
			
 
				 ```
			
 
				 pip install -U pip setuptools
			
 
				-pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 -e .
			
 
				+pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 git+https://github.com/facebookresearch/llama-recipes
			
 
				 ```
			
 
				 For development and contributing to llama-recipes please install all optional dependencies:
			
 
				 ```
			
 
				 pip install -U pip setuptools
			
 
				-pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 -e .[tests,auditnlg,vllm]
			
 
				+pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 git+https://github.com/facebookresearch/llama-recipes#egg=llama-recipes\[tests,auditnlg,vllm\]
			
 
				 ```
			
 
				 ## Install with optional dependencies
			
 
				 Llama-recipes offers the installation of optional packages. There are three optional dependency groups.
			
@@ -65,8 +65,6 @@ Optional dependencies can also be combines with [option1,option2].
 
				 
			
 
				 **Note** All the setting defined in [config files](src/llama_recipes/configs/) can be passed as args through CLI when running the script, there is no need to change from config files directly.
			
 
				 
			
 
				-**Note** In case need to run PEFT model with FSDP, please make sure to use the PyTorch Nightlies.
			
 
				-
			
 
				 **For more in depth information checkout the following:**
			
 
				 
			
 
				 * [Single GPU Fine-tuning](./docs/single_gpu.md)
			
@@ -78,7 +76,7 @@ Optional dependencies can also be combines with [option1,option2].
 
				 
			
 
				 # Where to find the models?
			
 
				 
			
 
				-You can find llama v2 models on Hugging Face hub [here](https://huggingface.co/meta-llama), where models with `hf` in the name are already converted to Hugging Face checkpoints so no further conversion is needed. The conversion step below is only for original model weights from Meta that are hosted on Hugging Face model hub as well.
			
 
				+You can find Llama 2 models on Hugging Face hub [here](https://huggingface.co/meta-llama), where models with `hf` in the name are already converted to Hugging Face checkpoints so no further conversion is needed. The conversion step below is only for original model weights from Meta that are hosted on Hugging Face model hub as well.
			
 
				 
			
 
				 # Model conversion to Hugging Face
			
 
				 The recipes and notebooks in this folder are using the Llama 2 model definition provided by Hugging Face's transformers library.
			
@@ -118,7 +116,7 @@ All the parameters in the examples and recipes below need to be further tuned to
 
				 #if running on multi-gpu machine
			
 
				 export CUDA_VISIBLE_DEVICES=0
			
 
				 
			
 
				-python -m llama_recipes.finetuning  --use_peft --peft_method lora --quantization --model_name /patht_of_model_folder/7B --output_dir Path/to/save/PEFT/model
			
 
				+python -m llama_recipes.finetuning  --use_peft --peft_method lora --quantization --model_name /patht_of_model_folder/7B --output_dir path/to/save/PEFT/model
			
 
				 
			
 
				 ```
			
 
				 
			
@@ -135,7 +133,7 @@ Here we make use of Parameter Efficient Methods (PEFT) as described in the next
 
				 
			
 
				 ```bash
			
 
				 
			
 
				-torchrun --nnodes 1 --nproc_per_node 4  examples/finetuning.py --enable_fsdp --use_peft --peft_method lora --model_name /patht_of_model_folder/7B --fsdp_config.pure_bf16 --output_dir Path/to/save/PEFT/model
			
 
				+torchrun --nnodes 1 --nproc_per_node 4  examples/finetuning.py --enable_fsdp --use_peft --peft_method lora --model_name /patht_of_model_folder/7B --fsdp_config.pure_bf16 --output_dir path/to/save/PEFT/model
			
 
				 
			
 
				 ```
			
 
				 
			
@@ -146,7 +144,7 @@ Here we use FSDP as discussed in the next section which can be used along with P
 
				 Setting `use_fast_kernels` will enable using of Flash Attention or Xformer memory-efficient kernels based on the hardware being used. This would speed up the fine-tuning job. This has been enabled in `optimum` library from Hugging Face as a one-liner API, please read more [here](https://pytorch.org/blog/out-of-the-box-acceleration/).
			
 
				 
			
 
				 ```bash
			
 
				-torchrun --nnodes 1 --nproc_per_node 4  examples/finetuning.py --enable_fsdp --use_peft --peft_method lora --model_name /patht_of_model_folder/7B --fsdp_config.pure_bf16 --output_dir Path/to/save/PEFT/model --use_fast_kernels
			
 
				+torchrun --nnodes 1 --nproc_per_node 4  examples/finetuning.py --enable_fsdp --use_peft --peft_method lora --model_name /patht_of_model_folder/7B --fsdp_config.pure_bf16 --output_dir path/to/save/PEFT/model --use_fast_kernels
			
 
				 ```
			
 
				 
			
 
				 ### Fine-tuning using FSDP Only