пре 1 година · 5446ea7999
--- a/docs/Dataset.md
+++ b/docs/Dataset.md
@@ -1,6 +1,6 @@
 
				 # Datasets and Evaluation Metrics
			
 
				 
			
 
				-The provided fine tuning script allows you to select between three datasets by passing the `dataset` arg to the `llama_recipes.finetuning` module or `llama_finetuning.py` script. The current options are `grammar_dataset`, `alpaca_dataset`and `samsum_dataset`. Note: Use of any of the datasets should be in compliance with the dataset's underlying licenses (including but not limited to non-commercial uses)
			
 
				+The provided fine tuning script allows you to select between three datasets by passing the `dataset` arg to the `llama_recipes.finetuning` module or `examples/finetuning.py` script. The current options are `grammar_dataset`, `alpaca_dataset`and `samsum_dataset`. Note: Use of any of the datasets should be in compliance with the dataset's underlying licenses (including but not limited to non-commercial uses)
			
 
				 
			
 
				 * [grammar_dataset](https://huggingface.co/datasets/jfleg) contains 150K pairs of english sentences and possible corrections.
			
 
				 * [alpaca_dataset](https://github.com/tatsu-lab/stanford_alpaca) provides 52K instruction-response pairs as generated by `text-davinci-003`.
			
@@ -21,7 +21,7 @@ To add a custom dataset the following steps need to be performed.
 
				 1. Create a dataset configuration after the schema described above. Examples can be found in [configs/datasets.py](../src/llama_recipes/configs/datasets.py).
			
 
				 2. Create a preprocessing routine which loads the data and returns a PyTorch style dataset. The signature for the preprocessing function needs to be (dataset_config, tokenizer, split_name) where split_name will be the string for train/validation split as defined in the dataclass.
			
 
				 3. Register the dataset name and preprocessing function by inserting it as key and value into the DATASET_PREPROC dictionary in [utils/dataset_utils.py](../src/llama_recipes/utils/dataset_utils.py)
			
 
				-4. Set dataset field in training config to dataset name or use --dataset option of the `llama_recipes.finetuning` module or llama_finetuning.py training script.
			
 
				+4. Set dataset field in training config to dataset name or use --dataset option of the `llama_recipes.finetuning` module or examples/finetuning.py training script.
			
 
				 
			
 
				 ## Application
			
 
				 Below we list other datasets and their main use cases that can be used for fine tuning.
			
--- a/docs/multi_gpu.md
+++ b/docs/multi_gpu.md
@@ -9,7 +9,7 @@ To run fine-tuning on multi-GPUs, we will  make use of two packages:
 
				 Given the combination of PEFT and FSDP, we would be able to fine tune a Llama 2 model on multiple GPUs in one node or multi-node.
			
 
				 
			
 
				 ## Requirements 
			
 
				-To run the examples, make sure to install the llama-recipes package and clone the github repository in order to use the provided [`llama_finetuning.py`](../llama_finetuning.py) script with torchrun (See [README.md](../README.md) for details).
			
 
				+To run the examples, make sure to install the llama-recipes package and clone the github repository in order to use the provided [`examples/finetuning.py`](../examples/finetuning.py) script with torchrun (See [README.md](../README.md) for details).
			
 
				 
			
 
				 **Please note that the llama_recipes package will install PyTorch 2.0.1 version, in case you want to run FSDP + PEFT, please make sure to install PyTorch nightlies.**