Browse Source

Fix broken links in Dataset.md (#49)

Geeta Chauhan 1 year ago
parent
commit
3cc2b3787f
1 changed files with 2 additions and 2 deletions
  1. 2 2
      docs/Dataset.md

+ 2 - 2
docs/Dataset.md

@@ -10,7 +10,7 @@ The provided fine tuning script allows you to select between three datasets by p
 
 
 The list of available datasets can easily be extended with custom datasets by following these instructions.
 The list of available datasets can easily be extended with custom datasets by following these instructions.
 
 
-Each dataset has a corresponding configuration (dataclass) in [configs/dataset.py](../configs/dataset.py) which contains the dataset name, training/validation split names, as well as optional parameters like datafiles etc.
+Each dataset has a corresponding configuration (dataclass) in [configs/datasets.py](../configs/datasets.py) which contains the dataset name, training/validation split names, as well as optional parameters like datafiles etc.
 
 
 Additionally, there is a preprocessing function for each dataset in the [ft_datasets](../ft_datasets) folder.
 Additionally, there is a preprocessing function for each dataset in the [ft_datasets](../ft_datasets) folder.
 The returned data of the dataset needs to be consumable by the forward method of the fine-tuned model by calling ```model(**data)```.
 The returned data of the dataset needs to be consumable by the forward method of the fine-tuned model by calling ```model(**data)```.
@@ -18,7 +18,7 @@ For CausalLM models this usually means that the data needs to be in the form of
 
 
 To add a custom dataset the following steps need to be performed.
 To add a custom dataset the following steps need to be performed.
 
 
-1. Create a dataset configuration after the schema described above. Examples can be found in [configs/dataset.py](../configs/dataset.py).
+1. Create a dataset configuration after the schema described above. Examples can be found in [configs/datasets.py](../configs/datasets.py).
 2. Create a preprocessing routine which loads the data and returns a PyTorch style dataset. The signature for the preprocessing function needs to be (dataset_config, tokenizer, split_name) where split_name will be the string for train/validation split as defined in the dataclass.
 2. Create a preprocessing routine which loads the data and returns a PyTorch style dataset. The signature for the preprocessing function needs to be (dataset_config, tokenizer, split_name) where split_name will be the string for train/validation split as defined in the dataclass.
 3. Register the dataset name and preprocessing function by inserting it as key and value into the DATASET_PREPROC dictionary in [utils/dataset_utils.py](../utils/dataset_utils.py)
 3. Register the dataset name and preprocessing function by inserting it as key and value into the DATASET_PREPROC dictionary in [utils/dataset_utils.py](../utils/dataset_utils.py)
 4. Set dataset field in training config to dataset name or use --dataset option of the llama_finetuning.py training script.
 4. Set dataset field in training config to dataset name or use --dataset option of the llama_finetuning.py training script.