Sfoglia il codice sorgente

Adding Supporting Files For link and Spell Check (#26)

Geeta Chauhan 1 anno fa
parent
commit
74bde65a62

+ 20 - 17
README.md

@@ -7,12 +7,12 @@ Llama 2 is a new technology that carries potential risks with use. Testing condu
 
 # Table of Contents
 1. [Quick start](#quick-start)
-2. [Fine-tuning](#fine-tuning)
+2. [Model Conversion](#model-conversion-to-hugging-face)
+3. [Fine-tuning](#fine-tuning)
     - [Single GPU](#single-gpu)
     - [Multi GPU One Node](#multiple-gpus-one-node)
     - [Multi GPU Multi Node](#multi-gpu-multi-node)
-3. [Inference](./docs/inference.md)
-4. [Model Conversion](#model-conversion-to-hugging-face)
+4. [Inference](./docs/inference.md)
 5. [Repository Organization](#repository-organization)
 6. [License and Acceptable Use Policy](#license)
 
@@ -46,6 +46,23 @@ pip install -r requirements.txt
 
 **Please note that the above requirements.txt will install PyTorch 2.0.1 version, in case you want to run FSDP + PEFT, please make sure to install PyTorch nightlies.**
 
+# Model conversion to Hugging Face
+The recipes and notebooks in this folder are using the Llama 2 model definition provided by Hugging Face's transformers library.
+
+Given that the original checkpoint resides under models/7B you can install all requirements and convert the checkpoint with:
+
+```bash
+## Install HuggingFace Transformers from source
+pip freeze | grep transformers ## verify it is version 4.31.0 or higher
+
+```bash
+git clone git@github.com:huggingface/transformers.git
+cd transformers
+pip install protobuf
+python src/transformers/models/llama/convert_llama_weights_to_hf.py \
+   --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path
+```
+
 # Fine-tuning
 
 For fine-tuning Llama 2 models for your domain-specific use cases recipes for PEFT, FSDP, PEFT+FSDP have been included along with a few test datasets. For details see [LLM Fine-tuning](./docs/LLM_finetuning.md).
@@ -112,20 +129,6 @@ sbatch multi_node.slurm
 You can read more about our fine-tuning strategies [here](./docs/LLM_finetuning.md).
 
 
-# Model conversion to Hugging Face
-The recipes and notebooks in this folder are using the Llama 2 model definition provided by Hugging Face's transformers library.
-
-Given that the original checkpoint resides under models/7B you can install all requirements and convert the checkpoint with:
-
-```bash
-## Install HuggingFace Transformers from source
-pip install git+https://github.com/huggingface/transformers
-cd transformers
-
-python src/transformers/models/llama/convert_llama_weights_to_hf.py \
-    --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir models_hf/7B
-```
-
 # Repository Organization
 This repository is organized in the following way:
 

+ 1 - 1
docs/inference.md

@@ -31,7 +31,7 @@ inference/samsum_prompt.txt
 The inference folder also includes a chat completion example, that adds built-in safety features in fine-tuned models to the prompt tokens. To run the example:
 
 ```bash
-python chat_completion.py --model_name "PATH/TO/MODEL/7B/" --prompt_file chats.json  --quantization --use_auditnlg
+python inference/chat_completion.py --model_name "PATH/TO/MODEL/7B/" --prompt_file inference/chats.json  --quantization --use_auditnlg
 
 ```
 

+ 24 - 0
scripts/markdown_link_check_config.json

@@ -0,0 +1,24 @@
+{
+  "retryOn429": true,
+  "retryCount": 5,
+  "fallbackRetryDelay": "10s",
+  "httpHeaders": [
+    {
+      "urls": [
+        "https://docs.github.com/",
+        "https://help.github.com/"
+      ],
+      "headers": {
+        "Accept-Encoding": "zstd, br, gzip, deflate"
+      }
+    }
+  ],
+  "ignorePatterns": [
+    {
+      "pattern": "^http(s)?://127.0.0.1.*"
+    },
+    {
+      "pattern": "^http(s)?://localhost.*"
+    }
+  ]
+}

+ 20 - 0
scripts/spellcheck.sh

@@ -0,0 +1,20 @@
+# Source: https://github.com/pytorch/torchx/blob/main/scripts/spellcheck.sh
+set -ex
+sudo apt-get install aspell
+
+if [[ -z "$@" ]]; then
+    sources=$(find -name '*.md')
+else
+    sources=$@
+fi
+
+sources_arg=""
+for src in $sources; do
+        sources_arg="${sources_arg} -S $src"
+done
+
+if [ ! "$sources_arg" ]; then
+	echo "No files to spellcheck"
+else
+	pyspelling -c scripts/spellcheck_conf/spellcheck.yaml --name Markdown $sources_arg
+fi

+ 22 - 0
scripts/spellcheck_conf/spellcheck.yaml

@@ -0,0 +1,22 @@
+matrix:
+- name: Markdown
+  apsell:
+    lang: en
+    d: en_US
+  dictionary:
+    wordlists:
+    - scripts/spellcheck_conf/wordlist.txt
+    output: scripts/spellcheck_conf/wordlist.dic
+    encoding: utf-8
+  pipeline:
+  - pyspelling.filters.context:
+      context_visible_first: true
+      delimiters:
+      - open: '(?s)^ *(?P<open>`{3,})[a-z0-9]*?$'
+        close: '^(?P=open)$'
+      - open: ''
+        content: 'https?://[-a-zA-Z0-9.]+?\.[a-z]{2,6}[-?=&%.0-9a-zA-Z/_#]*'
+        close: ''
+  - pyspelling.filters.markdown:
+      markdown_extensions:
+      - markdown.extensions.extra:

File diff suppressed because it is too large
+ 1070 - 0
scripts/spellcheck_conf/wordlist.txt