lchu
|
80a4c36707
further fix #90
|
1 year ago |
Hamid Shojanazeri
|
88d3e1febc
fix the save_train_param condition
|
1 year ago |
Hamid Shojanazeri
|
62be60355a
resolving conflicts
|
1 year ago |
Hamid Shojanazeri
|
017cadd04b
Merge branch 'checkpoint_handler_path_fix' of https://github.com/facebookresearch/llama-recipes into checkpoint_handler_path_fix
|
1 year ago |
Hamid Shojanazeri
|
4f70348b94
remove the redundant lr step
|
1 year ago |
Hamid Shojanazeri
|
5b916114eb
merge main branch
|
1 year ago |
Hamid Shojanazeri
|
668c364f6b
add rank to save_train_params
|
1 year ago |
Hamid Shojanazeri
|
231c9e7da9
adding train_param.yaml saving for fsdp checkpoint loading for inference
|
1 year ago |
Hamid Shojanazeri
|
41dd7ff1cb
Merge branch 'main' into checkpoint_handler_path_fix
|
1 year ago |
Hamid Shojanazeri
|
a955ed1999
added checks for dist barrier and commented cuda exapnadable segements and dist_dbug
|
1 year ago |
Hamid Shojanazeri
|
a2403c7c1a
clean up
|
1 year ago |
Hamid Shojanazeri
|
e9559d2669
fixing the train/eval_loss calcualtion
|
1 year ago |
Hamid Shojanazeri
|
4ba4400a75
adding dist barrier before and after checkpointing
|
1 year ago |
Hamid Shojanazeri
|
a49a2c2804
adding PT cuda allocation expand flag
|
1 year ago |
Hamid Shojanazeri
|
442c1ccf7c
adding barrier to end of trainer loop
|
1 year ago |
Hamid Shojanazeri
|
f74d57dc08
printing scores based on fsdp usage or single gpu
|
1 year ago |
Hamid Shojanazeri
|
3d887ea483
update with active memory and removing rank0 for eval score
|
1 year ago |
Hamid Shojanazeri
|
bedb96b78a
fixing the full state path in checkpoint handler
|
1 year ago |
Hamid Shojanazeri
|
563e572f7c
adding active mem stat
|
1 year ago |
Hamid Shojanazeri
|
bd01f64cbd
Merge branch 'main' into fix-cuda_id
|
1 year ago |
Andrew Gu
|
71fdc4920a
Save memory and fix typos
|
1 year ago |
Hamid Shojanazeri
|
a7156dfb5d
fixing the cuda id
|
1 year ago |
Hamid Shojanazeri
|
707af7ea24
adding cuda:0 for non-fsdp situations
|
1 year ago |
Hamid Shojanazeri
|
6678be75ad
fixing identation
|
1 year ago |
Hamid Shojanazeri
|
6a84e9e4d5
fixing scaler for both fsdp and non fsdp
|
1 year ago |
Hamid Shojanazeri
|
065ddaa77b
fixing the condition for moving to cuda
|
1 year ago |
Hamid Shojanazeri
|
20b061e01c
modify to steping the lr scheduler each epoch
|
1 year ago |
chauhang
|
4767f09ecd
Initial commit
|
1 year ago |