Tags: cskyan/apex
Tags
Cherry-pick Megatron-LM's changes in pipeline model parallel for T5 (N… …VIDIA#1232) * update parallel_state * update pipeline common funcs - forward_step and backward_step * update pipelining w/o interleaving * type hint * merge utils into without_interleaving Motivation: functions in utils are only used by forward_backward_pipelining_without_interleaving * fix handling of `model_type` * fix import of DDP * update set_input_tensor method * fix * cosmetic * update model * refactor pipeline test scripts