NVIDIA / TransformerEngine Public

Notifications You must be signed in to change notification settings
Fork 336
Star 2k

Code
Issues 157
Pull requests 44
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: NVIDIA/TransformerEngine

Labels 37 Milestones 0

New pull request New

44 Open 955 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Update README.rst

#1385 opened Dec 23, 2024 by sbhavani

Loading…

1 of 6 tasks

bug fix for using return_layernorm_output=True

#1382 opened Dec 20, 2024 by LiyuanLucasLiu

Loading…

8 tasks done

[PyTorch] Add caching for attention backend selection results

#1381 opened Dec 19, 2024 by cyanguwa • Draft

8 of 13 tasks

Don't touch nor send messages to the root logger.

#1380 opened Dec 19, 2024 by sagostinho-nvidia

Loading…

4 of 13 tasks

[MoE][PyTorch] Add mask-based MoE permutation

#1373 opened Dec 13, 2024 by hxbai

Loading…

8 of 13 tasks

[common/PyTorch] Add FusedAttention support for SWA (left, right)

#1369 opened Dec 12, 2024 by cyanguwa • Draft

8 of 13 tasks

Add paged attention support

#1355 opened Dec 4, 2024 by cyanguwa

Loading…

8 of 13 tasks

[Draft] Introduce NVSHMEM based communication API for pytorch

#1346 opened Nov 22, 2024 by gdengk • Draft

13 tasks

[PyTorch] Adding TP overlap support for te.Linear with parallel_mode="column" 1.14.0 enhancement

New feature or request

#1343 opened Nov 20, 2024 by denera

Loading…

8 of 13 tasks

[PyTorch] Bugfix for wgrad bulk overlap conflict when dgrad overlap is reduce-scatter bug

Something isn't working

#1341 opened Nov 18, 2024 by denera

Loading…

6 of 13 tasks

[C/JAX] Comm+GEMM Overlap API for TE/JAX enhancement

New feature or request

jax

#1337 opened Nov 15, 2024 by denera • Draft

3 of 13 tasks

[COMMON/JAX] Support sliding window on THD format

#1327 opened Nov 11, 2024 by zlsh80826

Loading…

6 of 13 tasks

Build with uv instead of just pip

#1324 opened Nov 8, 2024 by jennifgcrl

Loading…

5 of 13 tasks

TP communication overlap: enable the overlap between GEMM chunk at Ho…

#1311 opened Nov 4, 2024 by erhoo82

Loading…

1 of 13 tasks

[JAX] Collective GEMM custom op with nvte_cublas_gemm (no comm. overlap) jax

#1307 opened Nov 2, 2024 by denera

Loading…

7 of 17 tasks

[PyTorch] Add heuristics for intializing FP8 params enhancement

New feature or request

#1300 opened Oct 30, 2024 by timmoon10

Loading…

8 of 13 tasks

Offloading example

#1299 opened Oct 29, 2024 by sanandaraj5597

Loading…

[PyTorch] Fix autocast deprecation warnings

#1277 opened Oct 21, 2024 by yaox12

Loading…

13 tasks

[PyTorch] Remove sequence parallel check for setting dropout RNG context

#1272 opened Oct 18, 2024 by ksivaman • Draft

1 of 13 tasks

attention_mask fill with -inf for UnfusedDotProductAttention

#1268 opened Oct 18, 2024 by Agoniii

Loading…

1 of 13 tasks

Draft: reduce cudagraph mem via preoallcations

#1253 opened Oct 15, 2024 by JimmyZhang12

Loading…

13 tasks

[pyTorch] Infrastructure for C++ QuantizedTensor

#1251 opened Oct 14, 2024 by ptrendx • Draft

13 tasks

fused out correction in CP

#1248 opened Oct 14, 2024 by xiaoyao0115

Loading…

12 tasks

Save CUDA Graph memory by reusing input and output tensors

#1234 opened Oct 9, 2024 by buptzyb

Loading…

5 of 13 tasks

[PyTorch] Improve CP P2P efficiency

#1208 opened Sep 26, 2024 by yenchenlin

Loading…

1 of 6 tasks

Previous 1 2 Next

Previous Next

ProTip! Filter pull requests by the default branch with base:main.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly