Tags: TrustLLMeu/Megatron-LM
Tags
Do not include `mmap`s in `_IndexReader` pickle Shouldn't affect `IndexedDataset`, since that, in turn, doesn't include the `_IndexReader` in its pickle, but this way it's just safer and more future-proof.
Do not include `mmap`ed objects in `pickle`ing We rewrite the `pickle` protocol for the datasets because this avoids running out of memory on some machines.