`language_translation` has typo which make loaded tgt tensor invalid

for `_yield_token` implementation in `src/data.py`, the third argument `src` expected to be `True` or `False`
```
# Turns an iterable into a generator
def _yield_tokens(iterable_data, tokenizer, src):

    # Iterable data stores the samples as (src, tgt) so this will help us select just one language or the other
    index = 0 if src else 1

    for data in iterable_data:
        yield tokenizer(data[index])
```

But the actual used argument is `str` (e.g. 'de' or 'en'), which will make `_yield_tokens` always construct `tgt` vocab from `src` tokens, so the loaded tgt tensor was wrong
```
    tgt_vocab = build_vocab_from_iterator(
        _yield_tokens(train_iterator, tgt_tokenizer, tgt_lang), <-- tgt_lang is 'de' or 'en'
        min_freq=1,
        specials=list(special_symbols.keys()),
        special_first=True
```

example of wrong tgt tensor, too much `0` values (which means `unknown`)
```
tensor([[   2,    2,    2,    2,    2,    2,    2,    2],
        [   0,    0,    0,    0,    0,    0,    0,    0],
        [   0,    0,    0,    0,    0,    0,    0,    0],
        [   0,    0,    0,    7,    0,    7,    0,    0],
        [   0,    0,    0,    0, 3425,    0,    0,    0],
        [   0,    0,    7,    0,    0,    0,    0,    0],
        [   0,    0,    0,    0,    0,    0,    0,    0],
        [   0,    0,    0,    0,    0,    0,   28,    0],
        [   7,    5,    0,    0,    0,   15,    5,    0],
        [   0,    3,    0,    0,    0,    0,    3,    0],
        [   0,    1,    5,    0,    5,    0,    1,    0],
        [   0,    1,    3,    0,    3,    0,    1, 5315],
        [   0,    1,    1,    0,    1,    0,    1,    0],
        [   5,    1,    1,    0,    1,    0,    1,    0],
        [   3,    1,    1,    0,    1,    0,    1,    5],
        [   1,    1,    1,    5,    1,    0,    1,    3],
        [   1,    1,    1,    3,    1,    5,    1,    1],
        [   1,    1,    1,    1,    1,    3,    1,    1]], device='cuda:0')
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`language_translation` has typo which make loaded tgt tensor invalid #1355

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

language_translation has typo which make loaded tgt tensor invalid #1355

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`language_translation` has typo which make loaded tgt tensor invalid #1355