Merge pull request huggingface#512 from nuass/main

docs(zh-cn): Reviewed No. 23 - What is dynamic padding?
feeeper · Feb 27, 2023 · ace04ee · ace04ee
2 parents b03d1b1 + 593070d
commit ace04ee
Showing 1 changed file with 33 additions and 31 deletions.
diff --git a/subtitles/zh-CN/23_what-is-dynamic-padding.srt b/subtitles/zh-CN/23_what-is-dynamic-padding.srt
@@ -15,12 +15,12 @@ In the "Batching Inputs together" video,
 
 4
 00:00:10,890 --> 00:00:12,720
-我们已经看到能够对输入进行分组
+我们已经看到为了能够对（不同长度的同批次）输入进行分组
 we have seen that to be able to group inputs
 
 5
 00:00:12,720 --> 00:00:15,300
-同一批不同长度的，
+（同一批不同长度的），
 of different lengths in the same batch,
 
 6
@@ -40,12 +40,12 @@ Here, for instance, the longest sentence is the third one,
 
 9
 00:00:24,600 --> 00:00:27,270
-我们需要添加五个、两个或七个填充令牌
+我们需要添加五个、两个或七个填充标记
 and we need to add five, two, or seven pad tokens
 
 10
 00:00:27,270 --> 00:00:30,090
-到其他句子有四个句子
+到其他句子使得四个句子具有
 to the other sentences to have four sentences
 
 11
@@ -65,12 +65,12 @@ there are various padding strategies we can apply.
 
 14
 00:00:37,560 --> 00:00:39,540
-最明显的一种是填充所有元素
+最明显的一种是填充整个数据集所有的样本
 The most obvious one is to pad all the elements
 
 15
 00:00:39,540 --> 00:00:40,923
-数据集的相同长度：
+达到相同的长度：
 of the dataset to the same length:
 
 16
@@ -80,67 +80,67 @@ the length of the longest sample.
 
 17
 00:00:44,070 --> 00:00:45,330
-这会给我们批次
+我们得到具有相同形状的批次
 This will then give us batches
 
 18
 00:00:45,330 --> 00:00:46,890
-都具有相同的形状
+
 that all have the same shape
 
 19
 00:00:46,890 --> 00:00:49,800
-由最大序列长度决定。
+（其长度）由最大序列长度决定。
 determined by the maximum sequence length.
 
 20
 00:00:49,800 --> 00:00:52,893
-缺点是批次由短句组成
+缺点是（如果）批次样本由短句组成
 The downside is that batches composed from short sentences
 
 21
 00:00:52,893 --> 00:00:54,960
-会有很多填充令牌
+将带来很多填充符号
 will have a lot of padding tokens
 
 22
 00:00:54,960 --> 00:00:57,660
-这将在模型中引入更多计算
+并且在模型中引入更多不必要的计算。
 which will introduce more computations in the model
 
 23
 00:00:57,660 --> 00:00:58,910
-我们最终不需要。
+
 we ultimately don't need.
 
 24
 00:01:00,060 --> 00:01:03,300
-为了避免这种情况，另一种策略是填充元素
+为了避免这种情况，另一种策略是填充（短样本）符号
 To avoid this, another strategy is to pad the elements
 
 25
 00:01:03,300 --> 00:01:05,280
-当我们把它们批在一起时，
+当把它们放在一批时，
 when we batch them together,
 
 26
 00:01:05,280 --> 00:01:08,190
-到批次中最长的句子。
+达到本批次中最长句子的长度。
 to the longest sentence inside the batch.
 
 27
 00:01:08,190 --> 00:01:12,000
-这样，由短输入组成的批次会更小
+这样，由短样本输入组成的批次大小
 This way, batches composed of short inputs will be smaller
 
 28
 00:01:12,000 --> 00:01:13,920
-比包含最长句子的批次
+会比按整个数据集最长句子的长度（补齐）批次更小
 than the batch containing the longest sentence
 
 29
 00:01:13,920 --> 00:01:15,510
-在数据集中。
+
 in the dataset.
 
 30
@@ -155,7 +155,7 @@ The downside is that all batches
 
 32
 00:01:20,490 --> 00:01:22,140
-然后会有不同的形状，
+会有不同的形状，
 will then have different shapes,
 
 33
@@ -170,7 +170,7 @@ Let's see how to apply both strategies in practice.
 
 35
 00:01:29,370 --> 00:01:31,280
-我们实际上已经看到了如何应用固定填充
+我们实际上已经知道了如何使用固定填充
 We have actually seen how to apply fixed padding
 
 36
@@ -190,22 +190,22 @@ after loading the dataset and tokenizer,
 
 39
 00:01:38,250 --> 00:01:40,680
-我们将标记化应用于所有数据集
+我们将符号化应用于所有数据集
 we applied the tokenization to all the dataset
 
 40
 00:01:40,680 --> 00:01:42,480
-带填充和截断
+包括填充和截断
 with padding and truncation
 
 41
 00:01:42,480 --> 00:01:45,273
-制作所有长度为 128 的样本。
+保证所有样本的长度为 128 。
 to make all samples of length 128.
 
 42
 00:01:46,530 --> 00:01:48,360
-结果，如果我们传递这个数据集
+最后，如果我们传递这个数据集
 As a result, if we pass this dataset
 
 43
@@ -215,7 +215,8 @@ to a PyTorch DataLoader,
 
 44
 00:01:50,520 --> 00:01:55,503
-我们得到形状批量大小的批次，这里是 16，乘以 128。
+我们得到形状为 batch_size 乘以 16 乘以 128 的批次。
+
 we get batches of shape batch size, here 16, by 128.
 
 45
@@ -230,7 +231,7 @@ we must defer the padding to the batch preparation,
 
 47
 00:02:01,440 --> 00:02:04,740
-所以我们从标记化函数中删除了那部分。
+所以我们从标记函数中删除了那部分。
 so we remove that part from our tokenize function.
 
 48
@@ -295,7 +296,7 @@ We pass it to the PyTorch DataLoader as a collate function,
 
 60
 00:02:35,310 --> 00:02:37,620
-然后观察生成的批次
+然后观察到生成的批次
 then observe that the batches generated
 
 61
@@ -310,12 +311,13 @@ all way below the 128 from before.
 
 63
 00:02:42,660 --> 00:02:44,820
-动态批处理几乎总是更快
+动态批处理几乎在 CPU 和 GPU 上更快，
+
 Dynamic batching will almost always be faster
 
 64
 00:02:44,820 --> 00:02:47,913
-在 CPU 和 GPU 上，所以如果可以的话你应该应用它。
+所以如果可以的话你应该应用它。
 on CPUs and GPUs, so you should apply it if you can.
 
 65
@@ -330,7 +332,7 @@ if you run your training script on TPU
 
 67
 00:02:53,490 --> 00:02:55,293
-或者需要成批的固定形状。
+或者需要固定形状的批次输入。
 or need batches of fixed shapes.
 
 68