forked from huggingface/course
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path58_what-is-domain-adaptation.srt
185 lines (144 loc) · 3.2 KB
/
58_what-is-domain-adaptation.srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
1
00:00:00,000 --> 00:00:01,402
(air whooshing)
2
00:00:01,402 --> 00:00:02,720
(smiley snapping)
3
00:00:02,720 --> 00:00:05,910
(air whooshing)
4
00:00:05,910 --> 00:00:07,923
- What is domain adaptation?
5
00:00:09,540 --> 00:00:12,540
When fine-tuning a pre-trained
model on a new dataset,
6
00:00:12,540 --> 00:00:15,480
the fine-tuned model we
obtain will make predictions
7
00:00:15,480 --> 00:00:17,433
that are attuned to this new dataset.
8
00:00:18,840 --> 00:00:21,840
When the two models are
trained with the same task,
9
00:00:21,840 --> 00:00:25,320
we can then compare their
predictions on the same input.
10
00:00:25,320 --> 00:00:27,870
The predictions of the two
models will be different
11
00:00:27,870 --> 00:00:29,790
in a way that reflects the differences
12
00:00:29,790 --> 00:00:31,680
between the two datasets,
13
00:00:31,680 --> 00:00:34,053
a phenomenon we call domain adaptation.
14
00:00:35,310 --> 00:00:38,640
Let's look at an example
with masked language modeling
15
00:00:38,640 --> 00:00:41,910
by comparing the outputs of the
pre-trained DistilBERT model
16
00:00:41,910 --> 00:00:43,080
with the version fine-tuned
17
00:00:43,080 --> 00:00:45,273
in chapter 7 of the course, linked below.
18
00:00:46,500 --> 00:00:49,140
The pre-trained model
makes generic predictions,
19
00:00:49,140 --> 00:00:50,580
whereas the fine-tuned model
20
00:00:50,580 --> 00:00:53,253
has its first two
predictions linked to cinema.
21
00:00:54,390 --> 00:00:57,210
Since it was fine-tuned on
a movie reviews dataset,
22
00:00:57,210 --> 00:00:58,680
it's perfectly normal to see
23
00:00:58,680 --> 00:01:01,440
it adapted its suggestions like this.
24
00:01:01,440 --> 00:01:03,090
Notice how it keeps the same prediction
25
00:01:03,090 --> 00:01:05,220
as the pre-trained model afterward.
26
00:01:05,220 --> 00:01:08,100
Even if the fine-tuned model
adapts to the new dataset,
27
00:01:08,100 --> 00:01:10,450
it's not forgetting what
it was pre-trained on.
28
00:01:11,490 --> 00:01:14,220
This is another example
on a translation task.
29
00:01:14,220 --> 00:01:17,310
On top, we use a pre-trained
French/English model,
30
00:01:17,310 --> 00:01:21,330
and at the bottom, the version
we fine-tuned in chapter 7.
31
00:01:21,330 --> 00:01:23,610
The top model is pre-trained
on lots of texts,
32
00:01:23,610 --> 00:01:25,170
and leaves technical English terms,
33
00:01:25,170 --> 00:01:28,350
like plugin and email,
unchanged in the translation.
34
00:01:28,350 --> 00:01:31,350
Both are perfectly
understood by French people.
35
00:01:31,350 --> 00:01:33,780
The dataset picked for the
fine-tuning is a dataset
36
00:01:33,780 --> 00:01:36,660
of technical texts where
special attention was picked
37
00:01:36,660 --> 00:01:39,150
on translating everything in French.
38
00:01:39,150 --> 00:01:42,090
As a result, the fine-tuned
model picked that habit
39
00:01:42,090 --> 00:01:44,193
and translated both plugin and email.
40
00:01:45,942 --> 00:01:49,181
(air whooshing)
41
00:01:49,181 --> 00:01:50,592
(air whooshing)