subtitles/en/58_what-is-domain-adaptation.srt

﻿1
00:00:00,000 --> 00:00:01,402
(air whooshing)

2
00:00:01,402 --> 00:00:02,720
(smiley snapping)

3
00:00:02,720 --> 00:00:05,910
(air whooshing)

4
00:00:05,910 --> 00:00:07,923
- What is domain adaptation?

5
00:00:09,540 --> 00:00:12,540
When fine-tuning a pre-trained
model on a new dataset,

6
00:00:12,540 --> 00:00:15,480
the fine-tuned model we
obtain will make predictions

7
00:00:15,480 --> 00:00:17,433
that are attuned to this new dataset.

8
00:00:18,840 --> 00:00:21,840
When the two models are
trained with the same task,

9
00:00:21,840 --> 00:00:25,320
we can then compare their
predictions on the same input.

10
00:00:25,320 --> 00:00:27,870
The predictions of the two
models will be different

11
00:00:27,870 --> 00:00:29,790
in a way that reflects the differences

12
00:00:29,790 --> 00:00:31,680
between the two datasets,

13
00:00:31,680 --> 00:00:34,053
a phenomenon we call domain adaptation.

14
00:00:35,310 --> 00:00:38,640
Let's look at an example
with masked language modeling

15
00:00:38,640 --> 00:00:41,910
by comparing the outputs of the
pre-trained DistilBERT model

16
00:00:41,910 --> 00:00:43,080
with the version fine-tuned

17
00:00:43,080 --> 00:00:45,273
in chapter 7 of the course, linked below.

18
00:00:46,500 --> 00:00:49,140
The pre-trained model
makes generic predictions,

19
00:00:49,140 --> 00:00:50,580
whereas the fine-tuned model

20
00:00:50,580 --> 00:00:53,253
has its first two
predictions linked to cinema.

21
00:00:54,390 --> 00:00:57,210
Since it was fine-tuned on
a movie reviews dataset,

22
00:00:57,210 --> 00:00:58,680
it's perfectly normal to see

23
00:00:58,680 --> 00:01:01,440
it adapted its suggestions like this.

24
00:01:01,440 --> 00:01:03,090
Notice how it keeps the same prediction

25
00:01:03,090 --> 00:01:05,220
as the pre-trained model afterward.

26
00:01:05,220 --> 00:01:08,100
Even if the fine-tuned model
adapts to the new dataset,

27
00:01:08,100 --> 00:01:10,450
it's not forgetting what
it was pre-trained on.

28
00:01:11,490 --> 00:01:14,220
This is another example
on a translation task.

29
00:01:14,220 --> 00:01:17,310
On top, we use a pre-trained
French/English model,

30
00:01:17,310 --> 00:01:21,330
and at the bottom, the version
we fine-tuned in chapter 7.

31
00:01:21,330 --> 00:01:23,610
The top model is pre-trained
on lots of texts,

32
00:01:23,610 --> 00:01:25,170
and leaves technical English terms,

33
00:01:25,170 --> 00:01:28,350
like plugin and email,
unchanged in the translation.

34
00:01:28,350 --> 00:01:31,350
Both are perfectly
understood by French people.

35
00:01:31,350 --> 00:01:33,780
The dataset picked for the
fine-tuning is a dataset

36
00:01:33,780 --> 00:01:36,660
of technical texts where
special attention was picked

37
00:01:36,660 --> 00:01:39,150
on translating everything in French.

38
00:01:39,150 --> 00:01:42,090
As a result, the fine-tuned
model picked that habit

39
00:01:42,090 --> 00:01:44,193
and translated both plugin and email.

40
00:01:45,942 --> 00:01:49,181
(air whooshing)

41
00:01:49,181 --> 00:01:50,592
(air whooshing)