forked from fengdu78/Coursera-ML-AndrewNg-Notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path10 - 3 - Model Selection and Traination.srt
913 lines (730 loc) · 27.1 KB
/
10 - 3 - Model Selection and Traination.srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
1
00:00:00,160 --> 00:00:04,570
Suppose you're left to decide what degree of polynomial to fit to a data set.
假如你想要确定对于某组数据最合适的多项式次数是几次(字幕翻译:中国海洋大学 黄海广,[email protected] )
2
00:00:04,570 --> 00:00:08,750
So that what features to include that gives you a learning algorithm.
怎样选用正确的特征来构造学习算法
3
00:00:08,750 --> 00:00:13,160
Or suppose you'd like to choose the regularization parameter longer for
或者假如你需要正确选择学习算法中的正则化参数lambda
4
00:00:13,160 --> 00:00:14,550
learning algorithm.
5
00:00:14,550 --> 00:00:15,830
How do you do that?
你应该怎样做呢?
6
00:00:15,830 --> 00:00:17,510
This account model selection process.
这些问题我们称之为模型选择问题
7
00:00:17,510 --> 00:00:22,411
Browsers, and in our discussion of how to do this, we'll talk about not just how to
在我们对于这一问题的讨论中,我们还将提到不仅
8
00:00:22,411 --> 00:00:27,031
split your data into the train and test sets, but how to switch data into what we
是把你的数据分为:训练集和测试集,而且是如何将数据分为三个数据组:
9
00:00:27,031 --> 00:00:31,020
discover is called the train,validation, and test sets.
也就是训练集、验证集和测试集
10
00:00:31,020 --> 00:00:33,860
We'll see in this video just what these things are, and
在这节视频中我们将会介绍这些内容的含义
11
00:00:33,860 --> 00:00:36,890
how to use them to do model selection.
以及如何使用它们进行模型选择。
12
00:00:36,890 --> 00:00:40,350
We've already seen a lot of times the problem of overfitting,
在前面的学习中,我们已经多次接触到过拟合现象
13
00:00:40,350 --> 00:00:44,380
in which just because a learning algorithm fits a training set well,
在过拟合的情况中学习算法在适用于训练集时表现非常完美,
14
00:00:44,380 --> 00:00:47,490
that doesn't mean it's a good hypothesis.
但这并不代表此时的假设也很完美
15
00:00:47,490 --> 00:00:52,290
More generally, this is why the training set's error is not a good predictor for
更普遍地说,这也是为什么训练集误差通常不能正确预测出
16
00:00:52,290 --> 00:00:56,000
how well the hypothesis will do on new example.
该假设是否能很好地拟合新样本的原因。
17
00:00:56,000 --> 00:00:59,150
Concretely, if you fit some set of parameters.
具体来讲,如果你把这些参数集
18
00:00:59,150 --> 00:01:02,810
Theta0, theta1, theta2, and so on, to your training set.
比如theta1 theta2等等,调整到非常拟合你的训练集
19
00:01:02,810 --> 00:01:07,210
Then the fact that your hypothesis does well on the training set.
那么结果就是你的假设会在训练集上表现地很好。
20
00:01:07,210 --> 00:01:11,630
Well, this doesn't mean much in terms of predicting how well your hypothesis will
但这并不能确定当你的假设推广到训练集之外的新的样本上时
21
00:01:11,630 --> 00:01:15,970
generalize to new examples not seen in the training set.
预测的结果是怎样的
22
00:01:15,970 --> 00:01:18,730
And a more general principle is that once your
而更为普遍的规律是只要
23
00:01:18,730 --> 00:01:21,290
parameter is what fit to some set of data.
你的参数非常拟合某个数据组。
24
00:01:21,290 --> 00:01:23,720
Maybe the training set,maybe something else.
比如说非常拟合训练集,当然也可以是其他数据集。
25
00:01:23,720 --> 00:01:27,900
Then the error of your hypothesis as measured on that same data set,
那么你的假设对于相同数据组的预测其预测误差
26
00:01:27,900 --> 00:01:29,600
such as the training error,
比如说训练误差,
27
00:01:29,600 --> 00:01:33,380
that's unlikely to be a good estimate of your actual generalization error.
是不能够用来推广到一般情况的,或者说是不能作为实际的泛化误差的。
28
00:01:33,380 --> 00:01:38,330
That is how well the hypothesis will generalize to new examples.
也就是说,不能说明你的假设对于新样本的效果。
29
00:01:38,330 --> 00:01:41,170
Now let's consider the model selection problem.
下面我们来考虑模型选择问题。
30
00:01:41,170 --> 00:01:45,100
Let's say you're trying to choose what degree polynomial to fit to data.
假如说你现在要选择能最好地拟合你数据的多项式次数。
31
00:01:45,100 --> 00:01:48,800
So, should you choose a linear function, a quadratic function, a cubic function?
换句话说 你应该选择一次函数、二次函数还是三次函数呢?
32
00:01:48,800 --> 00:01:50,740
All the way up to a 10th-order polynomial.
等等一直到十次函数
33
00:01:51,940 --> 00:01:55,810
So it's as if there's one extra parameter in this algorithm,
所以似乎在这个算法里应该有这样一个参数,
34
00:01:55,810 --> 00:02:01,210
which I'm going to denote d, which is, what degree of polynomial.
这里我用d来表示你应该选择的多项式次数。
35
00:02:01,210 --> 00:02:02,310
Do you want to pick.
36
00:02:02,310 --> 00:02:06,610
So it's as if, in addition to the theta parameters, it's as if
所以,你还要考虑确定一个参数似乎除了你要确定的参数theta之外你还要考虑确定一个参数
37
00:02:06,610 --> 00:02:10,680
there's one more parameter, d, that you're trying to determine using your data set.
你同样需要用你的数据组来确定这个多项式的次数d。
38
00:02:10,680 --> 00:02:14,940
So, the first option is d equals one, if you fit a linear function.
第一个选择是d=1,也就表示线性(一次)方程
39
00:02:14,940 --> 00:02:19,760
We can choose d equals two, d equals three, all the way up to d equals 10.
我们也可以选择d=2或者3等等,一直到d=10
40
00:02:19,760 --> 00:02:24,830
So, we'd like to fit this extra sort of parameter which I'm denoting by d.
因此 我们想确定这个多出来的参数d最适当的取值
41
00:02:24,830 --> 00:02:28,880
And concretely let's say that you want to choose a model,
具体地说,比如你想要选择一个模型,
42
00:02:28,880 --> 00:02:33,110
that is choose a degree of polynomial, choose one of these 10 models.
那就从这10个模型中选择一个最适当的多项式次数
43
00:02:33,110 --> 00:02:37,920
And fit that model and also get some estimate of how well your
并且用这个模型进行估测
44
00:02:37,920 --> 00:02:42,570
fitted hypothesis was generalize to new examples.
预测你的假设能否很好地推广到新的样本上
45
00:02:42,570 --> 00:02:44,130
Here's one thing you could do.
那么你可以这样做
46
00:02:44,130 --> 00:02:49,790
What you could, first take your first model and minimize the training error.
你可以先选择第一个模型,然后求训练误差的最小值
47
00:02:49,790 --> 00:02:53,260
And this would give you some parameter vector theta.
这样你就会得到一个参数向量 theta
48
00:02:53,260 --> 00:02:58,600
And you could then take your second model,the quadratic function, and
然后你再选择第二个模型,二次函数模型
49
00:02:58,600 --> 00:03:01,290
fit that to your training set and this will give you some other.
进行同样的过程这样你会得到另一个
50
00:03:01,290 --> 00:03:03,040
Parameter vector theta.
参数向量 theta
51
00:03:03,040 --> 00:03:06,790
In order to distinguish between these different parameter vectors, I'm going
为了区别这些不同的参数向量theta,
52
00:03:06,790 --> 00:03:11,550
to use a superscript one superscript two there where theta superscript one
我想用上标(1) 上标(2)来表示,这里的上标(1)表示的是
53
00:03:11,550 --> 00:03:16,170
just means the parameters I get by fitting this model to my training data.
在调整第一个模型使其拟合训练数据时得到的参数theta
54
00:03:16,170 --> 00:03:19,130
And theta superscript two just means the parameters I
同样地 theta上标(2)表示的是
55
00:03:19,130 --> 00:03:23,940
get by fitting this quadratic function to my training data and so on.
二次函数在和训练数据拟合的过程中得到的参数,以此类推
56
00:03:23,940 --> 00:03:30,500
By fitting a cubic model I get parenthesis three up to, well, say theta 10.
在拟合三次函数模型时我又得到一个参数theta(3)等等,直到theta(10)
57
00:03:30,500 --> 00:03:36,200
And one thing we ccould do is that take these parameters and look at test error.
接下来我们要做的是对所有这些模型求出测试集误差
58
00:03:36,200 --> 00:03:38,800
So I can compute on my test set J test of one,
因此 我可以算出Jtest(θ(1))
59
00:03:38,800 --> 00:03:46,330
J test of theta two, and so on.
Jtest(θ(2)),等等
60
00:03:47,410 --> 00:03:51,930
J test of theta three, and so on.
Jtest(θ(3)),以此类推
61
00:03:53,050 --> 00:03:57,546
So I'm going to take each of my hypotheses with the corresponding parameters and
也就是对于每一个模型对应的假设
62
00:03:57,546 --> 00:04:00,270
just measure the performance of on the test set.
都计算出其作用于测试集的表现如何
63
00:04:00,270 --> 00:04:05,010
Now, one thing I could do then is, in order to select one of these models,
接下来为了确定选择哪一个模型最好
64
00:04:05,010 --> 00:04:09,160
I could then see which model has the lowest test set error.
我要做的是
65
00:04:09,160 --> 00:04:09,930
And let's just say for
那么我们假设
66
00:04:09,930 --> 00:04:14,480
this example that I ended up choosing the fifth order polynomial.
对于这一个例子最终选择了五次多项式模型
67
00:04:13,880 --> 00:04:15,430
看看这些模型中哪一个对应的测试集误差最小
68
00:04:14,480 --> 00:04:16,940
So, this seems reasonable so far.
这一过程目前看来还比较合理
69
00:04:16,940 --> 00:04:21,060
But now let's say I want to take my fifth hypothesis, this, this,
那么现在 我确定了我使用我的假设也就是这个五次函数模型
70
00:04:21,060 --> 00:04:26,080
fifth order model, and let's say I want to ask, how well does this model generalize?
这个五次函数模型,现在我想知道这个模型能不能推广到新的样本
71
00:04:27,190 --> 00:04:30,560
One thing I could do is look at how well my fifth order
我们可以观察这个五次多项式假设模型
72
00:04:30,560 --> 00:04:34,710
polynomial hypothesis had done on my test set.
对测试集的拟合情况
73
00:04:34,710 --> 00:04:39,450
But the problem is this will not be a fair estimate of how well my
但这里有一个问题是这样做仍然不能公平地说明
74
00:04:39,450 --> 00:04:42,360
hypothesis generalizes.
我的假设推广到一般时的效果
75
00:04:42,360 --> 00:04:48,140
And the reason is what we've done is we've fit this extra parameter d,
其原因在于我们选择了一个能够最好地拟合测试集的参数d的值
76
00:04:48,140 --> 00:04:50,870
that is this degree of polynomial.
及多项式的度
77
00:04:50,870 --> 00:04:54,720
And what fits that parameter d, using the test set, namely,
即我们选择了一个参数d的值
78
00:04:54,720 --> 00:05:00,310
we chose the value of d that gave us the best possible performance on the test set.
我们选择了一个能够最好地拟合测试集的参数d的值
79
00:05:00,310 --> 00:05:06,340
And so, the performance of my parameter vector theta5, on the test set,
因此 我们的参数向量theta(5)在拟合测试集时的结果
80
00:05:06,340 --> 00:05:11,160
that's likely to be an overly optimistic estimate of generalization error.
也就是对测试样本预测误差时,很可能导致一个比实际泛化误差更完美的预测结果
81
00:05:11,160 --> 00:05:15,640
Right, so, that because I had fit this parameter d to my test set is no longer
对吧?因为我是找了一个最能拟合测试集的参数d
82
00:05:15,640 --> 00:05:21,410
fair to evaluate my hypothesis on this test set, because I fit my parameters
因此我再用测试集来评价我的假设,就显得不公平了因为我已经选了一个能够最拟合测试集的参数
83
00:05:21,410 --> 00:05:25,900
to this test set, I've chose the degree d of polynomial using the test set.
我选择的多项式次数d本身就是按照最拟合测试集来选择的
84
00:05:25,900 --> 00:05:29,430
And so my hypothesis is likely to do better on
因此我的假设很可能很好地
85
00:05:29,430 --> 00:05:33,650
this test set than it would on new examples that it hasn't seen before, and
很可能很好地拟合测试集而且这种拟合的效果很可能会比对那些没见过的新样本拟合得更好
86
00:05:33,650 --> 00:05:36,140
that's which is,which is what I really care about.
而我们其实是更关心对新样本的拟合效果的
87
00:05:36,140 --> 00:05:41,050
So just to reiterate, on the previous slide, we saw that if we fit some set of
所以,再回过头来说在前面的幻灯片中我们看到,如果我们
88
00:05:41,050 --> 00:05:45,210
parameters, you know, say theta0, theta1, and so on, to some training set,
用训练集来拟合参数,theta0 theta1 等等参数时
89
00:05:45,210 --> 00:05:50,300
then the performance of the fitted modelon the training set is not predictive of
那么 拟合后的模型在作用于训练集上的效果,是不能预测出
90
00:05:50,300 --> 00:05:53,500
how well the hypothesis willgeneralize to new examples.
我们将这个假设推广到新样本上时其效果如何的
91
00:05:53,500 --> 00:05:56,770
Is because these parameters were fit to the training set,
这是因为这些参数能够很好地拟合训练集,
92
00:05:56,770 --> 00:05:59,110
so they're likely to do well on the training set,
因此它们很有可能在对训练集的预测中表现地很好,
93
00:05:59,110 --> 00:06:03,200
even if the parameters don't do well on other examples.
但对其他的新样本来说 就不一定那么好了
94
00:06:03,200 --> 00:06:07,460
And, in the procedure I just described on this line, we just did the same thing.
而在刚才这一页幻灯片上我讲到的步骤,实际上是在做相同的工作
95
00:06:07,460 --> 00:06:13,060
And specifically, what we did was,we fit this parameter d to the test set.
具体地说,我们是在对测试集进行拟合
96
00:06:13,060 --> 00:06:16,770
And by having fit the parameter to the test set, this means that
而通过拟合测试集得到的参数
97
00:06:16,770 --> 00:06:22,010
the performance of the hypothesis on that test set may not be a fair estimate of how
其假设对于测试样本的预测效果不能公平地估计出
98
00:06:22,010 --> 00:06:26,670
well the hypothesis is, is likely to do on examples we haven't seen before.
这个假设对于未知的新样本的预测效果如何
99
00:06:26,670 --> 00:06:30,630
To address this problem, in a model selection setting,
为了调整这个评价假设时模型选择的问题,
100
00:06:30,630 --> 00:06:35,550
if we want to evaluate a hypothesis, this is what we usually do instead.
我们通常会采用如下的方法来解决
101
00:06:35,550 --> 00:06:40,200
Given the data set, instead of just splitting into a training test set,
如果我们有这样的数据集,我们不要将其仅仅分为训练集和测试集两部分
102
00:06:40,200 --> 00:06:43,930
what we're going to do is then split it into three pieces.
而是分割为三个部分
103
00:06:43,930 --> 00:06:49,130
And the first piece is going to be called the training set as usual.
第一部分和之前一样,也是训练集
104
00:06:50,130 --> 00:06:53,300
So let me call this first part the training set.
那么我们把第一部分还是称为训练集
105
00:06:54,780 --> 00:07:00,056
And the second piece of this data, I'm going to call the cross validation set.
然后第二部分数据,我们将其称为交叉验证集
106
00:07:00,056 --> 00:07:04,711
[SOUND] Cross validation.
交叉验证
107
00:07:04,711 --> 00:07:08,860
And the cross validation, as CV .
我用CV 来作为交叉验证的缩写
108
00:07:08,860 --> 00:07:13,520
Sometimes it's also called the validation set instead of cross validation set.
有时也把交叉验证直接称为验证集
109
00:07:13,520 --> 00:07:18,060
And then the loss can be to call the usual test set.
然后最后这部分数据我们依然称之为测试集
110
00:07:18,060 --> 00:07:21,930
And the pretty, pretty typical ratio at which to split these things will be
那么最典型的比例是分配三组数据
111
00:07:21,930 --> 00:07:24,990
to send 60% of your data's, your training set,
将整个数据的60%分给训练集
112
00:07:24,990 --> 00:07:29,290
maybe 20% to your cross validation set, and 20% to your test set.
然后20%作为验证集,20%作为测试集
113
00:07:29,290 --> 00:07:33,600
And these numbers can vary a little bit but this integration be pretty typical.
当然这些比值可以稍微进行调整,但这种分法是最典型的比例
114
00:07:33,600 --> 00:07:38,922
And so our training sets will now be only maybe 60% of the data, and our
因此现在我们的训练集就是大约60%的总数据,
115
00:07:38,922 --> 00:07:44,860
cross-validation set, or our validation set, will have some number of examples.
我们的交叉验证集或者叫验证集 就会有某个数量的样本
116
00:07:44,860 --> 00:07:47,290
I'm going to denote that m subscript cv.
我用m下标cv来表示这个数量
117
00:07:47,290 --> 00:07:50,860
So that's the number of cross-validation examples.
因此它表示的是交叉验证样本的总数
118
00:07:52,040 --> 00:07:59,110
Following our early notational convention I'm going to use xi cv comma y i cv,
沿用我们之前使用的标记法则,我们还是使用(x(i)cv, y(i)cv)
119
00:07:59,110 --> 00:08:02,720
to denote the i cross validation example.
来表示交叉验证样本
120
00:08:02,720 --> 00:08:07,290
And finally we also have a test set over here with our
最后,我们同样有一个测试集
121
00:08:07,290 --> 00:08:11,420
m subscript test being the number of test examples.
我们用m下标test来表示测试样本的总数
122
00:08:11,420 --> 00:08:14,570
So, now that we've defined the training validation or
所以 这样我们就定义了训练集或
123
00:08:14,570 --> 00:08:16,740
cross validation and test sets.
交叉验证和以及测试集
124
00:08:16,740 --> 00:08:21,420
We can also define the training error, cross validation error, and test error.
同样地我们可以定义训练误差 交叉验证误差和测试误差
125
00:08:21,420 --> 00:08:23,790
So here's my training error, and
因此训练误差可以这样定义
126
00:08:23,790 --> 00:08:26,820
I'm just writing this as J subscript train of theta.
这里我将训练误差写作J下标train(θ)
127
00:08:26,820 --> 00:08:29,030
This is pretty much the same things.
这和前面完全是一个意思
128
00:08:29,030 --> 00:08:32,260
These are the same thing as the J of theta that I've been writing so
这就是我们通常写的J(θ)
129
00:08:32,260 --> 00:08:37,110
far, this is just a training set error you know, as measuring a training set and then
这就是当你对训练集进行预测时,得到的训练集误差
130
00:08:37,110 --> 00:08:41,470
J subscript cv my cross validation error,this is pretty much what you'd expect,
然后J下标cv表示的是验证集误差,
131
00:08:41,470 --> 00:08:45,970
just like the training error you've set measure it on a cross validation data set,
就像训练集误差一样,只不过是对验证样本进行预测得到的结果而已
132
00:08:45,970 --> 00:08:48,450
and here's my test set error same as before.
另外这是我的测试误差也是一样的
133
00:08:49,530 --> 00:08:53,410
So when faced with a model selection problem like this, what we're going to
因此 在考虑像这样的模型选择问题时
134
00:08:53,410 --> 00:08:58,709
do is, instead of using the test set to select the model, we're instead
我们不再使用测试集来进行模型选择
135
00:08:58,709 --> 00:09:04,580
going to use the validation set, or the cross validation set, to select the model.
取而代之的是我们将使用验证集,也叫交叉验证集来选择模型
136
00:09:04,580 --> 00:09:10,570
Concretely, we're going to first take our first hypothesis, take this first model,
具体来讲,我们首先要选取第一种假设,或者说第一个模型
137
00:09:10,570 --> 00:09:13,580
and say, minimize the cross function, and
使得代价函数取最小值
138
00:09:13,580 --> 00:09:17,520
this would give me some parameter vector theta for the new model.
这样我们可以得到对应一次模型的一个参数向量theta
139
00:09:17,520 --> 00:09:20,300
And, as before, I'm going to put a superscript 1,
然后像之前一样,我们也用上标(1)
140
00:09:20,300 --> 00:09:23,560
just to denote that this is the parameter for the new model.
来表示它是一次模型的参数
141
00:09:23,560 --> 00:09:25,660
We do the same thing for the quadratic model.
然后 我们再对二次函数模型进行同样的步骤
142
00:09:25,660 --> 00:09:27,927
Get some parameter vector theta two.
这样我们又得到一个参数向量theta(2)
143
00:09:27,927 --> 00:09:31,601
Get some para,parameter vector theta three, and so
然后是theta(3),等等
144
00:09:31,601 --> 00:09:35,470
on, down to theta ten for the polynomial.
一直到多项式次数为10的情况
145
00:09:35,470 --> 00:09:40,440
And what I'm going to do is, instead of testing these hypotheses on the test set,
接下来我的做法,与之前不同的是,我不是使用测试集来测试这些假设的表现如何
146
00:09:40,440 --> 00:09:43,130
I'm instead going to test themon the cross validation set.
而是使用交叉验证集来测试其预测效果
147
00:09:43,130 --> 00:09:46,600
And measure J subscript cv,
因此我要对每一个模型都算出其对应的Jcv
148
00:09:46,600 --> 00:09:52,180
to see how well each of these hypotheses do on my cross validation set.
来观察这些假设模型中,哪一个能最好地对交叉验证集进行预测
149
00:09:53,250 --> 00:09:57,180
And then I'm going to pick the hypothesis with the lowest cross validation error.
我将选择交叉验证误差最小的那一组假设作为我们的模型
150
00:09:57,180 --> 00:10:00,180
So for this example, let's say for the sake of argument,
因此对于这个例子,为了讨论的方便
151
00:10:00,180 --> 00:10:06,550
that it was my 4th order polynomial, that had the lowest cross validation error.
我们假设四次多项式,对应的交叉验证误差最小
152
00:10:06,550 --> 00:10:11,070
So in that case I'm going to pick this fourth order polynomial model.
那么在这种情况下,我就选择四次多项式模型
153
00:10:11,070 --> 00:10:15,250
And finally,what this means is that that parameter d,
作为我们最终的选择。这表示的是参数d
154
00:10:15,250 --> 00:10:17,200
remember d was the degree of polynomial, right?
别忘了参数d指的是多项式次数
155
00:10:17,200 --> 00:10:20,270
So d equals two, d equals three, all the way up to d equals 10.
d等于2 等于3,一直到d等于10
156
00:10:20,270 --> 00:10:25,040
What we've done is we'll fit that parameter d and we'll say d equals four.
我们做的是我们选择了最合适的d=4
157
00:10:25,040 --> 00:10:27,290
And we did so using the cross-validation set.
我们是使用交叉验证集来确定的这个参数
158
00:10:27,290 --> 00:10:32,320
And so this degree of polynomial, so the parameter, is no longer fit to the test
因此,这个参数d,即这个多项式次数,就没有跟测试集拟合过了
159
00:10:32,320 --> 00:10:39,260
set, and so we've not saved away the test set, and we can use the test set to
这样我们就为测试集留出了一条路,现在我们就能使用
160
00:10:39,260 --> 00:10:44,325
measure, or to estimate the generalization error of the model that was selected.
测试集来预测或者估计,通过学习算法得出的模型的泛化误差了
161
00:10:44,325 --> 00:10:47,680
By the of them.
162
00:10:47,680 --> 00:10:51,140
So, that was model selection and how you can take your data,
因此,这是模型选择问题,包括你应该如何将数据
163
00:10:51,140 --> 00:10:54,310
split it into a training, validation, and test set.
包括你应该如何将数据和测试集
164
00:10:54,310 --> 00:10:57,310
And use your cross validation data to select the model and
以及使用你的交叉验证集,来选择最合适的模型,并且
165
00:10:57,310 --> 00:10:58,570
evaluate it on the test set.
使用测试集进行预测
166
00:10:59,630 --> 00:11:02,860
One final note, I should say that in.
最后我要说明的一点是
167
00:11:02,860 --> 00:11:06,210
The machine learning, as of this practice today, there aren't many
在如今的机器学习应用中,很少有人
168
00:11:06,210 --> 00:11:10,470
people that will do that early thing that
I talked about, and said that, you know,
按照我最开始介绍的那个步骤做的,就像我们前面说过的(使用测试集来做模型选择,然后再使用同样的测试集,来测出误差,评价假设的预测效果)
169
00:11:10,470 --> 00:11:15,360
it isn't such a good idea, of selecting your model using this test set.
这实在不是一个好主意,用测试集来选择你的模型
170
00:11:15,360 --> 00:11:19,590
And then using the same test set to report the error as though
然后使用相同的测试集来报告误差
171
00:11:19,590 --> 00:11:24,120
selecting your degree of polynomial on the test set, and then reporting the error on
通过使用用测试集来选择多项式的次数,然后求测试集的预测误差
172
00:11:24,120 --> 00:11:27,840
the test set as though that were a good estimate of generalization error.
尽管这样做的确能得到一个很理想的泛化误差
173
00:11:27,840 --> 00:11:31,160
That sort of practice is unfortunately many, many people do do it.
但很遗憾确实有很多人就是这样做的
174
00:11:31,160 --> 00:11:35,550
If you have a massive, massive test that is maybe not a terrible thing to do,
如果你有一个很大很大的测试集,也许情况稍微好些
175
00:11:35,550 --> 00:11:38,090
but many practitioners,
但大多数的机器学习
176
00:11:38,090 --> 00:11:42,360
most practitioners that machine
learnimg tend to advise against that.
实践者一般还是建议最好不要这样做
177
00:11:42,360 --> 00:11:45,760
And it's considered better practice to have separate train validation and
最好还是分成训练集、验证集和
178
00:11:45,760 --> 00:11:46,728
test sets.
测试集
179
00:11:46,728 --> 00:11:50,620
I just warned you to sometimes people to do, you know, use the same data for
但就像我之前说的,总有很多人使用同一组数据
180
00:11:50,620 --> 00:11:54,320
the purpose of the validation set, and for the purpose of the test set.
既用来做验证集,也同时用来做测试集
181
00:11:54,320 --> 00:11:57,430
You need a training set and a test set, and that's good,
你只需把数据只分成训练集和测试集,因为这能得到很好的效果
182
00:11:57,430 --> 00:12:00,020
that's practice, though you will see some people do it.
所以你会看到很多人是这么做的
183
00:12:00,020 --> 00:12:03,090
But, if possible, I would recommend against doing that yourself.
但我还是建议尽量不要这样做