forked from cooplab/popgen-notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathOne_locus_selection.tex
1887 lines (1678 loc) · 119 KB
/
One_locus_selection.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\chapter{One-Locus Models of Selection.}
\label{Chapter:OneLocusSelection}
\begin{quotation}
``Socrates consisted of the genes his parents gave him, the experiences they and his environment later provided, and a growth and development mediated by numerous meals. For all I know, he may have been very successful in the evolutionary sense of leaving numerous offspring. His phenotype, nevertheless, was utterly destroyed by the hemlock and has never since been duplicated. The same argument holds also for genotypes. With Socrates' death, not only did his phenotype disappear, but also his genotype.[...] The loss of Socrates' genotype is not assuaged by any consideration of how prolifically he may have reproduced. Socrates' genes may be with us yet, but not his genotype, because meiosis and recombination destroy genotypes as surely as death." --\citet{Williams:66}
\end{quotation}
Individuals are temporary, their phenotypes are temporary, and their
genotypes are temporary. However, the alleles that individuals
transmit across generations have permanence. Sustained phenotypic
evolutionary change due to natural selection occurs because of changes
in the allelic composition of the population. To understand these
changes, we need to understand how the frequency of alleles (genes)
changes over time due to natural selection. We'll also see that the
because an individual's genotype is just a ephemeral collection of
alleles that genetic conflicts can arise that actually lower the fitness of
individuals.
As we have seen, natural selection occurs when there are differences between individuals in fitness. We may define fitness in various ways. Most commonly, it is defined with respect to the contribution of a phenotype or genotype to the next generation.
Differences in fitness can arise at any point during the life
cycle. For instance, different genotypes or phenotypes may have
different survival probabilities from one stage in their life to the
stage of reproduction (viability), or they may differ in the number of
offspring produced (fertility), or both. Here, we define the absolute
fitness of a genotype as the expected number of offspring of an
individual of that genotype. Differences in fitness among genotypes
drive allele frequency change. In this chapter we'll study the
dynamics of alleles at a single locus. In this chapter we'll ignore
the effects of genetic drift, and just study the deterministic
dynamics of selection. We'll return to discuss the interaction of
selection and drift in a couple of chapters.\\
\subsection{Haploid selection model}
\begin{quotation}
``The dream of every cell is to become two cells.'' -- Francois Jacob.
\end{quotation}
We start out by modeling selection in a haploid model, as this is mathematically relatively simple.
Let the number of individuals carrying alleles $A_1$ and $A_2$ in generation $t$ be $P_t$ and $Q_t$. Then, the relative frequencies at time $t$ of alleles $A_1$ and $A_2$ are $p_t = P_t / (P_t + Q_t)$ and $q_t = Q_t / (P_t + Q_t) = 1 - p_t$. Further, assume that individuals of type $A_1$ and $A_2$ on average produce $W_1$ and $W_2$ offspring individuals, respectively. We call $W_i$ the absolute fitness.\\
Therefore, in the next generation, the absolute number of carriers of $A_1$ and $A_2$ are $P_{t+1} = W_1 P_t$ and $Q_{t+1} = W_2 Q_t$, respectively. The mean absolute fitness of the population at time $t$ is
\begin{equation}
\label{eq:meanAbsFit}
\Wbar_t = W_1 \frac{P_t}{P_t + Q_t} + W_2 \frac{Q_t}{P_t + Q_t} = W_1 p_t + W_2 q_t,
\end{equation}
i.e.\ the sum of the fitness of the two types weighted by their
relative frequencies. Note that the mean fitness depends on time, as
it is a function of the allele frequencies, which are themselves time
dependent. \\
As an example of a rapid response to selection on an allele in a haploid population, we can consider some data on the evolution of drug resistant viruses. \citet{feder2017} studied viral dynamics in a macaque infected with a strain of simian immunodeficiency virus (SHIV) that carries the HIV-1 reverse transcriptase coding region. \marginnote{The main focus of \citeauthor{feder2017}'s work was modeling the complicated spatial dynamics of drug-resistant SHIV adaptation in different organ systems. } The viral load of the macaque's blood plasma is shown as a black line in Figure \ref{fig:HIV_viral_freqs}. Twelve weeks after infection, the macaque was treated with an anti-retroviral drug that targeted the the virus' reverse transcriptase protein. Note how the viral load initially starts to drop once the drug is administered, suggesting that the absolute fitness of the original strain is less than one ($W_{2}<1$) in the presence of the drug (as their numbers are decreasing). However, the viral population rebounds as a mutation that confers drug resistance to the anti-retroviral drug arises in the SHIV and starts to spread. Viruses carrying this mutation (let's call them allele $1$) likely have absolute fitness $W_1>1$. The frequency of the drug-resistant allele is shown in red; it quickly spreads from being undetectable in week 13, to being fixed in the SHIV population in week 20.
\begin{figure}
\begin{center}
\includegraphics[width= \textwidth]{Journal_figs/single_locus_selection/Feder_HIV/Feder_HIV.pdf}
\end{center}
\caption{The rapid evolution of drug-resistant SHIV. The viral load of
SHIV in the blood of a macaque (black line), the frequency of a drug
resistance mutation (red line). Data from \citet{feder2017}. \gitcode{https://github.com/cooplab/popgen-notes/blob/master/Journal_figs/single_locus_selection/Feder_HIV/Feder_HIV.R}} \label{fig:HIV_viral_freqs}
\end{figure}
The rapid spread of this drug-resistant allele through the population is driven by the much greater relative fitness of the drug-resistant allele over the original strain in the presence of the anti-retroviral drug.
The frequency of allele $A_1$ in the next generation is given by
\begin{equation}
\label{eq:eq:recHaplMod1}
p_{t+1} = \frac{P_{t+1}}{P_{t+1} + Q_{t+1}} = \frac{W_1 P_t}{W_1 P_t + W_2 Q_t}
%= \frac{W_1 (P_t + Q_t)p_t}{W_1 (P_t + Q_t)p_t + W_2 (P_t + Q_t)q_t}
= \frac{W_1 p_t}{W_1 p_t + W_2 q_t} = \frac{W_1}{\Wbar_t} p_t.
\end{equation}
Importantly, eqn.\ (\ref{eq:eq:recHaplMod1}) tells us that the change in $p$ only depends on a ratio of fitnesses. Therefore, we need to specify fitness only up to an arbitrary constant. As long as we multiply all fitnesses by the same value, that constant will cancel out and eqn.\ (\ref{eq:eq:recHaplMod1}) will hold. Based on this argument, it is very common to scale absolute fitnesses by the absolute fitness of one of the genotypes, e.g.\ the most or the least fit genotype, to obtain relative fitnesses. Here, we will use $w_i$ for the relative fitness of genotype $i$. If we choose to scale by the absolute fitness of genotype $A_1$, we obtain the relative fitnesses $w_1 = W_1/W_1 = 1$ and $w_2 = W_2/W_1$.\\
Without loss of generality, we can therefore rewrite eqn.\ (\ref{eq:eq:recHaplMod1}) as
\begin{equation}
\label{eq:recHaplMod2}
p_{t+1} = \frac{w_1}{\wbar} p_t,
\end{equation}
dropping the subscript $t$ for the dependence of the mean fitness on time in our notation, but remembering it.
The change in frequency from one generation to the next is then given by
\begin{equation}
\Delta p_t = p_{t+1} - p_t= \frac{ w_1 p_t}{ \wbar} - p_t = \frac{w_1 p_t - \wbar p_t}{\wbar} = \frac{w_1 p_t - (w_1 p_t + w_2 q_t) p_t}{\wbar} = \frac{w_1 - w_2}{\wbar} p_t q_t,
\label{eq:deltap_haploid}
\end{equation}
recalling that $q_t = 1 - p_t$.\\
Assuming that the fitnesses of the two alleles are constant over time,
the number of the two allelic types $\tau$ generations after time $0$ are
$P_{\tau} = (W_1)^{\tau} P_0$ and $Q_{\tau}= (W_2)^{\tau} Q_0$, respectively. Therefore, the relative frequency of allele $A_1$ after $\tau$ generations past $t$ is
\begin{equation}
p_{\tau} = \frac{ (W_1)^{\tau} P_0}{ (W_1)^{\tau} P_0+(W_2)^{\tau} Q_0} = \frac{ (w_1)^{\tau} P_0}{ (w_1)^{\tau} P_0+(w_2)^{\tau} Q_0} = \frac{p_0}{p_0 + (w_2/w_1)^{\tau} q_0},
\label{eq:haploid_tau_gen}
\end{equation}
where the last step includes dividing the whole term by $(w_1)^{\tau}$ and switching from absolute to relative allele frequencies.
%Rearranging eqn.\ \eqref{eq:haploid_tau_gen} and setting $t = 0$, we can work out the time $\tau$ for the frequency of $A_1$ to change from $p_0$ to $p_{\tau}$. First, we write
%\begin{equation}
% p_{\tau} = \frac{p_0}{p_0 + (w_2/w_1)^{\tau} q_0}
%\end{equation}
Rearrange this to obtain
\begin{equation}
\label{eq:estTau}
\frac{p_{\tau}}{q_{\tau}} = \frac{p_0}{q_0} \left(\frac{w_1}{w_2}\right)^{\tau}.
\end{equation}
Solving this for $\tau$ yields
\begin{equation}
\label{eq:solTau}
\tau = \log \left(\frac{p_{\tau} q_0}{q_{\tau} p_0}\right) / \log\left( \frac{w_1}{w_2} \right).
\end{equation}
\\
In practice, it is often helpful to parametrize the relative fitnesses $w_i$ in a specific way. For example, we may set $w_1 = 1$ and $w_2 = 1 - s$, where $s$ is called the selection coefficient. Using this parametrization, $s$ is simply the difference in relative fitnesses between the two alleles. Equation \eqref{eq:haploid_tau_gen} becomes
\begin{equation}
\label{eq:haploid_tau_gen_expl}
p_{\tau} = \frac{p_{0}}{p_0 + q_0 (1 - s)^{\tau}},
\end{equation}
as $w_2 / w_1 = 1 - s$. Then, if $s \ll 1$, we can approximate $(1-s)^{\tau}$ in the denominator by $\exp(-s\tau)$ to obtain
\begin{equation} \label{eq:haploid_logistic growth}
p_{\tau} \approx \frac{p_0}{p_0 + q_0 e^{-s\tau}}.
\end{equation}
This equation takes the form of a logistic function. That is because
we are looking at the relative frequencies of two `populations' (of
alleles $A_1$ and $A_2$) that are growing (or declining)
exponentially, under the constraint that $p$ and $q$ always sum to 1. \\
Moreover, eqn.\ \eqref{eq:estTau} for the number of generations $\tau$ it takes for a certain change in frequency to occur becomes
\begin{equation}
\label{eq:estTauExpl}
\tau = - \log \left(\frac{p_{\tau} q_0}{q_{\tau} p_0}\right) / \log\left(1-s\right).
\end{equation}
Assuming again that $s \ll 1$, this simplifies to
\begin{equation}
\label{eq:estTauExplSimpl}
\tau \approx \frac{1}{s} \log \left(\frac{p_{\tau} q_0}{q_{\tau} p_0}\right).
\end{equation}
One particular case of interest is the time it takes to go from an absolute
frequency of 1 to near fixation in a population of size $N$. In this case, we
have $p_0 = 1/N$, and we may set $p_{\tau} = 1 - 1/N$, which is very close to
fixation. Then, plugging these values into eqn.\ \eqref{eq:estTauExplSimpl}, we
obtain
\begin{align}
\tau &= \frac{1}{s} \log\left( \frac{1 - \nicefrac{2}{N} +
\nicefrac{1}{N^2}}{\nicefrac{1}{N^2}} \right) \nonumber \\
&\approx \frac{1}{s} (\log(N) + \log(N-2)) \nonumber \\
&\approx \frac{2}{s} \log(N) \label{eq:fixTimeSimpl}
\end{align}
%
where we make the approximations $N^2 - 2N + 1 \approx N^2 - 2N$ and later
$N-2 \approx N$.
\begin{question}
In our example of the evolution of drug resistance, the drug-resistant SHIV virus spread from undetectable frequencies to $\sim 65\%$ frequency by 16 weeks post infection. An estimated effective population size of SHIV is $1.5 \times 10^5$, and its generation time is $\sim 1$ day. Assuming that the mutation arose as a single copy allele very shortly the start of drug treatment at 12 weeks, what is the selection coefficient favouring the drug resistance allele?
\end{question}
%\begin{question}
%You are studying the frequency of antibiotic-resistant bacteria in a
%patient. Before administering the antibiotic the frequency of the
%resistance allele is $\nicefrac{1}{1000}$. You adminster the antibiotic,
%alarming just 8 days later you find the frequency %of the
%resistance allele to be $99\%$. Assume a generation time of $\nicefrac{1}{4}$ a day for
%these bacteria. \\
%What is the selection coefficient associated with the resistance to antibiotics?
%\end{question}
\subsection{Diploid model}
\begin{marginfigure}
\begin{center}
\includegraphics[width= \textwidth]{illustration_images/single_locus_selection/cow_auroch/auroch.png}
\end{center}
\caption{Auroch ({\it Bos primigenius}). Aurochs are an extinct species of large wild cattle that cows
were domesticated from. \IANC{Dictionnaire des sciences naturelles. 1816 Cuvier,
F.G. }{https://www.flickr.com/photos/internetarchivebookimages/20713828960/in/photolist-owtfpr-owkoQN-obMTQg-owc9mc-rgpdRz-otq59G-oeZFD1-ottAnF-otuXhK-odKHJY-oqYxSb-oviGuD-ytox4c-owa3cJ-yc73Ji-wtrahu-ouf1fo-wXHoQ6-t97h27-owfa8h-xisfNf-waBt8s-x8859A-xwY4eG-wpCm8P-oev6vL-oy1AhH-tNJj8g-xGgALJ-x2kj8g-xDphGC-oxvRgt-x8eFQp-xypMG5-wKqr2k-xnCp1u-xpC2zS-wt5Lpp-xUjHhG-wGJBAQ-wv5dnr-xqLVc3-wPhru1}{NCSU Libraries}} \label{fig:auroch}
\end{marginfigure}
We will now move on to a diploid model of a single locus with two segregating alleles. As an example of the change in the frequency of an allele driven by selection, let's consider the evolution of lactase persistence. A number of different human populations that historically have raised cattle have convergently evolved to maintain the expression of the protein lactase into adulthood (in most mammals the protein is switched off after childhood), with different lactase-persistence mutations having arisen and spread in different pastoral human populations.
This continued expression of lactase allows adults to break down lactose, the main carbohydrate in milk, and so benefit nutritionally from milk-drinking. This seems to have offered a strong fitness benefit to individuals in pastoral populations.
With the advent of techniques to sequence ancient human DNA, researchers can now potentially track the frequency of selected mutations over thousands of years. The frequency of a lactase persistence allele in ancient Central European populations is shown in Figure \ref{fig:LCT_freqs}. The allele is absent more than 5,000 years ago, but now found at frequency of upward of $70\%$ in many European populations.
\begin{figure}
\begin{center}
\includegraphics[width= \textwidth]{Rcode/Lactase_example/Lactase_freq_time.pdf}
\end{center}
\caption{Frequency of the lactase persistence allele in ancient and
modern samples form Central Europe. Data compiled by
\citet{marciniak2017} from various sources. Thanks to Stephanie
Marciniak for sharing these data. \gitcode{https://github.com/cooplab/popgen-notes/blob/master/Rcode/Lactase_example/Lactase_plots.R}} \label{fig:LCT_freqs}
\end{figure}
We will assume that the difference in fitness between the three
genotypes comes from differences in viability, i.e.\ differential
survival of individuals from the formation of zygotes to reproduction.
We denote the absolute fitnesses of genotypes $A_1A_1$, $A_1A_2$, and $A_2A_2$ by $W_{11}$, $W_{12}$, and $W_{22}$. Specifically, $W_{ij}$ is the probability that a zygote of genotype $A_iA_j$ survives to reproduction.
Assuming that individuals mate at random, the number of zygotes that
are of the three genotypes in generation $t$ are
\begin{equation}
Np_t^2, ~~~ N2p_tq_t, ~~~ Nq_t^2.
\end{equation}
\marginnote{These diploid models of selection were first laid out in
\citet{fisher1923xxi}, \citet{haldane1924}, and
\citet{wright1931evolution}. \citet{haldane1924} marked the
start of a series of ten papers, over ten years, where Haldane worked through the
implications and applications of these models.
}
The mean fitness of the population of zygotes is then
\begin{equation}
\Wbar_t = W_{11} p_t^2+W_{12} 2p_tq_t + W_{22} q_t^2.
\end{equation}
Again, this is simply the weighted mean of the genotypic fitnesses.
\\
How many zygotes of each of the three genotypes survive to reproduce? \erin{plot needs caption and title}
An individual of genotype $A_1A_1$ has a probability of $W_{11}$ of
surviving to reproduce, and similarly for other genotypes. Therefore, the expected number of $A_1A_1$, $A_1A_2$, and $A_2A_2$ individuals who survive to reproduce is
\begin{equation}
NW_{11} p_t^2, ~~~ NW_{12} 2p_tq_t , ~~~ N W_{22} q_t^2.
\end{equation}
It then follows that the total number of individuals who survive to
reproduce is
\begin{equation}
N \left(W_{11} p_t^2+W_{12} 2p_tq_t + W_{22} q_t^2 \right).
\end{equation}
This is simply the mean fitness of the population multiplied by the
population size (i.e.\ $N \wbar$).\\
The relative frequency of $A_1A_1$ individuals at reproduction
is simply the number of $A_1A_1$ genotype individuals at reproduction ($NW_{11} p_t^2$)
divided by the total number of individuals who survive to reproduce
($N \Wbar$), and likewise for the other two genotypes.
Therefore, the relative frequency of individuals with the three different genotypes at reproduction is
\begin{equation}
\frac{NW_{11} p_t^2}{N\Wbar}, ~~~ \frac{NW_{12} 2p_tq_t}{N\Wbar} , ~~~ \frac{N W_{22} q_t^2}{N\Wbar}
\end{equation}
(see Table \ref{dip_fitness_table}).\\
\begin{table*}
\begin{center}
\begin{tabular}{lccc}
\hline
& $A_1A_1$ & $A_1A_2$ & $A_2A_2$\\
\hline
Absolute no. at birth & $Np_t^2$ & $N2p_tq_t$ & $Nq_t^2$\\
Fitnesses & $W_{11}$ & $W_{12}$& $W_{22}$\\
Absolute no.\ at reproduction & $NW_{11} p_t^2$ & $NW_{12} 2p_tq_t$& $N W_{22} q_t^2$\\
Relative freq.\ at reproduction & $ \frac{W_{11}}{\Wbar} p_{t}^2$ & $ \frac{W_{12}}{\Wbar} 2 p_{t} q_{t}$ & $\frac{W_{22}}{\Wbar} q_{t}^2$\\
\end{tabular}
\end{center}
\caption{Relative genotype frequencies after one episode of viability selection.} \label{dip_fitness_table}
\end{table*}
%\gc{Dobzhansky}
%\begin{center}
%\begin{tabular}{lccc}
%\hline
% & ST/ST & ST/CH & CH/CH \\ ##From Evolution encylopedia
%Eggs & 41 & 82 &27\\
%Adults & 25 & 74 & 12\\
%\end{tabular}
%\end{center}
As there is no difference in the fecundity of the three genotypes, the
allele frequencies in the zygotes forming the next generation are simply the
allele frequency among the reproducing individuals of the previous generation. Hence, the frequency of $A_1$ in generation $t+1$ is
\begin{equation}
p_{t+1} = \frac{W_{11} p_t^2 + W_{12} p_tq_t}{\Wbar}
\label{pgen_dip}.
\end{equation}
\erin{it might help students understand this equation more to mention here that the expected 2 in 2pq is being cancelled out by the 1/2 for only one A1 allele in hets...or to just include that step though it's fairly trivial} Note that, again, the absolute value of the fitnesses is irrelevant to
the frequency of the allele. Therefore, we can just as easily replace
the absolute fitnesses with the relative fitnesses. That is, we may replace $W_{ij}$ by $w_{ij} = W_{ij}/W_{11}$, for instance. \\
Each of our genotype frequencies is responding to selection in a
manner that depends just on its fitness compared to the mean fitness
of the population. For example, the frequency of the $A_1A_1$ homozygotes
increases from birth to adulthood in proportion to $\nicefrac{W_{11}}{\Wbar}$. In
fact, we can estimate this fitness ratio for each genotype by comparing
the frequency at birth compared to adults. As an example of this calculation, we'll
look at some data from sticklebacks.
\begin{marginfigure}
\begin{center}
\includegraphics[width= 1.2 \textwidth]{illustration_images/single_locus_selection/Stickleback/Gasterosteus_aculeatus_1879.jpg}
\end{center}
\caption{Freshwater threespine stickleback ({\it
G. aculeatus}). \BHLNC{British fresh-water fishes. Houghton W
1879.}{https://commons.wikimedia.org/wiki/File:Gasterosteus_aculeatus_1879.jpg}{Ernst
Mayr Library, Harvard.}} \label{fig:stickleback}
\end{marginfigure}
Marine threespine stickleback ({\it Gasterosteus aculeatus})
independently colonized and adapted to many freshwater lakes
as glaciers receded following the last ice age, making sticklebacks a wonderful system for studying the genetics of adaptation. In marine habitats, most of the stickleback have armour plates to protect them
from predation, but freshwater populations repeatedly evolve the
loss of armour plates due to selection on an allele at the
Ectodysplasin gene (EDA). This allele is found as a standing variant at very low frequency marine populations;
\citet{Barrett:08} took advantage of this fact and collected and bred
a population of marine individuals carrying both the low- (L) and
completely- plated (C) alleles. They introduced the offspring of this
cross into four freshwater ponds and monitored genotype frequencies
\sidenote{The actual dynamics observed by \citeauthor{Barrett:08} are more complicated, as in the very young fish selection reverses direction.}
over their life courses:
\begin{center}
\begin{tabular}{lccc}
& CC & LC & LL \\
Juveniles & 0.55 & 0.23 & 0.22\\
Adults & 0.21 & 0.53 & 0.26\\
Adults/Juv. ($W_{\bullet}/\Wbar$) & 0.4 & 2.3 & 1.2 \\
rel. fitness ($W_{\bullet}/W_{12}$) & 0.17 & 1.0 & 0.54 \\
\end{tabular}
\end{center}
\erin{I changed the fitness calculated above in the 3rd row to being labelled Adults/Juv. not Juv./Adult unfortunately after notes were distributed to class} The heterozygotes have increased in frequency dramatically in the
population as their fitness is more than double the mean fitness of
the population. We can also calculate the relative fitness of each
genotype by dividing through by the fitness of the fittest genotype,
the heterozygote in this case (doing this cancels through
$\Wbar$). The relative fitness of the $CC$ is $\sim 1/5$ of the
heterozygote. Note that this calculation does not rely on the genotype frequencies being at their HWE in the juveniles.
\begin{question}{}
{\bf A)} What is the frequency of the low-plated EDA allele ($L$) at the start of the stickleback experiment? \\
{\bf B)} What is the frequency in the adults? \\
{\bf C)} Calculate the frequency in adults, this time by using the
relative fitnesses.
\end{question}
The change in frequency from generation $t$ to $t+1$ is
\begin{equation}
\Delta p_t = p_{t+1} -p_{t}= \frac{w_{11} p_t^2 + w_{12} p_tq_t}{\wbar} - p_t. \label{deltap_dip1}
\end{equation}
To simplify this equation, we will first define two variables $\wbar_1$ and $\wbar_2$ as
\begin{eqnarray}
\wbar_1 & = w_{11} p_t + w_{12} q_t, \\
\wbar_2 & = w_{12} p_t+ w_{22} q_t.
\end{eqnarray}
These are called the marginal fitnesses of allele $A_1$
and $A_2$, respectively. They are so called as $\wbar_1$ is the
average fitness of an allele $A_1$, i.e.\ the fitness of $A_1$ in a
homozygote weighted by the probability it is in a homozygote ($p_t$)
plus the fitness of $A_1$ in a
heterozygote weighted by the probability it is in a heterozygote
($q_t$). \sidenote{The marginal fitnesses are also the phenotypic additive effects
of our two alleles on fitness, defined in \eqn
\eqref{eqn:add_effect1} and \eqref{eqn:add_effect1}}.
We further note that the mean relative fitness can be expressed in terms of the marginal fitnesses as
\begin{equation}
\label{eq:meanFitInTermsOfMargFit}
\wbar = \wbar_1 p_t + \wbar_2 q_t,
\end{equation}
where, for notational simplicity, we have omitted subscript t for the dependence of mean and marginal fitnesses on time.\\
We can then rewrite eqn.\ \eqref{deltap_dip1} using $\wbar_1$ and $\wbar_2$ as
\begin{equation}
\Delta p_t = \frac{ (\wbar_1-\wbar_2)}{\wbar} p_t q_t.
\label{deltap_dip2}
\end{equation}
The sign of $\Delta p_t$, i.e. whether allele $A_1$ increases of decreases
in frequency, depends only on the sign of
$(\wbar_1-\wbar_2)$. \sidenote{This difference between our marginal
fitnesses is the difference between the additive effects of the two
alleles, thus it is also the regression slope ($\alpha_{\ell}$) of the fitness (phenotype) on
additive genotype ($0$, $1$, $2$) see discussion around \eqn \eqref{eqn:additive_var_additive_effect}.}
The frequency of $A_1$ will keep increasing over the generations so
long as its marginal fitness is higher than that of $A_2$,
i.e.\ $\wbar_1 > \wbar_2$, while if $\wbar_1 < \wbar_2$, the
frequency of $A_1$ will decrease. Note the similarity between eqn.\ \eqref{deltap_dip2} and the respective expression for the haploid model in eqn.\ \eqref{eq:deltap_haploid}. (We will return to the
special case where $\wbar_1 = \wbar_2$ shortly).\\
We can also rewrite \eqref{deltap_dip1} as
\begin{equation}
\Delta p_t =\frac{1}{2} \frac{p_tq_t}{\wbar} \frac{d \wbar}{dp},
\label{deltap_dip3}
\end{equation}
\marginnote{To see this we can write
\begin{align*}
\frac{d\bar{w}}{dp} &= \frac{d}{dp} \left( W_{11} p^2 + 2 W_{12} p \right. \nonumber\\
& ~~ \left. - 2 W_{12} p^2 + W_{22} - 2 W_{22} p + W_{22} p^2\right) \nonumber\\
&= 2\left(w_{11} p + w_{12} - 2pw_{12} - w_{22} - w_{22} + w_{22} p\right)
\end{align*}
On expansion of $\bar{w}_1 - \bar{w}_2$, we see that it matched the terms in
the parentheses in the expression above. Thus, we see that we can replace
$\bar{w}_1 - \bar{w}_2$ with $\nicefrac{1}{2} \frac{d\bar{w}}{dp}$.
}
This form shows that the frequency of $A_1$ will increase ($\Delta p_t > 0$) if the mean fitness is an increasing function of the frequency of $A_1$ (i.e.\ if $\frac{d \wbar}{dp}>0$). On the other hand, the frequency of $A_1$ will decrease ($\Delta p_t < 0$) if the mean fitness is a decreasing function of the frequency of $A_1$ (i.e.\ if $\frac{d \wbar}{dp}<0$).
%This form shows that
%$\Delta p_t$ in increase if $\frac{d \wbar}{dp}>1$, i.e. increasing the
%frequency of $1$ increases the mean fitness, while the frequency of
%the allele with decrease if this increases the mean fitness of the
%population ($\frac{d \wbar}{dp}>1$).
Thus, although selection acts on
individuals, under this simple model, selection is acting to increase
the mean fitness of the population. The rate of this increase is proportional to
the variance in allele frequencies within the population
($p_tq_t$). This formulation suggested to \citet{wright1932} the view of natural
selection as moving populations up local fitness peaks, as we
encountered in Section \ref{section:pheno_fitness_landscapes} in
discussing phenotypic fitness peaks. Again this view of selection as
maximizing mean fitness only holds true
if the genotypic fitnesses are frequency independent; later in this
chapter we'll discuss some important cases where that doesn't hold. \\
%\begin{question}
%Show that eqns.\ \eqref{deltap_dip3} and \eqref{deltap_dip2} are
%equivalent. (Trickier question.)\\
%\end{question}
\begin{question}{}
For many generations you have been studying an annual wildflower that has two color morphs, orange and white. You have discovered that a single bi-allelic locus controls flower color, with the white allele being recessive. The pollinator of these plants is an almost blind bat, so individuals are pollinated at random with respect to flower color. Your population census of 200 individuals showed that the population consisted of 168 orange-flowered individuals, and 32 white-flowered individuals.\\
Heavy February rainfall creates optimal growing conditions for an
exotic herbivorous beetle with a preference for orange-flowered
individuals. This year it arrives at your study site with a ravenous
appetite. Only 50\% of orange-flowered individuals survive its wrath,
while 90\% of white-flowered individuals survive until the end of the
growing season. \\
%Additionally, surviving orange flowered individuals produce 80 seeds on average, while surviving white-flowered individuals produce 100 seeds on average.
{\bf A)} What is the initial frequency of the white allele, and what do you
have to assume to obtain this?\\
{\bf B)} What is the frequency of the white allele in the seeds forming the next generation?\\
\end{question}
%%Selection coeffs in diploid model
\subsection{Diploid directional selection}
So far, our treatment of the diploid model of selection has been in terms of generic fitnesses $w_{ij}$. In the following, we will use particular parameterizations to gain insight about two specific modes of selection: directional selection and heterozygote advantage.
Directional selection means that one of the two alleles always has higher marginal fitness than the other one. Let us assume that $A_1$ is the fitter allele, so that $w_{11} \geq w_{12} \geq w_{22}$, and hence $\wbar_1 > \wbar_2$. As we are interested in changes in allele frequencies, we \sa{may use} relative fitnesses. We parameterize the reduction in relative fitness in terms of a selection coefficient, similar to the
one we met in the haploid selection section, as follows:\\
\begin{center}
\begin{tabular}{lccc}
genotype & $A_1A_1$ & $A_1A_2$ & $A_2A_2$ \\
absolute fitness & $W_{11}$ & $ \geq W_{12} \geq$ & $W_{22}$ \\
relative fitness (generic) & $w_{11} = W_{11}/W_{11}$ & $w_{12} = W_{12}/W_{11}$ & $w_{22} = W_{22}/W_{11}$ \\
relative fitness (specific) & $1$ & $1-sh$ & $1-s$. \\
\end{tabular}\\
\end{center}
Here, the selection coefficient $s$ is the difference in relative
fitness between the two homozygotes, and $h$ is the
dominance coefficient. \sa{For selection to be directional, we require that $0 \leq h \leq 1$ holds. The dominance coefficient allows us to move between two extremes. One is when $h = 0$, such that allele $A_1$ is fully dominant and $A_2$ fully recessive. In this case, the heterozygote $A_1A_2$ is as fit as the $A_1A_1$ homozgyote genotype. The inverse holds when $h = 1$, such that allele $A_1$ is fully recessive and $A_2$ fully dominant.}\\
\begin{marginfigure}
\begin{center}
\includegraphics[width=1.2 \textwidth]{figures/simple_diploid_trajs.png}
\end{center}
\caption{The trajectory of the frequency of allele $A_1$, starting
from $p_{0}=0.01$, for a selection coefficient $s=0.01$ and three
different dominance coefficients. The recessive beneficial allele ($h=1$) will
eventually fix in the population, but it takes a long
time. \gitcode{https://github.com/cooplab/popgen-notes/blob/master/Rcode/diploid_sel.R}}
\label{fig:diploid_traj}
\end{marginfigure}
%\gc{Yellow monkey flowers ({\it Mimulus guttatus}) have repeatedly adapted
%to the toxic soils found at copper mines throughout the Californian
%foothills in the past 150 years. Kevin Wright }
%, of the $12$ and
%$22$ genotypes we will use selection coefficients $s_{12} \leq 0$ and
%$s_{22} \leq s_{12}$
We can then rewrite eqn.\ \eqref{deltap_dip2} as
\begin{equation}
\Delta p_t = \frac{p_ths + q_t s(1-h)}{\wbar}p_tq_t ,
\label{deltap_direct}
\end{equation}
where
\begin{equation}
\wbar = 1-2p_tq_t sh-q_t^2s.
\end{equation}\\
\begin{marginfigure}
\begin{center}
\includegraphics[width = 1.2 \textwidth]{illustration_images/single_locus_selection/Copperopolis/KeystoneCopperMineCopperopolisCalaverasCounty.jpg}
\end{center}
\caption{Keystone Copper Mine 1866, Copperopolis, Calaveras
County. \newline \noindent \tiny{ Image from
\href{https://picryl.com/media/keystone-copper-mine-copperopolis-calaveras-county}{picryl}.
Source Library of Congress, Public Domain. }}
\label{fig:Copperopolis}
\end{marginfigure}
\begin{question}{}
Throughout the Californian foothills are old copper and gold-mines, which have dumped out soils that are polluted with heavy metals. While these toxic mine tailings are often depauperate of plants, {\it Mimulus guttatus} and a number of other plant species have managed to adapt to these harsh soils. \citet{wright2015adaptation} have mapped one of the major loci contributing to the adaptation to soils at two mines near Copperopolis, CA. \citeauthor{wright2015adaptation} planted homozygote seedlings out in the mine tailings and found that only $10\%$ of the homozygotes for the non-copper-tolerant allele survived to flower, while $40\%$ of the copper-tolerant seedlings survived to flower.\\
{\bf A)} What is the selection coefficient acting against the non-copper-tolerant allele on the mine tailing?\\
{\bf B)} The copper-tolerant allele is fairly dominant in its action on fitness. If we assume that $h=0.1$, what percentage of heterozygotes should survive to flower?
\end{question}
\begin{question}{}
Comparing the red ($h=0$) and black ($h=0.5$) trajectories in Figure \ref{fig:diploid_traj}, provide an explanation for why $A_1$ increases faster initially if $h=0$, but then approaches fixation more slowly compared to the case of $h=0.5$.
\end{question}
%%%% Another possible fox image
%%% https://twitter.com/BioDivLibrary/status/1046777289416081408
%%%%%%%%%%%%%%%%%%%%%FOXSES
\begin{figure}
\begin{center}
\includegraphics[width = 0.8 \textwidth]{Journal_figs/single_locus_selection/silver_fox/fox_morph_freqs.pdf}
\end{center}
\caption[][3cm]{The frequency of red, cross, and silver fox morphs over the
decades in Eastern Canada. These data are well described by
recessive selection acting against the silver fox morph. Data from
\citet{elton:42}, compiled by \citet{Allendorf:09}. \gitcode{https://github.com/cooplab/popgen-notes/blob/master/Journal_figs/single_locus_selection/silver_fox/fox_morphs.R}} \label{fig:Fox_morph_freqs}
\end{figure}
To see how dominance affects the trajectory of a real
polymorphism, we'll consider an example from a colour polymorphism in
red foxes ({\it Vulpes vulpes}). \begin{marginfigure}\begin{center}
\includegraphics[width = \textwidth]{illustration_images/single_locus_selection/fox_morphs/fox_morphs_silver_cross.png}
\end{center}
\caption{Three colour morphs in red fox {\it V. vulpes}, cross, red,
and silver foxes from left to right. \BHLNKC{The larger North American
mammals" Nelson, E.W., Fuertes,
L.A. 1916.}{https://www.flickr.com/photos/internetarchivebookimages/20578302420/in/photolist-wZ1CDZ-x4aTMj-tCTNnY-sFEbZG-xphfQ3-xmrbnA-xiXcDj-xejHVF-xtiB5G-xbxj1h-xsQdrP-wvPad5-xsFHvi-xqbZ1n-wsJA56-wrzbGj-xhvUJC-xgyia4-wYQ2pR-wXZf6j-wiuZ1t-wWKbS1-whsqaP-whio1h-xeiFTH-wWNQYe-xeiq1a-xdwa1s-wQExt6-x8BrsK-wPgGBE-w9DN9W-x75ojD-wP27dM-w9D6Ye-x6tXdt-wNRKTC-w9AXfX-x5rdVc-x25Puc-vvxmtP-tJ16gt-tAVz57-tmv9Zh-tCXVo2-owo4PL-oum6R1-oeCRWg-oeg5dH-ot9SVz}{Cornell University Library}} \label{fig:Fox_morphs}
\end{marginfigure} There are three colour morphs of red foxes: silver, cross, and
red (see Figure \ref{fig:Fox_morphs}), with this difference primarily
controlled by a single polymorphism with genotypes RR, Rr, and rr respectively. The fur pelts of the silver morph
fetched three times the price for hunters compared to cross (a smoky red) and red
pelts, the latter two being seen as roughly equivalent in worth. Thus
the desirability of the pelts acts as a recessive trait, with much
stronger selection against the silver homozygotes. As a
result of this price difference, silver foxes were hunted more
intensely and declined as a proportion of the population in Eastern Canada, see Figure
\ref{fig:Fox_morph_freqs}, as documented by \citeauthor{elton:42},
from $16\%$ to $5\%$ from 1834 to 1937.
\citet{haldane:42} reanalyzed these data and showed that they
were consistent with recessive selection acting against the silver
morph alone.
Note how the heterozygotes (cross) decline somewhat as a
result of selection on the silver homozygotes, but overall the R
allele is slow to respond to selection as it is `hidden' from
selection in the heterozygote state.
\graham{Add selection lines or get students to do that as an exercise.}
%%dominant colour poly in owls
%%https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3105316/
\paragraph{Directional selection on an additive allele.}
A special case is when $h = 0.5$. This case is the case of no dominance, as the interaction among alleles with respect to fitness is strictly additive. Then, eqn.\ \eqref{deltap_direct} simplifies to
\begin{equation}
\Delta p_t = \frac{1}{2}\frac{s}{\wbar}p_tq_t .
\label{deltap_add}
\end{equation}
If selection is very weak, i.e.\ $s \ll 1$, the denominator ($\wbar$) is close to $1$ and we have
\begin{equation}
\Delta p_t = \frac{1}{2} s p_t q_t .
\label{deltap_add_simpl}
\end{equation}
It is useful to compare \eqn \eqref{deltap_add_simpl} to our haploid
model for $\Delta p_t$, \eqn \eqref{eq:deltap_haploid}, setting $w_1
= 1$ and $w_2 = 1-s$. Again, assume that $s$ is small, so that our
haploid \eqn \eqref{eq:deltap_haploid} becomes $\Delta p_t = s p_t
q_t$, which differs from our diploid model only by a factor of two.
Under our additive diploid model, for weak selection, the selection against each
allele is equal to s/2 so this is equivalent to the haploid case where
we replace $s$ by $\nicefrac{s}{2}$.
%Hence, if $s$ is small, the diploid model of directional selection without dominance is identical to the haploid model, up to a factor of $1/2$. That factor is due to the choice of the parametrisation; we could have set $w_{11} = 1$, $w_{12} = 1-s$, and $w_{22} = 1-2s$ in our diploid model instead, in which case the agreement with the haploid model would be perfect.\\
From this analogy, we can borrow some insight we gained from the
haploid model. Specifically, the trajectory of the frequency of allele
$A_1$ in the diploid model without dominance follows a logistic growth
curve similar to eqn. \eqref{eq:haploid_logistic growth}. From this similarity, we can extrapolate from Equation \eqref{eq:estTauExplSimpl} to find the time it takes for our diploid, beneficial, additive allele ($A_1$) to move from frequency $p_0$ to $p_{\tau}$:
\begin{equation}
\tau \approx \frac{2}{s} \log \left(\frac{p_{\tau} q_0}{q_{\tau} p_0}\right)
\end{equation}
generations; this just differs by a factor of $2$ from our haploid model. Using this result we can find the time it takes for our favourable, additive allele ($A_1$) to transit from its entry into the population ($p_0 =1/(2N)$)
to close to fixation ($p_{\tau} =1-1/(2N)$):
\begin{equation}
\tau \approx \frac{4}{s} \log(2N) \label{eq:diploid_fix_time}
\end{equation}
generations. Note the similarity to eqn.\ \ref{eq:fixTimeSimpl} for the haploid model, with a difference
by a factor of 2 due to the choice of parametrization
(and that the number of alleles is $2N$ in the diploid model, rather than $N$). Doubling our selection coefficient halves the time it takes for our allele to move through the population.\\
% https://www.flickr.com/photos/internetarchivebookimages/17578330873/
\begin{marginfigure}
\begin{center}
\includegraphics[width = \textwidth]{illustration_images/single_locus_selection/killifish/20974603315_1f9775189e_z.jpg}
\end{center}
\caption{Gulf killifish ({\it Fundulus grandis}). \BHLNKC{Distribution and
abundance of fishes and invertebrates in Gulf of Mexico
estuaries. Nelson D M and Pattillo M E}{https://www.flickr.com/photos/internetarchivebookimages/20974603315/in/photolist-xXsjPD-xXtgv2-xXtH6R-xXqWDP-wLTvyb-vNtwuV-w5YUE8-x3LoqL-w6oGAu-v8XTGY-xeLtHH-x55G1H-x3LEab-xqMKAg-wjksEw-x2brmo-w5hbhG-x2xSUG-wLUKE7-wLTSzb-yiwYWH-vNn18J-w5m7Hh-wLFFPg-w5Zyip-x547wa-wxjgKE-owf2BN-tryPJL-xERrRA-xfpKrJ-x4sNs6-x1sk9W-xec1S6-xEP4rE-x36kBN-waaBtg-wLf5jY-x29N9E-xB1126-tFDcgi-xjju3s-w6Fm6M-w3D8Ej-xzFCbj-xEZcSX-wLkVzJ-xrV46d-xJwuXV-x4tZjV}{MBLWHOI Library} } \label{fig:killifish}
\end{marginfigure}
\begin{question}{}
Gulf killifish ({\it Fundulus grandis}) have rapidly adapted to the
very high pollution levels in the Houston shipping canal since the
1950s. One of the ways that they've adapted is through the deletion of
their aryl hydrocarbon receptor (AHR)
gene. \citet{oziolor2019adaptive} estimated that individuals who were
homozygous for the intact AHR gene had a relative fitness of 20\% of
that of homozygotes for the deletion. Assuming an additive selection
model, and an effective population size of 200 thousand individuals, how long would it take for the deletion to reach fixation, starting as a single copy in this population?
\end{question}
%% Can you specify that h=0.5 here? -- EBJ
%\begin{tcolorbox}
%\begin{question}
%An autosomal pesticide resistance allele is at 50\% frequency in a species of flies. We stop using the pesticide, and within 20 years the frequency of the allele is 5\% in the new-born flies. There are two fly generations per year. Assuming that the allele affects fitness in an additive fashion, estimate the selection coefficient acting against homozygotes for the resistance allele.
%\end{question}
%\end{tcolorbox}
\section{Balancing selection and the selective maintenance of polymorphism.}
Directional selection on genotypes is expected to remove variation
from populations, yet we see plentiful phenotypic and genetic
variation in every natural population. Why is this? Three broad
explanations for the maintenance of polymorphisms are
\begin{enumerate}
\item Variation is maintained by a balance of genetic drift and
mutation (we discussed this explanation in Chapter
\ref{Chapter:Drift}).
\item Selection can sometimes act to maintain variation in
populations (balancing selection).
\item Deleterious variation can be maintained in the population as
a balance between selection removing variation and mutation
constantly introducing new variation into the population.
\end{enumerate}
We'll turn to these latter two explanations through this chapter and
the next.
Note that these explanations are not mutually exclusive. Each
explanation will explain some proportion of the variation, and these
proportions will differ over species and classes of polymorphism. A
central challenge in population genomics is how we can do this in a
systematic way.
\begin{marginfigure}[-3cm]
\begin{center}
\includegraphics[width = \textwidth]{figures/het_advant_traj.pdf}
\end{center}
\caption{Two allele frequency trajectories of the $A_1$ allele subject to
heterzygote advantage ($w_{11}=0.9$, $w_{12}=1$, and $w22=0.85$). In
one simulation the allele is started from being rare in the population
($p=\nicefrac{1}{1000}$, solid line) and increases in frequency/ In
the other simulation the allele is almost
fixed ($p=\nicefrac{999}{1000}$, dashed line). In both cases the
frequency moves toward the equilibrium frequency. The red line shows
the equilibrium frequency ($p_e$). \gitcode{https://github.com/cooplab/popgen-notes/blob/master/Rcode/diploid_sel_het_advantage.R}} \label{fig:het_advant_traj}
\end{marginfigure}
\subsection{Heterozygote advantage}
One form of balancing selection occurs when the heterozygotes are fitter than either
of the homozygotes. In this case, it is useful to parameterize the relative fitnesses as follows:\\
\begin{center}
\begin{tabular}{lccc}
genotype & $A_1A_1$ & $A_1A_2$ & $A_2A_2$ \\
absolute fitness & $w_{11}$ & $<w_{12}>$ & $w_{22}$ \\
relative fitness (generic) & $w_{11}=W_{11}/W_{12}$ & $w_{12} = W_{12}/W_{12}$ & $w_{22} = W_{22}/W_{12}$ \\
relative fitness (specific) & $1-s_1$ & $1$ & $1-s_2$ \\
\end{tabular}\\
\end{center}
Here, $s_1$ and $s_2$ are the differences between the relative fitnesses
of the two homozygotes and the heterozygote. Note that to obtain
relative fitnesses we have divided
absolute fitness by the heterozygote fitness. We could use the
same parameterization as in the model of directional selection, but
the reparameterization we have chosen here makes the math easier.\\
In this case, when allele $A_1$ is rare, it is often found in a
heterozygous state, while the $A_2$ allele is usually in the
homozygous state, and so $A_1$ is more fit and increases in frequency. However, when
the allele $A_1$ is common, it is often found in a less fit homozygous state, while
the allele $A_2$ is often found in a heterozygous state; thus it is
now allele $A_2$ that increases in frequency at the expense of allele
$A_1$. Thus, at least in the deterministic model, neither allele can
reach fixation and both alleles will be maintained at an equilibrium frequency as a balanced
polymorphism in the population.
We can solve for this equilibrium frequency by setting $\Delta p_t = 0$ in eqn.\ \eqref{deltap_dip2},
i.e.\ $p_tq_t (\wbar_1-\wbar_2)=0$. Doing so, we find that there are
three equilibria. Two of them are not very interesting ($p=0$ or
$q=0$), but the third one is a stable polymorphic equilibrium, where
$\wbar_1-\wbar_2=0$ holds.
Using our $s_1$ and $s_2$ parametrization above, we see that the marginal fitnesses of
the two alleles are equal when
\begin{equation}
p_e = \frac{s_2}{s_1+s_2} \label{eqn:het_ad_eq}
\end{equation}
\begin{marginfigure}
\begin{center}
\includegraphics[width = \textwidth]{figures/het_advant_dp_wbar.pdf}
\end{center}
\caption{{\bf Top)} The change in frequency of an allele with heterozygote
advantage within a generation ($\Delta p$) as a function of the allele
frequency. Fitnesses as in Figure \ref{fig:het_advant_traj}. Note how the frequency change is positive below the
equilibrium frequency ($p_e$) and negative above. {\bf Bottom)} Mean
fitness ($\bar{w}$) as a function of the allele frequency. The red line shows
the equilibrium frequency ($p_e$). \gitcode{https://github.com/cooplab/popgen-notes/blob/master/Rcode/diploid_sel_het_advantage.R}} \label{fig:het_advant_dp_wbar}
\end{marginfigure} for the equilibrium frequency of interest. This is also the frequency
of $A_1$ at which the mean fitness of the population is maximized. The
highest possible fitness of the population would be achieved if every
individual was a heterozygote. However, Mendelian segregation of alleles in the
gametes of heterozygotes means that a sexual population can never
achieve a completely heterozygote population. This equilibrium
frequency represents an evolutionary compromise between the advantages
of the heterozygote and the comparative costs of the two
homozygotes.\\
\begin{figure}
\begin{center}
\includegraphics[width = \textwidth]{Rcode/Soay_Sheep/Hopping_sheep_all.pdf}
\end{center}
\caption[][6cm]{For the three Soay sheep genotypes: the offspring per year ({\bf left}), the probability of
surviving a year ({\bf middle}), and the product of the two ({\bf
right}). Thanks to Susan Johnston for supplying these simplified
numbers from \citet{johnston2013life}. \gitcode{https://github.com/cooplab/popgen-notes/blob/master/Rcode/Soay_Sheep/Soay_Sheep_fitness.R}
} \label{fig:Soay_fitness}
\end{figure}
One example of a polymorphism maintained by heterozygote advantage is
a horn-size polymorphism found in Soay sheep, a population of feral sheep on the island of Soay
(about 40 miles off the coast of Scotland). The horns of the soay sheep resemble
those of the wild Mouflon sheep, and the male Soay sheep use their horns to defend females during
the rut. \citet{johnston2013life} found a large-effect locus, at the
{\it RXFP2} gene, that controls much of the genetic variation for horn size. Two
alleles {\it Ho$^p$} and {\it Ho$^+$} segregate at this locus. The Ho$^+$ allele is associated with
growing larger horns, while the {\it Ho$^p$} allele is associated with smaller
horns, with a reasonable proportion of {\it Ho$^p$} homozygotes developing no
horns at all. \citet{johnston2013life} found that the Ho locus had substantial
effects on male, but not female, fitness (see Figure
\ref{fig:Soay_fitness}). \begin{marginfigure}
\begin{center}
\includegraphics[width = \textwidth]{illustration_images/single_locus_selection/mouflon/18195657882_eb207e4e9e_z.jpg}
\end{center}
\caption{Mouflon ({\it Ovis orientalis orientalis}). \BHLNC{Animate
creation. (1898). Wood, J. G.}{https://www.flickr.com/photos/internetarchivebookimages/18195657882/in/photolist-oeVkzm-oeVyZM-owqffV-otpyr4-abvEzA-osMSNY-vdEfRm-odZgfB-ot1epP-xvDeKd-yb3aL3-xRUfo4-owdr4T-wJhfme-xj57bn-xAzfXc-xiEcGn-sMnpMH-xP6U3J-w6Wm26-xuMUdr-tHTvam-w6aexB-oubB13-wYKiWR-xnrRBv-xptbR9-wYeGR2-xBHTk1-xvoaiK-xdb5i7-xst4MC-w3QTNe-x9SM4R-xnwGwR-xGd6Qk-xyfL9Q-w3QtPx-wYeJqe}{Smithsonian Libraries} } \label{fig:Mouflon}
\end{marginfigure} The {\it Ho$^p$} allele has a mostly recessive effect on
male fecundity, with the {\it Ho$^p$} homozygotes having lower yearly reproductive
success presumably due to the fact that they perform poorly in male-male
competition (left plot Figure \ref{fig:Soay_fitness}). Conversely, the
{\it Ho$^{+}$} has a mostly recessive effect on viability, with {\it Ho$^{+}$} homozygotes having lower
yearly survival (middle plot Figure \ref{fig:Soay_fitness}), likely because they spend little time feeding during the rut and so lose substantial body weight. Thus both of the
homozygotes suffer from trade-offs between viability and
fecundity. As a result, the {\it Ho$^p$Ho$^+$} heterozygotes have the highest
fitness (right plot Figure \ref{fig:Soay_fitness}). The allele is
thus balanced at intermediate frequency ($~50\%$) in the population due to
this trade off between fitness at different life history stages.
\marginnote{The fitnesses here are chosen to roughly match those of
the real Soay sheep example, as a full model would
require us to more carefully model the life-histories of the sheep. }
\begin{question}{}
Assume that the frequency of the Ho$^P$ allele is 10\%, that there are 1000 males at birth, and that individual adults mate at random.\\
{\bf A)} What is the expected number of males with each of the three genotypes in the population at birth? \\
{\bf B)} Assume that a typical male individual of each genotypes has the following probability of surviving to adulthood:\\
\begin{tabular}{ccc}
{\it Ho$^+$ Ho$^+$} & {\it Ho$^+$ Ho$^p$} & Ho$^p$ {\it Ho$^p$} \\
0.5 & 0.8 & 0.8
\end{tabular}
Making the assumptions from above, how many males of each genotype
survive to reproduce?\\
{\bf C)} Of the males who survive to reproduce, let's say that males
with the {\it Ho+Ho+} and {\it Ho+Ho$^p$} genotype have on average 2.5 offspring, while {\it Ho$^p$Ho$^p$} males have on average 1 offspring. Taking into account both survival and reproduction, how many offspring do you expect each of the three genotypes to contribute to the total population in the next generation? \\
{\bf D)} What is the frequency of the {\it Ho+ allele} in the sperm that will form this next generation? \\
{\bf E )} How would your answers to B-D change if the {\it Ho$^p$ allele} was at 90\% frequency? \\
\end{question}
\begin{figure}
\begin{center}
\includegraphics[width = \textwidth]{figures/additive_effect_OverDom.pdf}
\end{center}
\caption{The deviations of the fitness of each genotype away from the mean population
fitness (0) are shown as black dots. The area of each circle is proportion to the fraction of
the population in each genotypic class ($p^2$, $2pq$, and $q^2$). The
additive genetic fitness of each genotype is shown as
a red dot. The linear regression between fitness and additive
genotype is shown as a red line. The black vertical arrows show the
difference between the average mean-centered phenotype and additive genetic value for each genotype.
The left panel shows $p=0.1$ and the right panel shows $p=0.9$; in the
middle panel the frequency is set to the equilibrium frequency. \gitcode{https://github.com/cooplab/popgen-notes/blob/master/Rcode/Quant_gen/additive_effect.R} } \label{fig:additive_effect_OverDom}
\end{figure}
To push our understanding of heterozygote advantage a little further, note that the marginal fitnesses of our alleles are equivalent to the additive effects of our alleles on fitness. Recall from our discussion of non-additive variation (Section \ref{section:nonAddVar}) that the difference in the additive effects of the two alleles gives the slope of the regression of additive genotypes on fitness, and that there is additive variance in fitness
when this slope is non-zero.
So what's happening here in our heterozygote advantage model is that the marginal fitness of the $A_1$ allele, the additive effect of allele $A_1$ on fitness, is greater than the marginal fitness of the $A_2$ allele ($\bar{w}_1 > \bar{w}_2 $) when $A_1$ is at low frequency in the population. In this case, the regression of fitness on the number of $A_1$ alleles in a genotype has a positive slope. This is true when the
frequency of the $A_1$ allele is below the equilibrium frequency. If the frequency of $A_1$ is above the equilibrium frequency, then
the marginal fitness of allele $A_2$ is higher than the marginal fitness of allele $A_1$ ($\bar{w}_1 < \bar{w}_2 $) and the regression of fitness on the number of copies of allele $A_1$ that individuals carry is negative. In both cases there is additive genetic variance for fitness ($V_A > 0$) and the population has a directional response. Only when the population is at its equilibrium frequency, i.e. when $\bar{w}_1 =
\bar{w}_2$, is there no additive genetic variance ($V_A = 0$), as the linear regression of fitness on genotype is zero.
\begin{marginfigure}[-0.5cm]
\begin{center}
\includegraphics[width = \textwidth]{illustration_images/single_locus_selection/Pseudacraea_eurytus/Pseudacraea_eurytus.JPG}
\end{center}
\caption{In {\it Pseudacraea eurytus} there are two homozygotes morphs that mimic
a different blue and orange butterfly; the heterozygote fails to mimic
either successfully and so suffers a high rate of predation
\citep{owen1972polymorphic}. \BHLNC{Illustrations of new species of
exotic butterflies (1868) Hewitson.}{https://commons.wikimedia.org/wiki/File:Pseudacraea_eurytus.JPG}{Smithsonian Libraries}} \label{fig:underdom_buttfly} % https://books.google.com/books?id=4XSmCwAAQBAJ&pg=PA399&lpg=PA399&dq=pseudacraea+eurytus+heterozygotes&source=bl&ots=Dw6f7Wl6rO&sig=oT6VYvO80LEh9dbYKgri-X9OHHQ&hl=en&sa=X&ved=2ahUKEwinoYnWxtXeAhVnw1QKHRj6CzMQ6AEwDXoECAUQAQ#v=onepage&q=pseudacraea%20eurytus%20heterozygotes&f=false
\end{marginfigure}
%%add Drosophila underdom. selection experiement http://courses.biology.utah.edu/seger/3410_spr_09/feb_4_2pp.pdf
\paragraph{Underdominance.} Another case that is of potential interest is the case of fitness
underdominance, where the heterozygote is less fit than either of the two
homozygotes. Underdominance can be parametrized as follows: \\
\begin{center}
\begin{tabular}{lccc}
genotype & $A_1A_1$ & $A_1A_2$ & $A_2A_2$ \\
absolute fitness & $w_{11}$ & $>w_{12}<$ & $w_{22}$ \\
relative fitness (generic) & $w_{11}=W_{11}/W_{12}$ & $w_{12} = W_{12}/W_{12}$ & $w_{22} = W_{22}/W_{12}$ \\
relative fitness (specific) & $1+s_1$ & $1$ & $1+s_2$ \\
\end{tabular}\\
\end{center}
Underdominance also permits three equilibria: $p=0$, $p=1$, and a
polymorphic equilibrium $p=p_U$. However, now only the first two equilibria are stable, while the polymorphic
equilibrium ($p_ux$) is unstable. If $p<p_U$, then $\Delta p_t $ is negative %emjo: should it be p_e instead of p_u?
and allele $A_1$ will be lost, while if $p>p_U$, allele
$A_1$ will become fixed.\\
While strongly-selected, underdominant alleles might not spread within populations (if $p_U \gg
0$), they are of special interest in the study of speciation and hybrid zones. That is because alleles $A_1$
and $A_2$ may have arisen in a stepwise fashion, i.e.\ not by a single
mutation, but in separate subpopulations. In this case, heterozygote disadvantage will play a potential role in species maintenance.\\
\begin{figure*}
\begin{center}
\includegraphics[width = 0.8 \textwidth]{figures/het_disadvant_dp_wbar.pdf}
\end{center}
\caption{
{\bf Left)} Two allele frequency trajectories of an $A_1$ allele subject to
heterzygote disadvantage ($w_{11}=1.1$, $w_{12}=1$, and
$w22=1.2$). The allele is started from just above and below the
equilibrium frequency, in both cases the frequency move away the equilibrium frequency. The red line shows
the unstable equilibrium frequency ($p_e$).
{\bf Middle)} The change in frequency of an allele with heterozygote
disadvantage within a generation ($\Delta p$) as a function of the allele
frequency. Fitnesses as in Figure \ref{fig:het_advant_traj}. Note how the frequency change is negative below the
equilibrium frequency ($p_e$) and positive above. {\bf Right)} Mean
fitness ($\bar{w}$) as a function of the allele frequency. \gitcode{https://github.com/cooplab/popgen-notes/blob/master/Rcode/diploid_sel_het_advantage.R}} \label{fig:het_disadvant_dp_wbar}
\end{figure*}
\paragraph{Negative frequency-dependent selection.}
In the models and examples above, heterozygote advantage maintains multiple alleles in the population because the common allele has a disadvantage compared to the
other rarer allele. In the case of heterozygote advantage, the
relative fitnesses of our three genotypes are not a function of the
other genotypes present in the population. However, there's a broader set of models where the relative fitness of a genotype depends on the
genotypic composition of the population; this broad family of models
is called frequency-dependent selection. Negative frequency-dependent selection, where the fitness of an allele
(or phenotype) decreases as it becomes more common in the population, can act to maintain genetic and phenotypic diversity within populations. While cases of long-term heterozygote advantage may be somewhat rare in nature, negative frequency-dependent selection is likely a common form of
balancing selection.
One common mechanism that may create negative frequency-dependent
selection is the interaction between individuals within or among
species. For example, negative frequency-dependent dynamics can
arise in predator-prey or pathogen-host dynamics, where
alleles conferring common phenotypes are at a disadvantage because
predators or pathogens learn or evolve to counter the phenotypic effects of
common alleles.
As one example of negative frequency-dependent selection, consider the two flower colour morphs in the
deceptive elderflower orchid ({\it Dactylorhiza
sambucina}). Throughout Europe, there are populations of these orchids polymorphic for
yellow- and purple-flowered individuals, with the
yellow flower corresponding to a recessive allele. Neither of
these morphs provide any nectar or pollen reward to their bumblebee
pollinators. \begin{marginfigure}
\begin{center}
\includegraphics[width = \textwidth]{illustration_images/single_locus_selection/Elderflower_orchid/albumdesorchid1899corr_0209.jpg}
\end{center}
\caption{Elderflower orchid ({\it Dactylorhiza
sambucina}). \BHLNC{Abbildungen der in Deutschland und den angrenzenden
gebieten vorkommenden grundformen der orchideenarten (1904). Müller, W.}{https://www.biodiversitylibrary.org/page/15349868\#page/126/mode/1up}{New York Botanical Garden} } \label{fig:ElderflowerOrchid} %Illustrations of the basic forms of orchid species found in Germany and neighboring areas
\end{marginfigure} Thus these plants are typically pollinated by newly emerged
bumblebees who are learning about which plants offer food rewards,
with the bees alternating to try a different coloured flower if they
find no food associated with a particular flower-colour morph \citep{smithson1997negative}.
\citet{gigord2001negative} explored whether this behaviour by bees
could result in negative frequency-dependent selection; out in the field, the researchers set up
experimental orchid plots in which they varied
the frequency of the two colour morphs. Figure \ref{fig:Elderflower_orchids_fitness} shows their measurements of the relative
male and female reproductive success of the yellow morph across these experimental plots. When the yellow morph is rare, it has
higher reproductive success than the purple morph, as it receives a
disproportionate number of visits from bumblebees that are dissatisfied
with the purple flowers. This situation is reversed when the yellow
morph becomes common in the population; now the purple morph
outperforms the yellow morph. Therefore, both colour morphs are
maintained in this population, and presumably Europe-wide, due to this negative frequency-dependent
selection. %The yellow morph is found at $\sim 69\%$ in the region of
%France where this experiment was conducted, consistent with the frequency at which the two
%morphs are predicted to have equal fitness.
% \graham{check
% recessive?} %http://courses.biology.utah.edu/seger/3410_spr_09/feb_4_2pp.pdf
\begin{figure}
\begin{center}
\includegraphics[width = \textwidth]{Journal_figs/single_locus_selection/Elderflower_orchid/Elderflower_orchids_fitness.pdf}
\end{center}
\caption[][0cm]{{\bf Left)} Measures of the relative male- and female- reproductive success of the yellow elderflower orchid morph
as a function of the yellow morph in experimental plots. {\bf
Right)} Two allele frequency trajectories of the Yellow allele
subject to negative frequency scheme given in the left plot
(for an initial frequency of $0.01$ and $0.99$, solid and dotted
line respectively).
Male
reproductive success is measured in terms of the \% of pollinia
removed from a plant, and female reproductive success is measured in terms of the
\% of stigmas receiving pollinia on a plant. These measures are made
relative by dividing the reproductive success of the yellow morph by the
mean of the yellow and purple morphs. Pollinia are the pollen masses of
orchids, and other plants, where individual pollinium are transferred
as a single unit by pollinators. Data from
\citet{gigord2001negative}. \gitcode{https://github.com/cooplab/popgen-notes/blob/master/Journal_figs/single_locus_selection/Elderflower_orchid/Elderflower_orchids.R}} \label{fig:Elderflower_orchids_fitness}
\end{figure}
%The Independents are very aggressive toward each other, but tolerate
%the submissive Satellite males as females are attracted to leks where multiple males display.
Negative frequency-dependent selection can also maintain different
breeding strategies due to interactions amongst individuals within a population. One
dramatic example of this occurs in ruffs ({\it Philomachus pugnax}), a
marsh-wading sandpiper that summers in Northern Eurasia. The males of this species
lek, with the males gathering on open ground to display and attract females. There are three different male morphs differing in their breeding
strategy. The large majority of males are `Independent', with black or
chestnut ruff plumage, and try to defend and display on small territories. `Satellite' males, with white ruff plumage, make
up $\sim 16\%$ of males and do not defend territories, but rather join
in displays with Independent males and opportunistically mate with
females visiting the lek. Finally, the rare `Faeder' morph was only discovered
in 2006 \citep{jukema2006permanent} and makes up less than 1\% of males. These Faeder males are female mimics who hang
around the territories of Independents and try to 'sneak' in matings with females. Faedar males have plumage closely resembling
that of females and a smaller body size than other males, but with larger testicles (presumably to
take advantage of rare mating opportunities).
\begin{figure}
\begin{center} %% pic of a single ruff https://www.biodiversitylibrary.org/page/58888512#page/291/mode/1up
\includegraphics[width = 0.8 \textwidth]{illustration_images/single_locus_selection/Ruffs/Philomachus_pugnax_naumann.jpg}
\end{center}
\caption{Lekking ruffs ({\it Philomachus pugnax}). Three Independent males, one Satellite male, and one female
(or Faeder male?). {\newline \noindent \tiny{ Painting by Johann
Friedrich Naumann (1780–1857). Public Domain,
\href{https://en.wikipedia.org/wiki/Ruff\#/media/File:Philomachus_pugnax_naumann.jpg}{wikimedia}.}}}\label{fig:Ruff}
\end{figure}
All three of the ruff morphs, with their complex behavioural and morpological differences,
are controlled by three alleles at a single autosomal locus, with the
Satellite and Faeder alleles being genetically dominant over the high frequency
Independent allele. The genetic variation for these three morphs is potentially maintained by
negative frequency-dependent selection, as all three male strategies
are likely at an advantage when they are rare in the population. For
example, while the Satellites mostly lose out on mating opportunities
to Independents, they may have longer life-spans and so may have equal
life-time reproductive success \citep{widemo1998alternative}. However, Satellite and Faeder males
are totally reliant on the lekking Independent males, and so both of
these alternative strategies cannot become overly common in the
population. The locus controlling these differences has been mapped,
and the underlying alleles have persisted for roughly four million years
\citep{kupper2016supergene,lamichhaney2016structural}. While this mating system is
bizarre, the frequency dependent dynamics mean that it has been around
longer than we've been using stone tools. \\
While these examples may seem somewhat involved, they must be simple
compared to the complex dynamics that maintain the hundreds of alleles
present at the genes in the major histocompatibility complex (MHC). MHC
genes are key to the coordination of the vertebrate immune
system in response to pathogens, and are likely caught in an endless arms
race with pathogens adapting to common MHC alleles, allowing rare MHC
alleles to be favoured. Balancing selection at the MHC locus has maintained
some polymorphisms for tens of millions of years, such that some of
your MHC alleles may be genetically more closely related to MHC alleles in other primates than they are to alleles in your
close human friends.
\section{Fluctuating selection pressures}
Selection pressures are rarely constant through time due to
environmental change. As selection pressures on a polymorphism change
the frequency of the allele can fluctuate along with them. This can have important implications for which
alleles can survive and spread. We'll see that when selection
fluctuates that the success of alleles and genotypes can often be summarized
by their ``geometric mean fitness' and so alleles and genotypes
that bet-hedge in their strategies can win out in long-term competitions
between individuals in fluctuating environments.
\paragraph{Haploid model with fluctuating selection}
We can use our haploid model to consider this case where the fitnesses depend on time \citep{Dempster:55}, and
say that $w_{1,t}$ and $w_{2,t}$ are the fitnesses of the two types in