diff --git a/chapter-05.tex b/chapter-05.tex
index 540d983..bdd8093 100644
--- a/chapter-05.tex
+++ b/chapter-05.tex
@@ -687,6 +687,10 @@ \subsection{Migration--selection balance}
 migration-selection balance (at least under strong selection) is
 analogous to mutation selection balance.\\
 
+We can use this same model by analogy for the case of
+migration-selection balance in a diploid model, in that case we replace
+our haploid $s$ by the cost to heterozygotes $hs$.
+
 \begin{tcolorbox} 
 \begin{question}
 You are investigating a small river population of sticklebacks, which receives infrequent migrants from a very large marine population. At a set of (putatively) neutral biallelic markers the freshwater population has frequencies:
diff --git a/chapter-06.tex b/chapter-06.tex
index bfb885a..2843b32 100644
--- a/chapter-06.tex
+++ b/chapter-06.tex
@@ -24,6 +24,7 @@ \subsection{Stochastic loss of strongly selected alleles}
 P_i= \frac{(1+s)^i e^{-(1+s)}}{i!}
 \end{equation}
 
+
 Consider starting from a single individual with the selected allele, and ask
 about the probability of eventual loss of our selected allele starting
 from this single copy ($p_L$). To derive this we'll make use of a
@@ -33,16 +34,18 @@ \subsection{Stochastic loss of strongly selected alleles}
 \begin{enumerate}
 \item In our first generation
 with probability $P_0$ our individual leaves no copies of itself to
-the next generation, in which case our allele is lost.
+the next generation, in which case our allele is lost (Figure \ref{fig:Proof_of_pL_2s}A).
 \item Alternatively
 it could leave one copy of itself to the next generation (with
 probability $P_1$), in which
-case with probability $p_L$ this copy eventually goes extinct.
+case with probability $p_L$ this copy eventually goes extinct (Figure \ref{fig:Proof_of_pL_2s}B).
 \item It could leave two copies of itself to the next generation (with
 probability $P_2$), in which
-case with probability $p_L^2$ both of these copies eventually goes extinct.
+case with probability $p_L^2$ both of these copies eventually goes
+extinct (Figure \ref{fig:Proof_of_pL_2s}C).
 \item More generally it could leave could leave $k$ copies ($k>0$) of itself to the next generation (with
-probability $P_k$), in which case with probability $p_L^k$  all of these copies eventually go extinct.
+probability $P_k$), in which case with probability $p_L^k$  all of
+these copies eventually go extinct (e.g. Figure \ref{fig:Proof_of_pL_2s}D).
 \end{enumerate}
 summing over this probabilities we see that
 \begin{eqnarray}
@@ -78,6 +81,13 @@ \subsection{Stochastic loss of strongly selected alleles}
 probability of being lost when it is first introduced into the
 population by mutation. \\
 
+\begin{figure}
+\begin{center}
+\includegraphics[width=\textwidth]{figures/Proof_of_pL_2s}
+\end{center}
+\caption{} \label{fig:Proof_of_pL_2s}
+\end{figure}
+
 %%consider reparameterizing 1+(1-hs)s
 We can also adapt this result to a diploid setting.
 Assuming that heterozygotes for the $1$ allele have $1+(1-h)s$ children, the
diff --git a/figures/Proof_of_pL_2s.pdf b/figures/Proof_of_pL_2s.pdf
new file mode 100644
index 0000000..2642f5a
Binary files /dev/null and b/figures/Proof_of_pL_2s.pdf differ
diff --git a/figures/additive_effect.pdf b/figures/additive_effect.pdf
index 2157cb6..5393c0a 100644
Binary files a/figures/additive_effect.pdf and b/figures/additive_effect.pdf differ
diff --git a/figures/additive_effect_OverDom.pdf b/figures/additive_effect_OverDom.pdf
index 4211cc7..ff8dc3f 100644
Binary files a/figures/additive_effect_OverDom.pdf and b/figures/additive_effect_OverDom.pdf differ
diff --git a/html/index.html b/html/index.html
index f473767..e00faa7 100644
--- a/html/index.html
+++ b/html/index.html
@@ -847,7 +847,7 @@ <h4 id="diploid-fluctuating-fitness">Diploid fluctuating fitness</h4>
 <span><strong>B)</strong></span> What conditions do you need for a polymorphic equilibrium to be maintained? At what is the equilibrium frequency of this balanced polymorphism?<br />
 <span><strong>C)</strong></span> Imagine the cost of the driver were additive <span class="math inline">\(w_{dd}=1\)</span>, <span class="math inline">\(w_{Dd}=1-e\)</span>, <span class="math inline">\(w_{DD}=1-2e\)</span>. Under what conditions can the driver invade the population? Can a polymorphic equilibrium be maintained?</p>
 <h2 id="mutationselection-balance">Mutation–selection balance</h2>
-<p>Mutation is constantly introducing new alleles into the population. Therefore, variation can be maintained within a population not only if selection is balancing (e.g. through heterozygote advantage or fluctuating selection over time, as we have seen in the previous section), but also due to a balance between mutation <span><span> and selection. A case of particular interest is when mutation introduces deleterious</span></span> alleles and selection acts against these alleles. To study this balance, we return to the model of directional selection, where allele <span class="math inline">\(A_1\)</span> is advantageous, i.e.</p>
+<p>Mutation is constantly introducing new alleles into the population. Therefore, variation can be maintained within a population not only if selection is balancing (e.g. through heterozygote advantage or fluctuating selection over time, as we have seen in the previous section), but also due to a balance between mutation and selection. A case of particular interest is when mutation introduces deleterious alleles and selection acts against these alleles. To study this balance, we return to the model of directional selection, where allele <span class="math inline">\(A_1\)</span> is advantageous, i.e.</p>
 <table>
 <tbody>
 <tr class="odd">
@@ -872,7 +872,7 @@ <h2 id="mutationselection-balance">Mutation–selection balance</h2>
 </table>
 <p><br />
 </p>
-<p><span><span> For a start, we consider the case where allele <span class="math inline">\(A_2\)</span> is not completely recessive (<span class="math inline">\(h&gt;0\)</span>), so that the heterozygotes suffer at least some disadvantage. We denote by <span class="math inline">\(\mu = \mu_{1\rightarrow2}\)</span> the mutation rate per generation from <span class="math inline">\(A_1\)</span> to the deleterious allele <span class="math inline">\(A_2\)</span>, and assume that there is no reverse mutation (<span class="math inline">\(\mu_{2\rightarrow1} = 0\)</span>). Let us assume that selection against <span class="math inline">\(A_2\)</span> is relatively strong compared to the mutation rate, so that it is justified to assume that <span class="math inline">\(A_2\)</span> is always rare, i.e. <span class="math inline">\(q_t = 1-p_t \ll 1\)</span>. Compared to previous sections, for mathematical clarity, we also switch from following the frequency <span class="math inline">\(p_t\)</span> of <span class="math inline">\(A_1\)</span> to following the frequency <span class="math inline">\(q_t\)</span> of <span class="math inline">\(A_2\)</span>. Of course, this is without loss of generality. The change in frequency of <span class="math inline">\(A_2\)</span> due to selection can be written as</span></span> <span class="math display">\[\Delta_S q_t = \frac{{\overline{w}}_2 - {\overline{w}}_1}{{\overline{w}}} p_t q_t  \approx  -hs q_t.
+<p>For a start, we consider the case where allele <span class="math inline">\(A_2\)</span> is not completely recessive (<span class="math inline">\(h&gt;0\)</span>), so that the heterozygotes suffer at least some disadvantage. We denote by <span class="math inline">\(\mu = \mu_{1\rightarrow2}\)</span> the mutation rate per generation from <span class="math inline">\(A_1\)</span> to the deleterious allele <span class="math inline">\(A_2\)</span>, and assume that there is no reverse mutation (<span class="math inline">\(\mu_{2\rightarrow1} = 0\)</span>). Let us assume that selection against <span class="math inline">\(A_2\)</span> is relatively strong compared to the mutation rate, so that it is justified to assume that <span class="math inline">\(A_2\)</span> is always rare, i.e. <span class="math inline">\(q_t = 1-p_t \ll 1\)</span>. Compared to previous sections, for mathematical clarity, we also switch from following the frequency <span class="math inline">\(p_t\)</span> of <span class="math inline">\(A_1\)</span> to following the frequency <span class="math inline">\(q_t\)</span> of <span class="math inline">\(A_2\)</span>. Of course, this is without loss of generality. The change in frequency of <span class="math inline">\(A_2\)</span> due to selection can be written as <span class="math display">\[\Delta_S q_t = \frac{{\overline{w}}_2 - {\overline{w}}_1}{{\overline{w}}} p_t q_t  \approx  -hs q_t.
 	\label{eq:dirSelApprox}\]</span> This approximation can be found by assuming that <span class="math inline">\(q^2 \approx 0\)</span>, <span class="math inline">\(p \approx 1\)</span>, and that <span class="math inline">\({\overline{w}}\approx w_1\)</span>.</p>
 <p>All of these assumptions make sense if <span class="math inline">\(q \ll 1\)</span>. From eqn.  we see that selection acts to reduce the frequency of <span class="math inline">\(A_2\)</span> (as both <span class="math inline">\(h\)</span> and <span class="math inline">\(s\)</span> are positive), and it does so geometrically across the generations. That is, if the initial frequency of <span class="math inline">\(A_2\)</span> is <span class="math inline">\(q_0\)</span>, then its frequency at time <span class="math inline">\(t\)</span> is approximately <span class="math display">\[q_t = q_0 (1 - hs)^t.
 	\label{eq:dirSelExplApprox}\]</span></p>
@@ -924,7 +924,7 @@ <h2 id="migrationselection-balance">Migration–selection balance</h2>
 <p>As a simple model of migration lets suppose within a population a fraction of <span class="math inline">\(m\)</span> individuals are migrants from the other population, and <span class="math inline">\(1-m\)</span> individuals are from the same deme.<br />
 To quickly sketch a solution to this well set up a situation analogous to our mutation-selection balance model. to do this lets assume that selection is strong compared to migration (<span class="math inline">\(s \gg m\)</span>) then allele <span class="math inline">\(1\)</span> will be almost fixed in population <span class="math inline">\(1\)</span> and allele <span class="math inline">\(2\)</span> will be almost fixed in population <span class="math inline">\(2\)</span>. If that is the case, migration changes the frequency of allele <span class="math inline">\(2\)</span> in population <span class="math inline">\(1\)</span> (<span class="math inline">\(q_1\)</span>) by <span class="math display">\[\Delta_{Mig.} q_1 \approx m\]</span> while as noted above <span class="math inline">\(\Delta_{S} q_1= -sq_1\)</span>, so that migration and selection are at an equilibrium when <span class="math inline">\(\Delta_{S} q_1+
 \Delta_{Mig.}q_1\)</span>, i.e. an equilibrium frequency of allele <span class="math inline">\(2\)</span> in population <span class="math inline">\(1\)</span> of <span class="math display">\[q_{e,1} = \frac{m}{s}\]</span> so that migration is playing to role of mutation and so migration-selection balance (at least under strong selection) is analogous to mutation selection balance.<br />
-</p>
+We can use this same model by analogy for the case of migration-selection balance in a diploid model, in that case we replace our haploid <span class="math inline">\(s\)</span> by the cost to heterozygotes <span class="math inline">\(hs\)</span>.</p>
 <p>You are investigating a small river population of sticklebacks, which receives infrequent migrants from a very large marine population. At a set of (putatively) neutral biallelic markers the freshwater population has frequencies: 0.2, 0.7, 0.8 at the same markers the marine population has frequencies: 0.4, 0.5 and 0.7. From studying patterns of heterozygosity at a large collection of markers, you have estimated the long term effective size of your freshwater population is 2000 individuals.<br />
 <span><strong>A)</strong></span> What is <span class="math inline">\(F_{ST}\)</span> across these neutral markers in the freshwater population, with respect to the large marine population (i.e. treat the marine population as the total)?<br />
 <span><strong>B)</strong></span> You are also studying an unlinked locus involved in the regulation of salt uptake. In the marine population the ancestral allele is at close to fixation, but in your river population the derived allele is at 0.99 frequency. Estimate the selective disadvantage of the ancestral allele in your river population. [Hint how can you use neutral differentiation to estimate the migration rate?]</p>
@@ -951,28 +951,35 @@ <h3 id="some-theory-of-the-spatial-distribution-of-allele-frequencies-under-dete
 <figure>
 <img src="figures/equilib_cline.png" alt="An equilibrium cline in allele frequency. Our individuals dispersal an average distance of \sigma=1km per generation, and our allele 2 has a relative fitness of 1+s and 1-s on either side of the environmental change at x=0." /><figcaption>An equilibrium cline in allele frequency. Our individuals dispersal an average distance of <span class="math inline">\(\sigma=1\)</span>km per generation, and our allele <span class="math inline">\(2\)</span> has a relative fitness of <span class="math inline">\(1+s\)</span> and <span class="math inline">\(1-s\)</span> on either side of the environmental change at <span class="math inline">\(x=0\)</span>.<span data-label="fig:cline"></span></figcaption>
 </figure>
+<h4 id="the-cline-in-allele-frequency-associated-with-a-sharp-environmental-transition.">The cline in allele frequency associated with a sharp environmental transition.</h4>
 <p>To make progress lets consider a simple model of location adaptation where the environment abruptly changes. Specifically we assume that <span class="math inline">\(\gamma(x)= 1\)</span> for <span class="math inline">\(x&lt;0\)</span> and <span class="math inline">\(\gamma(x)= -1\)</span> for <span class="math inline">\(x \geq 0\)</span>, i.e. our allele <span class="math inline">\(2\)</span> has a selective advantage at locations to the left of zero, while this allele is at a disadvantage to the right of zero. In this case we can get an equilibrium distribution of our two alleles were to the left of <span class="math inline">\(zero\)</span> our allele <span class="math inline">\(2\)</span> is at higher frequency, while to the right of zero allele <span class="math inline">\(1\)</span> predominates. As we cross from the left to the right side of our range the frequency of our allele <span class="math inline">\(2\)</span> decreases in a smooth cline.<br />
 Our equilibrium spatial distribution of allele frequencies can be found by setting the LHS of eqn. to zero to arrive at <span class="math display">\[s\gamma(x) q(x) \left( 1 - q(x) \right) = - \frac{\sigma^2}{2} \frac{d^2q(x)}{dx^2}\]</span> We then could solve this differential equation with appropriate boundary conditions (<span class="math inline">\(q(-\infty)=1\)</span> and <span class="math inline">\(q(\infty) = 0\)</span>) to arrive at the appropriate functional form to our cline. While we won’t go into the solution of this equation here, we can note that by dividing our distance <span class="math inline">\(x\)</span> by <span class="math inline">\(\ell=\sigma/\sqrt{s}\)</span> we can remove the effect of our parameters from the above equation. This compound parameter <span class="math inline">\(\ell\)</span> is the characteristic length of our cline, and it is this parameter which determines over what geographic scale we change from allele <span class="math inline">\(2\)</span> predominating to allele <span class="math inline">\(1\)</span> predominating as we move across our environmental shift.<br />
 The width of our cline, i.e. over what distance do we make this shift from allele <span class="math inline">\(2\)</span> predominating to allele <span class="math inline">\(1\)</span>, can be defined in a number of different ways. One simple way to define the cline width, which is easy to define but perhaps hard to measure accurately, is the slope (i.e. the tangent) of <span class="math inline">\(q(x)\)</span> at <span class="math inline">\(x=0\)</span>. Under this definition the cline width is approximately <span class="math inline">\(0.6
 \sigma/\sqrt{s}\)</span>.<br />
 </p>
+<h4 id="the-rate-of-spatial-spread-of-a-beneficial-allele.">The rate of spatial spread of a beneficial allele.</h4>
+<p>Consider a beneficial mutation that has arisen in a specific spatial location and has begun to spread geographically.</p>
 <h1 id="stochasticity-and-genetic-drift-in-allele-frequencies">Stochasticity and Genetic Drift in allele frequencies</h1>
 <h2 id="stochastic-loss-of-strongly-selected-alleles">Stochastic loss of strongly selected alleles</h2>
 <p>Even strongly selected alleles can be lost from the population when they are sufficiently rare. This is because the number of offspring left by individuals to the next generation is fundamentally stochastic. A selection coefficient of s=<span class="math inline">\(1\%\)</span> is a strong selection coefficient, which can drive an allele through the population in a few hundred generations once the allele is established. However, if individuals have on average a small number of offspring per generation the first individual to carry our allele who has on average <span class="math inline">\(1\%\)</span> more children could easily have zero offspring, leading to the loss of our allele before it ever get a chance to spread.<br />
 To take a first stab at this problem lets think of a very large haploid population, and in order for this population to stay constant in size we’ll assume that individuals without the selected mutation have on average one offspring per generation. While individuals with our selected allele have on average <span class="math inline">\(1+s\)</span> offspring per generation. We’ll assume that the distribution of offspring number of an individual is Poisson distributed with this mean, i.e. the probability that an individual with the selected allele has <span class="math inline">\(i\)</span> children is <span class="math display">\[P_i= \frac{(1+s)^i e^{-(1+s)}}{i!}\]</span></p>
 <p>Consider starting from a single individual with the selected allele, and ask about the probability of eventual loss of our selected allele starting from this single copy (<span class="math inline">\(p_L\)</span>). To derive this we’ll make use of a simple argument (derived from branching processes). Our selected allele will be eventually lost from the population if every individual with the allele fails to leave descendants.</p>
 <ol>
-<li><p>In our first generation with probability <span class="math inline">\(P_0\)</span> our individual leaves no copies of itself to the next generation, in which case our allele is lost.</p></li>
-<li><p>Alternatively it could leave one copy of itself to the next generation (with probability <span class="math inline">\(P_1\)</span>), in which case with probability <span class="math inline">\(p_L\)</span> this copy eventually goes extinct.</p></li>
-<li><p>It could leave two copies of itself to the next generation (with probability <span class="math inline">\(P_2\)</span>), in which case with probability <span class="math inline">\(p_L^2\)</span> both of these copies eventually goes extinct.</p></li>
-<li><p>More generally it could leave could leave <span class="math inline">\(k\)</span> copies (<span class="math inline">\(k&gt;0\)</span>) of itself to the next generation (with probability <span class="math inline">\(P_k\)</span>), in which case with probability <span class="math inline">\(p_L^k\)</span> all of these copies eventually go extinct.</p></li>
+<li><p>In our first generation with probability <span class="math inline">\(P_0\)</span> our individual leaves no copies of itself to the next generation, in which case our allele is lost (Figure [fig:Proof_of_pL_2s]A).</p></li>
+<li><p>Alternatively it could leave one copy of itself to the next generation (with probability <span class="math inline">\(P_1\)</span>), in which case with probability <span class="math inline">\(p_L\)</span> this copy eventually goes extinct (Figure [fig:Proof_of_pL_2s]B).</p></li>
+<li><p>It could leave two copies of itself to the next generation (with probability <span class="math inline">\(P_2\)</span>), in which case with probability <span class="math inline">\(p_L^2\)</span> both of these copies eventually goes extinct (Figure [fig:Proof_of_pL_2s]C).</p></li>
+<li><p>More generally it could leave could leave <span class="math inline">\(k\)</span> copies (<span class="math inline">\(k&gt;0\)</span>) of itself to the next generation (with probability <span class="math inline">\(P_k\)</span>), in which case with probability <span class="math inline">\(p_L^k\)</span> all of these copies eventually go extinct (e.g. Figure [fig:Proof_of_pL_2s]D).</p></li>
 </ol>
 <p>summing over this probabilities we see that <span class="math display">\[\begin{aligned}
 p_L &amp;= \sum_{k=0}^{\infty} P_k p_L^{k}  \nonumber \\
 &amp;=  \sum_{k=0}^{\infty} \frac{(1+s)^ke^{-(1+s)}}{k!} p_L^{k} \nonumber
 \\
 &amp;= e^{-(1+s)} \left( \sum_{k=0}^{\infty} \frac{\left(p_L(1+s) \right)^k}{k!}  \right)\end{aligned}\]</span> well the term in the brackets is itself an exponential expansion, so we can rewrite this as <span class="math display">\[p_L = e^{(1+s)(p_L-1)} \label{prob_loss}\]</span> solving this would give us our probability of loss for any selection coefficient. Lets rewrite this in terms of the the probability of escaping loss <span class="math inline">\(p_F = 1-p_L\)</span>. We can rewrite eqn as <span class="math display">\[1-p_F = e^{-p_F(1+s)}\]</span> to gain an approximation to this lets consider a small selection coefficient <span class="math inline">\(s \ll 1\)</span> such that <span class="math inline">\(p_F \ll 1\)</span> and then expanded out the exponential on the right hand side (ignoring terms of higher order than <span class="math inline">\(s^2\)</span> and <span class="math inline">\(p_F^2\)</span>) then <span class="math display">\[1-p_F \approx 1-p_F(1+s)+p_F^2(1+s)^2/2\]</span> solving this we find that <span class="math display">\[p_F = 2s.\]</span> Thus even an allele with a <span class="math inline">\(1\%\)</span> selection coefficient has a <span class="math inline">\(98\%\)</span> probability of being lost when it is first introduced into the population by mutation.<br />
-We can also adapt this result to a diploid setting. Assuming that heterozygotes for the <span class="math inline">\(1\)</span> allele have <span class="math inline">\(1+(1-h)s\)</span> children, the probability of allele <span class="math inline">\(1\)</span> is not lost, starting from a single copy in the population, is <span class="math display">\[p_F = 2 (1-h)s \label{eqn:diploid_escape}\]</span> for <span class="math inline">\(h&gt;0\)</span>.<br />
+</p>
+<figure>
+<img src="figures/Proof_of_pL_2s" alt="" /><figcaption><span data-label="fig:Proof_of_pL_2s"></span></figcaption>
+</figure>
+<p>We can also adapt this result to a diploid setting. Assuming that heterozygotes for the <span class="math inline">\(1\)</span> allele have <span class="math inline">\(1+(1-h)s\)</span> children, the probability of allele <span class="math inline">\(1\)</span> is not lost, starting from a single copy in the population, is <span class="math display">\[p_F = 2 (1-h)s \label{eqn:diploid_escape}\]</span> for <span class="math inline">\(h&gt;0\)</span>.<br />
 </p>
 <h2 id="the-interaction-between-genetic-drift-and-weak-selection.">The interaction between genetic drift and weak selection.</h2>
 <p>For strongly selected alleles, once the allele has escaped initial loss at low frequencies, their path will be determined deterministically by their selection coefficients. However, if selection is weak the stochasticity of reproduction can play a role in the trajectory an allele takes even when it is common in the population.<br />
diff --git a/popgen_notes.pdf b/popgen_notes.pdf
index c2af4d0..a488d00 100644
Binary files a/popgen_notes.pdf and b/popgen_notes.pdf differ