forked from dalcde/cam-notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmarkov_chains.tex
1665 lines (1468 loc) · 90.5 KB
/
markov_chains.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[a4paper]{article}
\def\npart {IB}
\def\nterm {Michaelmas}
\def\nyear {2015}
\def\nlecturer {G.\ R.\ Grimmett}
\def\ncourse {Markov Chains}
\input{header}
\begin{document}
\maketitle
{\small
\noindent\textbf{Discrete-time chains}\\
Definition and basic properties, the transition matrix. Calculation of $n$-step transition probabilities. Communicating classes, closed classes, absorption, irreducibility. Calculation of hitting probabilities and mean hitting times; survival probability for birth and death chains. Stopping times and statement of the strong Markov property.\hspace*{\fill} [5]
\vspace{5pt}
\noindent Recurrence and transience; equivalence of transience and summability of $n$-step transition probabilities; equivalence of recurrence and certainty of return. Recurrence as a class property, relation with closed classes. Simple random walks in dimensions one, two and three.\hspace*{\fill} [3]
\vspace{5pt}
\noindent Invariant distributions, statement of existence and uniqueness up to constant multiples. Mean return time, positive recurrence; equivalence of positive recurrence and the existence of an invariant distribution. Convergence to equilibrium for irreducible, positive recurrent, aperiodic chains *and proof by coupling*. Long-run proportion of time spent in a given state.\hspace*{\fill} [3]
\vspace{5pt}
\noindent Time reversal, detailed balance, reversibility, random walk on a graph.\hspace*{\fill} [1]}
\tableofcontents
\setcounter{section}{-1}
\section{Introduction}
So far, in IA Probability, we have always dealt with one random variable, or numerous independent variables, and we were able to handle them. However, in real life, things often \emph{are} dependent, and things become much more difficult.
In general, there are many ways in which variables can be dependent. Their dependence can be very complicated, or very simple. If we are just told two variables are dependent, we have no idea what we can do with them.
This is similar to our study of functions. We can develop theories about continuous functions, increasing functions, or differentiable functions, but if we are just given a random function without assuming anything about it, there really isn't much we can do.
Hence, in this course, we are just going to study a particular kind of dependent variables, known as \emph{Markov chains}. In fact, in IA Probability, we have already encountered some of these. One prominent example is the random walk, in which the next position depends on the previous position. This gives us some dependent random variables, but they are dependent in a very simple way.
In reality, a random walk is too simple of a model to describe the world. We need something more general, and these are Markov chains. These, by definition, are random distributions that satisfy the \emph{Markov assumption}. This assumption, intuitively, says that the future depends only upon the current state, and not how we got to the current state. It turns out that just given this assumption, we can prove a lot about these chains.
\section{Markov chains}
\subsection{The Markov property}
We start with the definition of a Markov chain.
\begin{defi}[Markov chain]
Let $X = (X_0, X_1, \cdots)$ be a sequence of random variables taking values in some set $S$, the \emph{state space}. We assume that $S$ is countable (which could be finite).
We say $X$ has the \emph{Markov property} if for all $n\geq 0, i_0,\cdots, i_{n + 1}\in S$, we have
\[
\P(X_{n + 1} = i_{n + 1} \mid X_0 = i_0, \cdots, X_n = i_n) = \P(X_{n + 1} = i_{n + 1} \mid X_n = i_n).
\]
If $X$ has the Markov property, we call it a \emph{Markov chain}.
We say that a Markov chain $X$ is \emph{homogeneous} if the conditional probabilities $\P(X_{n + 1} = j \mid X_n = i)$ do not depend on $n$.
\end{defi}
All our chains $X$ will be Markov and homogeneous unless otherwise specified.
Since the state space $S$ is countable, we usually label the states by integers $i \in \N$.
\begin{eg}\leavevmode
\begin{enumerate}
\item A random walk is a Markov chain.
\item The branching process is a Markov chain.
\end{enumerate}
\end{eg}
In general, to fully specify a (homogeneous) Markov chain, we will need two items:
\begin{enumerate}
\item The initial distribution $\lambda_i = \P(X_0 = i)$. We can write this as a vector $\lambda = (\lambda_i: i \in S)$.
\item The transition probabilities $p_{i, j} = \P(X_{n + 1} = j \mid X_n = i)$. We can write this as a matrix $P = (p_{i, j})_{i, j\in S}$.
\end{enumerate}
We will start by proving a few properties of $\lambda$ and $P$. These let us know whether an arbitrary pair of vector and matrix $(\lambda, P)$ actually specifies a Markov chain.
\begin{prop}\leavevmode
\begin{enumerate}
\item $\lambda$ is a \emph{distribution}, i.e.\ $\lambda_i \geq 0, \sum_i \lambda_i = 1$.
\item $P$ is a \emph{stochastic matrix}, i.e.\ $p_{i, j} \geq 0$ and $\sum_j p_{i, j} = 1$ for all $i$.
\end{enumerate}
\end{prop}
\begin{proof}\leavevmode
\begin{enumerate}
\item Obvious since $\lambda$ is a probability distribution.
\item $p_{i, j} \geq 0$ since $p_{ij}$ is a probability. We also have
\[
\sum_j p_{i,j} = \sum_j \P(X_{1} = j \mid X_0 = i) = 1
\]
since $\P(X_1 = \ph \mid X_0 = i)$ is a probability distribution function.\qedhere
\end{enumerate}
\end{proof}
Note that we only require the row sum to be $1$, and the column sum need not be.
We will prove another seemingly obvious fact.
\begin{thm}
Let $\lambda$ be a distribution (on $S$) and $P$ a stochastic matrix. The sequence $X = (X_0, X_1, \cdots)$ is a Markov chain with initial distribution $\lambda$ and transition matrix $P$ iff
\[
\P(X_0 = i_0, X_1 = i_1, \cdots, X_n = i_n) = \lambda_{i_0}p_{i_0, i_1}p_{i_1, i_2}\cdots p_{i_{n - 1}, i_n}\tag{$*$}
\]
for all $n, i_0, \cdots, i_n$
\end{thm}
\begin{proof}
Let $A_k$ be the event $X_k = i_k$. Then we can write $(*)$ as
\[
\P(A_0\cap A_1\cap\cdots \cap A_n) = \lambda_{i_0}p_{i_0, i_1}p_{i_1, i_2}\cdots p_{i_{n - 1}, i_n}. \tag{$*$}
\]
We first assume that $X$ is a Markov chain. We prove $(*)$ by induction on $n$.
When $n = 0$, $(*)$ says $\P(A_0) = \lambda_{i_0}$. This is true by definition of $\lambda$.
Assume that it is true for all $n < N$. Then
\begin{align*}
\P(A_0 \cap A_1 \cap \cdots \cap A_N) &= \P(A_0,\cdots, A_{N - 1})\P(A_0, \cdots, A_N \mid A_0, \cdots, A_{N - 1})\\
&= \lambda_{i_0} p_{i_0, i_1}\cdots p_{i_{N - 2}, i_{N - 1}} \P(A_{N} \mid A_0,\cdots, A_{N - 1})\\
&= \lambda_{i_0} p_{i_0, i_1}\cdots p_{i_{N - 2}, i_{N - 1}} \P(A_{N} \mid A_{N - 1})\\
&= \lambda_{i_0} p_{i_0, i_1}\cdots p_{i_{N - 2}, i_{N - 1}} p_{i_{N - 1}, i_N}.
\end{align*}
So it is true for $N$ as well. Hence we are done by induction.
Conversely, suppose that ($*$) holds. Then for $n = 0$, we have $\P(X_0 = i_0) = \lambda_{i_0}$. Otherwise,
\begin{align*}
\P(X_n = i_n\mid X_0 = i_0, \cdots, X_{n - 1} = i_{n - 1}) &= \P(A_n \mid A_0 \cap \cdots\cap A_{n - 1})\\
&= \frac{\P(A_0\cap \cdots \cap A_n)}{\P(A_0\cap \cdots \cap A_{n - 1})}\\
&= p_{i_{n - 1}, i_n},
\end{align*}
which is independent of $i_0, \cdots, i_{n - 2}$. So this is Markov.
\end{proof}
Often, we do not use the Markov property directly. Instead, we use the following:
\begin{thm}[Extended Markov property]
Let $X$ be a Markov chain. For $n \geq 0$, any $H$ given in terms of the past $\{X_i: i < n\}$, and any $F$ given in terms of the future $\{X_i: i > n\}$, we have
\[
\P(F\mid X_n = i, H) = \P(F\mid X_n = i).
\]
\end{thm}
To prove this, we need to stitch together many instances of the Markov property. Actual proof is omitted.
\subsection{Transition probability}
Recall that we can specify the dynamics of a Markov chain by the \emph{one-step} transition probability,
\[
p_{i, j} = \P(X_{n + 1} = j\mid X_n = i).
\]
However, we don't always want to take 1 step. We might want to take 2 steps, 3 steps, or, in general, $n$ steps. Hence, we define
\begin{defi}[$n$-step transition probability]
The $n$-step transition probability from $i$ to $j$ is
\[
p_{i, j}(n) = \P(X_n = j\mid X_0 = i).
\]
\end{defi}
How do we compute these probabilities? The idea is to break this down into smaller parts. We want to express $p_{i, j}(m + n)$ in terms of $n$-step and $m$-step transition probabilities. Then we can iteratively break down an arbitrary $p_{i, j}(n)$ into expressions involving the one-step transition probabilities only.
To compute $p_{i, j}(m + n)$, we can think of this as a two-step process. We first go from $i$ to some unknown point $k$ in $m$ steps, and then travel from $k$ to $j$ in $n$ more steps. To find the probability to get from $i$ to $j$, we consider all possible routes from $i$ to $j$, and sum up all the probability of the paths. We have
\begin{align*}
p_{ij}(m + n) &= \P(X_{m + n}\mid X_0 = i)\\
&= \sum_k \P(X_{m + n} = j\mid X_m = k, X_0 = i)\P(X_m = k\mid X_0 = i)\\
&= \sum_k \P(X_{m + n} = j\mid X_m = k)\P(X_m = k \mid X_0 = i)\\
&= \sum_k p_{i, k}(m)p_{k, j}(n).
\end{align*}
Thus we get
\begin{thm}[Chapman-Kolmogorov equation]
\[
p_{i, j}(m + n) = \sum_{k\in S} p_{i, k}(m) p_{k, j}(n).
\]
\end{thm}
This formula is suspiciously familiar. It is just matrix multiplication!
\begin{notation}
Write $P(m) = (p_{i, j}(m))_{i, j\in S}$.
\end{notation}
Then we have
\[
P(m + n) = P(m)P(n)
\]
In particular, we have
\[
P(n) = P(1)P(n - 1) = \cdots = P(1)^n = P^n.
\]
This allows us to easily compute the $n$-step transition probability by matrix multiplication.
\begin{eg}
Let $S = \{1, 2\}$, with
\[
P =
\begin{pmatrix}
1 - \alpha & \alpha\\
\beta & 1 - \beta
\end{pmatrix}
\]
We assume $0 < \alpha, \beta < 1$. We want to find the $n$-step transition probability.
We can achieve this via diagonalization. We can write $P$ as
\[
P = U^{-1}
\begin{pmatrix}
\kappa_1 & 0\\
0 & \kappa_2
\end{pmatrix}U,
\]
where the $\kappa_i$ are eigenvalues of $P$, and $U$ is composed of the eigenvectors.
To find the eigenvalues, we calculate
\[
\det (P - \lambda I) = (1 - \alpha - \lambda)(1 - \beta - \lambda) - \alpha\beta = 0.
\]
We solve this to obtain
\[
\kappa_1 = 1,\quad \kappa_2 = 1 - \alpha - \beta.
\]
Usually, the next thing to do would be to find the eigenvectors to obtain $U$. However, here we can cheat a bit and not do that. Using the diagonalization of $P$, we have
\[
P^n = U^{-1}
\begin{pmatrix}
\kappa_1^n & 0\\
0 & \kappa_2^n
\end{pmatrix}U.
\]
We can now attempt to compute $p_{1, 2}$. We know that it must be of the form
\[
p_{1, 2} = A\kappa_1^n + B\kappa_2^n = A + B(1 - \alpha - \beta)^n
\]
where $A$ and $B$ are constants coming from $U$ and $U^{-1}$. However, we know well that
\[
p_{1, 2}(0) = 0,\quad p_{12}(1) = \alpha.
\]
So we obtain
\begin{align*}
A + B &= 0\\
A + B(1 - \alpha - \beta) &= \alpha.
\end{align*}
This is something we can solve, and obtain
\[
p_{1, 2}(n) = \frac{\alpha}{\alpha + \beta}(1 - (1 - \alpha - \beta)^n) = 1 - p_{1, 1}(n).
\]
How about $p_{2, 1}$ and $p_{2, 2}$? Well we don't need additional work. We can obtain these simply by interchanging $\alpha$ and $\beta$. So we obtain
\[
P^n = \frac{1}{\alpha + \beta}
\begin{pmatrix}
\beta + \alpha(1 - \alpha - \beta)^n & \alpha - \alpha(1 - \alpha - \beta)^n \\
\beta - \beta(1 - \beta - \alpha)^n & \alpha + \beta(1 - \beta - \alpha)^n \\
\end{pmatrix}
\]
What happens as $n\to \infty$? We can take the limit and obtain
\[
P^n \to \frac{1}{\alpha + \beta}
\begin{pmatrix}
\beta & \alpha\\
\beta & \alpha
\end{pmatrix}
\]
We see that the two rows are the same. This means that as time goes on, where we end up does not depend on where we started. We will later (near the end of the course) see that this is generally true for most Markov chains.
Alternatively, we can solve this by a difference equation. The recurrence relation is given by
\[
p_{1, 1}(n + 1) = p_{1, 1}(n)(1 - \alpha) + p_{1, 2}(n)\beta.
\]
Writing in terms of $p_{11}$ only, we have
\[
p_{1, 1}(n + 1) = p_{1, 1}(n)(1 - \alpha) + (1 - p_{1, 1}(n))\beta.
\]
We can solve this as we have done in IA Differential Equations.
\end{eg}
We saw that the Chapman-Kolmogorov equation can be concisely stated as a rule about matrix multiplication. In general, many statements about Markov chains can be formulated in the language of linear algebra naturally.
For example, let $X_0$ have distribution $\lambda$. What is the distribution of $X_1$? By definition, it is
\[
\P(X_1 = j) = \sum_i \P(X_1 = j\mid X_0 = i)\P(X_0 = i) = \sum_i \lambda_i p_{i, j}.
\]
Hence this has a distribution $\lambda P$, where $\lambda$ is treated as a row vector. Similarly, $X_n$ has the distribution $\lambda P^n$.
In fact, historically, Markov chains was initially developed as a branch of linear algebra, and a lot of the proofs were just linear algebra manipulations. However, nowadays, we often look at it as a branch of probability theory instead, and this is what we will do in this course. So don't be scared if you hate linear algebra.
\section{Classification of chains and states}
\subsection{Communicating classes}
Suppose we have a Markov chain $X$ over some state space $S$. While we would usually expect the different states in $S$ to be mutually interacting, it is possible that we have a state $i \in S$ that can never be reached, or we might get stuck in some state $j \in S$ and can never leave. These are usually less interesting. Hence we would like to rule out these scenarios, and focus on what we call \emph{irreducible chains}, where we can freely move between different states.
We start with an some elementary definitions.
\begin{defi}[Leading to and communicate]
Suppose we have two states $i, j\in S$. We write $i \to j$ ($i$ \emph{leads to} $j$) if there is some $n \geq 0$ such that $p_{i, j}(n) > 0$, i.e.\ it is possible for us to get from $i$ to $j$ (in multiple steps). Note that we allow $n = 0$. So we always have $i \to i$.
We write $i \leftrightarrow j$ if $i \to j$ and $j \to i$. If $i \leftrightarrow j$, we say $i$ and $j$ \emph{communicate}.
\end{defi}
\begin{prop}
$\leftrightarrow$ is an equivalence relation.
\end{prop}
\begin{proof}\leavevmode
\begin{enumerate}
\item Reflexive: we have $i \leftrightarrow i$ since $p_{i, i}(0) = 1$.
\item Symmetric: trivial by definition.
\item Transitive: suppose $i \to j$ and $j \to k$. Since $i \to j$, there is some $m > 0$ such that $p_{i, j}(m) > 0$. Since $j \to k$, there is some $n$ such that $p_{j, k}(n) > 0$. Then $p_{i, k}(m + n) = \sum_{r} p_{i, r}(m)p_{r k}(n) \geq p_{i, j}(m)p_{j, k}(n) > 0$. So $i \to k$.
Similarly, if $j \to i$ and $k \to j$, then $k \to i$. So $i \leftrightarrow j$ and $j \leftrightarrow k$ implies that $i \leftrightarrow k$.\qedhere
\end{enumerate}
\end{proof}
So we have an equivalence relation, and we know what to do with equivalence relations. We form equivalence classes!
\begin{defi}[Communicating classes]
The equivalence classes of $\leftrightarrow$ are \emph{communicating classes}.
\end{defi}
We have to be careful with these communicating classes. Different communicating classes are not completely isolated. Within a communicating class $A$, of course we can move between any two vertices. However, it is also possible that we can escape from a class $A$ to a different class $B$. It is just that after going to $B$, we cannot return to class $A$. From $B$, we might be able to get to another space $C$. We can jump around all the time, but (if there are finitely many communicating classes) eventually we have to stop when we have visited every class. Then we are bound to stay in that class.
Since we are eventually going to be stuck in that class anyway, often, we can just consider this final communicating class and ignore the others. So wlog we can assume that the chain only has one communicating class.
\begin{defi}[Irreducible chain]
A Markov chain is \emph{irreducible} if there is a unique communication class.
\end{defi}
From now on, we will mostly care about irreducible chains only.
More generally, we call a subset \emph{closed} if we cannot escape from it.
\begin{defi}[Closed]
A subset $C\subseteq S$ is \emph{closed} if $p_{i, j} = 0$ for all $i \in C, j\not\in C$.
\end{defi}
\begin{prop}
A subset $C$ is closed iff ``$i\in C, i\to j$ implies $j \in C$''.
\end{prop}
\begin{proof}
Assume $C$ is closed. Let $i \in C, i \to j$. Since $i \to j$, there is some $m$ such that $p_{i, j}(m) > 0$. Expanding the Chapman-Kolmogorov equation, we have
\[
p_{i, j}(m) = \sum_{i_1, \cdots, i_{m - 1}} p_{i, i_1}p_{i_1, i_2}, \cdots, p_{i_{m - 1},j} > 0.
\]
So there is some route $i, i_1, \cdots, i_{m - 1}, j$ such that $p_{i, i_1}, p_{i_1, i_2}, \cdots, p_{i_{m - 1}, j} > 0$. Since $p_{i, i_1} > 0$, we have $i_1\in C$. Since $p_{i_1,i_2} > 0$, we have $i_2\in C$. By induction, we get that $j \in C$.
To prove the other direction, assume that ``$i\in C, i \to j$ implies $j\in C$''. So for any $i\in C, j\not\in C$, then $i\not\to j$. So in particular $p_{i, j} = 0$.
\end{proof}
\begin{eg}
Consider $S = \{1, 2, 3, 4, 5, 6\}$ with transition matrix
\[
P =
\begin{pmatrix}
\frac{1}{2} & \frac{1}{2} & 0 & 0 & 0 & 0\\
0 & 0 & 1 & 0 & 0 & 0\\
\frac{1}{3} & 0 & 0 & \frac{1}{3} & \frac{1}{3} & 0\\
0 & 0 & 0 & \frac{1}{2} & \frac{1}{2} & 0\\
0 & 0 & 0 & 0 & 0 & 1\\
0 & 0 & 0 & 0 & 1 & 0\\
\end{pmatrix}
\]
\begin{center}
\begin{tikzpicture}
\node [mstate] (1) at (0, 0) {$1$};
\node [mstate] (2) at (1, 1.4) {$2$};
\node [mstate] (3) at (2, 0) {$3$};
\node [mstate] (4) at (3, -1.4) {$4$};
\node [mstate] (5) at (4, 0) {$5$};
\node [mstate] (6) at (6, 0) {$6$};
\draw (1) edge [loop left, ->] (1);
\draw (1) edge [->] (2);
\draw (2) edge [->] (3);
\draw (3) edge [->] (1);
\draw (3) edge [->] (4);
\draw (3) edge [->] (5);
\draw (4) edge [->] (5);
\draw (4) edge [loop below, ->] (4);
\draw (5) edge [bend left, ->] (6);
\draw (6) edge [bend left, ->] (5);
\end{tikzpicture}
\end{center}
We see that the communicating classes are $\{1, 2, 3\}$, $\{4\}$, $\{5, 6\}$, where $\{5, 6\}$ is closed.
\end{eg}
\subsection{Recurrence or transience}
The major focus of this chapter is recurrence and transience. This was something that came up when we discussed random walks in IA Probability --- given a random walk, say starting at $0$, what is the probability that we will return to $0$ later on? Recurrence and transience is a qualitative way of answering this question. As we mentioned, we will mostly focus on irreducible chains. So by definition there is always a non-zero probability of returning to $0$. Hence the question we want to ask is whether we are going to return to $0$ with certainty, i.e.\ with probability $1$. If we are bound to return, then we say the state is recurrent. Otherwise, we say it is transient.
It should be clear that this notion is usually interesting only for an infinite state space. If we have an infinite state space, we might get transience because we are very likely to drift away to a place far, far away and never return. However, in a finite state space, this can't happen. Transience can occur only if we get stuck in a place and can't leave, i.e.\ we are not in an irreducible state space. These are not interesting.
\begin{notation}
For convenience, we will introduce some notations. We write
\[
\P_i(A) = \P(A\mid X_0 = i),
\]
and
\[
\E_i(Z) = \E(Z\mid X_0 = i).
\]
\end{notation}
Suppose we start from $i$, and randomly wander around. Eventually, we may or may not get to $j$. If we do, there is a time at which we first reach $j$. We call this the \emph{first passage time}.
\begin{defi}[First passage time and probability]
The \emph{first passage time} of $j \in S$ starting from $i$ is
\[
T_j = \min\{n \geq 1: X_n = j\}.
\]
Note that this implicitly depends on $i$. Here we require $n \geq 1$. Otherwise $T_i$ would always be $0$.
The \emph{first passage probability} is
\[
f_{ij}(n) = \P_i(T_j = n).
\]
\end{defi}
\begin{defi}[Recurrent state]
A state $i\in S$ is \emph{recurrent} (or \emph{persistent}) if
\[
\P_i (T_i < \infty) = 1,
\]
i.e.\ we will eventually get back to the state. Otherwise, we call the state \emph{transient}.
\end{defi}
Note that transient does \emph{not} mean we don't get back. It's just that we are not sure that we will get back. We can show that if a state is recurrent, then the probability that we return to $i$ infinitely many times is also $1$.
Our current objective is to show the following characterization of recurrence.
\begin{thm}
$i$ is recurrent iff $\sum_n p_{i, i}(n) = \infty$.
\end{thm}
The technique to prove this would be to use generating functions. We need to first decide what sequence to work with. For any fixed $i, j$, consider the sequence $p_{ij}(n)$ as a sequence in $n$. Then we define
\[
P_{i, j}(s) = \sum_{n = 0}^\infty p_{i, j}(n) s^n.
\]
We also define
\[
F_{i, j}(S) = \sum_{n = 0}^\infty f_{i, j}(n) s^n,
\]
where $f_{i, j}$ is our first passage probability. For the sake of clarity, we make it explicit that $p_{i, j}(0) = \delta_{i, j}$, and $f_{i, j}(0) = 0$.
Our proof would be heavily based on the result below:
\begin{thm}
\[
P_{i, j}(s) = \delta_{i, j} + F_{i, j}(s)P_{j, j}(s),
\]
for $-1 < s \leq 1$.
\end{thm}
\begin{proof}
Using the law of total probability
\begin{align*}
p_{i, j}(n) &= \sum_{m = 1}^n \P_i(X_n = j\mid T_j = m) \P_i(T_j = m) \\
\intertext{Using the Markov property, we can write this as}
&= \sum_{m = 1}^n \P(X_n = j\mid X_m = j) \P_i(T_j = m)\\
&= \sum_{m = 1}^n p_{j, j}(n - m) f_{i, j}(m).
\end{align*}
We can multiply through by $s^n$ and sum over all $n$ to obtain
\[
\sum_{n = 1}^\infty p_{i, j}(n) s^n = \sum_{n = 1}^\infty \sum_{m = 1}^n p_{j, j}(n - m)s^{n - m} f_{i, j}(m)s^m.
\]
The left hand side is \emph{almost} the generating function $P_{i, j}(s)$, except that we are missing an $n = 0$ term, which is $p_{i, j}(0) = \delta_{i, j}$. The right hand side is the ``convolution'' of the power series $P_{j, j}(s)$ and $F_{i, j}(s)$, which we can write as the product $P_{j, j}(s) F_{i, j}(s)$. So
\[
P_{i, j}(s) - \delta_{i, j} = P_{j, j}(s) F_{i, j}(s).\qedhere
\]
\end{proof}
Before we actually prove our theorem, we need one helpful result from Analysis that allows us to deal with power series nicely.
\begin{lemma}[Abel's lemma]
Let $u_1, u_2, \cdots$ be real numbers such that $U(s) = \sum_{n} u_n s^n$ converges for $0 < s < 1$. Then
\[
\lim_{s\to 1^-} U(s) = \sum_n u_n.
\]
\end{lemma}
Proof is an exercise in analysis, which happens to be on the first example sheet of Analysis II.
We now prove the theorem we initially wanted to show
\begin{thm}
$i$ is recurrent iff $\sum_n p_{ii}(n) = \infty$.
\end{thm}
\begin{proof}
Using $j = i$ in the above formula, for $0 < s < 1$, we have
\[
P_{i, i}(s) = \frac{1}{1 - F_{i, i} (s)}.
\]
Here we need to be careful that we are not dividing by $0$. This would be a problem if $F_{ii}(s) = 1$. By definition, we have
\[
F_{i, i}(s) = \sum_{n = 1}^\infty f_{i, i}(n) s^n.
\]
Also, by definition of $f_{ii}$, we have
\[
F_{i, i}(1) = \sum_n f_{i, i}(n) = \P(\text{ever returning to }i) \leq 1.
\]
So for $|s| < 1$, $F_{i, i}(s) < 1$. So we are not dividing by zero. Now we use our original equation
\[
P_{i, i}(s) = \frac{1}{1 - F_{i, i} (s)},
\]
and take the limit as $s \to 1$. By Abel's lemma, we know that the left hand side is
\[
\lim_{s \to 1}P_{i, i}(s) = P_{i, i}(1) = \sum_n p_{i, i}(n).
\]
The other side is
\[
\lim_{s \to 1}\frac{1}{1 - F_{i, i}(s)} = \frac{1}{1 - \sum f_{i, i}(n)}.
\]
Hence we have
\[
\sum_n p_{i, i}(n) = \frac{1}{1 - \sum f_{i, i}(n)}.
\]
Since $\sum f_{i, i}(n)$ is the probability of ever returning, the probability of ever returning is 1 if and only if $\sum_n p_{i, i}(n) = \infty$.
\end{proof}
Using this result, we can check if a state is recurrent. However, a Markov chain has many states, and it would be tedious to check every state individually. Thus we have the following helpful result.
\begin{thm}
Let $C$ be a communicating class. Then
\begin{enumerate}
\item Either every state in $C$ is recurrent, or every state is transient.
\item If $C$ contains a recurrent state, then $C$ is closed.
\end{enumerate}
\end{thm}
\begin{proof}\leavevmode
\begin{enumerate}
\item Let $i \leftrightarrow j$ and $i \not =j$. Then by definition of communicating, there is some $m$ such that $p_{i, j}(m) = \alpha > 0$, and some $n$ such that $p_{j, i}(n) = \beta > 0$. So for each $k$, we have
\[
p_{i, i}(m + k + n) \geq p_{i, j}(m) p_{j, j}(k) p_{j, i}(n) = \alpha\beta p_{j, j}(k).
\]
So if $\sum_k p_{j, j}(k) = \infty$, then $\sum_r p_{i, i}(r) = \infty$. So $j$ recurrent implies $i$ recurrent. Similarly, $i$ recurrent implies $j$ recurrent.
\item If $C$ is not closed, then there is a non-zero probability that we leave the class and never get back. So the states are not recurrent.\qedhere
\end{enumerate}
\end{proof}
Note that there is a profound difference between a finite state space and an infinite state space. A finite state space can be represented by a finite matrix, and we are all very familiar with a finite matrices. We can use everything we know about finite matrices from IA Vectors and Matrices. However, infinite matrices are weirder.
For example, any finite transition matrix $P$ has an eigenvalue of $1$. This is since the row sums of a transition matrix is always $1$. So if we multiply $P$ by $\mathbf{e} = (1, 1, \cdots, 1)$, then we get $\mathbf{e}$ again. However, this is not true for infinite matrices, since we usually don't usually allow arbitrary infinite vectors. To avoid getting infinitely large numbers when multiplying vectors and matrices, we usually restrict our focus to vectors $\mathbf{x}$ such that $\sum x_i^2$ is finite. In this case the vector $\mathbf{e}$ is not allowed, and the transition matrix need not have eigenvalue $1$.
Another thing about a finite state space is that probability ``cannot escape''. Each step of a Markov chain gives a probability distribution on the state space, and we can imagine the progression of the chain as a flow of probabilities around the state space. If we have a finite state space, then all the probability flow must be contained within our finite state space. However, if we have an infinite state space, then probabilities can just drift away to infinity.
More concretely, we get the following result about finite state spaces.
\begin{thm}
In a finite state space,
\begin{enumerate}
\item There exists at least one recurrent state.
\item If the chain is irreducible, every state is recurrent.
\end{enumerate}
\end{thm}
\begin{proof}
(ii) follows immediately from (i) since if a chain is irreducible, either all states are transient or all states are recurrent. So we just have to prove (i).
We first fix an arbitrary $i$. Recall that
\[
P_{i, j}(s) = \delta_{i, j} + P_{j, j}(s) F_{i, j}(s).
\]
If $j$ is transient, then $\sum_n p_{j, j}(n) = P_{j, j}(1) < \infty$. Also, $F_{i, j}(1)$ is the probability of ever reaching $j$ from $i$, and is hence finite as well. So we have $P_{i, j}(1) < \infty$. By Abel's lemma, $P_{i, j}(1)$ is given by
\[
P_{i, j}(1) = \sum_n p_{i, j}(n).
\]
Since this is finite, we must have $p_{i, j}(n)\to 0$.
Since we know that
\[
\sum_{j\in S}p_{i, j}(n) = 1,
\]
if every state is transient, then since the sum is finite, we know $\sum p_{i, j}(n) \to 0$ as $n\to \infty$. This is a contradiction. So we must have a recurrent state.
\end{proof}
\begin{thm}[P\'olya's theorem]
Consider $\Z^d = \{(x_1, x_2, \cdots, x_d): x_i \in \Z\}$. This generates a graph with $x$ adjacent to $y$ if $|x - y| = 1$, where $|\ph |$ is the Euclidean norm.
\begin{center}
\begin{tikzpicture}[scale=0.75]
\node at (-4, 0) {$d = 1$};
\draw (-2.5, 0) -- (2.5, 0);
\foreach \x in {-2,-1,...,2} {
\node [circ] at (\x, 0) {};
}
\begin{scope}[shift={(0, -3.5)}]
\node at (-4, 2) {$d = 2$};
\foreach \x in {-2, -1,...,2} {
\foreach \y in {-2,-1,...,2} {
\node [circ] at (\x, \y) {};
}
}
\foreach \x in {-2, -1,...,2} {
\draw (\x, -2.5) -- (\x, 2.5);
\draw (-2.5, \x) -- (2.5, \x);
}
\end{scope}
\end{tikzpicture}
\end{center}
Consider a random walk in $\Z^d$. At each step, it moves to a neighbour, each chosen with equal probability, i.e.
\[
\P(X_{n + 1} = j\mid X_n = i) =
\begin{cases}
\frac{1}{2d} & |j - i| = 1\\
0 & \text{otherwise}
\end{cases}
\]
This is an irreducible chain, since it is possible to get from one point to any other point. So the points are either all recurrent or all transient.
The theorem says this is recurrent iff $d = 1$ or $2$.
\end{thm}
Intuitively, this makes sense that we get recurrence only for low dimensions, since if we have more dimensions, then it is easier to get lost.
\begin{proof}
We will start with the case $d = 1$. We want to show that $\sum p_{0, 0}(n) = \infty$. Then we know the origin is recurrent. However, we can simplify this a bit. It is impossible to get back to the origin in an odd number of steps. So we can instead consider $\sum p_{0, 0}(2n)$. However, we can write down this expression immediately. To return to the origin after $2n$ steps, we need to have made $n$ steps to the left, and $n$ steps to the right, in any order. So we have
\[
p_{0, 0}(2n) = \P(n\text{ steps left}, n\text{ steps right}) = \binom{2n}{n} \left(\frac{1}{2}\right)^{2n}.
\]
To show if this converges, it is not helpful to work with these binomial coefficients and factorials. So we use Stirling's formula $n! \simeq \sqrt{2\pi n}\left(\frac{n}{e}\right)^n$. If we plug this in, we get
\[
p_{0, 0}(2n) \sim \frac{1}{\sqrt{\pi n}}.
\]
This tends to $0$, but really slowly, and even more slowly than the harmonic series. So we have $\sum p_{0, 0}(2n) = \infty$.
In the $d = 2$ case, suppose after $2n$ steps, I have taken $r$ steps right, $\ell$ steps left, $u$ steps up and $d$ steps down. We must have $r + \ell + u + d = 2n$, and we need $r = \ell, u = d$ to return the origin. So we let $r = \ell = m, u = d = n - m$. So we get
\begin{align*}
p_{0, 0}(2n) &= \left(\frac{1}{4}\right)^{2n} \sum_{m = 0}^n \binom{2n}{m, m, n - m, n - m} \\
&= \left(\frac{1}{4}\right)^{2n} \sum_{m = 0}^n \frac{(2n)!}{(m!)^2 ((n - m)!)^2} \\
&= \left(\frac{1}{4}\right)^{2n} \binom{2n}{n}\sum_{m = 0}^n \left(\frac{n!}{m!(n - m)!}\right)^2\\
&= \left(\frac{1}{4}\right)^{2n} \binom{2n}{n}\sum_{m = 0}^n \binom{n}{m}\binom{n}{n - m}\\
\intertext{We now use a well-known identity (proved in IA Numbers and Sets) to obtain}
&= \left(\frac{1}{4}\right)^{2n} \binom{2n}{n} \binom{2n}{n}\\
&= \left[\binom{2n}{n} \left(\frac{1}{2}\right)^{2n}\right]^2\\
&\sim \frac{1}{\pi n}.
\end{align*}
So the sum diverges. So this is recurrent. Note that the two-dimensional probability turns out to be the square of the one-dimensional probability. This is not a coincidence, and we will explain this after the proof. However, this does not extend to higher dimensions.
In the $d = 3$ case, we have
\[
p_{0, 0}(2n) = \left(\frac{1}{6}\right)^{2n}\sum_{i + j + k = n}\frac{(2n)!}{(i!j!k!)^2}.
\]
This time, there is no neat combinatorial formula. Since we want to show this is summable, we can try to bound this from above. We have
\begin{align*}
p_{0, 0}(2n) &= \left(\frac{1}{6}\right)^{2n} \binom{2n}{n} \sum \left(\frac{n!}{i!j!k!}\right)^2\\
&= \left(\frac{1}{2}\right)^{2n} \binom{2n}{n} \sum \left(\frac{n!}{3^n i!j!k!}\right)^2\\
\intertext{Why do we write it like this? We are going to use the identity $\displaystyle \sum_{i + j + k = n} \frac{n!}{3^n i!j!k!} = 1$. Where does this come from? Suppose we have three urns, and throw $n$ balls into it. Then the probability of getting $i$ balls in the first, $j$ in the second and $k$ in the third is exactly $\frac{n!}{3^n i!j!k!}$. Summing over all possible combinations of $i$, $j$ and $k$ gives the total probability of getting in any configuration, which is $1$. So we can bound this by}
&\leq \left(\frac{1}{2}\right)^{2n}\binom{2n}{n} \max\left(\frac{n!}{3^n i!j!k!}\right)\sum \frac{n!}{3^n i!j!k!}\\
&= \left(\frac{1}{2}\right)^{2n}\binom{2n}{n} \max\left(\frac{n!}{3^n i!j!k!}\right)
\intertext{To find the maximum, we can replace the factorial by the gamma function and use Lagrange multipliers. However, we would just argue that the maximum can be achieved when $i$, $j$ and $k$ are as close to each other as possible. So we get}
&\leq \left(\frac{1}{2}\right)^{2n}\binom{2n}{n} \frac{n!}{3^n}\left(\frac{1}{\lfloor n/3\rfloor!}\right)^3\\
&\leq Cn^{-3/2}
\end{align*}
for some constant $C$ using Stirling's formula. So $\sum p_{0,0}(2n) < \infty$ and the chain is transient. We can prove similarly for higher dimensions.
\end{proof}
Let's get back to why the two dimensional probability is the square of the one-dimensional probability. This square might remind us of independence. However, it is obviously not true that horizontal movement and vertical movement are independent --- if we go sideways in one step, then we cannot move vertically. So we need something more sophisticated.
We write $X_n = (A_n, B_n)$. What we do is that we try to rotate our space. We are going to record our coordinates in a pair of axis that is rotated by $45^\circ$.
\begin{center}
\begin{tikzpicture}
\draw [->] (-2, 0) -- (4, 0) node [right] {$A$};
\draw [->] (0, -3) -- (0, 3) node [above] {$B$};
\node [circ] at (3, 2) {};
\node [right] at (3, 2) {$(A_n, B_n)$};
\draw [mred, ->] (-1, -1) -- (3, 3) node [anchor = south west] {$V$};
\draw [mred, ->] (-1, 1) -- (3, -3) node [anchor = north west] {$U$};
\draw [mred, dashed] (3, 2) -- (0.5, -0.5) node [anchor = north east] {$U_n/\sqrt{2}$};
\draw [mred, dashed] (3, 2) -- (2.5, 2.5) node [anchor = south east] {$V_n/\sqrt{2}$};
\end{tikzpicture}
\end{center}
We can define the new coordinates as
\begin{align*}
U_n &= A_n - B_n\\
V_n &= A_n + B_n
\end{align*}
In each step, either $A_n$ or $B_n$ change by one step. So $U_n$ and $V_n$ \emph{both} change by $1$. Moreover, they are independent. So we have
\begin{align*}
p_{0, 0}(2n) &= \P(A_n = B_n = 0)\\
&= \P(U_n = V_n = 0)\\
&= \P(U_n = 0)\P (V_n = 0)\\
&= \left[\binom{2n}{n}\left(\frac{1}{2}\right)^{2n}\right]^2.
\end{align*}
\subsection{Hitting probabilities}
Recurrence and transience tells us if we are going to return to the original state with (almost) certainty. Often, we would like to know something more qualitative. What is the actual probability of returning to the state $i$? If we return, what is the expected duration of returning?
We can formulate this in a more general setting. Let $S$ be our state space, and let $A \subseteq S$. We want to know how likely and how long it takes for us to reach $A$. For example, if we are in a casino, we want to win, say, a million, and don't want to go bankrupt. So we would like to know the probability of reaching $A = \{1\text{ million}\}$ and $A = \{0\}$.
\begin{defi}[Hitting time]
The \emph{hitting time} of $A \subseteq S$ is the random variable $H^A = \min\{n \geq 0: X_n \in A\}$. In particular, if we start in $A$, then $H^A = 0$. We also have
\[
h_i^A = \P_i(H^A < \infty) = \P_i(\text{ever reach }A).
\]
\end{defi}
To determine hitting times, we mostly rely on the following result:
\begin{thm}
The vector $(h_i^A: i \in S)$ satisfies
\[
h_i^A =
\begin{cases}
1 & i \in A\\
\sum_{j \in S}p_{i, j}h_j^A & i \not \in A
\end{cases},
\]
and is \emph{minimal} in that for any non-negative solution $(x_i: i \in S)$ to these equations, we have $h_i^A \leq x_i$ for all $i$.
\end{thm}
It is easy to show that $h_i^A$ satisfies the formula given, but it takes some more work to show that $h_i^A$ is the minimal. Recall, however, that we have proved a similar result for random walks in IA probability, and the proof is more-or-less the same.
\begin{proof}
By definition, $h_i^A = 1$ if $i \in A$. Otherwise, we have
\[
h_i^A = \P_i(H^A < \infty) = \sum_{j\in S} \P_i(H^A < \infty \mid X_1 = j)p_{i, j} = \sum_{j\in S}h_{j}^A p_{i, j}.
\]
So $h_i^A$ is indeed a solution to the equations.
To show that $h_i^A$ is the minimal solution, suppose $x = (x_i: i \in S)$ is a non-negative solution, i.e.
\[
x_i =
\begin{cases}
1 & i \in A\\
\sum_{j \in S}p_{i, j}x_j A& i \not \in A
\end{cases},
\]
If $i \in A$, we have $h_i^A = x_i = 1$. Otherwise, we can write
\begin{align*}
x_i &= \sum_j p_{i, j}x_j \\
&= \sum_{j \in A}p_{i, j}x_j + \sum_{j \not\in A}p_{i, j}x_j \\
&= \sum_{j \in A}p_{i, j} + \sum_{j \not\in A}p_{i, j}x_j \\
&\geq \sum_{j \in A}p_{i, j}\\
&= \P_i(H^A = 1).
\end{align*}
By iterating this process, we can write
\begin{align*}
x_i &= \sum_{j \in A} p_{i, j} + \sum_{j \not\in A} p_{i, j}\left(\sum_k p_{i, k} x_k\right) \\
&= \sum_{j \in A} p_{i, j} + \sum_{j \not\in A} p_{i, j}\left(\sum_{k\in A} p_{i, k} x_k + \sum_{k \not\in A} p_{i, k}x_k\right)\\
&\geq \P_i (H^A = 1) + \sum_{j \not \in A, k \in A}p_{i, j}p_{j, k}\\
&= \P_i(H^A = 1) + \P_i(H^A = 2)\\
&= \P_i (H^A \leq 2).
\end{align*}
By induction, we obtain
\[
x_i \geq \P_i(H^A \leq n)
\]
for all $n$. Taking the limit as $n \to \infty$, we get
\[
x_i \geq \P_i(H^A \leq \infty) = h_i^A.
\]
So $h_i^A$ is minimal.
\end{proof}
The next question we want to ask is how long it will take for us to hit $A$. We want to find $\E_i(H^A) = k_i^A$. Note that we have to be careful --- if there is a chance that we never hit $A$, then $H^A$ could be infinite, and $\E_i(H^A) = \infty$. This occurs if $h_i^A < 1$. So often we are only interested in the case where $h_i^A = 1$ (note that $h_i^A = 1$ does \emph{not} imply that $k_i^A < \infty$. It is merely a necessary condition).
We get a similar result characterizing the expected hitting time.
\begin{thm}
$(k_i^A: i \in S)$ is the minimal non-negative solution to
\[
k_i^A =
\begin{cases}
0 & i \in A\\
1 + \sum_j p_{i, j}k_j^A & i \not\in A.
\end{cases}
\]
\end{thm}
Note that we have this ``$1 +$'' since when we move from $i$ to $j$, one step has already passed.
The proof is almost the same as the proof we had above.
\begin{proof}
The proof that $(k_i^A)$ satisfies the equations is the same as before.
Now let $(y_i : i\in S)$ be a non-negative solution. We show that $y_i \geq k_i^A$.
If $i \in A$, we get $y_i = k_i^A = 0$. Otherwise, suppose $i\not\in A$. Then we have
\begin{align*}
y_i &= 1 + \sum_j p_{i, j}y_j\\
&= 1 + \sum_{j \in A}p_{i, j}y_j + \sum_{j\not\in A}p_{i, j}y_j\\
&= 1 + \sum_{j\not\in A}p_{i, j}y_j\\
&= 1 + \sum_{j \not\in A}p_{i, j}\left(1 + \sum_{k\not\in A} p_{j, k}y_k\right)\\
&\geq 1 + \sum_{j \not\in A} p_{i, j}\\
&= \P_i(H^A \geq 1) + \P_i(H^A \geq 2).
\end{align*}
By induction, we know that
\[
y_i \geq \P_i (H^A \geq 1) + \cdots + \P_i(H^A \geq n)
\]
for all $n$. Let $n\to \infty$. Then we get
\[
y_i \geq \sum_{m \geq 1}\P_i(H^A \geq m) = \sum_{m \geq 1} m\P_i(H^A = m)= k_i^A.\qedhere
\]
\end{proof}
\begin{eg}[Gambler's ruin]
This time, we will consider a random walk on $\N$. In each step, we either move to the right with probability $p$, or to the left with probability $q = 1 - p$. What is the probability of ever hitting $0$ from a given initial point? In other words, we want to find $h_i = h_i^{\{0\}}$.
We know $h_i$ is the minimal solution to
\[
h_i =
\begin{cases}
1 & i = 0\\
qh_{i - 1} + ph_{i + 1} & i \not= 0.
\end{cases}
\]
What are the solutions to these equations? We can view this as a difference equation
\[
ph_{i + 1} - h_i + qh_{i - 1} = 0,\quad i \geq 1.
\]
with the boundary condition that $h_0 = 1$. We all know how to solve difference equations, so let's just jump to the solution.
If $p \not= q$, i.e.\ $p \not= \frac{1}{2}$, then the solution has the form
\[
h_i = A + B\left(\frac{q}{p}\right)^i
\]
for $i \geq 0$. If $p < q$, then for large $i$, $\left(\frac{q}{p}\right)^i$ is very large and blows up. However, since $h_i$ is a probability, it can never blow up. So we must have $B = 0$. So $h_i$ is constant. Since $h_0 = 1$, we have $h_i = 1$ for all $i$. So we always get to $0$.
If $p > q$, since $h_0 = 1$, we have $A + B = 1$. So
\[
h_i = \left(\frac{q}{p}\right)^i + A\left(1 - \left(\frac{q}{p}\right)^i\right).
\]
This is in fact a solution for all $A$. So we want to find the smallest solution.
As $i \to\infty$, we get $h_i \to A$. Since $h_i \geq 0$, we know that $A \geq 0$. Subject to this constraint, the minimum is attained when $A = 0$ (since $(q/p)^i$ and $(1 - (q/p)^i)$ are both positive). So we have
\[
h_i = \left(\frac{q}{p}\right)^i.
\]
There is another way to solve this. We can give ourselves a ceiling, i.e.\ we also stop when we hit $k > 0$, i.e.\ $h_k = 1$. We now have two boundary conditions and can find a unique solution. Then we take the limit as $k \to \infty$. This is the approach taken in IA Probability.
Here if $p = q$, then by the same arguments, we get $h_i = 1$ for all $i$.
\end{eg}
\begin{eg}[Birth-death chain]
Let $(p_i: i \geq 1)$ be an arbitrary sequence such that $p_i \in (0, 1)$. We let $q_i = 1 - p_i$. We let $\N$ be our state space and define the transition probabilities to be
\[
p_{i, i + 1} = p_i,\quad p_{i, i - 1} = q_i.
\]
This is a more general case of the random walk --- in the random walk we have a constant $p_i$ sequence.
This is a general model for population growth, where the change in population depends on what the current population is. Here each ``step'' does not correspond to some unit time, since births and deaths occur rather randomly. Instead, we just make a ``step'' whenever some birth or death occurs, regardless of what time they occur.
Here, if we have no people left, then it is impossible for us to reproduce and get more population. So we have
\[
p_{0, 0} = 1.
\]
We say $0$ is \emph{absorbing} in that $\{0\}$ is closed. We let $h_i = h_i^{\{0\}}$. We know that
\[
h_0 = 1,\quad p_i h_{i + 1} - h_i + q_i h_{i - 1} = 0,\quad i \geq 1.
\]
This is no longer a difference equation, since the coefficients depends on the index $i$. To solve this, we need magic. We rewrite this as
\begin{align*}
p_i h_{i + 1} - h_i + q_i h_{i - 1} &= p_i h_{i + 1} - (p_i + q_i) h_i + q_i h_{i - 1} \\
&= p_i (h_{i + 1} - h_i) - q_i(h_i - h_{i - 1}).
\end{align*}
We let $u_i = h_{i - 1} - h_i$ (picking $h_i - h_{i - 1}$ might seem more natural, but this definition makes $u_i$ positive). Then our equation becomes
\[
u_{i + 1} = \frac{q_i}{p_i} u_i.
\]
We can iterate this to become
\[
u_{i + 1} = \left(\frac{q_i}{p_i}\right)\left(\frac{q_{i - 1}}{p_{i - 1}}\right) \cdots \left(\frac{q_1}{p_1}\right) u_1.
\]
We let
\[
\gamma_i = \frac{q_1q_2\cdots q_i}{p_1p_2\cdots p_i}.
\]
Then we get $u_{i + 1} = \gamma_i u_1$. For convenience, we let $\gamma_0 = 1$. Now we want to retrieve our $h_i$. We can do this by summing the equation $u_i = h_{i - 1} - h_i$. So we get
\[
h_0 - h_i = u_1 + u_2 + \cdots + u_i.
\]
Using the fact that $h_0 = 1$, we get
\[
h_i = 1 - u_1(\gamma_0 + \gamma_1 + \cdots + \gamma_{i - 1}).
\]
Here we have a parameter $u_1$, and we need to find out what this is. Our theorem tells us the value of $u_1$ minimizes $h_i$. This all depends on the value of
\[
S = \sum_{i = 0}^\infty \gamma_i.
\]
By the law of excluded middle, $S$ either diverges or converges. If $S = \infty$, then we must have $u_1 = 0$. Otherwise, $h_i$ blows up for large $i$, but we know that $0 \leq h_i \leq 1$. If $S$ is finite, then $u_1$ can be non-zero. We know that the $\gamma_i$ are all positive. So to minimize $h_i$, we need to maximize $u_1$. We cannot make $u_1$ arbitrarily large, since this will make $h_i$ negative. To find the maximum possible value of $u_1$, we take the limit as $i \to\infty$. Then we know that the maximum value of $u_1$ satisfies
\[
0 = 1 - u_1 S.
\]
In other words, $u_1 = 1/S$. So we have
\[
h_i = \frac{\sum_{k = i}^\infty \gamma_k}{\sum_{k = 0}^\infty \gamma_k}.
\]
\end{eg}
\subsection{The strong Markov property and applications}
We are going to state the \emph{strong} Markov property and see applications of it. Before this, we should know what the \emph{weak} Markov property is. We have, in fact, already seen the weak Markov property. It's just that we called it the ``Markov property' instead.
In probability, we often have ``strong'' and ``weak'' versions of things. For example, we have the strong and weak law of large numbers. The difference is that the weak versions are expressed in terms of probabilities, while the strong versions are expressed in terms of random variables.
Initially, when people first started developing probability theory, they just talked about probability distributions like the Poisson distribution or the normal distribution. However, later it turned out it is often nicer to talk about random variables instead. After messing with random variables, we can just take expectations or evaluate probabilities to get the corresponding statement about probability distributions. Hence usually the ``strong'' versions imply the ``weak'' version, but not the other way round.
In this case, recall that we defined the Markov property in terms of the probabilities at some fixed time. We have some fixed time $t$, and we want to know the probabilities of events after $t$ in terms of what happened before $t$. In the strong Markov property, we will allow $t$ to be a random variable, and say something slightly more involved. However, we will not allow $T$ to be any random variable, but just some nice ones.
\begin{defi}[Stopping time]
Let $X$ be a Markov chain. A random variable $T$ (which is a function $\Omega \to \N\cup \{\infty\}$) is a \emph{stopping time} for the chain $X = (X_n)$ if for $n \geq 0$, the event $\{T = n\}$ is given in terms of $X_0, \cdots, X_n$.
\end{defi}
For example, suppose we are in a casino and gambling. We let $X_n$ be the amount of money we have at time $n$. Then we can set our stopping time as ``the time when we have $\$10$ left''. This is a stopping time, in the sense that we can use this as a guide to when to stop --- it is certainly possible to set yourself a guide that you should leave the casino when you have $\$10$ left. However, it does not make sense to say ``I will leave if the next game will make me bankrupt'', since there is no way to tell if the next game will make you bankrupt (it certainly will not if you win the game!). Hence this is not a stopping time.
\begin{eg}
The hitting time $H^A$ is a stopping time. This is since $\{H^A = n\} = \{X_i \not\in A\text{ for }i < n\} \cap \{X_n \in A\}$. We also know that $H^A + 1$ is a stopping time since it only depends in $X_i$ for $i \leq n - 1$. However, $H^A - 1$ is not a stopping time since it depends on $X_{n + 1}$.
\end{eg}
We can now state the strong Markov property, which is expressed in a rather complicated manner but can be useful at times.
\begin{thm}[Strong Markov property]
Let $X$ be a Markov chain with transition matrix $P$, and let $T$ be a stopping time for $X$. Given $T < \infty$ and $X_T = i$, the chain $(X_{T + k}: k \geq 0)$ is a Markov chain with transition matrix $P$ with initial distribution $X_{T + 0} = i$, and this Markov chain is independent of $X_0, \cdots, X_T$.
\end{thm}
Proof is omitted.
\begin{eg}[Gambler's ruin]
Again, this is the Markov chain taking values on the non-negative integers, moving to the right with probability $p$ and left with probability $q = 1 - p$. $0$ is an absorbing state, since we have no money left to bet if we are broke.
Instead of computing the probability of hitting zero, we want to find the time it takes to get to $0$, i.e.
\[
H = \inf\{n \geq 0: X_n = 0\}.
\]
Note that the infimum of the empty set is $+\infty$, i.e.\ if we never hit zero, we say it takes infinite time. What is the distribution of $H$? We define the generating function
\[
G_i(s) = \E_i(s^H) = \sum_{n = 0}^\infty s^n \P_i(H = n),\quad |s| < 1.
\]
Note that we need the requirement that $|s| < 1$, since it is possible that $H$ is infinite, and we would not want to think whether the sum converges when $s = 1$. However, we know that it does for $|s| < 1$.
We have
\[
G_1(s) = \E_1(s^H) = p\E_1(s^H\mid X_1 = 2) + q\E_1(s^H\mid X_1 = 0).
\]
How can we simplify this? The second term is easy, since if $X_1 = 0$, then we must have $H = 1$. So $\E_1(s^H\mid X_1 = 0) = s$. The first term is more tricky. We are now at $2$. To get to $0$, we need to pass through $1$. So the time needed to get to $0$ is the time to get from $2$ to $1$ (say $H'$), plus the time to get from $1$ to $0$ (say $H''$). We know that $H'$ and $H''$ have the same distribution as $H$, and by the strong Markov property, they are independent. So
\[
G_1 = p\E_1(s^{H' + H'' + 1}) + qs = psG_1^2 + qs.\tag{$*$}
\]
Solving this, we get two solutions
\[
G_1 (s) = \frac{1 \pm \sqrt{1 - 4pqs^2}}{2ps}.
\]
We have to be careful here. This result says that for each value of $s$, $G_1(s)$ is \emph{either} $\frac{1 + \sqrt{1 - 4pqs^2}}{2ps}$ \emph{or} $\frac{1 - \sqrt{1 - 4pqs^2}}{2ps}$. It does \emph{not} say that there is some consistent choice of $+$ or $-$ that works for everything.
However, we know that if we suddenly change the sign, then $G_1(s)$ will be discontinuous at that point, but $G_1$, being a power series, has to be continuous. So the solution must be either $+$ for all $s$, or $-$ for all $s$.
To determine the sign, we can look at what happens when $s = 0$. We see that the numerator becomes $1 \pm 1$, while the denominator is $0$. We know that $G$ converges at $s = 0$. Hence the numerator must be $0$. So we must pick $-$, i.e.
\[
G_1 (s) = \frac{1 - \sqrt{1 - 4pqs^2}}{2ps}.
\]
We can find $\P_1(H = k)$ by expanding the Taylor series.
What is the probability of ever hitting $0$? This is
\[
\P_1(H < \infty) = \sum_{n = 1}^\infty \P_1(H = n) = \lim_{s\to 1} G_1(s) = \frac{1 - \sqrt{1 - 4pq}}{2p}.
\]
We can rewrite this using the fact that $q = 1 - p$. So $1 - 4pq = 1 - 4p(1 - p) = (1 - 2p)^2 = |q - p|^2$. So we can write
\[
\P_1(H < \infty) = \frac{1 - |p - q|}{2p} =
\begin{cases}
1 & p \leq q\\
\frac{q}{p} & p > q
\end{cases}.
\]
Using this, we can also find $\mu = \E_1(H)$. Firstly, if $p > q$, then it is possible that $H = \infty$. So $\mu = \infty$. If $p \leq q$, we can find $\mu$ by differentiating $G_1(s)$ and evaluating at $s = 1$. Doing this directly would result it horrible and messy algebra, which we want to avoid. Instead, we can differentiate $(*)$ and obtain
\[
G_1' = pG_1^2 + ps 2 G_1G_1' + q.
\]
We can rewrite this as
\[
G_1'(s) = \frac{pG(s)^2 + q}{1 - 2ps G(s)}.
\]
By Abel's lemma, we have
\[
\mu = \lim_{s \to 1}G'(s) =
\begin{cases}
\infty & p = \frac{1}{2}\\
\frac{1}{p - q} & p < \frac{1}{2}
\end{cases}.
\]
\end{eg}
\subsection{Further classification of states}
So far, we have classified chains in say, irreducible and reducible chains. We have also seen that states can be recurrent or transient. By definition, a state is recurrent if the probability of returning to it is 1. However, we can further classify recurrent states. Even if a state is recurrent, it is possible that the expected time of returning is infinite. So while we will eventually return to the original state, this is likely to take a long, long time. The opposite behaviour is also possible --- the original state might be very attracting, and we are likely to return quickly. It turns out this distinction can affect the long-term behaviour of the chain.
First we have the following proposition, which tells us that if a state is recurrent, then we are expected to return to it infinitely many times.
\begin{thm}
Suppose $X_0 = i$. Let $V_i = |\{n \geq 1: X_n = i\}|$. Let $f_{i, i} = \P_i(T_i < \infty)$. Then
\[
\P_i (V_i = r) = f_{i, i}^r (1 - f_{i, i}),
\]
since we have to return $r$ times, each with probability $f_{i, i}$, and then never return.
Hence, if $f_{i, i} = 1$, then $\P_i(V_i = r) = 0$ for all $r$. So $\P_i(V_i = \infty) = 1$. Otherwise, $\P_i(V_i = r)$ is a genuine geometric distribution, and we get $\P_i(V_i < \infty) = 1$.
\end{thm}
\begin{proof}
Exercise, using the strong Markov property.
\end{proof}
\begin{defi}[Mean recurrence time]
Let $T_i$ be the returning time to a state $i$. Then the \emph{mean recurrence time} of $i$ is
\[
\mu_i = \E_i(T_i) =
\begin{cases}
\infty & i\text{ transient}\\
\sum_{n = 1}^\infty n f_{i, i}(n) & i\text{ recurrent}
\end{cases}
\]
\end{defi}
\begin{defi}[Null and positive state]
If $i$ is recurrent, we call $i$ a \emph{null state} if $\mu_i = \infty$. Otherwise $i$ is \emph{non-null} or \emph{positive}.
\end{defi}
This is mostly all we care about. However, there is still one more technical consideration. Recall that in the random walk starting from the origin, we can only return to the origin after an even number of steps. This causes problems for a lot of our future results. For example, we will later look at the ``convergence'' of Markov chains. If we are very likely to return to $0$ after an even number of steps, but is impossible for an odd number of steps, we don't get convergence. Hence we would like to prevent this from happening.
\begin{defi}[Period]
The \emph{period} of a state $i$ is $d_i = \gcd\{n \geq 1: p_{i, i}(n)> 0\}$.
A state is \emph{aperiodic} if $d_i = 1$.
\end{defi}
In general, we like aperiodic states. This is not a very severe restriction. For example, in the random walk, we can get rid of periodicity by saying there is a very small chance of staying at the same spot in each step. We can make this chance is so small that we can ignore its existence for most practical purposes, but will help us get rid of the technical problem of periodicity.
\begin{defi}[Ergodic state]
A state $i$ is \emph{ergodic} if it is aperiodic and positive recurrent.
\end{defi}
These are the really nice states.
Recall that recurrence is a class property --- if two states are in the same communicating class, then they are either both recurrent, or both transient. Is this true for the properties above as well? The answer is yes.
\begin{thm}
If $i \leftrightarrow j$ are communicating, then
\begin{enumerate}
\item $d_i = d_j$.
\item $i$ is recurrent iff $j$ is recurrent.
\item $i$ is positive recurrent iff $j$ is positive recurrent.
\item $i$ is ergodic iff $j$ is ergodic.
\end{enumerate}
\end{thm}
\begin{proof}\leavevmode
\begin{enumerate}