I'm starting from the output that derived from bubblepop:
Paths | pos1 | pos2 | pos3 |
---|---|---|---|
pathx | A | T | A |
pathy | A | T | T |
pathz | - | - | A |
Where number of sequences is number of rows in a matrix and number of segregation sites number of the columns in a matrix. The next step is calculate the total number of pairwise differences observed between all sequences.
Total number of paiwise differences on VG:
- With itertools I get all possible combination of paths as pairwise.
Combinations | Value |
---|---|
x,y | ('A,T,A', 'A,T,T') |
x,z | ('A,T,A', '-,-,A') |
y,z | ('A,T,T', '-,-,A') |
-
I check each tuple value with each next tuple value. If the value is the same I put True otherwise I put False.
-
Count_differences is: count how many False there are, there is the number of differences as pairwise. For example--> x, y = (True, True, False), count of false is 1.