Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GWAS] All SNPs plotted only in first contig #146

Open
lawleyjw opened this issue Aug 23, 2023 · 2 comments
Open

[GWAS] All SNPs plotted only in first contig #146

lawleyjw opened this issue Aug 23, 2023 · 2 comments

Comments

@lawleyjw
Copy link

Hi James,

I am trying to plot GWAS data, and the GFF3 annotation file loads well showing the position of each chromosome (contig), but when I load the SNP data file it plots all SNPs in chromosome 01, apparently disregarding the information in the CHR column. The base pair positions in my annotation file resets for every chromosome, but I assumed phandango could handle that since I input chromosome information in the SNP data file. See attached annotation and SNP data files I am using to generate the Manhattan plot. I appreciate any advice.
GWAS-files.zip

Thank you very much,
Jon

@jameshadfield
Copy link
Owner

Hey Jon - yeah that seems like the contig's just being ignored. I have some housekeeping admin to do in Phandango at some point so I'll try to get to this, but it'll probably be a while until I get to it.

In the meantime, incrementing the BP column in the plot file by the prior (cumulative) chromosome length will make it work, although a bit of a pain. This python script will do that (although might want to double check I haven't got any off-by-ones):

contig_adjust = {}
counter = 0
with open('./ArME14-genes.gff3', 'r') as fh:
    for line in fh:
        if line.startswith('##sequence-region'):
            contig_adjust[line.split()[1]] = counter
            counter += int(line.split()[3])
with open('./hostGE-wg_snps.plot', 'r') as fhi:
    with open('./hostGE-wg_snps.adjusted-coordinates.plot', 'w') as fho:
        for line in fhi:
            fields = line.strip().split('\t')
            if fields[0] in contig_adjust:
                fields[2] = str(int(fields[2]) + contig_adjust[fields[0]])
            print("\t".join(fields), file=fho, )

image

Interestingly, the high (log10) values all seem to be intergenic.

@lawleyjw
Copy link
Author

Great, thanks very much for this James!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants