Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in glm.score #43

Open
hkj7 opened this issue Mar 30, 2022 · 13 comments
Open

Error in glm.score #43

hkj7 opened this issue Mar 30, 2022 · 13 comments

Comments

@hkj7
Copy link

hkj7 commented Mar 30, 2022

Hello Dr Chen,

I am running the glmm.score command below. The command includes a gzipped bgen file and my linear mixed model regression (BSmodel). Both the bgen file and the linear mixed model contains IID. Since my bgen file is so big, I have gzipped the file and want to test if the first 100 rows are read in...The size of the gzipped bgen file is 8 GB. The IIDs aren't necessarily in the same order in the genetic file and the model. I am running the command below:

> glmm.score(BSmodel, infile = "~/Desktop/PROs_GWAS.bgen.gz", BGEN.samplefile = "~/Desktop/PROs_GWAS.sample", infile.nrow = 100, outfile = "glmm.score.bgen.testoutfile.txt")

The error message:

Error: cannot open gzipped file ~/Desktop/PROs_GWAS.bgen.gz
Warning message:
In glmm.score(BSmodel, infile = "~/Desktop/PROs_GWAS.bgen.gz", BGEN.samplefile = "~/Desktop/PROs_GWAS.sample",  :
  Argument select is unspecified... Assuming the order of individuals in infile matches unique id_include in obj...

I'm not sure what this error means. Does this mean I have to supply the select matrix? I assumed the IIDs would be automatically matched between the genetic file and regression model.

Please let me know if you would require any more information/corresponding data or commands. Thanks very much!

@hanchenphd
Copy link
Owner

Thank you for your interest in GMMAT! Currently, the function does not take gzipped bgen files as the input, and you would need to gunzip it to a .bgen file.

Best,
Han

@hkj7
Copy link
Author

hkj7 commented Mar 31, 2022

Hi Han,

Thank you for your response. I have unzipped the file but I still get an error reading in the file. I'm not sure whether its the gen.file and samplefile command that's causing it. I have put the names of the files in the command below:

> geno.file <- system.file("extdata", "PROs_GWAS.bgen", package = "GMMAT")
> samplefile <- system.file("extdata", "PROs_GWAS.sample", package = "GMMAT")

> glmm.score(BSmodel, infile = "~/Desktop/PROs_GWAS.bgen", BGEN.samplefile = "~/Desktop/PROs_GWAS.sample", infile.nrow = 100, outfile = "glmm.score.bgen.testoutfile.txt")

Error reading BGEN file: ~/Desktop/PROs_GWAS.bgen

@hanchenphd
Copy link
Owner

Were you able to run any analyses using this BGEN file with a different software program (e.g. PLINK2)?

@hkj7
Copy link
Author

hkj7 commented Apr 3, 2022

Hi Chen,

Thanks for your response. I converted the pgen file to bgen file by using the command. I also filtered the SNPs via MAF score and imputation score.

#!/bin/bash
#PBS -N Imputation 
#PBS -l walltime=06:00:00
#PBS -l nodes=1:ppn=8
#PBS -l vmem=16gb
#PBS -m bea
#PBS -M email


i=$PBS_ARRAYID
cd /data/genome/PROs_GWAS

./plink2 --threads 8 --pfile output_init_PROs --extract extract0.3.txt --maf 0.05 --export bgen-1.2 --out PROs_GWAS

I was able to use the pgen file to run other analyses but have not tried with bgen.

@hanchenphd
Copy link
Owner

Please export to bgen-1.3 and let me know if it works or not.

Thanks,
Han

@hkj7
Copy link
Author

hkj7 commented Apr 4, 2022

Hi Han,

Thank you for your response. I have exported to bgen 1.3 and still same error:

> geno.file <- system.file("extdata", "PROs_GWAS_1.3.bgen", package = "GMMAT")
> samplefile <- system.file("extdata", "PROs_GWAS_1.3.sample", package = "GMMAT")
> glmm.score(BSmodel, infile = "~/Desktop/PROs_GWAS_1.3.bgen", BGEN.samplefile = "~/Desktop/PROs_GWAS_1.3.sample", infile.nrow = 100, outfile = "glmm.score.bgen.testoutfile.txt")
Error reading BGEN file: ~/Desktop/PROs_GWAS_1.3.bgen

@hanchenphd
Copy link
Owner

Can you send me a simulated reproducible example? I will take a look.

@hkj7
Copy link
Author

hkj7 commented Apr 6, 2022

Dear Dr Chen,

Thank you for your response. I have figured out the problem as I made a stupid error and my file was not stored in the right directory. However, I am getting a new error saying:

Warning in glmm.score(BSmodel, infile = "PROs_GWAS_1.3.bgen", BGEN.samplefile = "~/Desktop/PROs_GWAS_1.3.sample",  :
  Check your data... Some id_include in obj are missing in BGEN.samplefile!
Error in glmm.score(BSmodel, infile = "PROs_GWAS_1.3.bgen", BGEN.samplefile = "~/Desktop/PROs_GWAS_1.3.sample",  : 
  Error: id_include in obj does not match sample.id in BGEN.samplefile!

My mixed linear regression model includes all patients with genotyped data and therefore exactly matches patient IDs in the bgen/sample file but of course in different orders. However, my bgen file/bgen sample file contains two IDs: FID and IID.

My sample bgen file looks like this with 1931 patient IDs. The first ID is the FID and the second ID is the IID. E.g. 1032 is FID and IID is 468768 for one individual.

ID_1 ID_2 missing sex
0 0 0 D
1032 468768 0 2
1405 468769 0 2
1564 468770 0 2
1610 468771 0 2
998 468774 0 2
975 468775 0 2
1066 468776 0 2
1038 468778 0 2

The dataframe for my linear regression model is in long format and includes patient IID and the list of covariates ...

 IID age   bmi smoking chemo bed_breast_late    etc...
470502  62 29.00       1     0           75.69       
470502  62 29.00       1     0           75.69       
470502  62 29.00       1     0           75.69     
470502  62 29.00       1     0           75.69    
470514  47 21.72       1     0           75.69      
470514  47 21.72       1     0           75.69      

All the IIDs in the dataframe above matches IIDs in the sample bgen. I thought the software would ignore the FIDs in my genetic file?

Any help you could provide would be much appreciated.

@hanchenphd
Copy link
Owner

I guess your BGEN 1.3 file should already include a single identifier (see the BGEN format), so you probably don't need BGEN.samplefile?

Best,
Han

@hkj7
Copy link
Author

hkj7 commented Apr 7, 2022

Hi Han,

I've tried running without sample file

geno.file <- system.file("extdata", "PROs_GWAS_1.3.bgen", package = "GMMAT")
glmm.score(BSmodel, infile = "PROs_GWAS_1.3.bgen", outfile = "glmm.score.bgen.testoutfile.txt")

I still get the same error:

Warning in glmm.score(BSmodel, infile = "PROs_GWAS_1.3.bgen", outfile = "glmm.score.bgen.testoutfile.txt") :
  Check your data... Some id_include in obj are missing in sample.id of infile!
Error in glmm.score(BSmodel, infile = "PROs_GWAS_1.3.bgen", outfile = "glmm.score.bgen.testoutfile.txt") : 
  Error: id_include in obj does not match sample.id in infile!

Is this because my genetic file contains 2 IDs? But my regression model only contains one ID?

@hanchenphd
Copy link
Owner

Hello,

If you are not using the family ID in this analysis, could you please create a fake sample file that shows both ID_1 and ID_2 as individual ID? If your null model included the individual ID, then they should be automatically matched to the genotype file. Let me know if it fixes the problem or not.

Thanks,
Han

@hkj7
Copy link
Author

hkj7 commented Apr 8, 2022

Hi Han,

Thanks for your response. Just to confirm, I should implement the changes to the genetic file (take out FID and repeat IID twice ) and then recreate the bgen file and sample bgen file and re-run again?

Thanks

@hanchenphd
Copy link
Owner

I don't think you need to create the bgen file again. You can use BGEN.samplefile to overwrite the FID with IID (if that is what you used in your null model).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants