Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate ALT allele in VCF causes exception #1718

Open
bartcharbon opened this issue Oct 4, 2024 · 3 comments
Open

Duplicate ALT allele in VCF causes exception #1718

bartcharbon opened this issue Oct 4, 2024 · 3 comments
Labels

Comments

@bartcharbon
Copy link

Description of the issue:

When reading a VCF with HTSJDK it causes an exception when there is a duplicate ALT allele.
I believe duplicate ALT alleles are allowed in VCF, and are used in cases when there are multiple symbolic alleles at a certain position.
e.g. CNV:TR,CNV:TR

the exception that occurs:
The provided VCF file is malformed at approximately line number 530: Duplicate allele added to VariantContext: <CNV:TR>, for input source:input.vcf.gz

Both BcfTools and EBIvariation/vcf-validator/ read/validate the lines with the duplicate ALT without problems.

Your environment:

  • version of htsjdk: 4.1.0
  • version of java: 17
  • which OS: CentOS

Expected behaviour

The file should not cause an exception

Actual behaviour

The file should does cause an exception

@lindenb
Copy link
Contributor

lindenb commented Oct 4, 2024

I believe duplicate ALT alleles are allowed in VCF,

hum , I don't think so. How would you know which ALT allele is associated to a genotype ?

@bartcharbon
Copy link
Author

the INFO fields that are related to the ALT are required to be comma separated in the same order as the ALT's
Genotypes are associated just like any other multiallic, by index.

I cannot find any mention that ALT should be unique in VCF spec 4.2
For the 4.5 VCF draft there are even examples of this case CNV:TR,CNV:TR in the spec itself

@lbergelson lbergelson added vcf Waiting for Review This PR is waiting for a reviewer to respond SV and removed Waiting for Review This PR is waiting for a reviewer to respond labels Oct 4, 2024
@lbergelson
Copy link
Member

@bartcharbon Interesting. I believe the restriction makes sense with non symbolic alleles, even if it's not in the spec specifically. It looks like we should relax it for structural variants though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants