Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: The number of reported differential peaks is much lower than the number of originally called peaks: Why? #402

Open
eregenyi opened this issue Jun 11, 2020 · 1 comment
Assignees

Comments

@eregenyi
Copy link

Hi Taoliu,

I am comparing two transcription factor's peaks. I analyzed unique peaks, meaning peaks that had no overlap (0 overlapping bases) between the two TFs. I was also interested in differentially enriched peaks, so I followed the differential binding event wiki article (https://github.com/macs3-project/MACS/wiki/Call-differential-binding-events).

However, inspecting the results, I found that much less of the peaks are reported in the differential analysis compared to the originally called peaks. Looking at the diff_.....bed peaks in IGV, some very obvious peaks at known target promoters are missed (they appear neither among the common, nor the unique peaks).

Number of lines in TF1- and TF2_peaks.xls: ~90 000 and ~130 000
Number of lines in diff_common: 31 360
diff_TF1: 1 172
diff_TF2: 329

I kept the 'd' parameter constant.

Why is this?
Could I tweak the parameters to include more peaks?
Can I somehow use these peaks? I have strong strong feelings against using them after seeing that some obvious peaks are not reported in _common.bed.

Thank you for your help in advance

@taoliu
Copy link
Contributor

taoliu commented Jun 26, 2020

Sorry for my late reply! This is a common issue in ChIP-seq differential analysis. If you only look at 'unique peaks' that are not overlapping between conditions. Many of them may be called due to the cutoff of the original peak calling, especially when the peak significance (measured in p-value/q-value) is around the set cutoff. I think you understand this and this is your 'strong strong feeling against using them'. Regarding MACS way to call differential peaks, it won't just divide all peaks into common, diff_TF1, and diff_TF2. It won't report those in between -- which means the region is neither significantly biased to either condition or significantly 'equal'. You can try to lower the 'likelihood ratio cutoff' of bdgdiff -- macs2 bdgdiff -C 2 to set a likelihood ratio=100 instead of the default -C 3 (1000).

@taoliu taoliu self-assigned this Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants