Q: The number of reported differential peaks is much lower than the number of originally called peaks: Why? #402

eregenyi · 2020-06-11T11:49:12Z

Hi Taoliu,

I am comparing two transcription factor's peaks. I analyzed unique peaks, meaning peaks that had no overlap (0 overlapping bases) between the two TFs. I was also interested in differentially enriched peaks, so I followed the differential binding event wiki article (https://github.com/macs3-project/MACS/wiki/Call-differential-binding-events).

However, inspecting the results, I found that much less of the peaks are reported in the differential analysis compared to the originally called peaks. Looking at the diff_.....bed peaks in IGV, some very obvious peaks at known target promoters are missed (they appear neither among the common, nor the unique peaks).

Number of lines in TF1- and TF2_peaks.xls: ~90 000 and ~130 000
Number of lines in diff_common: 31 360
diff_TF1: 1 172
diff_TF2: 329

I kept the 'd' parameter constant.

Why is this?
Could I tweak the parameters to include more peaks?
Can I somehow use these peaks? I have strong strong feelings against using them after seeing that some obvious peaks are not reported in _common.bed.

Thank you for your help in advance

taoliu · 2020-06-26T20:23:29Z

Sorry for my late reply! This is a common issue in ChIP-seq differential analysis. If you only look at 'unique peaks' that are not overlapping between conditions. Many of them may be called due to the cutoff of the original peak calling, especially when the peak significance (measured in p-value/q-value) is around the set cutoff. I think you understand this and this is your 'strong strong feeling against using them'. Regarding MACS way to call differential peaks, it won't just divide all peaks into common, diff_TF1, and diff_TF2. It won't report those in between -- which means the region is neither significantly biased to either condition or significantly 'equal'. You can try to lower the 'likelihood ratio cutoff' of bdgdiff -- macs2 bdgdiff -C 2 to set a likelihood ratio=100 instead of the default -C 3 (1000).

eregenyi added the General Question label Jun 11, 2020

taoliu self-assigned this Jun 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q: The number of reported differential peaks is much lower than the number of originally called peaks: Why? #402

Q: The number of reported differential peaks is much lower than the number of originally called peaks: Why? #402

eregenyi commented Jun 11, 2020

taoliu commented Jun 26, 2020

Q: The number of reported differential peaks is much lower than the number of originally called peaks: Why? #402

Q: The number of reported differential peaks is much lower than the number of originally called peaks: Why? #402

Comments

eregenyi commented Jun 11, 2020

taoliu commented Jun 26, 2020