Test data sets

There is no existing tools to benchmark mHapTools, we have manually checked random regions to make sure our implementation is correct. In principle, CpG site methylation derived from BAM file should be highly correlated, if not identical, with that from mHap file. We thus tested concordance of mean methylation on three samples, which were sequenced as single-end or paired-end, and were processed by Bismark and Bsmap. The test datasets and associated results are available here. Fastq files for two samples (HUES64 and HCT116) were downloaded from GEO and processed in house. BAM and CpG bedMethylation files for Adrenal Gland were downloaded from ENCODE and processed by mHapTools.

Data sets

HUES64 HC-T116 Fetal Adrenal Gland
Library Single-end Paired-end Paired-end
Source GEO GEO ENCODE
Fastq SRX759486.fastq.gz SRX999983_1.fastq.gz SRX999983_2.fastq.gz ENCLB358OMC
BAM HUES64_BSMAP HUES64_BISMARK HCT116_BSMAP HCT116_BISMARK ENCFF285ZGH
CpG methylation (Bismark/Bsmap) HUES64_BSMAP_CpGs HUES64_BISMARK_CpGs HCT116_BSMAP_CpGs HCT116_BISMARK_CpGs ENCFF826PSA
mHap HUES64_BSMAP_mHap HUES64_BISMARK_mHap HCT116_BSMAP_mHap HCT116_BISMARK_mHap Adrenal_Gland_mHap
CpG methylation (mHapTools) HUES64_BSMAP_mHap_CpG HUES64_BISMARK_mHap_CpG HCT116_BSMAP_mHap_CpG HCT116_BISMARK_mHap_CpG Adrenal_Gland_mHap_CpG

Concordance

Using demo samples (HUES64 and HCT116), we have compared CGI-level mean methylation which were calculated by aligners and mHapTools. In general, good correlation is observed when reads were alinged by BSMAP and BISMARK.