DNA methylation haplotype format and mHapTools

Abstract

Bisulfite sequencing (BS-seq) is the gold-standard for measuring genome-wide DNA methylation profile at single nucleotide resolution. Most analysis focuses on mean CpG methylation and ignores methylation status on the same DNA fragment, which is also called DNA methylation haplotypes. Here, we propose mHap, a simple DNA methylation haplotype format for storing DNA methylation BS-seq data. It reduces the size of a BAM file by 40-140-fold while keeps all read-level CpG methylation information. It’s also compatible with the Tabix tool for fast and random access. We implemented a command-line tool mHapTools for converting BAM/SAM files from existing platforms to mHap files as well as postprocessing DNA methylation in mHap format. With this tool, we pro-cessed all publicly available human reduced representation bisulfite sequencing (RRBS) data and provided as a comprehensive mHap database.

Note: The start and end columns in the mHap format represent the genomic coordinates of the first and last CpG sites in each haplotype. They have been updated here to correct an error in the online publication.