DescriptionMotivation: Counting molecules using next-generation sequencing (NGS) suffers from PCR amplification bias, which reduces the accuracy of many quantitative NGS-based experimental methods such as RNA-Seq. This is true even if molecules are made distinguishable using unique molecular identifiers (UMIs) before PCR amplification, and distinct UMIs are counted instead of reads: Molecules that are lost entirely during the sequencing process will still cause under-estimation of the molecule count, and amplification artifacts like PCR chimeras create phantom UMIs and thus cause over-estimation.
TRUmiCount uses a mechanistic model of PCR amplification to correct for both
types of errors. In our
paper we demonstrate that the phantom-filtered and loss-corrected molecule counts
computed by TRUmiCount measure the true number of molecules with considerably
higher accuracy than the raw number of distinct UMIs.
PublicationTRUmiCount is described in detail in our paper:
Florian G. Pflug, Arndt von Haeseler. (2018). TRUmiCount: Correctly counting absolute numbers of molecules using unique molecular identifiers. Bioinformatics, DOI: 10.1093/bioinformatics/bty283.
If you use TRUmiCount, please cite this publication!
AvailabilityConda, you can install TRUmiCount from the Bioconda channel by doing
conda install -c bioconda trumicount
For more detailed instruction and other installation options see the manual.
Sourecode onThe code is avaiable on https://github.com/Cibiv/trumicount.git
Datasetskv_1000g.bam: Mapped single-end reads for first 1000 genes from replicate 1 of the D. melanogaster data by Kivioja et al. (Counting absolute numbers of molecules using unique molecular identifiers. Nature Methods 9, 72-74, 2011)
sh_100g.bam: Mapped reads for first 100 genes from replicate 1 of the E. coli data by Shiroguchi et al. (Shiroguchi et al. Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes. PNAS 109, 1347-1352, 2012)
Running TRUmiCountTo run TRUmiCount on the example data file kv_1000g.bam, the firs must first be downloaded, and index with samtools by doing:
curl -O https://cibiv.github.io/trumicount/kv_1000g.bam samtools index kv_1000g.bam
This indexed BAM-File can then be processed with trumicount to produce a table containing bias-corrected numbers of transcript molecules for each gene (kv_1000g.tab)
trumicount --input-bam kv_1000g.bam --molecules 2 --threshold 2 --genewise-min-umis 3 --output-counts kv_1000g.tab
For a brief list of command-line options of TRUmiCount see
--help, for an in-depth description of the possible operating modes,
input and output formats and command-line options see the manual
TRUmiCount is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.