TRUmiCount

Correctly counting molecules using unique molecular identifiers

Description

Motivation: Counting molecules using next-generation sequencing (NGS) suffers from PCR amplification bias, which reduces the accuracy of many quantitative NGS-based experimental methods such as RNA-Seq. This is true even if molecules are made distinguishable using unique molecular identifiers (UMIs) before PCR amplification, and distinct UMIs are counted instead of reads: Molecules that are lost entirely during the sequencing process will still cause under-estimation of the molecule count, and amplification artifacts like PCR chimeras create phantom UMIs and thus cause over-estimation.

Results: TRUmiCount uses a mechanistic model of PCR amplification to correct for both types of errors. In our paper we demonstrate that the phantom-filtered and loss-corrected molecule counts computed by TRUmiCount measure the true number of molecules with considerably higher accuracy than the raw number of distinct UMIs.

Publication

TRUmiCount is described in detail in our paper:

Florian G. Pflug, Arndt von Haeseler. (2018). TRUmiCount: Correctly counting absolute numbers of molecules using unique molecular identifiers. Bioinformatics, DOI: 10.1093/bioinformatics/bty283.

If you use TRUmiCount, please cite this publication!

Availability

Install from

If you're already using Conda, you can install TRUmiCount from the Bioconda channel by doing
conda install -c bioconda trumicount

For more detailed instruction and other installation options see the manual.

Sourecode on

The code is avaiable on https://github.com/Cibiv/trumicount.git

Usage Examples

Datasets

kv_1000g.bam: Mapped single-end reads for first 1000 genes from replicate 1 of the D. melanogaster data by Kivioja et al. (Counting absolute numbers of molecules using unique molecular identifiers. Nature Methods 9, 72-74, 2011)

sh_100g.bam: Mapped reads for first 100 genes from replicate 1 of the E. coli data by Shiroguchi et al. (Shiroguchi et al. Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes. PNAS 109, 1347-1352, 2012)

Running TRUmiCount

To run TRUmiCount on the example data file kv_1000g.bam, the firs must first be downloaded, and index with samtools by doing:
curl -O https://cibiv.github.io/trumicount/kv_1000g.bam
samtools index kv_1000g.bam

This indexed BAM-File can then be processed with trumicount to produce a table containing bias-corrected numbers of transcript molecules for each gene (kv_1000g.tab)

trumicount --input-bam kv_1000g.bam --molecules 2 --threshold 2 --genewise-min-umis 3 --output-counts kv_1000g.tab

For a brief list of command-line options of TRUmiCount see trumicount --help, for an in-depth description of the possible operating modes, input and output formats and command-line options see the manual

License

TRUmiCount is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

TRUmiCount is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.