iPool-Seq Pipeline

An analysis pipeline for insertion pool sequencing data

Description

The iPool-Seq (Uhse et al. 2018, Uhse et al.2019) protocol enables large-scale insertional mutagenesis screens of pathogens such as U. maydis (maize smut). It uses a combination of tagmentation, affinity purification and unique molecular identifiers (UMIs) to overcome the problem of the genetic material of the pathogen being severely underrepresented within the host, and allows mutant abundances to be quantified via next-generation sequencing (NGS) accurately enough to detect differences in infection efficiacy between mutants and wildtypes.

Apart from the wet-lab protocol, achieving this level of accuracy requires carefull analysis of the sequencing data to remove artifacs and to deal with differences of mutant abundances in the pre-infection mutant pool, with mutant-specific PCR biases, and different sequencing depths and detection efficiencies between different libraries.

Our iPool-Seq analysis pipeline is based on the TRUmiCount algorithm (Pflug et al. 2018) for the quantitative analysis of UMI data, and takes are of all steps of the analysis of iPool-Seq data. From from raw sequencing reads it computes the differential virulences of the mutants in the pre-infection pool compared to a set of reference mutants.

Using the Pipeline

Downloading from

Download the latest release of the iPool-Seq analysis pipeline, and unzip it. On a Linux terminal, this is achieved with
VER=latest-release
URL=http://github.com/Cibiv/ipoolseq-pipeline/archive
curl -L -O $URL/$VER.tar.gz
tar xzf $VER.tar.gz
cd ipoolseq-pipeline-$VER

Installing a environment containing all necessary dependencies

The file environment.yaml defines a environment that provides all programs necessary for running the iPool-Seq analysis pipeline. To ensure reproducibility of that environment even if Conda packages are replaced and removed, our source code repository also contains environment.tar.gz, a conda-pack archive of that environent. To unpack that environment into ./environment and make it usable, run
./install-environment.sh
Remember that (as all conda environments), this environment must, before it can be used, be activated for the current terminal session by doing
source ./environment/bin/activate

Testing the installation

The iPool-Seq protocol was introduced by Uhse et al. To download and analyse their experiment A1 with the iPool-Seq pipeline, run
snakemake data/Uhse_et_al.2018/expA.r1.dv.tab
The pipeline will generate the table data/Uhse_et_al.2018/expA.r1.dv.tab containing the results of the differential virulence analysis for the mutants screened by Uhse et al, and produces an accompanying report data/Uhse_et_al.2018/expA.r1.dv.html that can be viewed with a web browser.

Analyzing your own data

See our publication (Uhse et al., 2019) in Current Protocols in Plant Biology that describes both the web-lab and the data-analysis parts of iPool-Seq in detail, and includes a step-by-step description of how to use this pipeline.

For a brief overview of the necessary input files, run

snakemake help

Publications

Simon Uhse, Florian G. Pflug, Arndt von Haeseler, Armin Djamei (2019). Insertion pool sequencing for insertional mutant analysis in complex host-microbe interactions. Current Protocols in Plant Biology 4: e20097. DOI: 10.1002/cppb.20097

Simon Uhse, Florian G. Pflug, Stirnberg Alexandra, Ehrlinger Klaus, Arndt von Haeseler, Armin Djamei (2018). In vivo insertion pool sequencing identifies virulence factors in a complex fungal–host interaction. PLoS Biology 16(4): e2005129. DOI: 10.1371/journal.pbio.2005129

Florian G. Pflug, Arndt von Haeseler (2018). TRUmiCount: correctly counting absolute numbers of molecules using unique molecular identifiers. Bioinformatics Volume 34, Issue 18, 15 September 2018, Pages 3137–3144. DOI: 10.1093/bioinformatics/bty283

License

The iPool-Seq pipeline is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

The iPool-Seq pipeline is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.