Differential Virulence Report for expA.r1 (Uhse_et_al.2018)

1. Files
2. Quality Control
3. Results

1. Files

The following table lists the main input files used to compute differential virulences, and the name of the generated results table. The results table lists, for each knockout, the change in virulence (as a log2 fold change of normalized output abundance relative to the neutral reference set), p-values, and FDR-corrected (using the method of Benjamini-Hochberg) q-values for the significance of the change.

	file	report
list of insertional knockouts (mutants)	cfg/Uhse_et_al.2018/knockouts.gff
knockout abundances in (pre-infection) input pool	expA.r1-in.count.tab	TRUmiCount report
knockout abundances in (post-infection) output pool	expA.r1-out.count.tab	TRUmiCount report
differential virulence results table	expA.r1.dv.tab

2. Quality Control

Sequencing and Trimming

FastQC reports for the trimmed reads. There should be little or no remaining adapter content, and per-base qualities should not drop too much at the end of the reads, otherwise further trimming might be required.

	Input pool	Output pool
Read 1	FastQC report	FastQC report
Read 2	FastQC report	FastQC report

Read and UMI count statistics

The number of remaining read pairs and the number of unique UMIs within these pairs after each analysis step. The percentages are relative to the first number within each column. Since unique UMIs are determined for the 5’ and 3’ flank of each knockout, no UMI counts are reported until reads have been assigned to the individual knockouts (and their flanks).

Also check the TRUmiCount report (found above under Files) for details about the performance of the TRUmiCount UMI filtering step.

	#Reads Input	#UMIs Input	#Reads Output	#UMIs Output
Sequencing	2085955 (100%)	-	2306739 (100%)	-
Trimming	1947564 (93%)	-	2157059 (94%)	-
Mapping	1835272 (88%)	-	2013912 (87%)	-
Assignment	1780327 (85%)	1690649 (100%)	1949356 (85%)	1621117 (100%)
TRUmiCount	1700374 (82%)	1571493 (93%)	1836147 (80%)	1478861 (91%)

Correlation of 3’ and 5’ Flank Abundances

Since the abundance of each knockout is measured twice, once for the 5’ and once for the 3’ flank of the knockout cassette integration, the correlation of these two measurements provides a quality check of the data. The following table lists the correlation of both the raw, and the TRUmiCount-corrected UMI counts detected for the 5’ and 3’ flanks of each knockout.

Pool	Type	Correlation 5’ vs. 3’
input	Raw (after UMI-Tools and read-count threshold)	0.769
output	Raw (after UMI-Tools and read-count threshold)	0.777
input	Loss-corrected (after TRUmiCount)	0.912
output	Loss-corrected (after TRUmiCount)	0.874

The following plots show the correlation of 5’ and 3’ UMI counts (again raw and loss-corrected by TRUmiCount) in more detail.

(Zero detected UMIs are shown in these logarithmic plots with a cross “x” instead of an open circle “o”)

Correlation of Input and Output Abundances

In iPool-Seq-based screens, the abundances of the different knockout strains are often spread over multiple orders of magnitude, and differences in input abundances can thus affect the abundance in the output more strongly than the mutant’s phenotypes. The statistical model used to detect significant changes of virulence must thus take the input abundance into account, and assumes that for neutral knockouts, input and output abundances are proportional.

The following table shows the observed correlation of (loss-corrected) input and output abundances (averaged across the 5’ and 3’ flank measurements)

Type	Correlation Input vs. Output
Raw (after UMI-Tools and read-count threshold)	0.950
Loss-corrected (after TRUmiCount)	0.954

The following plots show the correlation of input and output abundances in more detail.

(Zero detected UMIs are shown in these logarithmic plots with a cross “x” instead of an open circle “o”)

3. Results

Statistical Model and Parameters

The (raw) output UMI count \(N_\text{out}\) of a neutral knockout given its (raw) input UMI count \(N_\text{in}\) and TRUmiCount-estimated loss (i.e. fraction of lost UMIs) \(\ell_\text{in}\) for the input and \(\ell_\text{out}\) for the output is modelled with the following negative binomial model

\[ N_\text{out} \,|\, N_\text{in} \;\sim\; \text{NegBin}\left(\mu:=N_\text{in}\cdot\lambda\cdot\frac{1-\ell_\text{out}}{1-\ell_\text{in}}, r:=\frac{N_\text{in}}{1+d\cdot N_\text{in}}\right). \]

(UMI counts of 5’ and 3’ flanks are added to obtain \(N_\text{in}\) and \(N_\text{out}\), losses for 5’ and 3’ flanks are averaged to obtain \(\ell_\text{in}\) and \(\ell_\text{out}\)). Parameter \(\lambda\) measures the relative size of the output library (i.e. loss-corrected total number of UMIs per neutral knockout) compared to the input. Parameter \(d\) represents the biological contribution to the squared coefficient of variation of the (raw) output UMI counts \(N_\text{out}\). The total squared coefficient of variation \(\text{CV}^2 = \sigma^2 / \mu^2\) of \(N_\text{out}\) (where \(\mu\) is the mean and \(\sigma^2\) the variance of \(N_\text{out}\)) comprises three contributors,

\[ \text{CV}^2 = \frac{1}{\mu} + \frac{1}{N_\text{in}} + d, \]

where the first two are of technical nature and represent the variation due to (Poissonian, i.e. non-exhaustive) sampling of genomes in the input (\(1/N_\text{in}\)) and output (\(1/\mu\)) pools, and the third (\(d\)) represents the biological variation due to growth differences between neutral mutants.

The optimal (likelihood-maximazing) values for parameters \(\lambda\) and \(d\) for the reference set of neutral knockouts are:

Relative output library size (lambda)	Biological contribution to squared CV (d)
0.552	0.0182

Differential virulence compared to the neutral reference set

For every knockout strain detected in both input and output with (raw) UMI counts \(N_\text{in}\) respectively \(N_\text{out}\) (counts of 5’ and 3’ flanks added) and TRUmiCount-estimated losses (i.e. fraction of lost UMIs) \(\ell_\text{in}\) and \(\ell_\text{out}\) (losses for 5’ and 3’ flanks averaged), the virulence log2 fold-change compared to the neutral reference set is

\[ \log_2 \Delta v = \log_2 \frac{N_\text{out} / (1-\ell_\text{out})}{\lambda\cdot N_\text{in} / (1-\ell_\text{in})}. \]

(\(\lambda\) is the relative size of the outpout library compared to the input library, see Statistical Model and Parameters)

Due to the negative binomial model (see Statistical Model and Parameters), the power to detect changes in virulence increases with the input abundance of a knockout. The following plot shows the change in virulence (as log2 fold-change) against the input abundance, and indicates which knockout have a virulence significantly different from the neutral reference set at a false discovery rate (FDR) threshold of 0.05.

(The grey area of insignificance virulence changes is only approximate – it does not take any false discovery rate (FDR) or multiple testing correction into account, and it is computed from an average loss, instead of the per-knockout loss percentage computed by TRUmiCount. Knockouts are called significant or insignificant based on a FDR threshold of 0.05)

Reduced virulence compared to the neutral reference set

The following table lists p-values and (FDR-corrected) q-values for the virulence log2 fold-change being significantly smaller than zero, i.e. for the virulence being reduced compared to the neutral reference set

(log2fc contains the log2 fold-changes of the virulence compared to the neutral reference set, pval contains p-values for the significance of log2fc < 0 under a negative binomial model, qval contains FDR-corrected p-values, in sig ***** means qval <= 0.00001, **** means qval <= 0.0001, *** means qval <= 0.001, ** means qval <= 0.01, * means qval <= 0.05, + means qval <= 0.1. By default only knockouts with qval <= 0.1 are shown, remove the column filter to show all)

Increased virulence compared to the neutral reference set

The following table lists p-values and (FDR-corrected) q-values for the virulence log2 fold-change being significantly larger than zero, i.e. for the virulence being increased compared to the neutral reference set

(log2fc contains the log2 fold-changes of the virulence compared to the neutral reference set, pval contains p-values for the significance of log2fc > 0 under a negative binomial model, qval contains FDR-corrected p-values, in sig ***** means qval <= 0.00001, **** means qval <= 0.0001, *** means qval <= 0.001, ** means qval <= 0.01, * means qval <= 0.05, + means qval <= 0.1. By default only knockouts with qval <= 0.1 are shown, remove the column filter to show all)