EMOGEE and EMOGEE Tools (Last Update: 04.07.2007)
Introduction:
Recent studies describe
that the level of gene expression between species is positively
correlated with the time that has passed since the species split
from a common ancestor (Ranz et al., 2006). Moreover,
Khaitovich et al. (2004) found a linear relationship between
divergence time and expression differences. This linearity can be
explained by the neutral theory (Kimura, 1983). Consequently,
a neutral model for gene expression evolution was suggested
(Khaitovich et al., 2005). The model describes mutations in the
regulatory region of a gene by a compound Poisson process. The
strength of changes in the expression level is described by a
continuous distribution which is called here mutation effect
distribution. That is, whenever a mutation occurs, the gene
expression level changes according to the mutation effect
distribution.
The EMOGEE package implements the model by Khaitovich et al. (2005) and several extensions of that model.
In a first extension a gamma distribution is
used to describe mutation effects which is more flexible than the
distributions used in the original model (M-gamma model). In a second extension,
non-mutational effects are taken into account (M&E model). These effects
(e.g., metabolism and environmental effects) overlay mutational
changes of gene expression. To describe them a new parameter is
introduced which provides a better fit to real data. This
makes it possible to estimate influences of mutational and
non-mutational changes of the gene expression level.
According to a variant of M&E model using normal distributed mutation effects (M&E-normal model), two applications were implemented. They are located in the EMOGEE Tools package.
The first application is a Bayesian method to detect genes with mutations in their
regulatory regions. The second one is a non-neutrality
test which can be applied to gene expression data
sampled from individuals of a population. Based on this test one
can detect those genes that show a significant deviation from
expression levels under neutrality. The test is an adaptation of
the widely used Tajima's D test (Tajima, 1989). Before using the Bayesian method or the Tajima-type test, it is necessary to estimate the model parameters
of the corresponding
data with EMOGEE. The respective results have to be fed into
the configuration file of EMOGEE Tools. Please see the manual for more details.
Reference:
The methods are described in the following PhD thesis:
If you are using EMOGEE or EMOGEE Tools, please cite this thesis.
Download of the software:
Data sets:
The following data sets were analysed in the thesis "Development and Applications of Neutral Models for Evolution of Gene Expression" and can be used directly with EMOGEE and EMOGEE Tools, respectively. The bracketed numbers signify the class labels of the individuals in the data sets. Please note that all data sets are preprocessed versions of published data. Please refer to the thesis for details and references.
Data of human (1), chimpanzee (2) and orangutan (3) (Chapter 3):
Data of human (1) and chimpanzee (2) (Chapter 4):
Data of human (1) and chimpanzee (2) without sex-related genes (Chapter 5):
Data of mus musculus domesticus (1), mus musculus musculus (2), mus musculus ssp (3), mus musculus castaneus (4) and mus spretus (5) (Chapter 3, chapter 4):
(for details on the mice data please read Voolstra C., Tautz D., Farbrother P., Eichinger L., Harr B. (2007)
Contrasting evolution of expression differences in the testis between species and subspecies of the house mouse,
Genome Research, 17, 42-49)
Medical data sets from adeno carcinomas (1) vs squamous cell carcinomas (2) and normal bone marrow (1) vs chronic myeloid leukaemia (2) (Chapter 6):