EMOGEE and EMOGEE Tools (Last Update: 04.07.2007)

Introduction:

Recent studies describe that the level of gene expression between species is positively correlated with the time that has passed since the species split from a common ancestor (Ranz et al., 2006). Moreover, Khaitovich et al. (2004) found a linear relationship between divergence time and expression differences. This linearity can be explained by the neutral theory (Kimura, 1983). Consequently, a neutral model for gene expression evolution was suggested (Khaitovich et al., 2005). The model describes mutations in the regulatory region of a gene by a compound Poisson process. The strength of changes in the expression level is described by a continuous distribution which is called here mutation effect distribution. That is, whenever a mutation occurs, the gene expression level changes according to the mutation effect distribution.

The EMOGEE package implements the model by Khaitovich et al. (2005) and several extensions of that model. In a first extension a gamma distribution is used to describe mutation effects which is more flexible than the distributions used in the original model (M-gamma model). In a second extension, non-mutational effects are taken into account (M&E model). These effects (e.g., metabolism and environmental effects) overlay mutational changes of gene expression. To describe them a new parameter is introduced which provides a better fit to real data. This makes it possible to estimate influences of mutational and non-mutational changes of the gene expression level.

According to a variant of M&E model using normal distributed mutation effects (M&E-normal model), two applications were implemented. They are located in the EMOGEE Tools package. The first application is a Bayesian method to detect genes with mutations in their regulatory regions. The second one is a non-neutrality test which can be applied to gene expression data sampled from individuals of a population. Based on this test one can detect those genes that show a significant deviation from expression levels under neutrality. The test is an adaptation of the widely used Tajima's D test (Tajima, 1989). Before using the Bayesian method or the Tajima-type test, it is necessary to estimate the model parameters of the corresponding data with EMOGEE. The respective results have to be fed into the configuration file of EMOGEE Tools. Please see the manual for more details.

Reference:

The methods are described in the following PhD thesis: If you are using EMOGEE or EMOGEE Tools, please cite this thesis.

Download of the software:

Data sets:

The following data sets were analysed in the thesis "Development and Applications of Neutral Models for Evolution of Gene Expression" and can be used directly with EMOGEE and EMOGEE Tools, respectively. The bracketed numbers signify the class labels of the individuals in the data sets. Please note that all data sets are preprocessed versions of published data. Please refer to the thesis for details and references.
Data of human (1), chimpanzee (2) and orangutan (3) (Chapter 3):
Data of human (1) and chimpanzee (2) (Chapter 4):
Data of human (1) and chimpanzee (2) without sex-related genes (Chapter 5):
Data of mus musculus domesticus (1), mus musculus musculus (2), mus musculus ssp (3), mus musculus castaneus (4) and mus spretus (5) (Chapter 3, chapter 4):
(for details on the mice data please read Voolstra C., Tautz D., Farbrother P., Eichinger L., Harr B. (2007) Contrasting evolution of expression differences in the testis between species and subspecies of the house mouse, Genome Research, 17, 42-49)
Medical data sets from adeno carcinomas (1) vs squamous cell carcinomas (2) and normal bone marrow (1) vs chronic myeloid leukaemia (2) (Chapter 6):