The
CIBIV wants to understand the
processes that have shaped the genomes of contemporary species.
To this end we apply methods from statistics, computer sciences,
mathematics and computational statistics to develop models that
mimic the process of evolution.
These methods are further investigated in close collaboration with
"wet" biologists to address real biological questions.
Currently we are working (in collaboration with various colleagues) on
the following aspects of molecular evolution:
- Alignments
Statistics of sequence alignment (i.e. mcmcalgn).
Recently we have extended this approach to reconstruct an alignment and
a phylogenetic tree simultaneously.
- Sequence evolution
To understand sequence evolution it is necessary to model the substitution
process. We are working on models sequence that allow dependencies among
sequence sites (Markov fields seem to be an appropriate tool).
We are developing test statistics to select the "best" model, to detect
groups of sequence that evolve differently form the rest of a gene family,
say. We have developed a test to detect change points (branches where the
substitution model changes) in a phylogenetic tree.
Currently we are working on methods to detect the dependency structure
among sequence positions in an alignment.
- Gene trees
We develop efficient heuristic algorithms to reconstruct trees based
on sequence data (i.e. TREE-PUZZLE).
To this end we have developed parallel TREE-PUZZLE program.
Moreover, we are currently developing a variant of TREE-PUZZLE,
which computes (maximum) likelihood trees for up to 1,000 sequences
in reasonable time. We are also working on super tree methods to
merge different gene trees to form one species tree.
Quartet based tree reconstruction method appear as a versatile tool
to study super trees from a new perspective.
- Population genetics
Gene trees appear in a natural context also in populations, here, however,
the gene tree in a population is a random variable if a sample of sequences
is drawn from the population. We are interested in the development and
application of coalescence based methods to infer the demographic history
of populations. In the future we plan to work on coalescence processes with
complex interactions patterns. In this context we have constructed the so
called hvrbase, where currently most
of the hypervariable regions from the mitochondrial genome from primates
are collected in a multiple sequence alignment. This user friendly database
is currently extended to store other genomic regions.
- Complex pattern of evolution
To reconstruct the evolutionary history it is necessary to take more
complex events like lateral gene transfer (between species),
gene duplication, and gene loss into account. A combination of these
events may disturb the relation between species trees and gene trees.
Recently, we have developed a maximum likelihood based method to estimate
the amount of gene flow among prokaryotes by analyzing the COG database.
This full genome analysis poses a collection of new computational problems
as well as modeling problems. Our "Jukes Cantor" type of modeling gene
transfer needs refinements. Moreover, we have to take into account
duplication and losses of genes. This will be done in the next future.
- Species tree
The topics outlined above will eventually be employed to reconstruct
one gigantic species tree utilizing all the sequence data available for
the different species. Models of sequence evolution are necessary to
detect differently evolving regions in complete genomes. Tree
reconstruction methods for a large number of sequences allow the
reconstruction of gene trees with several hundred sequences, and finally
the patchiness of the available sequence data for different species
makes it necessary to apply super tree methods. A better understanding
of complex evolutionary patterns will also reveal instances where the gene
trees are different from the species tree. Once this is well understood
it seems reasonable to construct a sequenced based tree of life.