MISFITS: Evaluating the goodness of fit between a phylogenetic model and an alignment.
NEWS
- April 21, 2011: Updated MANUAL provides detailed information how to compute the confidence interval.
- August 11, 2010: Binary for Windows system is provided, though not carefully tested yet.
- August 10, 2010: Binary and bash scripts for MacOS X are provided.
Introduction:
MISFITS is a program to evaluate the goodness of fit of a model to an alignment in phylogeny reconstruction. It offers a look back
at the alignment to pinpoint to site patterns that do not fit to the model and the resulting tree (thereafter referred to as the
tree-model). MISFITS then introduces a number of extra-substitutions on the tree, in a parsimonious manner to fit these site
patterns in to the tree-model. These extra-substitutions plus the evolutionary model will then fully explain the alignment.
Thus, the number of extra-substitutions may be interpreted as a measure to evaluate the goodness of fit of the model to the alignment:
the less the number, the better the fit.
Methods:
The method is described in the following article:
A brief description of the method is as follows:
- Count the observed frequency of patterns in the alignment.
- Compute pattern likelihood under the model and the inferred tree.
- Determine the set of over-represented patterns D+ and the set of under-represented patterns D-.
- For all pairs of patterns (p, p'), p ∈ D+, p' ∈ D-, compute the minimal number of extra-substitutions to convert p into p'.
- Select a matching between patterns in D+ and D- such that the total number of extra-substitutions is minimal.
- Map the extra-substitutions on the tree.
- Determine the significance of the number of extra-substitutions computed at step 5.
Availability:
- For step 1-6: The program is written in C++ and available free of charge. The executable file currently
works under Unix platform (Linux and MacOS X systems) as well as Windows system.
- For step 7: Since it depends on the simulation and tree reconstruction programs that users want to use, we provide
a number of bash scripts running on Unix systems to carry out this task with: SEQ-GEN for simulation and PHYML
for tree reconstruction.
Users may modify these scripts to use other programs instead as well as to use misfits with different options.
- Refer to download for the source code,
program manual, binary file of misfits and the bash scripts.
External programs required:
- For step 1-6: the MISFITS program requires TREE-PUZZLE to compute likelihood
of the patterns (written in a phylip format alignment) given the tree and the model with the corresponding
parameter's values. Please make sure that the executable file of TREE-PUZZLE is named puzzle.
- For step 7: a simulator and a tree-reconstruction program are needed. If you use the bash scripts we provide, you need SEQ-GEN
and PHYML packages and the executable files should be named seq-gen and PhyML_3.0, respectively.
- Make sure that you use the correct binaries of these external programs, i.e. compatible with your system (Linux, MacOS, Windows).
Version history:
- March 2010: The first version misfits-1.0 was launched.
Note:
Please read the manual carefully if you try our program the first time. If you encounter bugs, please report to minh.anh.nguyen(AT)univie.ac.at together with the log files.