MISFITS: Evaluating the goodness of fit between a phylogenetic model and an alignment.

NEWS

  • April 21, 2011: Updated MANUAL provides detailed information how to compute the confidence interval.
  • August 11, 2010: Binary for Windows system is provided, though not carefully tested yet.
  • August 10, 2010: Binary and bash scripts for MacOS X are provided.

Introduction:

MISFITS is a program to evaluate the goodness of fit of a model to an alignment in phylogeny reconstruction. It offers a look back at the alignment to pinpoint to site patterns that do not fit to the model and the resulting tree (thereafter referred to as the tree-model). MISFITS then introduces a number of extra-substitutions on the tree, in a parsimonious manner to fit these site patterns in to the tree-model. These extra-substitutions plus the evolutionary model will then fully explain the alignment. Thus, the number of extra-substitutions may be interpreted as a measure to evaluate the goodness of fit of the model to the alignment: the less the number, the better the fit.

Methods:

The method is described in the following article: A brief description of the method is as follows:
  1. Count the observed frequency of patterns in the alignment.
  2. Compute pattern likelihood under the model and the inferred tree.
  3. Determine the set of over-represented patterns D+ and the set of under-represented patterns D-.
  4. For all pairs of patterns (p, p'), p ∈ D+, p' ∈ D-, compute the minimal number of extra-substitutions to convert p into p'.
  5. Select a matching between patterns in D+ and D- such that the total number of extra-substitutions is minimal.
  6. Map the extra-substitutions on the tree.
  7. Determine the significance of the number of extra-substitutions computed at step 5.

Availability:

  • For step 1-6: The program is written in C++ and available free of charge. The executable file currently works under Unix platform (Linux and MacOS X systems) as well as Windows system.
  • For step 7: Since it depends on the simulation and tree reconstruction programs that users want to use, we provide a number of bash scripts running on Unix systems to carry out this task with: SEQ-GEN for simulation and PHYML for tree reconstruction. Users may modify these scripts to use other programs instead as well as to use misfits with different options.
  • Refer to download for the source code, program manual, binary file of misfits and the bash scripts.

External programs required:

  • For step 1-6: the MISFITS program requires TREE-PUZZLE to compute likelihood of the patterns (written in a phylip format alignment) given the tree and the model with the corresponding parameter's values. Please make sure that the executable file of TREE-PUZZLE is named puzzle.
  • For step 7: a simulator and a tree-reconstruction program are needed. If you use the bash scripts we provide, you need SEQ-GEN and PHYML packages and the executable files should be named seq-gen and PhyML_3.0, respectively.
  • Make sure that you use the correct binaries of these external programs, i.e. compatible with your system (Linux, MacOS, Windows).

Version history:

  • March 2010: The first version misfits-1.0 was launched.

Note:

Please read the manual carefully if you try our program the first time. If you encounter bugs, please report to minh.anh.nguyen(AT)univie.ac.at together with the log files.

Download:

Type Links
Manual misfits_manual.pdf
Source code misfits-1.0.tar.gz
Binary files for Linux system
(misfits-1.0, including 32-bit-binary)
Linux-binary-files.tar.gz
Binary files for MacOS X system
(misfits-1.0)
MacOS-binary-files.tar.gz
Binary files for Windows system
(misfits-1.0)
Windows-binary-files.tar.gz
Example example.tar.gz
Bash scripts for Unix (Linux and MacOS X) Unix-phyml-misfits-1.0.tar.gz
Unix-seqgen-phyml-misfits-1.0.tar.gz