Welcome to Epi-Speller

Introduction:

Epi-Speller is a program for analyzing multiple genome-wide profling epigenomic data. It includes:

Signal discretization based on automatic inference of cut-offs for genome-wide signal levels.

Clustering based on letter-representation.

Using sequence logo to summarize the frequent signals using Weblogo.

Availability:

Source code:

Epi-Speller package

Installation:

Unzipping the package and compiling the epi_letter.cpp program by following command (in the same folder):

g++ epi_letter.cpp -o epi_letter

How to use?

Input data:

List of genomic coordinates (e.g. windows, bin, tiles, ...) and corresponding signals (microarray intensity or number of mapped short-reads with or without normalization) for chromatin marks according to the following format (tab-separated):

- First row: Names of chromatin marks

- For each of following row: first is the genomic coordinate (format start:end) following by corresponding signals for chromatin marks in the first row.

- Example input file: example.txt consist of 21K tiles from 12 tiling arrays for histone modification marks and DNA methylation in Arabidopsis.

Running the Epi-Speller step by step

Grouping of epigenetic signatures with input data example.txt
Assigning epi-letters

R --vanilla < alphabet_chrom.R --f <input file> --k <number_of_epi_letter> --d <dictionary_file>--r <0> --o <mutilple_epigenome_filename>

Example: R --vanilla < alphabet_chrom.R --f example.txt --k 3 --d epi_letter.dict --r 0 --o example.epi

Please create the text file with the acronyms for the epi-letter as you want, each row is for a letter (--d parameter, e.g. epi_letter.dict), --r parameter is for creating random epi-letter-represented epigenomes (0-no, 1-yes), default 0.

It will create the multiple epigenomes for all chromatin marks with epi-letter representation in a single file (--o is parameter for output file).

It also creates the look-up dictionary (.dict) listing all the tiles with coordinates, signals and letter_ID assigned and the epi-letter string file (.dna) for each individual mark. The coordinate file (.coor) is created for using in the next step.

Searching/Clustering for epigenetic signatures: either by using conventional profiling signals or by using epi-letter representation as following

3.1 Scanning for the epigenetic patterns

perl epimotif_scanning.pl -f <mutilple_epigenome_filename> (currently only support column patterns)

Example: perl epimotif_scanning.pl -f example.epi

It will create a file with ".cols" that list all column patterns and the corresponding frequency of its appreances (in the file .cols.freq).

3.2 Using R to make a unique column file for removing the repeated patterns for efficient computation of Hamming distance between patterns, for example:

write.table(unique(read.table("example.epi.cols.freq")), "example.epi.cols.freq.uniq", sep = "\t", quote=F, row.names=F, col.names=F)

OR using shell command-line as following:

sort example.epi.cols.freq | uniq > example.epi.cols.freq.uniq

example.epi.cols.freq.uniq is the file of unique column patterns. The orginal pattern file (example.epi.cols) is still necessary for tracing back the corresponding location in the genome.

3.3 Computing Hamming distance matrix for clustering

perl hamming_distance.pl -f <column_pattern_file>

Example: perl hamming_distance.pl -f example.epi.cols.freq.uniq

It will output the .hamming file that can be used for clustering, for example with k-mean method in R in the next step.

3.4 Clustering

R --vanilla < try_clustering.R --f <hamming_distance_file> --u <unique_pattern_file> --c <column_pattern_file> --k <number_of_cluster>

Example: R --vanilla < try_clustering.R --f example.epi.cols.freq.uniq.hamming --u example.epi.cols.freq.uniq --c example.epi.cols --k 4

It will output for each cluster one file (named cluster_xx, xx is the cluster_id) consiting of the pattern, coordinates and cluster_id. It also extract the pattern (the 2nd column in the file .logo) for the logo representation in the next step.

Logo representation using Weblogo 3.2 program (download the sourcecode or here. You have to unzip the files to use it)

Example: ./weblogo-3.2/weblogo --format pdf --ylabel '' --show-xaxis no --alphabet 'LMH' --errorbars no --color red H 'High' --color green L 'Low' --color blue M 'Middle' <cluster_1.logo >cluster_1.pdf