ALFRED : Alignment Free Distance Estimator

AlFRED is an Alignment Free Distance Estimator software. Currently, it provides the following options :

  1. Compute k-mismatch longest common substring using the prefix-chopping algorithm for a pair of sequneces.
  2. Compute k-mistmatch average common substring for every pair of sequences present in a database of sequences.
  3. Estimate a greedy distance estimaton for phylogenetic tree inference.

Installation

Dependencies

External Dependencies that need to be installed

Dependices made available with the gz file

Compilation

Download the alfred.tar.gz file available here. Extract the contents as follows:

tar xvzf alfred.tar.gz

Next, create a build directory. For example,

 mkdir build
 cd build

Finally, configure and build the executable 'alfred.x'. Continuing the example,

 cmake ..
 make

If successfully built, alfred.x should be available in the build directory.

Usage

Preparing Input

Place the input sequences into a single fasta files. Make sure that the header of the fasta file has the appropriate sequence name. For example, PHYLIP doesn't accept sequence names longer than 10 characters. Also, make sure that the input file has one of 'fasta' or 'fa' or 'fas' file extension. Note that the program accepts only sequences with valid DNA or protien alphabets. An example is as shown below.

>SeqName
ACGTTAGAGTAAATGGAGTAGAAT

Compute k-mismatch Average common substrings for a string database

Run the program by providing the pepared input file with -f option, the output file with the -o option, and the number of mismatches to allow with -k option. An example is shown below:

build/alfred.x -f primates.full.fas -o acs.primates.k3.out -k 3

Output is generated in a matrix format, which can be fed directly to PHYLIP

Compute k-mismatch LCP between two sequences X and Y

Run the program in the same manner shown above, and add the -p option to stop after generating the k-mismatch LCP values. An example is shown below:

build/alfred.x -f primates.full.fas -o acs.primates.k3.out -k 3 -p

Output is generated in the json format. json ouput has two arrays, LX and LY -- one corresponding to each input string. The array LX is of the length |X|, where i-th entry of LX has the length and the position of the longest common substring in Y for the suffix of X starting at i. LY holds similar values for the string Y.

Citation

Thankachan, Sharma V, Chockalingam, Sriram P, Liu, Yongchao, Apostolico, Alberto, and Aluru, Srinivas. ALFRED: A practical method for alignment-free distance computation. (Submitted.)