ALFRED : Alignment Free Distance Estimator
AlFRED is an Alignment Free Distance Estimator software. Currently, it provides the following options :
- Compute k-mismatch longest common substring using the prefix-chopping algorithm for a pair of sequneces.
- Compute k-mistmatch average common substring for every pair of sequences present in a database of sequences.
- Estimate a greedy distance estimaton for phylogenetic tree inference.
External Dependencies that need to be installed
- A modern, C++11 ready compiler such as
g++version 4.8 or higher or
clangversion 3.2 or higher.
- The cmake build system (Version >= 2.8.11).
- A 64-bit operating system. Either Mac OS X or Linux are currently supported.
- Git version control system
Dependices made available with the gz file
- googletest is included as git submodule.
- libdivsufsort libraries are copied in the directory.
Download the alfred.tar.gz file available here. Extract the contents as follows:
tar xvzf alfred.tar.gz
Next, create a build directory. For example,
mkdir build cd build
Finally, configure and build the executable 'alfred.x'. Continuing the example,
cmake .. make
If successfully built, alfred.x should be available in the build directory.
Place the input sequences into a single fasta files. Make sure that the header of the fasta file has the appropriate sequence name. For example, PHYLIP doesn't accept sequence names longer than 10 characters. Also, make sure that the input file has one of 'fasta' or 'fa' or 'fas' file extension. Note that the program accepts only sequences with valid DNA or protien alphabets. An example is as shown below.
Compute k-mismatch Average common substrings for a string database
Run the program by providing the pepared input file with -f option, the output file with the -o option, and the number of mismatches to allow with -k option. An example is shown below:
build/alfred.x -f primates.full.fas -o acs.primates.k3.out -k 3
Output is generated in a matrix format, which can be fed directly to PHYLIP
Compute k-mismatch LCP between two sequences X and Y
Run the program in the same manner shown above, and add the -p option to stop after generating the k-mismatch LCP values. An example is shown below:
build/alfred.x -f primates.full.fas -o acs.primates.k3.out -k 3 -p
Output is generated in the json format. json ouput has two arrays, LX and LY -- one corresponding to each input string. The array LX is of the length |X|, where i-th entry of LX has the length and the position of the longest common substring in Y for the suffix of X starting at i. LY holds similar values for the string Y.
Thankachan, Sharma V, Chockalingam, Sriram P, Liu, Yongchao, Apostolico, Alberto, and Aluru, Srinivas. ALFRED: A practical method for alignment-free distance computation. (Submitted.)