The Automated Simultaneous Analysis Phylogenetics (ASAP) approach is designed to facilitate the process of downloading, formatting, and ultimately performing the phylogenetic analyses often associated with multi-partition datasets. ASAP analyses begin with a set of systematically annotated FASTA formatted files. An ASAP script then organizes these data into a single, multi-partition NEXUS file; which is then used to infer individual trees for each partition, as well as a simultaneous analysis (SA) that incorporates all data from all partitions. ASAP then calculates the branch support (BS) and the partition Bremer support (PBS) for each node in the SA tree. The final statistics are formatted into a tab-delimited file that can be imported into spreadsheet applications such as Microsoft Excel.
System Requirements
The scripts are optimized for OS X (10.4.x PPC or Intel); however, they are designed for use on any *NIX platform (including earlier versions of OS X) that also support the requisite tools, specifically:
- PAUP*. The command-line version should be installed. Version 4b10 or 4b11 are recommended. Specific instructions for its installation can be found at the PAUP* Website. PAUP* should be installed into your path. You can verify that PAUP* is correctly installed by typing "which paup" at the command line -- it should return something like "/usr/local/bin/paup" "/usr/bin/paup".
- MUSCLE. Download and installation instructions can be found at the MUSCLE Website. The muscle binary should be installed into /usr/local/bin/. You can verify that MUSCLE is correctly installed by typing "which muscle" at the command line -- it should return "/usr/bin/muscle" or "/usr/local/bin/muscle".
(For more help on how to install the command-line version of PAUP* or MUSCLE into your path, contact your local *NIX/OS X guru.)
Downloading and Installing ASAP
Once you have met the system requirements listed above, installation is a two step process:
For OS X:
- Download and uncompress the latest ASAP installation package (ASAP_OSX.pkg.zip)
- Double-click the downloaded file.
For *NIX:
- Download and uncompress the latest ASAP .zip file (ASAP.zip)
- Move or copy the contents of the .zip file into /usr/bin.
Creating data sets for ASAP
The general ASAP process is to first get some partitioned data that are annotated in a systematic manner (the ASAP package includes scripts to help with some of this) and then the run the ASAP script on these data. Data can originate in three forms: (1) a customized data set; (2) an existing NEXUS file with partitions defined as CharSets; or, (3) a text file containing a list of partion names and NCBI GI numbers. General instructions follow for each of these methods.
Please take note that ALL files, regardless of method, require that input files are formatted with UNIX line-breaks.
Method 1: Create a customized data set from FASTA formatted files (CLICK HERE FOR BRIEF TUTORIAL)
ASAP expects a particular directory structure wherein the data are found by the ASAP script and subjected to analyses. With a parent ASAP analysis directory, two sub-directories are required: "_partitions" and "_pendingAlignment". Take note that the prefixing '_' is required, and that the names are cAsE-SeNsItiVe. An example of the kind of directory structure that is required can be downloaded here (ASAP.dirStructure.zip). One approach might be to use this template directory structure as a place to start ASAP runs.A separate FASTA file is required for each partition. The name of the partition in all the subsequent analyses will be derived from the FASTA file names, and the names of the taxa will be generated from the comment lines within the FASTA files. It is suggested that all FASTA files end with the extension ".fasta". Spaces, underscores, and generally any non-alphanumeric characters, are not permitted in either partition or taxon names and may cause errors in the analysis (some of these characters are used internally within ASAP to keep track of things as the script progresses). Also note that all taxon names and partition names are cAsE-SeNsItiVe (e.g., "Dmela" will be treated as a different taxon than one annotated as "dmela"). It is strongly suggested that the following name structure is followed:
PartitionName.fasta for file names, where PartitionName is the name of the partition.
An example of a _partition directory containing partitions that are correctly formated can be downloaded here here.
The type and status of a given partition determines which directory you place it in:
- For partitions that require sequence alignment, place the respective FASTA files into the "_pendingAlignment" directory.
- For partitions that do not require sequence alignment (e.g., already aligned sequence or morphological data), place the respective FASTA files into the "_partitions" directory.
Method 2: Create a data set using an existing NEXUS file with CharSets
If a NEXUS file (formatted with UNIX line breaks) contains a multi-partition dataset annotated using the CharSet commands, a single ASAP script can be used to create the requisite directory structure and export the data from the NEXUS file into individual FASTA files in the "_partitions" directory. An example of a NEXUS file that can be used as input is Gatesy, 1999 [Syst Biol 48(1)]. Then,
- Use the paup2Asap script convert the NEXUS file to create the directory structure and FASTA partition files. First, change to the directory that contains the NEXUS file. Then, type in the command "paup2Asap inputNexusFileName", where inputNexusFileName is the name of the NEXUS file.
A directory, entitled inputNexusFileName.ASAP, will be created in the same directory that the command was executed. Within this directory, the required ASAP directory structure (as described above) will be created. In addition to the "_partitions" and "_pendingAlignment" directory, a directory called "_phylipFiles" will also be created, which contains the partition data in individual Phylip files. The partition data used by ASAP, as derived from the input NEXUS file are located in the _partitions directory. It is assumed that the data that are in the NEXUS file are aligned sequence data or morphological data; however, if any of these data partitions should be re-aligned, they can be moved to the "_pendingAlignment" directory.
Method 3: Create a data set using a set of NCBI GI numbers organized into partition names
A list of partition labels and NCBI GI sequence numbers that are organized into a tab-delimited file can be used to automatically download and generate the ASAP directory structure discussed above. The general format for the file is: partitionLabel (tab) GInumber. Remember to save this file using UNIX line-breaks. An example file, consisting of two partitions (ND1 and COX1) can be seen here.
- To create the ASAP directory structure, and download the sequence files type in the command "gi2Asap listFileName", where listFileName is the name of the file containing the tab-delimited list of partitions and NCBI GI numbers.
The above-described directory structure will be created, and the data will be downloaded into the "_pendingAlignment" directory. A separate FASTA file will be generated for each partition named in the list file, named as the partition. The taxon names are crafted using the first letter of the genus and the first four letters of the species name.
Running ASAP
After the data are downloaded, the ASAP script is then used to process the contents within the prescribed ASAP data structure. If there are any sequences in the "_pendingAlignment" directory, they are aligned using MUSCLE and then moved to the "_partitions" directory. The files within the "_partitions" directory are then assembled into a single Interleaved NEXUS matrix, with all the required supporting command blocks. The phylogenetic analyses are then performed using this matrix. To run the script, the general syntax is:
ASAP asapDirName
Where, asapDirName is the name of the ASAP directory containing the data to be processed. A number of new directories and files are generated as the script processes the data and runs the phylogenetic analyses through PAUP*:
- _branchSupports.txt - this file contains the tab-delimited results of the branch supports at each node.
- _NEXUS.nex - this file is the multi-partitioned dataset assembled from the _partition directory.
- currList - this is an internal file name for ASAP to keep track of files that are in the _pendingAlignment or _partition directories. This prevents ASAP from repeating an already completed analysis. This can be overriden by entering "forceASAP" at the command line (e.g., ASAP asapDirName forceASAP).
- getTrees.cmd - this file is a PAUP command file that is used to generate all the individual partition trees as well as the simultaneous analysis tree.
- supportTests - this directory contains all the individual results from each of the performed tests
- trees - this directory contains the results of the individual partition and simultaneous analysis trees (three files exist for each tree search performed: AllTrees, Bootstrap, and Consensus).
Outgroup taxa and customized partitions can be specified to ASAP through a settings file. This is done by creating a file called ASAP.settings within the ASAP directory structure. A general form for this file is included in the ASAP directory structure file mentioned above (ASAP.dirStructure.zip) and can also be downloaded here (ASAP.settings). General guidelines are within the comment lines in these template files. Briefly, all lines that begin with a '#' are ignored; outgroup taxa are specified using the command OUTGROUP = taxon1, taxon2; and, custom partitions are defined using existing partitions (CharSets) using CHARSET partitionName = partName1, partName2.