Skip navigation.
New Mexico State University

Molecular Biology Program Bioinformatics and Educational Resources - Bioinformatics Tools Help

Batch BLAST Help

Run Full Version of Program

Run Fungal Version of Program

Contents

Guidelines for Use
Input format for sequences
Submission Form
Results

Overview

NOTE: As of 21 February 2005 there are two versions of Batch BLAST. The original ("Full") version requires a password and is limited to use by the NMSU research community or approved users. The "Fungal" version also requires a password, but the number of query sequences that may be submitted is limited, and only local fungal sequence databases are offered. Otherwise, the information below applies to both versions.

Batch BLAST is a service provided by SWBIC to the NMSU research community. It runs the Basic Local Alignment Search Tool (BLAST) for each sequence in a set of input (i.e., query) sequences. The sequences are input as a group in FASTA format. The user may select which type of BLAST to run (blastp, blastn, or blastx), which sequence database to search, and other BLAST options. The results of these BLASTs may be sorted by the queries that resulted in the most significant matches to database sequences. The BLAST runs are performed in parallel on SWBIC's 32-node Beowulf cluster, which is capable of completing thousands of BLAST runs each hour. Because of the possibile time required, however, results are returned in an e-mail that provides the URL to the main page of your BLAST results. These results provide more than the standard BLAST report, and include a graphical alignment of High Scoring Segment Pairs (HSPs), more complete descriptions of subject sequences, and links to the Genbank entries of the matched subject sequences.

Batch BLAST uses a recent version of BLAST from NCBI, as well as sequence databases from NCBI that are updated bimonthly. The NCBI BLAST Help page provides information about the various BLAST options.

Guidelines for Use

Batch BLAST runs are queued on a first-come-first-served basis. The time it will take to return results will depend on how busy the cluster is, where your job is in the queue, and how large your job is. We ask that you submit large jobs in off-peak hours, if possible, and wait for the job to complete (you will get e-mail) before submitting the next large job. What constitutes a large job depends on several factors: the number and length of sequences submitted, the type of BLAST being run, and the size of the database searched. blastx requires approximately six times longer to run than blastp, and three times longer than blastn. A job with more than 1,000 query sequences to be blasted against a large database is considered a large job. The largest databases are the non-redundant and EST databases. Please keep these considerations in mind when you submit a job.

The result web pages will be retained on the cluster for at least two weeks. If you want to keep your results, use your browser's "save page as" function to save all results locally. If you need to keep them online for a longer time, contact John Spalding.

Input format for sequences

You may paste one or more query sequences in the text box, or click the UPLOAD button to specify a file of sequences on your computer. The maximum number of sequences that can be submitted is shown above the input box. All sequences must be in FASTA-format, i.e., each sequence begins with the ">" character in the first position, followed by descriptive text (the "definition line"). One or more lines containing the sequence then follow. These lines may be of varying length and should contain only sequence characters that are valid to BLAST. The input may consist of multiple sequences, one following the other. The following example of two sequences is valid:

>seq001 this is the description
MGIKALAGRDLLAIADLTIEEMKSLLQLAADLKSGVLKPHCRKILGLLFYKASTRTRVSF
TAAMYQLGGQ VLDLNPSVTQ VGRGEPIQDT ARVLDRYIDI
LAVRTFKQTDLQTFADHAKM
>seq002 this is the description
MRVFLAICLSLTVALAAETGKYTPFQYNRVYSTVSPFVYKPGRYVADPGR
GFYTGSGTAGGPGGAYVGTKEDLSKYLGDAYKGSSIVPLPVVKPTIPVPV
APEATTT

IMPORTANT: The first field of each defline must be unique because Batch BLAST identifies query sequences by this field. The first field consists of the characters between the leading '>' and the first blank. The above example follows this rule. The next example does not and will cause Batch BLAST to fail:

>experiment 1
MGIKALAGRDLLAIADLTIEEMKSLLQLAADLKSGVLKPHCRKILGLLFYKASTRTRVSF
>experiment 2
MRVFLAICLSLTVALAAETGKYTPFQYNRVYSTVSPFVYKPGRYVADPGR

Submission Form

There are four types of inputs to Batch BLAST:

  • Query sequences: these must be in FASTA format. Each sequence contains a definition line (starting with a ">" followed by a description of the sequence) and then one or more lines of sequence data. This set of sequences may be pasted into the text window or uploaded from a file on your computer.
  • BLAST options:
    • BLAST program: the choices are blastn (nucleotide query and database), blastp (protein query and database), and blastx (nucleotide query translated into six reading frames and searched against a protein database).
    • Database to search: the available databases are offered in this list. For more information on these databases see Sequence Databases Help.
    • Max E-value: this is a cut-off value; HSPs with an E-value greater than this will not be included in the BLAST output.
    • Max number of matches: this is a cut-off value; the number of matched subject sequences will not exceed this number.
    • Protein substitution matrix: applicable for blastp and blastx searches.
    • Filter query: uncheck this box if you do not want to apply a low-complexity filter.
    • Costs to open and extend a gap: these are set to defaults, but you may specify your own values. Be aware, however, that BLAST does not accept any combination of gap costs, and may stop if it detects an invalid pair of values.
    • Advanced options: this text box allows the user to enter other BLAST options not offered in the Batch BLAST form. Options must be provided by in the Unix "-flag value" format required as input to the "blastall" program (NCBI Help). Do not enter options already being defined in the input form. Do not use the "-m" option (alignment view) because this will alter the format of the BLAST report and prevent Batch BLAST from completing.
  • Sort options: This determines the order of query sequences in the main result page. They may be sorted
    • by the queries that resulted in the smallest E-value matches to subject sequences,
    • by the queries that produced the most BLAST matches of subject sequences
    • or not sorted (i.e., in the order of query sequences input)
  • BLAST report format: This checkbox says "Use html4blast report format (forced if number of sequences > 1000)". If checked, the BLAST report for each query is in "htm4blast" format, which is described below. If not checked, the regular Batch BLAST format is used, which is also described below and contains more information. The "htm4blast" format uses less disk space, so it is forced if there are more than 1000 query sequences.
  • E-mail address: This is a required field.

Results

Batch BLAST results are organized in a series of web pages stored on darwin. When all of the analyses are completed, the user is sent an e-mail with the URL of the main page of results. These results are organized in a table, in which each row contains a summary of and links to the results for each query sequence. Each row contains the following information: the number of subject sequences matched, a link to the query sequence, the query description and link to the detailed results, the description of the top-scoring subject, and the scores (bits and E-value) of this subject.

The results are pre-sorted by the methods described above. You can show the differently-sorted results by clicking on the sorting method at the top of the results page.

Clicking on a query description opens a page of complete results for that query in a second window. This is the normal BLAST report with additional information. In regular Batch BLAST report format, at the top is a graphical display showing the alignment of all HSPs against the query, with reverse complement HSPs (blastn and blastx) shown as hollow bars. Statistics for each HSP are shown to the right in the following order: E-value, bits score, percent identity, percent positives, percent gaps, and length of the subject sequence. When you hold the mouse pointer over an HSP bar, a short description of that subject sequence is shown. Click on an HSP bar to jump to that sequence in summary table below.

If you selected "html4blast report format" (or it was forced because there were more than 1000 query sequences) the results are similar, except that the graphical display of HSPs is simpler.

Below the HSP graphical alignments (above in "html4blast" format) is a table of data for each subject sequence, sorted by the score of the top-scoring HSP (the same order as in the BLAST report). Depending on the report format, fields in this section are links to other parts of the BLAST report or to the Genbank or other database entries for each subject sequence.

The last section of the results is the BLAST report itself. You may jump to the alignment (HSP) results for a particular subject sequence by clicking links in the table above.