Upload a FASTA-format file containing multiple protein sequences to be
searched for matching Pfam families. Results of the search will be
returned to you at the email address that you specify. Please check the
notes
below for the restrictions on uploaded sequence files.
More...
File contents
We accept only protein sequences and your uploaded file
must conform to a fairly strict interpretation of the
FASTA file format.
We apply the following checks to the format of uploaded sequence files.
Files that do not conform to the following rules will be rejected by
the server
File contents
Files must contain only header lines and sequence lines. Header lines,
which begin with ">", can be used to describe the sequence
that follows. There is no fixed format for header lines but we restrict
the characters that are allowed. If your header lines contain any of the
following characters, your file will be rejected:
; \ ! *
Note that we explicitly include the semi-colon (;) in the list of
forbidden characters, although this may be used to denote comments in
some versions of the FASTA-format. Please do not use comments in the
FASTA files that you upload here.
Header rows
The FASTA-format specification recommends that header lines are kept
shorter than 80 characters. Batch searches are run
using our pfam_scan.pl script, which uses programs from
the HMMER suite. Because of the way that HMMER handles header lines in
FASTA files, only the first 60 characters are actually used. Please
make sure that your header lines are 60 characters or less in length.
In order to avoid generating strange or ambiguous search results, we
require that header lines in your file are unique, at least within the
first 60 characters. Uploaded files which contain duplicate header lines
will be rejected.
Sequence symbols
Your sequence should be a valid protein sequence. As
such, the sequence line should contain only amino-acid symbols, i.e.
capital letters excluding "J". In the context of a Pfam
search, gaps and translation stops have little meaning and should not
normally be used, but we accept "-" or "*" to
denote gaps and translation stops respectively.
Service limits
Searches run on a "compute farm" with a limited number of
"slots". Each search takes one slot and once all slots are
in use, new jobs wait in a queue for the next slot to become free. In
order to prevent large jobs occupying slots for very long periods,
which can impact the availability of the system for other users, we
place a number of restrictions on the size of job that we will accept.
File size
Files must have fewer than 500,000 lines and fewer than
5000 sequences.
Sequence length
Each sequence must be no longer than 20000 amino-acids.
Sequence variation
We use heuristics to check that a sequence has a reasonable level of
variation, in order to prevent large strings of identical sequence or
a large number of duplicate residues being searched. If you find that
you cannot submit a valid sequence because of this restriction, please
let us know.
E-value limit
If you specify an E-value cut-off for your search, that E-value must
be a positive number.