Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

SAM - Sequence Analysis Methods for automatic annotation of unreviewed entries

Last modified June 13, 2019

UniProt’s Automatic Annotation pipeline enhances the unreviewed TrEMBL records in UniProtKB by enriching them with automatic classification and annotation. In this context, we use a suite of Sequence Analysis Methods (SAM) to annotate extra sequence-specific information.

Methods

Predictions of sequence features such as Signal, Transmembrane, Coiled coil and intrinsically disordered regions (the latter described in Region and Compositional bias annotations) are generated using the following software from external providers:

These methods are applied to UniProtKB sequences by InterPro to predict sequence features. More annotations (mainly keywords) are then added automatically to enrich the generated predictions. The new predictions are propagated to all the UniProtKB/TrEMBL records that do not already contain such feature predictions from the UniRule automatic annotation system.

Overlaps and sanity checks

We use the overlap of different methods to confirm the presence of a predicted sequence feature.

Transmembrane region

TMHMM and Phobius predictors are used to infer transmembrane regions. If there is an overlap of at least 10 amino acids between TMHMM and Phobius results, the transmembrane region is annotated using the sequence ranges predicted by Phobius. Otherwise, if there is no such overlap, no predictions are generated.

Transmembrane region prediction

Signal peptide

TMHMM, SignalP and Phobius predictors are used to infer signal peptides. If there is a prediction from SignalP and none from TMHMM in the same range, the signal peptide is annotated.
If SignalP and Phobius both predict a signal peptide, then it is annotated.
When predicted N-terminal signal peptides (as predicted by SignalP) and transmembrane regions (as predicted by TMHMM) overlap, then the prediction returned by Phobius is used to discriminate between the two possibilities.
In all the above cases, we annotate the sequence region predicted by SignalP.

Coiled coil region

Only the Coils method is used to predict coiled coil regions. If a disordered region is predicted in the same range by MobiDB-lite, the coiled coil region is not annotated.

Intrinsically disordered region

The MobiDB-lite method uses several different predictors to derive a consensus prediction for regions that are intrinsically disordered (or have a compositional bias).

Related information

Transmembrane regions in reviewed entries
Signal peptides in reviewed entries
Coiled coil regions in reviewed entries

UniProt is an ELIXIR core data resource
Main funding by: National Institutes of Health

We'd like to inform you that we have updated our Privacy Notice to comply with Europe’s new General Data Protection Regulation (GDPR) that applies since 25 May 2018.

Do not show this banner again