Protein Sequence Database Examples

The primary databases contain sequence data(nucleic acid or protein).

Protein sequence databases Examples.
The different protein sequence database examples are discussed below.

a. PIR (Protein Information resources): 

It is the largest, most comprehensive, annotated protein sequence database in public domain. It is a collection of sequences for investigating evolutionary relationships among proteins .The PIR database is split into four distinct sections. PIR 1 -PI R4 based on the manner in which the protein data are entered and their status. Normally the fully classified entries are given more importance and hence the entered in PIR 1. The sequences which are not fully classified are stored in PIR 2.Since the PIR entries are not fully classified they may contain redundant (excessive) information. The unverified entries are entered in PIR 3. The PIR serves the scientific community through online access, by distributing magnetic tapes etc.


It is a very helpful biological database of protein sequences. Swiss- Prot was developed by the Swiss Institute of Bioinformatics and European Bioinformatics institute. This database provides a high level of integration with other databases and has a very low level of redundancy (it means that less identical sequences are present in the database.) This database provides high level information including descriptions of the function of the proteins, its variants, structure of its domains etc. SWISS-PROT is one of the most popular protein sequence resources because of the quality of its entries. Also SWISS-PROT contains 70,000 entries from more than 5000 different species. The structure of SWISS-PROT makes computational access both straight forward and efficient. So SWISS-PROT is the most widely used protein sequence database in the world. Swiss-Prot functions as a minimal redundant information source. It means excessive data is not present- only the vital information is stored.

Swiss -prot provides descriptions of a non-redundant set of proteins including their function, domain structure, post-translational modifications and variants. It is tightly integrated with other databases. Swiss-Prot concentrates on model organisms of distinct taxonomic groups to ensure the presence of high quality annotation.

The Swiss-Prot group develops and maintains other databases including PROSITE, a data base of protein families, and ENZYME database of enzyme nomenclature.
 Structure of Swiss-Prot: Swiss-Prot emerged as a famous database due to the quality of its annotations (comments), structure and the way in which the data are stored. The common structure of database is given in table.

Two letter code in the entry
Each entry begins with an Identification line.
An additional identifier is provided by the Accession number
Give information about Date of entry, date of last modification etc. 
Description lines to describe the name by which the protein is known.
Give Gene Name.
Indicate Organism Species
Organism Classification information
R-line irovides a list of supporting references.
Comment lines to indicate the various protein details such as its function, subcellular location similarity to particular protein families etc.
These are called Database cross Reference lines to provide links to other bio-molecular databases, primary and secondary databases etc.
Give applicable Keywords
Feature table indicates the main regions of sequences concerned.
SQ line includes the sequence itself.
To indicate the end of entry.

c. TrEMBL ( Translated European Molecular Biology Laboratory). 

A special feature of TrEMBL format is that it contains translations of all coding sequences (CDS). The main aim of TrEMBL is to allow very rapid access to sequence data from genome projects. TrEMBL is a very large protein sequence database in Swiss-Prot format. It is generated by computer translation of the genomic information from the EMBL Nucleotide Sequence Database. Computer translation is not entirely perfect. So proteins predicted by the TrEMBL database can be hypothetical and many TrEMBL entries are poorly annotated (TrEMBL has two main sections designated SP-TrEMBL and REM-TrEMBL.

SP-TrEMBL: SWISS-PROT TrEMBL contains entries that are united together into Swiss-Prot. Swiss-Prot accession numbers are provided for all the entries of SP-TrEMBL.

REM-TrEMBL: Contains sequences that are not concerned to be included in SWISS-PROT.   

Composite Protein Sequence Databases:

Composite databases use a variety of different primary sources and are hence efficient to search. Different methods can be used to create composite resources. Composite databases render sequences searching much more efficient because they avoid the need to interrogate multiple sequences. The main composite databases are,

a) NRDB (Non-Redundant Database)
b) OWL

This database has the advantage of containing fewer errors than many Other composite databases. Different composite databases use different primary sources. 

Sreejith Hrishikesan

Sreejith Hrishikesan is a ME post graduate and has been worked as an Assistant Professor in Electronics Department in KMP College of Engineering, Ernakulam. For Assignments and Projects, Whatsapp on 8289838099.

Post a Comment

Previous Post Next Post