Secondary Database Searching & Hidden Markov Model


Primary database search tools are effective for identifying sequence similarities. But analysis of output is difficult. So the main principle behind the development of secondary database is that by using them, we can share the structural and functional characteristics of the constituent sequences.
Different secondary databases are formed as a result of different analysis methods. HMMs, profiles, blocks, fingerprints etc are the different pattern recognition methods used in major secondary database. Some analysis methods are given below.

a) Fingerprints 

Within a sequence alignment, we can find several motifs (motif means a consecutive string of amino acids in a protein sequence, whose general character is repeated). In secondary database, we can store such motifs so that during searching it becomes easy to identify related sequences. Motifs are used to create a signature or fingerprint and stored in secondary database. The technique of fingerprinting is not commonly used.


Similar to protein fingerprinting, blocks may be used to search sequence database to find additional family members. Here blocks within the family are used to make, independent database searches .For a given sequence, the more blocks are matched, the greater possibility that the sequence belongs to that family.


Profile is a pattern recognition method in 2 databases. Profiles define which residues are allowed that given positions, which positions are highly conserved and so profiles helps in defining the full domain alignments.

d) Hidden Markov Model (HMM) 

It can determine the most likely MSA or set of possible MSAs. HMM is a probabilistic model consisting of a number of interconnecting states. HMMs have some limitations which lead to false matches.

The first approach for discovering disease related genes is the technique of positional cloning. Hence the chromosome related to the disease in question is stabled by analyzing a population of subjects. The whole process of positional cloning is time consuming.

Popular Posts