Multiple Sequence Alignment Methods

Methods of producing Multiple Sequence Alignments (MSA):

1. Dynamic programming:

There are different methods of producing a MSA.  The most direct method uses a dynamic programming technique to identify the globally optimal alignment solution. For proteins, the method of dynamic programming involves two set of parameters called gap penalty and substitution matrix. Here scores are assigned to the alignment of each pair of amino acids based on the similarity of amino acids' chemical properties and the evolutionary probability of mutation.

In MSA, for n-individual sequences, an n-dimensional equivalent matrix is formed in standard pairwise sequence alignment. The search space thus increases exponentially with increasing 'n'. The MSA program optimizes the sum of all the pairs of characters at each position in the alignment. It is called sum of pair score.

2. Progressive alignment construction

Progressive alignment is the most widely used approach to multiple sequence alignments. It is also called hierarchical or tree method. It builds up a final MSA by pairwise alignments beginning with the most similar pair and progressing distantly related pair. All progressive alignment methods require two stages.  At first stage, the relationship between sequences is represented as a tree and in the second stage the MSA is built up.

The primary problem in progressive alignment is that when errors are made at any stage in growing MSA , these errors are then propagated through to the final result, Performance also degrades when all of the sequences in the set are distantly related. Progressive alignment methods are efficient enough even if we use about 1000 sequences.

3. Iterative methods:

A major problem in the progressive alignment method is that the accuracy of alignment heavily depend on the accuracy of initial pairwise alignment. The iterative methods work similarly to progressive methods, but repeatedly realign the initial sequences and sometime add new sequences to the growing MSA.

4. Hidden Markov Models

It can determine the most likely MSA or set of possible MSAs. HMM is a probabilistic model consisting of a number of interconnecting states. Typical HMM based methods work by representing an MSA as a partial order graph which consists of a series of nodes representing possible entries in the columns of an MSA. In this representation a column that is absolutely conserved is coded as a single node with many outgoing connections.