Coding for sources with memory:
A drawback of Huffman’s code is that it requires simple probabilities. For real time applications, Huffman encoding becomes impractical as a source statistics are not always known apriori. A code that might be more efficient to use statistical interdependence of the letters in the alphabet along with their individual probabilities of occurrences is the Lempelziv algorithm. This algorithm belongs to the class of universal source coding algorithms.
The logic behind Lempelziv universal coding is as follows:
* The compression of an arbitrary binary sequence is possible by coding a series of zeros and ones as some previous such string (prefix string) + one new bit.
* The new string formed by such parsing becomes a potential prefix, string for future strings.
These variables are called phrases (sub sequences). The phrases are listed in a dictionary or code book which stores the existing phrases and their locations. In encoding a new phrase be specified the location of the existing phrase in the code book and append the new letter.
Consider an example:
1. Determine the Lempelziv code for the given sequence:
000101110010100101………….. ?
Answer:
The given sequence is:
00, 01, 011, 10, 010, 100, 101
Depending on this a table can be formed as :
A drawback of Huffman’s code is that it requires simple probabilities. For real time applications, Huffman encoding becomes impractical as a source statistics are not always known apriori. A code that might be more efficient to use statistical interdependence of the letters in the alphabet along with their individual probabilities of occurrences is the Lempelziv algorithm. This algorithm belongs to the class of universal source coding algorithms.
The logic behind Lempelziv universal coding is as follows:
* The compression of an arbitrary binary sequence is possible by coding a series of zeros and ones as some previous such string (prefix string) + one new bit.
* The new string formed by such parsing becomes a potential prefix, string for future strings.
These variables are called phrases (sub sequences). The phrases are listed in a dictionary or code book which stores the existing phrases and their locations. In encoding a new phrase be specified the location of the existing phrase in the code book and append the new letter.
Consider an example:
1. Determine the Lempelziv code for the given sequence:
000101110010100101………….. ?
Answer:
The given sequence is:
00, 01, 011, 10, 010, 100, 101
Depending on this a table can be formed as :
Numeric position

1

2

3

4

5

6

7

8

9

Sub Sequence

0

1

00

01

011

10

010

100

101

Numerical representation





11

12

42

21

41

61

62

Binary Coding Sequence





0010

0011

1001

0100

1000

1100

1101

Maximum number in numerical representation = 6
6=110
No of bits = 3
i.e,.
0 = 000
1 = 001
2 = 0101
4 = 100
5 = 101
6 = 110
The last symbol of each sub sequence in the code book is an innovation sequence corresponding the last bit of each uniform blocks of bits in the binary encoded representation of the data stream represents innovation symbol for the particular subsequence under consideration. The remaining bits provide the equal binary representation of the pointer in the root subsequence that matches the one in question except for the innovation number
6=110
No of bits = 3
i.e,.
0 = 000
1 = 001
2 = 0101
4 = 100
5 = 101
6 = 110
The last symbol of each sub sequence in the code book is an innovation sequence corresponding the last bit of each uniform blocks of bits in the binary encoded representation of the data stream represents innovation symbol for the particular subsequence under consideration. The remaining bits provide the equal binary representation of the pointer in the root subsequence that matches the one in question except for the innovation number
0 on: "Lempel Ziv Algorithm with Example"