Let T V E be a tree let us assume its vertices are distinct
Let T = (V, E) be a tree, let us assume its vertices are distinctly labeled by the numbers 1, 2, . . . , n. We now describe an encoding algorithm P encode(T): It outputs a sequence of numbers seq of length n 1. Until there are no nodes left in T, it always removes the leaf v with the lowest label from the current T (note that labels are the integers 1 through n) and appends the index of v’s (only) neighbor to seq.
1. Write down the pseudocode for P encode(T). What is the complexity of this algorithm? Hint: First draw a tree for yourself and try to see what the encoding of this tree is. Your algorithm should use an adjacency list to store T as a graph. Use a priority queue to store the leaves of T.
2. Assume that seq is a sequence output by the P encode algorithm. Design the algorithm P decode(seq) that takes as input this sequence and outputs the tree that it was generated from. (Thus seq == P encode( P decode(seq)).) Write the pseudocode for P decode. Your algorithm should run in O(n) time.
Solution
Consider the tree whose nodes are encoded by using binary string on the basis of frequency,
1. The leaf node for each unique character must be created and build a min heap for all unique characters. The min heap is the priority queue and frequency is used to compare the nodes.
2. The two nodes from the min heap are extracted with minimum frequency.
3. Build a internal node by summing up the two nodes extracted.
4. Repeat the steps 2 and 3 till a single node is remained and place the node as root node of tree.
The run time of the algorithm is O (n log n)
Pseudocode to decode the tree :
1. Consider the string ‘s’ in encoded form, process the string from left to right
2. Move the right child of the root node , we get a leaf node with a value ‘x’ , add ‘x’ to decoded string .
3. Move the left child of the root node, we get a leaf node with a value ‘y’ , add ‘y’ to decoded string .
4. Repeat the process till all the nodes are decoded.
The runtime of algorithm is equal to the number of decoded nodes and all the nodes are decoded with same speed. Hence the runtime is O(n) .
For the above the encoded string “ 1001011” gives characters “XYAZA”
