Skip to the content.

21 Feb 2020 - Dr. Nithin Nagaraj

Lecture No. 4 & 5: Conditional Entropy, Mutual Information, Shannon-Fano and Huffman Coding

[Deadline: February 21, 2020]

Instructions: Do not assume anything over and beyond what was defined in the class. In case you use any identity/theorem outside what was done in the class, you will have to show the proof of the same. But, better to avoid this as much as possible. Show your calculations in entirety. Justify all your answers with appropriate reasoning and arguments.

  1. Prove from first principles $H(X|Y) \leq H(X)$.

  2. Verify the Grouping Axiom of Shannon Entropy. To verify, use the definition of entropy: $H(X) = -\sum_{i=1}^{n}p_i \log_2(p_i)$ bits where $X$ is a discrete random variable with $n$ values with respective probabilities $p_1$ to $p_n$.

  3. Starting from the definition of $I(X,Y)$, show that if $X$ and $Y$ are statistically independent then $I(X,Y)=0$ bits.

  4. In how many different ways can you write $H(X,Y)$? (You can use $H(X|Y), H(Y|X), I(X,Y), H(X), H(Y),$ $p(X), p(Y), p(X,Y)$ etc. in your expressions)

  5. Give an example of a uniquely decodable code which is not prefix-free and which has a decoding delay of 4 symbols.

  6. Design a prefix-free code with code-word lengths ${ 1, 2, 3, 3, 3 }$.

  7. Consider a source with alphabet: ${a, b, c }$. For what probabilities ${p_a, p_b, p_c }$ will the Huffman code have an average codeword length equal to the Entropy of this source? Extrapolate your answer to the case of an alphabet with $n$ symbols.

  8. Huffman makes the claim that the least two frequently occurring symbols should always have the same codeword length. Can you justify his claim? Is this only true for the binary Huffman codes? (Hint: You can refer to Huffman’s original 1952 paper1.)

  9. Try and analyze why Shannon-Fano coding is worse than Huffman coding. State your reasons. Hint: The words ‘optimum’ and ‘minimum-redundancy’ are used in Huffman’s 1952 paper1.

  10. Design a Ternary Huffman code for a source with symbols probabilities given by ${0.22, 0.20, 0.18, 0.15, 0.10, 0.08, 0.05, 0.02}$. How does this compare with the binary Huffman code and Shannon-Fano code in terms of average codeword length? Hint: Refer to Huffman’s 1952 paper1 to understand how you can construct a ternary code.

    I would recommend reading the 1952 paper1 of Huffman in entirety. It not only will enable to answer the above questions, it has great insight and is a very good example of what [fundamental research] of the highest quality looks and feels like. The last I checked in Google Scholar, the paper had $7982$ citations!

  1. Huffman’s 1952 Paper  2 3 4