- Non-singular codes: The discussion in class and in the text focussed on
instantaneous codes, with extensions to uniquely decodable codes. Both these are required
in cases when the code is to be used repeatedly to encode a sequence of outcomes of a
random variable. But if we need to encode only one outcome and we know when we have
reached the end of a codeword, we do not need unique decodability - only the fact that the
code is non-singular would suffice. Eg. if a random variable X takes on 3 values a,
b and c, we could encode them by 0, 1, and 00. Such a code is non-singular but not
uniquely decodeable.
In the following, assume that we have a random variable X
which takes on m values with probabilities
and that the probabilities
are ordered so that
.
- By viewing the non-singular binary code as a ternary code with three symbols, 0,1 and
``STOP'', show that the expected length of a non-singular code
for a random variable X
satisfies the following inequality:
where
is the entropy of X in bits. Thus the average length of a
non-singular code is at least a constant fraction of the average length of an
instantaneous code.
- Let
be the expected length of the best instantaneous code and
be the expected
length of the best non-singular code for X. Argue that
.
- Give a simple example where the average length of the non-singular code is less than the
entropy.
- The set of codewords available for an non-singular code is
. Since
, show that this
is minimized if we allot the shortest codewords to the most probable symbols. Thus
,
,
etc. Show that in general
, and therefore
.
- The previous part shows that it is easy to find the optimal non-singular code for a
distribution. However, it is a little more tricky to deal with the average length of this
code. We now bound this average length. It follows from the previous part that
.
Consider the difference
Prove by the method of Lagrange multipliers that the maximum of
occurs when
,
where
and
is the sum of the harmonic series, i.e.,
(This can also be done using the non-negativity of relative entropy.)
- Complete the arguments for
Now it is well known (see, e.g. Knuth, ``Art of Computer Programming'', Vol. 1) that
(more
precisely,
where
, and
Euler's constant
).
Either using this or a simple approximation that
, which can be proved by
integration of
, it can be shown that
. Thus we have
A non-singular code cannot do much better than an instantaneous code!
- Huffman codes.
- Construct a binary Huffman code for the following distribution on 5 symbols
.
What is the average length of this code?
- Construct a probability distribution
on 5 symbols for which the code that you
constructed in part (a) has an average length (under
) equal to its entropy
.
- Huffman codes: Consider a random variable X which takes 6 values
with
probabilities (0.5,0.25,0.1, 0.05,0.05,0.05) respectively.
- Construct a binary Huffman code for this random variable. What is its average length?
- Construct a quaternary Huffman code for this random variable, i.e., a code over an
alphabet of four symbols (call them a,b,c and d). What is the
average length of this code?
- One way to construct a binary code for the random variable is to start with a quaternary
code, and convert the symbols into binary using the mapping
,
,
and
.
What is the average length of the binary code for the above random variable constructed by
this process?
- For any random variable X, let
be the average length of the binary Huffman
code for the random variable, and let
be the average length code constructed by
first building a quaternary Huffman code and converting it to binary. Show that
- The lower bound in the previous example is tight. Give an example where the code
constructed by converting an optimal quaternary code is also the optimal binary code?
- The upper bound, i.e.,
is not tight. In fact, a better bound is
.
Prove this bound, and provide an example where this bound is tight.
- Data compression. Find an optimal set of binary codeword lengths
(minimizing
) for an instantaneous code for each of the following probability mass
functions:
- Bad wine
One is given 6 bottles of wine. It is known that precisely one bottle has gone bad (tastes
terrible). From inspection of the bottles it is determined that the probability
that
the
bottle is bad is given by
. Tasting will determine the bad wine. Suppose
you taste the wines one at a time. Choose the order of tasting to minimize the expected
number of tastings required to determine the bad bottle. Remember, if the first 5 wines
pass the test you don't have to taste the last.
- What is the expected number of tastings required?
- Which bottle should be tasted first?
Now you get smart. For the first sample, you mix some of the wines in a fresh glass and
sample the mixture. You proceed, mixing and tasting, stopping when the bad bottle has been
determined.
- (c)
- What is the minimum expected number of tastings required to determine the bad wine?
- (d)
- What mixture should be tasted first?
- Huffman coding
Let X have probability mass function
- Find the binary Huffman code for X.
- Find the ternary (D=3) Huffman code for X.
- Huffman vs. Shannon
A random variable X takes on three values with
probabilities 0.6, 0.3, and 0.1.
- What are the lengths of the binary Huffman codewords for X? What are the lengths
of the binary Shannon codewords
for X?
- What is the smallest integer D such that the expected Shannon codeword length
with a D-ary alphabet equals the expected Huffman codeword length with a D-ary
alphabet?
- Arithmetic coding.
Let
be binary stationary Markov with transition matrix
- Find
- How many bits
can be known for sure if it is not known how X=01110 continues?
- Tunstall Coding: The normal setting for source coding maps a symbol (or a block
of symbols) from a finite alphabet onto a variable length string. An example of such a
code is the Huffman code, which is the optimal (minimal expected length) mapping from a
set of symbols to a prefix free set of codewords. Now consider the dual problem of
variable-to-fixed length codes, where we map a variable length sequence of source symbols
into a fixed length binary (or D-ary) representation. A variable-to-fixed length
code for an i.i.d. sequence of random variables
is defined by a prefix-free
set of phrases
, where
is the set of finite length strings of
symbols of
, and
. Given any sequence
, the string is
parsed into phrases from
(unique because of the prefix free property
of
), and represented by a sequence of symbols from a D-ary alphabet.
Define the efficiency of this coding scheme by
where
is the expected length of a phrase from
.
- Prove that
.
- The process of constructing
can be considered as a process of
constructing an m-ary tree whose leaves are the phrases in
. Assume that D=
1+k(m-1) for some integer
. Consider the following algorithm due
to Tunstall:
- Start with
with probabilities
. This corresponds to a complete m-ary
tree of depth 1.
- Expand the node with the highest probability. For example, if
is the node with
the highest probability, the new set is
.
- Repeat step 2 until the number of leaves (number of phrases) reaches the required value.
Show that the Tunstall algorithm is optimal, in the sense that it constructs a variable
to fixed code with the best
for a given D, i.e., the largest
value of
for a given D.
- Show that there exists a D such that
.
- Huffman algorithm for tree construction: Consider the following problem: m
binary signals
are available at times
, and we would like to find their sum
using 2-input gates, each gate with 1 time unit delay, so that the final result is
available as quickly as possible. A simple greedy algorithm is to combine the earliest two
results, forming the partial result at time
. We now have a new problem with
,
available at times
. We can now sort this list of T's, and
apply the same merging step again, repeating this until we have the final result.
- Argue that the above procedure is optimal, in that it constructs a circuit for which the
final result is available as quickly as possible.
- Show that this procedure finds the tree that minimizes
where
is the time at which the result alloted to the i-th leaf is
available, and
is the length of the path from the i-th leaf to the root.
- Show that
for any tree T.
- Show that there exists a tree such that
- Generating random variables: One wishes to generate a random variable X
You are given fair coin flips
. Let N be the (random) number of
flips needed to generate X. Show that
.
- Huffman coding.
- Give the binary Huffman code for p=(.54,.12,.08,.08,.08,.06,.04).
- Find the 5-ary (D=5) Huffman code for this distribution.
- Data compression. Find an optimal set of binary codeword lengths
(minimizing
) for an instantaneous code for each of the following probability mass
functions:
- Codes
Which of the following codes are
- uniquely decodable?
- instantaneous?
- Huffman
Give the Huffman code with alphabet size D=4 for the
distribution
.
- Huffman.
Find the Huffman D-ary code for
and the expected word length
- for D=2.
- for D=4.
- Entropy of encoded bits Let
be a nonsingular but nonuniquely decodable
code. Let X have entropy H(X).
- Compare H(C(X)) to H(X).
- Compare
to
.
- Code rate.
Let X be a random variable with alphabet
and distribution
The data compression code for X assigns codewords
Let
be independent identically distributed according to this distribution and
let
be the string of binary symbols resulting from concatenating the
corresponding codewords.For example, 122 becomes 01010.
- Find the entropy rate
and the entropy rate
in bits per
symbol. Note that Z is not compressible further.
- Now let the code be
and find the entropy rate
- Finally, let the code be
and find the entropy rate