Extra Problems for Chapter 8

T.M. Cover and J.A. Thomas

  1. Unused symbols. Show that the capacity of the channel with probability transition matrix

    equation791

    is achieved by a distribution that places zero probability on one of input symbols. What is the capacity of this channel? Give an intuitive reason why that letter is not used.

  2. Erasures and errors in a binary channel. Consider a channel with binary inputs that has both erasures and errors. Let the probability of error be tex2html_wrap_inline906 and the probability of erasure be tex2html_wrap_inline908 , so the the channel is as illustrated below:

    picture223

    1. Find the capacity of this channel.
    2. Specialize to the case of the binary symmetric channel ( tex2html_wrap_inline922 ).
    3. Specialize to the case of the binary erasure channel ( tex2html_wrap_inline924 ).
  3. Channels with dependence between the letters. Consider the following channel over a binary alphabet that takes in two bit symbols and produces a two bit output, as determined by the following mapping: tex2html_wrap_inline926 , tex2html_wrap_inline928 , tex2html_wrap_inline930 , and tex2html_wrap_inline932 . Thus if the two bit sequence 01 is input to the channel, the output is 10 with probability 1. Let tex2html_wrap_inline938 denote the two input symbols and tex2html_wrap_inline940 denote the corresponding output symbols.
    1. Calculate the mutual information tex2html_wrap_inline942 as a function of the input distribution on the four possible pairs of inputs.
    2. Show that the capacity of a pair of transmissions on this channel is 2 bits.
    3. Show that under the maximizing input distribution, tex2html_wrap_inline944 .

      Thus the distribution on the input sequences that achieves capacity does not necessarily maximize the mutual information between individual symbols and their corresponding outputs.

  4. Jointly typical sequences. As we did in problem gif of Chapter gif for the typical set for a single random variable, we will calculate the jointly typical set for a pair of random variables connected by a binary symmetric channel, and the probability of error for jointly typical decoding for such a channel.

    picture252

    We will consider a binary symmetric channel with crossover probability 0.1. As argued in class, the input distribution that achieves capacity is the uniform distribution. We will therefore take p(x) = (1/2, 1/2) and therefore the joint distribution p(x,y) for this channel is given by

    tabular267

    The marginal distribution of Y is also (1/2,1/2).

    1. Calculate H(X), H(Y), H(X,Y) and I(X;Y) for the joint distribution above.
    2. Let tex2html_wrap_inline964 be drawn i.i.d. according the Bernoulli(1/2) distribution. Of the tex2html_wrap_inline966 possible sequences of length n, which of them are typical, i.e., member of tex2html_wrap_inline970 for tex2html_wrap_inline972 ? Which are the typical sequences in tex2html_wrap_inline974 ?
    3. The jointly typical set tex2html_wrap_inline976 is defined as the set of sequences that satisfy equations (8.36-8.38) of EIT. The first two equations correspond to the conditions that tex2html_wrap_inline978 and tex2html_wrap_inline980 are in tex2html_wrap_inline970 and tex2html_wrap_inline974 respectively. Consider the last condition, which can be rewritten to state that tex2html_wrap_inline986 . Let k be the number of places in which the sequence tex2html_wrap_inline978 differs from tex2html_wrap_inline980 (k is a function of the two sequences). Then we can write

      eqnarray809

      An alternative way at looking at this probability is to look at the binary symmetric channel as in additive channel tex2html_wrap_inline996 , where Z is a binary random variable that is equal to 1 with probability p, and is independent of X. In this case,

      eqnarray811

      Show that the condition that tex2html_wrap_inline1004 being jointly typical is equivalent to the condition that tex2html_wrap_inline978 is typical and tex2html_wrap_inline1008 is typical.

    4. We now calculate the size of tex2html_wrap_inline1010 for n=25 and tex2html_wrap_inline972 . As in problem gif of Chapter gif, here is a table of the probabilities and numbers of sequences of with k ones

      tabular280

      (Sequences with more than 12 ones are omitted since their total probability is negligible (and they are not in the typical set).)

      What is the size of the set tex2html_wrap_inline1010 ?

    5. Now consider random coding for the channel, as in the proof of the channel coding theorem. Assume that tex2html_wrap_inline1028 codewords tex2html_wrap_inline1030 are chosen uniformly over the tex2html_wrap_inline966 possible binary sequences of length n. One of these codewords is chosen and sent over the channel. The receiver looks at the received sequence and tries to find a codeword in the code that is jointly typical with the received sequence. As argued above, this corresponds to finding a codeword tex2html_wrap_inline1036 such that tex2html_wrap_inline1038 . For a fixed codeword tex2html_wrap_inline1040 , what is the probability that the received sequence tex2html_wrap_inline1042 is such that tex2html_wrap_inline1044 is jointly typical?
    6. Now consider a particular received sequence tex2html_wrap_inline1046 , say. Assume that we choose a sequence tex2html_wrap_inline1048 at random, uniformly distributed among all the tex2html_wrap_inline966 possible binary n-sequences. What is the probability that the chosen sequence is jointly typical with this tex2html_wrap_inline980 ? (Hint: this is the probability of all sequences tex2html_wrap_inline978 such that tex2html_wrap_inline1058 .)
    7. Now consider a code with tex2html_wrap_inline1060 codewords of length 12 chosen at random, uniformly distributed among all the tex2html_wrap_inline966 sequences of length n=25. One of these codewords, say the one corresponding to i=1, is chosen and sent over the channel. As calculated in part (e), the received sequence, with high probability, is jointly typical with the codeword that was sent. What is probability that one or more of the other codewords (which were chosen at random, independently of the sent codeword) is jointly typical with the received sequence? (Hint: You could use the union bound but you could also calculate this probability exactly, using the result of part (f) and the independence of the codewords)
    8. Given that a particular codeword was sent, the probability of error (averaged over the probability distribution of the channel and over the random choice of other codewords) can be written as

      eqnarray825

      There are two kinds of error: the first occurs if the received sequence tex2html_wrap_inline980 is not jointly typical with the transmitted codeword, and the second occurs if there is another codeword jointly typical with the received sequence. Using the result of the previous parts, calculate this probability of error.

      By the symmetry of the random coding argument, this does not depend on which codeword was sent.

    The calculations above show that average probability of error for a random code with 512 codewords of length 25 over the binary symmetric channel of crossover probability 0.1 is about 0.34. This seems quite high, but the reason for this is that the value of tex2html_wrap_inline906 that we have chosen is too large. By choosing a smaller tex2html_wrap_inline906 , and a larger n in the definitions of tex2html_wrap_inline1076 , we can get the probability of error to be as small as we want, as long as the rate of the code is less than tex2html_wrap_inline1078 .

    Also note that the decoding procedure described in the problem is not optimal. The optimal decoding procedure is maximum likelihood, i.e., to choose the codeword that is closest to the received sequence. It is possible to calculate the average probability of error for a random code for which the decoding is based on an approximation to maximum likelihood decoding, where we decode a received sequence to the unique codeword that differs from the received sequence in tex2html_wrap_inline1080 bits, and declare an error otherwise. The only difference with the jointly typical decoding described above is that in the case when the codeword is equal to the received sequence! The average probability of error for this decoding scheme can be shown to be about 0.285.

  5. Sum Channel.
    1. Consider two discrete memoryless channels tex2html_wrap_inline1082 and tex2html_wrap_inline1084 with capacities tex2html_wrap_inline1086 and tex2html_wrap_inline1088 respectively. A new channel is defined with input alphabet tex2html_wrap_inline1090 , output alphabet tex2html_wrap_inline1092 , and probability transitiion function tex2html_wrap_inline1094 if tex2html_wrap_inline1096 and tex2html_wrap_inline1098 if tex2html_wrap_inline1100 . Thus the channel has both channels available for use, but only one channel can be used at a time. Show that the capacity of this sum channel is

      equation836

    2. Use the above result to calculate the capacity of the following channel

      picture322

  6. Geometric interpretation of channel capacity. Consider a channel tex2html_wrap_inline1112 , and assume that the channel input alphabet and output alphabet are of size m, e.g., let tex2html_wrap_inline1116 . We can write the probability transition function of the channel as a matrix:

    equation842

    where tex2html_wrap_inline1118 is the i-th row of the probability transistion matrix, i.e, tex2html_wrap_inline1122 . Consider an input distribution tex2html_wrap_inline1124 for the channel, and let tex2html_wrap_inline1126 be the corresponding output distribution, i.e., tex2html_wrap_inline1128 .

    1. Show that for this channel

      equation847

    2. Let tex2html_wrap_inline1130 be another distribution on tex2html_wrap_inline1132 . Show that

      equation852

      Use this to argue

      equation856

      and that for any distribution tex2html_wrap_inline1130 ,

      equation864

    3. Combining the first two parts, we have

      equation867

      Now justify the following argument:

      eqnarray872

      (This is a very general result that max min = min max for functions that are convex in one dimension and concave in the other and constitutes the fundamental theorem of game theory.)

      Combining the previous two equations, we have

        equation879

      A simple geometric interpretaion of this result is as follows: C is the minimum ``radius'' of an information ball that contains all the rows tex2html_wrap_inline1118 of the channel transition matrix, where all distances are measured with relative entropy. The center of this ball is located at tex2html_wrap_inline1130 that is induced by the capacity acheiving input distribution.

  7.   Encoder and decoder as part of the channel: Consider a Binary Symmetric Channel with crossover probability 0.1. A possible coding scheme for this channel with two codewords of length 3 is to encode message tex2html_wrap_inline1144 as 000 and tex2html_wrap_inline1146 as 111. With this coding scheme, we can consider the combination of encoder, channel and decoder as forming a new BSC, with two inputs tex2html_wrap_inline1144 and tex2html_wrap_inline1146 and two outputs tex2html_wrap_inline1144 and tex2html_wrap_inline1146 .
    1. Calculate the crossover probabilty of this channel.
    2. What is the capacity of this channel in bits per transmission of the original channel?
    3. What is the capacity of the original BSC with crossover probability 0.1?
    4. Prove a general result that for any channel, considering the encoder, channel and decoder together as a new channel from messages to estimated messages will not increase the capacity in bits per transmission of the original channel.
  8. Codes of length 3 for a BSC and BEC: In Problem gif, you calculated the probability of error for a code with two codewords of length 3 (000 and 111) sent over a binary symmetric channel with crossover probability tex2html_wrap_inline906 . For this problem, take tex2html_wrap_inline1158 .
    1. Find the best code of length 3 with four codewords for this channel. What is the probability of error for this code? (Note that all possible received sequences should be mapped onto possible codewords)
    2. What is the probability of error if we used all the 8 possible sequences of length 3 as codewords?
    3. Now consider a binary erasure channel with erasure probability 0.1. Again, if we used the two codeword code 000 and 111, then received sequences 00E,0E0,E00,0EE,E0E,EE0 would all be decoded as 0, and similarly we would decode 11E,1E1,E11,1EE,E1E,EE1 as 1. If we received the sequence EEE we would not know if it was a 000 or a 111 that was sent - so we choose one of these two at random, and are wrong half the time.

      What is the probability of error for this code over the erasure channel?

    4. What is the probability of error for the codes of parts (a) and (b) when used over the binary erasure channel?
  9. Channel Capacity: Calculate the capacity of the following channels with probability transition matrices:
    1. tex2html_wrap_inline1160

      equation888

    2. tex2html_wrap_inline1160

      equation892

    3. tex2html_wrap_inline1164

      equation896

  10. Capacity of the carrier pigeon channel. Consider a commander of an army besieged a fort for whom the only means of communication to his allies is a set of carrier pigeons. Assume that each carrier pigeon can carry one letter (8 bits), and assume that pigeons are released once every 5 minutes, and that they take exactly 3 minutes to reach their destination.
    1. Assuming all the pigeons reach safely, what is the capacity of this link in bits/hour?
    2. Now assume that the enemies try to shoot down the pigeons, and that they manage to hit a fraction tex2html_wrap_inline908 of them. Since the pigeons are sent at a constant rate, the receiver knows when the pigeons are missing. What is the capacity of this link?
    3. Now assume that the enemy is even more cunning, and every time they shoot down a pigeon, they send out a dummy pigeon carrying a random letter (chosen uniformly from all 8-bit letters). What is the capacity of this link in bits/hour?

    Set up an appropriate model for the channel in each of the above cases, and indicate how to go about finding the capacity.

  11. A channel with two independent looks at Y. Let tex2html_wrap_inline1168 and tex2html_wrap_inline1170 be conditionally independent and conditionally identically distributed given X.
    1. Show tex2html_wrap_inline1174
    2. Conclude that the capacity of the channel

      picture392

      is less than twice the capacity of the channel

      picture399

  12. Tall, fat people

    Suppose that average height of people in a room is 5 feet. Suppose the average weight is 100 lbs.

    1. Argue that no more than tex2html_wrap_inline1184 of the population is 15 feet tall.
    2. Find an upper bound on the fraction of 300 lb, 10 footers in the room.
  13. Can signal alternatives lower capacity? Show that adding a row to a channel transition matrix does not decrease capacity.
  14. Binary multiplier channel
    1. Consider the channel Y=XZ where X and Z are independent binary random variables that take on values 0 and 1. Z is Bernoulli( tex2html_wrap_inline908 ), i.e. tex2html_wrap_inline1196 . Find the capacity of this channel and the maximizing distribution on X.
    2. Now suppose the receiver can observe Z as well as Y. What is the capacity?
  15. Noise alphabets

    Consider the channel

    picture418

    tex2html_wrap_inline1212 , where Y=X+Z, and Z is uniformly distributed over three distinct integer values tex2html_wrap_inline1218

    1. What is the maximum capacity over all choices of the tex2html_wrap_inline1220 alphabet? Give distinct integer values tex2html_wrap_inline1222 and a distribution on tex2html_wrap_inline1224 achieving this.
    2. What is the minimum capacity over all choices for the tex2html_wrap_inline1220 alphabet? Give distinct integer values tex2html_wrap_inline1222 and a distribution on tex2html_wrap_inline1224 achieving this.
  16. Narrow channel

    Suppose a signal tex2html_wrap_inline1232 goes through an intervening transition tex2html_wrap_inline1234 :

    where tex2html_wrap_inline1236 , tex2html_wrap_inline1238 , and tex2html_wrap_inline1240 . Here p(v|x) and p(y|v) are arbitrary and the channel has transition probability tex2html_wrap_inline1246 .

    Show tex2html_wrap_inline1248 .

  17. Noisy typewriter. Consider the channel with tex2html_wrap_inline1250 and transition probabilities p(y|x) given by the following matrix:

    displaymath440

    1. Find the capacity of this channel.
    2. Define the random variable z=g(y) where

      displaymath462

      For the following two distributions for x, compute I(X;Z)

      1. displaymath470
      2. displaymath479
    3. Find the capacity of the channel between x and z, specifically where tex2html_wrap_inline1264 , tex2html_wrap_inline1266 , and the transition probabilities p(z|x) are given by

      displaymath489

    4. For the X distribution of part i. of b, does tex2html_wrap_inline1274 form a Markov chain?
  18. Capacity

    What are the capacities of the following channels?

    1. displaymath900
    2. displaymath901
  19. Erasure channel

    Let tex2html_wrap_inline1112 be a discrete memoryless channel with capacity C. Suppose this channel is immediately cascaded with an erasure channel tex2html_wrap_inline1280 that erases tex2html_wrap_inline908 of its symbols.

    Specifically, tex2html_wrap_inline1284 and

    displaymath902

    Determine the capacity of this channel.

  20. Choice of channels.

    Find the capacity C of the union of 2 channels tex2html_wrap_inline1288 and tex2html_wrap_inline1290 where, at each time, one can send a symbol over channel 1 or over channel 2 but not both. Assume the output alphabets are distinct and do not intersect.

    1. Show tex2html_wrap_inline1292 Thus tex2html_wrap_inline1294 is the effective alphabet size of a channel with capacity C.
    2. Compare with problem gif of Chapter gif, where tex2html_wrap_inline1298 , and interpret (a) in terms of the effective number of noise-free symbols.
  21. Binary multiplier channel.
    1. Consider the discrete memoryless channel Y=XZ where X and Z are independent binary random variables that take on values 0 and 1. Let tex2html_wrap_inline1196 . Find the capacity of this channel and the maximizing distribution on X.
    2. Now suppose the receiver can observe Z as well as Y. What is the capacity?
  22. Noise alphabets.

    Consider the channel

    picture418

    tex2html_wrap_inline1212 , where Y=X+Z, and Z is uniformly distributed over three distinct integer values tex2html_wrap_inline1218

    1. What is the maximum capacity over all choices of the tex2html_wrap_inline1220 alphabet? Give distinct integer values tex2html_wrap_inline1222 and a distribution on tex2html_wrap_inline1224 achieving this.
    2. What is the minimum capacity over all choices for the tex2html_wrap_inline1220 alphabet? Give distinct integer values tex2html_wrap_inline1222 and a distribution on tex2html_wrap_inline1224 achieving this.
  23. Source and channel.
    We wish to encode a Bernoulli( tex2html_wrap_inline908 ) process tex2html_wrap_inline1344 for transmission over a binary symmetric channel with error probability p.

    picture554

    Find conditions on tex2html_wrap_inline908 and p so that the probability of error tex2html_wrap_inline1368 can be made to go to zero as tex2html_wrap_inline1370 .



Latex file

Postscript file


Joy Thomas
Sat Aug 15 07:06:46 EDT 1998