Chris Hillman
hillman@math.washington.edu
This is local copy of the Original: http://www.math.washington.edu/~hillman/Entropy/infcode.html
In 1948, motivated by the problem of efficiently transmitting information
over a noisy communication channel,
Claude
Shannon introduced a revolutionary new probabilistic way of
thinking about communication and simultaneously created the first truly
mathematical theory of entropy. His ideas created a sensation and were
rapidly developed along two main lines of development: information
theory, which employs probability and ergodic theory to study
the statistical characteristics of data and communication systems, and
coding theory, which uses mainly algebraic and geometric tools to
contrive efficient codes for various situations. However, while the methods
of the two fields are different, they are spiritually so closely related
that it would be misleading to give them two seperate pages on this website.
* Thanks to
Emre
Telatar and Lucent Technologies (formerly Bell Labs, where Shannon
worked for many years), the full text of Shannon's
classic 1948 paper, A
Mathematical Theory of Communication, is now available electronically.
This paper is very readable and Parts I and II are still in many ways the
still best introduction to the modern concept of entropy.
Highest recommendation!
Here are some truly fabulous expository papers which I think give an
excellent overview of this whole area:

Typical Sequences and All
That: Entropy, Pattern Matching, and Data Compression,
the 1994
Shannon Lecture by the late Aaron D. Wyner (formerly of the Mathematics
of Communication Department, Bell Labs). Focuses on the contrast between
the traditional interpretation of the entropy of a stationary ergodic source
in terms of the number of typical sequences of a given length and the interpretation
in terms of the recurrence of blocks of symbols ("patterns") in a single
typical sequence. The second interpretation leads directly to a practical
method for three problems

estimating the entropy of an unknown information source from a finite amount
of data,

compressing a finite sequence produced by an unknown information source,

telling whether a given finite sequence could have reliably been produced
by a given source.
Advanced undergraduate to beginning graduate level.
Highly recommended!

A
Short Course in Information Theory.
by
David
J. C. MacKay (Cavendish Laboratory, Cambridge). An excellent overview
of the most fundamental ideas in classical information theory, at the advanced
undergraduate or beginning graduate level. Telegraphic in style, but quite
readable and well illustrated. His treatment of Shannon's Noiseless and
Noisy Coding Theorems is particularly interesting.
Highly recommended!

Shannon
Theory: Present and Future.
A fascinating panel discussion by

Richard Blahut (University of Illinois),

Imre Csisz'ar (Hungarian Academy of Sciences),

David Forney (Motorola),

Prakash Narayan (University of Maryland),

Mark Pinsker (Russian Academy of Sciences),

Sergio Verd'u (Princeton University),
sponsored by the Information Theory Society
of the IEEE.
If you've forgotten what a logarithm is, you might want to start instead
with the
Primer on
Information Theoryby
Thomas
D. Schneider (Laboratory of Molecular Biology, NIH), which gives a
very gentle introduction to Shannon's discrete entropy.
Here are some expository papers giving a nice overview of coding theory:
For the serious
student of coding theory, here are some longer expository works, including
some book length textbooks:

The
Algebraic Theory of Convolutional Codes,
by
Robert
J. McEliece (Electrical Engineering, Caltech). One chapter in the Handbook
of Coding Theory.

Theory
of Codes.
Lecture notes from a course taught by
Jean
Berstel and Dominque Perrin (Université MarnelaVallée and Institut
Gaspard Monge). Available chapter by chapter and as a single 4 Meg postscript
file.

Information
Theory and Coding 314.
Lecture notes from a course taught by
Roberto
Togneri (Centre for Intelligent Information Processing Systems, Dept.
of Electrical & Electronic Engineering, The University of Western Australia).
Includes brief overviews of Markov sources, communication channels, mutual
information, and using the method of Lagrange multipliers to compute the
channel capacity, among other things.
Highly recommended!

Lectures
on Source Coding,
course notes by John
C. Kieffer (Information Theory Research Group, Electrical Engineering,
University of Minnesota). Covers prefix codes, Huffman codes, codes with
memory, LempelZiv codes, arithmetic codes, run length codes, scalar quantizer
design, vector quantizer design, rate distortion theory, differential coding,
trellis codes, and transform coders.
Further Reading
Approximately 200 books on information and coding theory have been published
since Shannon's seminal paper. See the list
of textbooks in this area maintained by Werner
Heise (Mathematics, Technische Universitaet, Munich).
Some of the most recent textbooks (plus one classic) include the following:

Elements of Information Theory, by Thomas M. Cover and Joy A Thomas,
Wiley, 1991. A readable and very enjoyable survey at the undergraduate
level. Covers the equipartition theorem, Shannon's coding theorems, maximal
entropy, discrimination and the theory of types (see
Entropy
in Statistical Inference and Prediction), and much more.
Highly
recommended as a first textbook!

Communication theory, by Charles M. Goldie and Richard G.E Pinch,
Cambridge University Press, 1991. London Mathematical Society Student Texts
Vol. 20. A good undergraduate level survey, more concise but less comprehensive
than Thomas & Cover.

Principles and practice of information theory, by Richard E Blahut,
AddisonWesley, 1987. Undergraduate level. Covers the connection between
Shannon's entropy and decision theory.

Entropy optimization principles with applications, by J.N Kapur
and H.K. Kesavan, Academic Press, 1992. A very readable introduction to
the Principle of Maximal Entropy and its many uses, with a decidely practical
orientation.

Maximum entropy solutions to scientific problems, by Robert M. Bevensee,
Prentice Hall, 1993. Another recent textbook.

Information theory with applications, by Silviu Guiasu, McGrawHill,
1977. Still the best source I know for applications of entropy to classification
theory. Many of the later chapters can be read independently of the earlier
chapters (which are seriously outdated and written in a unattractive notation).

The Data Compression
Book,
by Mark Nelson and Jeanloup Gailly, M & T Books, 1995.
Covers ShannonFano and Huffman coding, Adaptive Huffman coding, Arithmetic
coding, the JPEG compression algorithm, etc. Includes data compression
code.

Handbook of Coding Theory, edited by R. Brualdi, W. C. Huffman,
and V. Pless. Amsterdam: Elsevier Science Publishers, 1996.

Coding and information theory, by Steven Roman, SpringerVerlag,
1992. State of the art coding theorems. A graduate level text written from
the standpoint of probability theory.

Entropy and information theory, by Robert M. Gray, SpringerVerlag,
1990. Another uptodate graduate level textbook.
Ioannis Kontoyiannis
(Statistics, Electrical and Computer Engineering, Purdue) has compiled
another
bibliography of suggested reading in this field.