In this chapter I'll tell you where my theory goes from there. LISP is only the first step. To make further progress you need to construct a programming language to use to measure the size of programs. I won't give any proofs, but I'll outline the basic ideas. I'll give a survey of what you get, of the subject that I call algorithmic information theory (AIT), which is concerned with program-size complexity, algorithmic information content, and algorithmic incompressibility or randomness. We'll get to my most devastating incompleteness theorems, theorems involving the random number Ω, the halting probability.
The bottom line is that I can show that in some areas of mathematics, mathematical truth is completely random, unstructured, patternless and incomprehensible. In fact, by using the work of Y. Matijasevic and J. Jones on Hilbert's 10th problem, I can even show that this occurs in elementary number theory, in Peano arithmetic. I exhibit an algebraic equation involving only whole numbers (a so-called diophantine equation) in which the number of solutions jumps from finite to infinite completely at random as you vary a parameter in the equation. In fact, this gives us the bits of the random number Ω. So we will never be able to know whether or not my equation has a finite number of solutions in each particular instance. More precisely, these are irreducible mathematical facts. They can only be deduced by adding them as axioms. Settling N cases requires N bits of axioms.
So not only was Hilbert's faith in the axiomatic method wrong, in some cases it was completely wrong. Because to say that some mathematical truths are irreducible means that they cannot be compressed into axioms at all, they cannot be deduced from any principles simpler than they are.
Let me start with a brief outline of the history of my field. Some people have already published their versions of this history. Here I'd like to tell you how it looked from my vantage point. I'll tell how I saw it, how I experienced it.
It was very unfortunate that publication of the second half was delayed by the editor for three years. It was also unfortunate that the referee, Donald Loveland, immediately sent the entire uncut original manuscript to Kolmogorov in Moscow.
An earlier piece of work, involving the time and program-size
complexity of infinite sets, was my third paper in the ACM
Journal (1969). Then I turned to the size of programs for
computing finite binary sequences, i.e., bit strings, and to the
randomness or incompressibility of individual bit strings, which led
to my first two papers in the ACM Journal. So these papers
were not published in chronological order.
[An even earlier piece of work led to my first publication,
which was not in the ACM Journal. When I was in high school I
programmed on the computer all the algorithms in E.F. Moore's paper
``Gedanken-experiments on sequential machines.'' ``Sequential
machines'' were finite automata, and Moore's paper was in the very
first book on theoretical computer science, C.E. Shannon and J.
McCarthy's Automata Studies (Princeton University Press, 1956).
This led to my first publication, written while I was in high school:
``An improvement on a theorem of E.F. Moore,'' IEEE Transactions on
Electronic Computers EC-14 (1965), pp. 466-467.
Moore's paper dealt with a toy model of the problem of scientific induction, namely
the problem of identifying an
automaton by giving it inputs and looking at the outputs—hence
the title gedanken or thought experiments.
And this involves
a finite automata version of Occam's razor, because it's
desirable to find the simplest finite automaton—the finite automaton
with the smallest number of
states—that explains a series of experiments on a black box. As I
said when describing my APL2 physics course, wherever I look, I see
program-size complexity!
And as my gedanken-experiment project, my
APL2 gallery, my Springer book, and this book all illustrate, in
my opinion the best way to understand something is to program it out
and see if it works on the computer.]
Simultaneously there were two other independent inventors of AIT, R.J.
Solomonoff in Cambridge, Massachusetts, and A.N. Kolmogorov in Moscow.
Solomonoff was not a mathematician. He was interested in artificial
intelligence and in the problem of scientific induction, theory
building and prediction. His first paper, in two parts in
Information & Control, is full of interesting ideas.
Unfortunately his math isn't very good and he doesn't really succeed
in doing too much with these ideas. In particular, he does state that
program-size complexity quantifies Occam's Razor by providing a
numerical measure of the degree of simplicity of a scientific theory.
Occam's Razor states that the simplest theory is best, that ``entities
should not be multiplied unnecessarily''. But it does not occur to
Solomonoff to propose a definition of randomness using program-size
complexity.
Kolmogorov and I independently come up with program-size complexity
and also propose (slightly different) definitions of randomness.
Roughly speaking, a random string is incompressible, there is no
simple theory for it, its program-size complexity is as large as
possible for bit strings having that length. Unlike Solomonoff,
Kolmogorov and I are mathematicians. Kolmogorov is at the end
of a distinguished career; I'm at the beginning of mine. I'm also a
computer programmer, which I think is a big help!... As far as I know,
Kolmogorov only publishes 3 or 4 pages on program-size complexity, in
two separate short papers, at least that's all I ever saw. I publish
many, many books and papers on AIT. AIT is my life!
In the initial formulations by Kolmogorov and myself of complexity
using binary programs, most N-bit strings, the random ones,
need N-bit programs, or close to it. Kolmogorov never realizes
that this theory is fatally flawed, and he never realizes that its
most fundamental application is not in redoing probability theory, but
in the new light that it sheds on the incompleteness phenomenon
discovered by Gödel.
But a young Swede visiting Kolmogorov in Moscow, P. Martin-Löf,
realizes that something is wrong, because Kolmogorov's proposal for
defining infinite random strings turns out to be vacuous.
Kolmogorov had required infinite random strings to have all prefixes
be incompressible, but this fails because Martin-Löf notes that
long runs of 0's and 1's produce logarithmic complexity dips. (I also
noticed this problem, and proposed a different complexity-based
definition of infinite random string, a more permissive one. This
leads to the opposite problem, namely that it accepts some
non-random strings.) So Martin-Löf abandons program-size
complexity and proposes a constructive measure-theoretic
definition of random infinite string.
[A real number is Martin-Löf random iff it is
not contained in any constructively covered set of measure zero.]
What do I do? I don't abandon program-size complexity, I stick with
it. I change the definition to use self-delimiting binary programs.
Then most N-bit strings require N +
log2N bit programs. It's now okay to demand that
the complexity of each N-bit prefix of an infinite random
string should never drop below N. The log2N
complexity dips now go from N + log2N to
N instead of from N to N −
log2N. (I'll explain this better later.) And my
complexity-based definition of randomness now works for both
finite and infinite strings. It turns out to be equivalent to
Martin-Löf's for infinite strings.
I'm invited to speak on this, the second major version of AIT,
at the opening plenary session of the 1974 IEEE International
Symposium on Information Theory in Notre Dame, Indiana,
with several well-known Soviet information-theorists in attendance.
I publish this new version of AIT in my 1975 ACM Journal paper
``A theory of program size formally identical to information theory,''
and later, in more complete form, in my 1987 Cambridge University
Press monograph Algorithmic Information Theory.
Meanwhile another Russian, L.A. Levin, also realizes that
self-delimiting programs are necessary, but he doesn't get it all
right, he doesn't do as good a job. For example, he doesn't
realize, as I did, that the definition of relative complexity
also has to be changed. (I'll explain this later, but the
basic idea is that you're not given something for free directly,
you're given a minimal-size program for it instead.)
And to my knowledge, no one else realizes that AIT can be reformulated
as a theory of the size of real programs in a usable programming
language, one based on LISP. But that's not too surprising, because I
had to invent the programming language and write all the
software for running it. That's the third major reformulation
of AIT, and this time I'm the only one who does it. It's presented in
my 1998 Springer-Verlag book The Limits of Mathematics.
Anyway, in my opinion AIT really begins with my 1975 ACM
Journal paper ``A theory of program size formally identical to
information theory;'' the rest was the pre-history of the
field!
On the side, just for the fun of it, I also developed three different
theories of LISP program-size complexity. These are in the second
World Scientific collection of my papers, the ``autobiography''
published in 1992. I did this work because (a) I like LISP and (b)
it's nice to look at the size of real programs and (c) because
these theories work much like one of my original theories that
measured Turing machine programs in states. My LISP program-size work
resurrected one of my first efforts, dealing with what I called
``bounded-transfer'' Turing machines. This is a somewhat peculiar
machine model, but I was fond of these youthful ideas, and hated to
see them completely disappear in the dust bin of history. I felt that
my work on LISP confirmed the validity of some of my youthful
intuitions about the right way to develop a theory of the size of real
programs, it was just that I hadn't applied these ideas to the right
programming language!
I also retain an interest in applying program-size complexity measures
to computing infinite sets. This part of the theory is much less
developed than the program-size complexity of computing individual
finite objects. I have only one paper on this subject,
``Algorithmic entropy of sets.''
It's in my first World Scientific
volume, the one published in 1987, and, in a second edition, in 1990.
However I do use this to define the complexity of a formal
axiomatic system as the size of the smallest program for generating
all of its theorems. I think that this is a better definition than
the one I used in Chapter V that the complexity of a formal axiomatic
system is given by the size of the smallest program for its
proof-checking algorithm. But of course they are closely related.
Many interesting open questions remain in the part of the theory
dealing with infinite computations instead of finite ones.
What I've presented above is a history of AIT proper, not of its
application to metamathematics and epistemology. In my first ACM
Journal paper I prove that program-size complexity is
uncomputable, which I was not the only person to notice. But I am the
only person to realize that AIT sheds dramatic new light on the
incompleteness phenomenon discovered by Gödel, which is not at
all equivalent to the remark that program-size is uncomputable, a weak
and abstract observation. Why is this? It's because one can begin to
discuss the information content of axioms and also because in a sense
a formal axiomatic system amounts to a computation in the limit of
infinite time, so that saying that something cannot be proven (with
proofs of any size) is stronger than saying that it cannot be
computed. (Technically, what I'm saying amounts to the observation
that a formal axiomatic system is ``recursively enumerable,''
not ``recursive.'') [I believe that the current terminology is ``computably
enumerable'' and ``computable.'' At any rate, the meaning is this.
The set of theorems of a formal axiomatic system has the property
that there's an algorithm for generating
its elements (in some arbitrary order). But in general
there's no algorithm to decide if something is in the
set of theorems or not. (That's the entscheidungsproblem, the decision problem,
in the title of Turing's 1936 paper, where he proved that these two ways of
defining a set are different.)]
I realize this at age 22 during a visit to a university in Rio de
Janeiro in 1970. It's just before Carnival in Rio and I recall
learning there the sad news that Bertrand Russell, one of my heroes,
had died. I show that a formal axiomatic system cannot establish any
lower bounds on the program-size complexity of individual objects, not if
the lower bound is substantially larger than the complexity of the
axioms themselves. This marks the beginning of my
information-theoretic approach to incompleteness.
When Jacob Schwartz of the Courant Institute visits Buenos Aires soon
after, he is astonished to hear my ideas and encourages me to develop
them.
[I lived from 1966 to 1975 in Buenos Aires—where I joined IBM in
1967—and the rest of the time in New York.]
I later discover that he had been in Moscow discussing AIT. (I
realize this when I see an acknowledgement of his participation in a
survey paper on AIT by A.K. Zvonkin and Levin in Russian
Mathematical Surveys.) Schwartz's astonishment shows clearly that
the essential point had not been grasped by the Moscow school of AIT.
I publish this idea in Rio in a research report in 1970, in an
abstract in the AMS Notices in 1970, in a short paper in the
ACM SIGACT News in 1971, in an invited paper in the IEEE
Information Theory Transactions in 1974, in a long paper in the
ACM Journal in 1974—my fourth ACM Journal
paper—and in an article in Scientific American in 1975.
I send the galley proofs of my invited paper in the IEEE Information
Theory Transactions to Gödel in early 1974 after
a phone conversation with him requesting an interview. He reads my
paper and in a second conversation grants me an appointment, but this
never comes to pass due to bad weather and the fact that my visit to
the U.S.A. is coming to an end.
My second major period of metamathematical activity is due to
an invitation in 1986 from Cambridge University Press to write the
first book in their series on theoretical computer science. In their
letter of invitation they explain that I was picked to be first in
order to make the point that computer science has deep intellectual
significance and is not just software engineering.
It is then that I realize that I can dress up my random Ω
number, the halting probability, as a diophantine equation, and that
there is therefore randomness in arithmetic, in elementary number
theory. And I am also able to show that an N-bit formal
axiomatic system can determine at most N bits of Ω,
even if the bits are scattered about instead of all at the beginning.
My book about this, Algorithmic Information Theory, is
published by Cambridge University Press in 1987 and causes a
commotion. In 1988 Ian Stewart praises it in a news item in
Nature entitled ``The ultimate in undecidability.'' Later in
1988 I'm surprised to find an article with my photograph entitled
``Une extension spectaculaire du théorème de Gödel:
l'équation de Chaitin'' (A spectacular extension of
Gödel's theorem: Chaitin's equation) by Jean-Paul Delahaye in
La Recherche. I'm asked to write about this in Scientific
American, in La Recherche, and
in New Scientist.
Two of the high points of my career follow. In 1991 John Casti and
Hans-Christian Reichel invite me to talk about my work in Gödel's
old classroom in Vienna. John announces my visit in a full-page
article with my photo entitled ``Gödeliger als Gödel''
(Out-Gödeling Gödel) in the Vienna newspaper Der
Standard. And in 1992 I visit Cambridge University, where Russell
and Turing worked. The occasion is a high-level meeting on
reductionism, and my talk is recorded as my paper on ``Randomness in
arithmetic and the decline and fall of reductionism in pure
mathematics'' in the book J. Cornwell, Nature's Imagination,
Oxford University Press, 1995. This paper, perhaps my most popular,
is reprinted several times.
My third major period of metamathematical activity is started by an
invitation from George Markowsky to give a course at the University of
Maine in Orono in 1994. I realize how to program my theory on the
computer, and I include in the course a much simpler proof of my
result about determining bits of Ω.
[I'm talking about my proof that an N-bit formal axiomatic
system can determine at most N bits of Ω, even if
the bits are scattered about instead of all at the beginning. In my
course I use the simple Berry-paradox program-size proof in my
``Information-theoretic incompleteness'' paper in Applied
Mathematics & Computation (1992), instead of the original
complicated measure-theoretic proof in my Cambridge University Press
monograph.]
I refine the course greatly as the result of a second invitation.
This time I'm invited by Veikko Keränen to give a course in
Rovaniemi, Finland, in May 1996. It's an amazing experience in every
possible way. It never gets dark, and Veikko and I drive to Norway's
North Cape, the top of Europe. The final result is my 1998 book
The Limits of Mathematics, actually published at the end of
1997, which sees the light only because of the enthusiastic support of
my good friend Cris Calude.
Again the results exceed all expectations. The Limits of
Mathematics is announced by Springer-Verlag, the world's leading
math publisher, with these words: ``Capture a piece of mathematics
history-in-the-making with Gregory Chaitin's New Book The Limits of
Mathematics.'' The ``energetic lectures'' and ``exuberant style''
in this book noted by the Library of Science book club, reflect
both my astonishment at being able to present my strongest
metamathematical results so simply, and also the encouragement that I
received from George and Veikko and the exhilarating experience of
giving my course twice to interested and able audiences in beautiful
environments.
Two delightful consequences are that in 1998 an interview with
me is the lead article on the cover of the Sunday magazine of a Buenos
Aires newspaper Página/12, and I'm also
interviewed in
the Sunday magazine of the Lisbon, Portugal newspaper Expresso.
These magazine interviews include photographs of me, my home, and my
Springer book, an amazing experience for a mathematician whose main
interest is epistemology!
It's been a wonderful life. I never imagined as a child that things
could go this way, or that it could pass so quickly...
My biggest disappointment is that I'm unable to use program-size
complexity to make mathematics out of Darwin, to prove that life must
evolve, because it's very hard to make my kind of complexity increase.
But Wolfram uses the ubiquity of universality to argue that there is
nothing to explain, and perhaps he's right... I'll describe Wolfram's
ideas in the concluding chapter.
This personal story is designed to humanize what would otherwise be
a dry piece of mathematics, and to show what an adventure discovery
can be, to show the blood, sweat and tears...
But now let me quickly outline the mathematics...
Then we define the complexity or algorithmic information content
H(X) of a LISP S-expression X to be the size in
bits |p| of the smallest program p that produces X.
The result of this is that most N-bit strings require programs
very close to N bits long. These are the random or
incompressible N-bit strings.
Unfortunately this theory still has some serious problems. One
symptom of this is that complexity is not additive. It is not
the case that the complexity of a pair is bounded by the sum of the
individual complexities. I.e., it's not the case that
In other words, you can't combine subroutines because you can't tell
where one ends and the other begins. To solve this, let's make
programs ``self-delimiting''.
[This (sub)additivity property played a big role in my
thinking: (a) because for a programmer it's a very natural requirement
and (b) because it was valid in my two earlier theories of Turing
machine program-size complexity measured in states and (c) because in
fact it had played an absolutely fundamental role in my theory of
program-size for ``bounded-transfer'' Turing machines. I had
regretfully given up additivity in going from my Turing machines
theories to binary programs. But I badly wanted additivity
back! That's why I came up with self-delimiting binary
programs. I had already been working with self-delimiting programs
before... By the way, remember that I showed that LISP complexity is
additive in Chapter V? That's another reason that I love LISP!]
The fact that ``read-bit'' does not return an end-of-file condition
but instead kills the computation is absolutely crucial. This forces
the program p to indicate its own size within itself somehow,
for example, by using a scheme like the length header that's placed at
the beginning of variable-length records. Now most N-bit
strings X have complexity greater than N, because
programs not only have to indicate the content of each bit in
X, they also have to indicate how many bits there are in order
to make the program self-delimiting.
The final result is that most N-bit strings X now have
complexity H(X) very close to N +
H(N). That's N plus the size in bits of the
smallest program to calculate N, which is usually about
N + log2N. So roughly speaking
And now information is additive: H((X Y)) ≤
H(X) + H(Y) + a constant number of bits
c required to stitch the two subroutines for X and
Y together.
Solomonoff had considered all the programs that produce a given
output, not just the smallest ones, but he had not been able to get it
to work. The sums over all programs that he defined diverged, they
always gave infinity.
Well, with self-delimiting programs everything works like a dream. In
addition to the complexity H(X) of a LISP S-expression
X, which is the size of the smallest program for X, we
can define a complexity measure that includes all programs for
X. That's the probability that a program produced by
coin tossing produces X.
The probability that a program produced by coin tossing produces
X turns out to be
I.e., each k-bit program p that produces X adds 2
to the minus k to the probability P(X) of
producing X.
I'm proud of my theorem in my 1975 ACM Journal paper that the
complexity H and the probability P are closely related.
In fact, the difference between H(X) and
−log2 P(X) is bounded.
In other words, most of the probability of computing X is
concentrated on the elegant programs for calculating X. And
this shows that the elegant program is essentially unique, i.e., that
Occam's razor picks out a bounded number of possibilities. And this
connection between program size and probability unlocks the door to
other deep results. I use it to prove a beautiful decomposition
theorem.
AIT in metamathematics
Why LISP program-size complexity is no good
It's easy to understand, it's nice, but it's no good because LISP
syntax makes LISP programs redundant. The bits in the program are not
being used optimally. Ideally each bit in a program should be equally
likely to be a 0 or a 1, should convey maximum information. That's
not the case with LISP programs. LISP program-size complexity
is still a nice theory, but not if the goal is to understand
incompressibility.
Program-size complexity with binary programs
So let's pick a computer U that works like this. The program
p will be a bit string that begins with the binary
representation of the definition of a LISP function.
[That's 8 bits for each character of LISP.]
Then there's a special delimiter character to indicate the end of the
LISP prefix. Then there's a bit string which is data. It's a list of
0's and 1's given to the LISP function defined in the prefix. I.e.,
the function in the prefix must be a function with one argument and
we'll apply it to the list consisting of the remaining bits of the
program. And the value of the LISP function is the output
U(p) produced by running the program p.
Program-size complexity with self-delimiting binary programs
What does self-delimiting mean? Now the computer U works like
this. The LISP prefix of our binary program p is no longer a
one-argument function that is given as argument the rest of the
program. Now the prefix is a LISP expression to be evaluated, and it
has to request the rest of the program bit by bit, one bit at a time,
and it explodes if it asks for too many bits. Bits are requested by
using a 0-argument LISP primitive function ``read-bit'' that returns a
0 or 1 or explodes the computation if all bits have been read and
another one is requested. And the value of the prefix LISP expression
is the output U(p) produced by running the program
p.
The elegant programs for something vs. all programs; algorithmic
probability
Relative complexity, mutual complexity, algorithmic
independence
Here's the second major result in my 1975 paper. It's this
decomposition theorem:
This states that the difference between (the complexity of a pair X, Y) and (the sum of the complexity of X plus the relative complexity of Y given X) is bounded. What's the relative complexity of Y given X? It's the size of the smallest program to calculate Y if we are given an elegant program to calculate X for free.
An important corollary concerns the mutual complexity or information content H(X:Y). That's defined to be the extent to which the complexity of a pair is less than the sum of the individual complexities.
Here's my result
H(X:Y) | = H(X) − H(X|Y) + O(1) |
= H(Y) − H(Y|X) + O(1) |
In other words, I show that within a bounded difference the mutual complexity or the mutual information content is also the extent to which knowing Y helps us to know X and the extent to which knowing X helps us to know Y.
Finally, there is the important concept of (algorithmic) independence. Two objects are independent if the complexity of the pair is equal to the sum of the individual complexities:
More precisely, they are independent if their mutual complexity is small compared to their individual complexities. For example, two N-bit strings are algorithmically independent if their mutual complexity is H(N), i.e., if the only thing that they have in common is their size. Where can we find such a pair of strings? That's easy, just take the two halves of a random (maximum complexity) 2N-bit string!
Now what's a finite random string? Well, the most random N-bit strings X have H(X) close to N + H(N). As I said before, most N-bit strings have close to this maximum possible complexity, i.e., are highly random. And as the complexity of an N-bit string drops below this, the string gets less and less random, and there are fewer and fewer such strings. [More precisely, the number of N-bit strings X such that H(X) < N + H(N) − K is less than 2N−K+c.] But where should we draw the line? How low should we let H(X) drop before we reject X as random? Well, as I said before, it's a matter of degree. But if you insist that I should provide a sharp cutoff, I can do it. How? The answer is provided by looking at infinite bit strings, in other words, at real numbers in binary notation, with an infinite number of binary digits of precision.
C.P. Schnorr showed that an infinite bit string X satisfies all computable statistical tests of randomness (which is Martin-Löf's randomness definition) iff there is a constant c such that
where XN is the first N bits of X (which is my randomness definition). In fact, I show that if this is the case then H(XN) − N must go to infinity. [In my Cambridge book, I prove that four randomness definitions for infinite bit strings are equivalent, including one by R.M. Solovay.] [More precisely, a real number is Martin-Löf random iff it is not contained in any constructively covered set of measure zero.]
So let's draw the cutoff for finite randomness when the complexity of an N-bit string drops below N. Then we can define an infinite random string X to be one with the property that almost all (all but finitely many) of its prefixes XN are finite random strings.
Some examples will clear the air. First, what's the least complex N-bit string? Well, obviously the string of N 0's. Its complexity is within a fixed number of bits of H(N). In other words, to calculate it we only need to know how many bits there are, not what they are.
Second, consider an N-bit elegant program. It turns out that its complexity is very close to N, within a fixed number of bits. So elegant programs are right on the borderline between structure and randomness. They have just enough structure to be self-delimiting!
Third, consider the N-bit base-two numeral for the number of N-bit strings which have exactly the maximum possible complexity. I showed in a 1993 note in Applied Mathematics & Computation that this number is itself an N-bit string within a fixed number of bits of the maximum possible complexity, which is N + H(N).
So these are three milestones on the complexity scale from least to most random.
Define the halting probability for my computer U as follows:
Since Ω is a probability, we have
Now let's write Ω in binary, i.e., in base-two notation, like this
whatever it is. Knowing the first N bits of this real number Ω would enable us to solve the halting problem for all programs for U up to N bits in size. Using this fact, I show that Ω is an algorithmically incompressible real number. I.e.,
where ΩN is the first N bits of Ω. It follows that the bits of Ω satisfy all computable statistical tests for randomness. Separately, I show that the bits of Ω are irreducible mathematical facts: it takes N bits of axioms to be able to determine N bits of Ω. More precisely, there is a constant c' such that it takes N + c' bits of axioms to be able to determine N bits of Ω.
In my 1998 Springer-Verlag book, I actually determine these constants c and c'. c = 8000 and c' = 15328!
has finitely or infinitely many natural number solutions (each solution is a 20,000-tuple with the values for X1, ..., X20000) if the Kth bit of Ω is, respectively, a 0 or a 1. Therefore determining whether this equation has finitely or infinitely many solutions is just as difficult as determining bits of Ω. [Actually, Hilbert in 1900 had asked a slightly different question. He had asked for a method to determine if an arbitrary diophantine equation has a solution or not. I'm interested in whether the number of solutions is finite. No solution is a finite number of solutions... Matijasevic showed in 1970 that Hilbert's 10th problem is equivalent to the halting problem. But Hilbert's 10th problem and the halting problem do not give randomness. They're not independent, irreducible mathematical facts. Why not? Because in order to solve N instances of either problem we just need to know how many of the N equations have a solution or how many of the N programs halt. And that's much less than N bits of information!]
I explain the detailed construction of this equation in my 1987 Cambridge University Press monograph. The final version of my software for constructing this equation is in Mathematica and C and is available from the Los Alamos e-print archives at http://xxx.lanl.gov as report chao-dyn/9312006.
Phew! That's a lot of math we've just raced our way through! The intent was to show you that program-size complexity is a serious business, that AIT is a serious, ``elegant'' (in the non-technical sense) well-developed field of mathematics, and that if I say that Ω is random, irreducible mathematical information, I know what I'm talking about.
Now it's time to wrap things up!