Hilbert's idea is the culmination of two thousand years of mathematical tradition going back to Euclid's axiomatic treatment of geometry, going back to Leibniz's dream of a symbolic logic and Russell and Whitehead's monumental Principia Mathematica. Hilbert's dream was to once and for all clarify the methods of mathematical reasoning. Hilbert wanted to formulate a formal axiomatic system which would encompass all of mathematics.
Formal Axiomatic System
→
→
→
Hilbert emphasized a number of key properties that such a formal axiomatic system should have. It's like a computer programming language. It's a precise statement about the methods of reasoning, the postulates and the methods of inference that we accept as mathematicians. Furthermore Hilbert stipulated that the formal axiomatic system encompassing all of mathematics that he wanted to construct should be ``consistent'' and it should be ``complete.''
Formal Axiomatic SystemConsistent means that you shouldn't be able to prove an assertion and the contrary of the assertion.
→ consistent
→ complete
→
Formal Axiomatic SystemYou shouldn't be able to prove A and not A. That would be very embarrassing.
→ consistent A ¬A
→ complete
→
Complete means that if you make a meaningful assertion you should be able to settle it one way or the other. It means that either A or not A should be a theorem, should be provable from the axioms using the rules of inference in the formal axiomatic system.
Formal Axiomatic SystemConsider a meaningful assertion A and its contrary not A. Exactly one of the two should be provable if the formal axiomatic system is consistent and complete.
→ consistent A ¬A
→ complete A ¬A
→
A formal axiomatic system is like a programming language. There's an alphabet and rules of grammar, in other words, a formal syntax. It's a kind of thing that we are familiar with now. Look back at Russell and Whitehead's three enormous volumes full of symbols and you'll feel you're looking at a large computer program in some incomprehensible programming language.
Now there's a very surprising fact. Consistent and complete means only truth and all the truth. They seem like reasonable requirements. There's a funny consequence, though, having to do with something called the decision problem. In German it's the Entscheidungsproblem.
Formal Axiomatic System
→ consistent A ¬A
→ complete A ¬A
→ decision problem
Hilbert ascribed a great deal of importance to the decision problem.
HILBERT
Formal Axiomatic System
→ consistent A ¬A
→ complete A ¬A
→ decision problem
Solving the decision problem for a formal axiomatic system is giving an algorithm that enables you to decide whether any given meaningful assertion is a theorem or not. A solution of the decision problem is called a decision procedure.
HILBERT
Formal Axiomatic System
→ consistent A ¬A
→ complete A ¬A
→ decision procedure
This sounds weird. The formal axiomatic system that Hilbert wanted to construct would have included all of mathematics: elementary arithmetic, calculus, algebra, everything. If there's a decision procedure, then mathematicians are out of work. This algorithm, this mechanical procedure, can check whether something is a theorem or not, can check whether it's true or not. So to require that there be a decision procedure for this formal axiomatic system sounds like you're asking for a lot.
However it's very easy to see that if it's consistent and it's complete that implies that there must be a decision procedure. Here's how you do it. You have a formal language with a finite alphabet and a grammar. And Hilbert emphasized that the whole point of a formal axiomatic system is that there must be a mechanical procedure for checking whether a purported proof is correct or not, whether it obeys the rules or not. That's the notion that mathematical truth should be objective so that everyone can agree whether a proof follows the rules or not.
So if that's the case you run through all possible proofs in size order, and look at all sequences of symbols from the alphabet one character long, two, three, four, a thousand, a thousand and one... a hundred thousand characters long. You apply the mechanical procedure which is the essence of the formal axiomatic system, to check whether each proof is valid. Most of the time, of course, it'll be nonsense, it'll be ungrammatical. But you'll eventually find every possible proof. It's like a million monkeys typing away. You'll find every possible proof, though only in principle of course. The number grows exponentially and this is something that you couldn't do in practice. You'd never get to proofs that are one page long.
But in principle you could run through all possible proofs, check which ones are valid, see what they prove, and that way you can systematically find all theorems. In other words, there is an algorithm, a mechanical procedure, for generating one by one every theorem that can be demonstrated in a formal axiomatic system. So if for every meaningful assertion within the system, either the assertion is a theorem or its contrary is a theorem, only one of them, then you get a decision procedure. To see whether an assertion is a theorem or not you just run through all possible proofs until you find the assertion coming out as a theorem or you prove the contrary assertion.
So it seems that Hilbert actually believed that he was going to solve once and for all, all mathematical problems. It sounds amazing, but apparently he did. He believed that he would be able to set down a consistent and complete formal axiomatic system for all of mathematics and from it obtain a decision procedure for all of mathematics. This is just following the formal, axiomatic tradition in mathematics.
But I'm sure he didn't think that it would be a practical decision procedure. The one I've outlined would only work in principle. It's exponentially slow, it's terribly slow! Totally impractical. But the idea was that if all mathematicians could agree whether a proof is correct and be consistent and complete, in principle that would give a decision procedure for automatically solving any mathematical problem. This was Hilbert's magnificent dream, and it was to be the culmination of Euclid and Leibniz, and Boole and Peano, and Russell and Whitehead.
Of course the only problem with this inspiring project is that it turned out to be impossible!
I think this is very inspiring. So did John von Neumann. When he was a young man he tried to carry through Hilbert's ambitious program. Because Hilbert couldn't quite get it all to work, in fact he started off just with elementary number theory, 1, 2, 3, 4, 5, ..., not even with real numbers at first.
And then in 1931 to everyone's great surprise (including von Neumann's), Gödel showed that it was impossible, that it couldn't be done, as I'm sure you all know.
Gödel 1931
This was the opposite of what everyone had expected. Von Neumann said it never occurred to him that Hilbert's program couldn't be carried out. Von Neumann admired Gödel enormously, and helped him to get a permanent position at the Institute for Advanced Study.
What Gödel showed was the following. Suppose that you have a formal axiomatic system dealing with elementary number theory, with 1, 2, 3, 4, 5 and addition and multiplication. And we'll assume that it's consistent, which is a minimum requirement---if you can prove false results it's really pretty bad. What Gödel showed was that if you assume that it's consistent, then you can show that it's incomplete. That was Gödel's result, and the proof is very clever and involves self-reference. Gödel was able to construct an assertion about the whole numbers that says of itself that it's unprovable. This was a tremendous shock. Gödel has to be admired for his intellectual imagination; everyone else thought that Hilbert was right.
However I think that Turing's 1936 approach is better.
Gödel 1931Gödel's 1931 proof is very ingenious, it's a real tour de force. I have to confess that when I was a kid trying to understand it, I could read it and follow it step by step but somehow I couldn't ever really feel that I was grasping it. Now Turing had a completely different approach.
Turing 1936
Turing's approach I think it's fair to say is in some ways more fundamental. In fact, Turing did more than Gödel. Turing not only got as a corollary Gödel's result, he showed that there could be no decision procedure.
You see, if you assume that you have a formal axiomatic system for arithmetic and it's consistent, from Gödel you know that it can't be complete, but there still might be a decision procedure. There still might be a mechanical procedure which would enable you to decide if a given assertion is true or not. That was left open by Gödel, but Turing settled it. The fact that there cannot be a decision procedure is more fundamental and you get incompleteness as a corollary.
How did Turing do it? I want to tell you how he did it because that's the springboard for my own work. The way he did it, and I'm sure all of you have heard about it, has to do with something called the halting problem. In fact if you go back to Turing's 1936 paper you will not find the words ``halting problem.'' But the idea is certainly there.
People also forget that Turing was talking about ``computable numbers.'' The title of his paper is ``On computable numbers, with an application to the Entscheidungsproblem.'' Everyone remembers that the halting problem is unsolvable and that comes from that paper, but not as many people remember that Turing was talking about computable real numbers. My work deals with computable and dramatically uncomputable real numbers. So I'd like to refresh your memory how Turing's argument goes.
Turing's argument is really what destroys Hilbert's dream, and it's a
simple argument. It's just Cantor's diagonal procedure (for those of
you who know what that is) applied to the computable real numbers.
That's it, that's the whole idea in a nutshell, and it's enough to
show that Hilbert's dream, the culmination of two thousand years of
what mathematicians thought mathematics was about, is wrong. So
Turing's work is tremendously deep.
What is Turing's argument? A real number, you know 3.1415926..., is a
length measured with arbitrary precision, with an infinite number of
digits. And a computable real number said Turing is one for which
there is a computer program or algorithm for calculating the digits
one by one. For example, there are programs for π, and there
are algorithms for solutions of algebraic equations with integer
coefficients. In fact most of the numbers that you actually find in
analysis are computable. However they're the exception, if you know
set theory, because the computable reals are denumerable and the reals
are nondenumerable (you don't have to know what that means). That's
the essence of Turing's idea.
The idea is this. You list all possible computer programs. At that
time there were no computer programs, and Turing had to invent the
Turing machine, which was a tremendous step forward. But now you just
say, imagine writing a list with every possible computer program.
If you consider computer programs to be in binary, then it's natural
to think of a computer program as a natural number. And next to each
computer program, the first one, the second one, the third one, write
out the real number that it computes if it computes a real (it may
not). But if it prints out an infinite number of digits, write them
out. So maybe it's 3.1415926 and here you have another and another
and another:
So you make this list. Maybe some of these programs don't print out
an infinite number of digits, because they're programs that halt or
that have an error in them and explode. But then there'll just be a
blank line in the list.
Following Cantor, Turing says go down the diagonal and look at the
first digit of the first number, the second digit of the second, the
third...
Well actually it's the digits after the decimal point. So it's the
first digit after the decimal point of the the first number, the
second digit after the decimal point of the second, the third digit of
the third number, the fourth digit of the fourth, the fifth digit of
the fifth. And it doesn't matter if the fifth program doesn't put out
a fifth digit, it really doesn't matter.
What you do is you change these digits. Make them different. Change
every digit on the diagonal. Put these changed digits together into a
new number with a decimal point in front, a new real number. That's
Cantor's diagonal procedure. So you have a digit which you choose to
be different from the first digit of the first number, the second
digit of the second, the third of the third, and you put these
together into one number.
This new number cannot be in the list because of the way it was
constructed. Therefore it's an uncomputable real number. How does
Turing go on from here to the halting problem? Well, just ask
yourself why can't you compute it? I've explained how to get
this number and it looks like you could almost do it. To compute the
Nth digit of this number, you get the Nth computer
program (you can certainly do that) and then you start it running
until it puts out an Nth digit, and at that point you change
it. Well what's the problem? That sounds easy.
The problem is, what happens if the Nth computer program never
puts out an Nth digit, and you sit there waiting? And that's
the halting problem---you cannot decide whether the Nth
computer program will ever put out an Nth digit! This is how
Turing got the unsolvability of the halting problem. Because if you
could solve the halting problem, then you could decide if the
Nth computer program ever puts out an Nth digit. And if
you could do that then you could actually carry out Cantor's diagonal
procedure and compute a real number which has to differ from any
computable real. That's Turing's original argument.
Why does this explode Hilbert's dream? What has Turing proved? That
there is no algorithm, no mechanical procedure, which will decide if
the Nth computer program ever outputs an Nth digit.
Thus there can be no algorithm which will decide if a computer program
ever halts (finding the Nth digit put out by the Nth
program is a special case). Well, what Hilbert wanted was a formal
axiomatic system from which all mathematical truth should follow, only
mathematical truth, and all mathematical truth. If Hilbert could do
that, it would give us a mechanical procedure to decide if a computer
program will ever halt. Why?
You just run through all possible proofs until you either find a proof
that the program halts or you find a proof that it never halts. So if
Hilbert's dream of a finite set of axioms from which all of
mathematical truth should follow were possible, then by running
through all possible proofs checking which ones are correct, you would
be able to decide if any computer program halts. In principle you
could. But you can't by Turing's very simple argument which is
just Cantor's diagonal argument applied to the computable reals.
That's how simple it is!
Gödel's proof is ingenious and difficult. Turing's argument is
so fundamental, so deep, that everything seems natural and inevitable.
But of course he's building on Gödel's work.
How do I get a much more uncomputable real? (And I'll have to tell
you how uncomputable it is.) Well, not with Cantor's diagonal
argument. I get this number, which I like to call Ω, like
this:
What exactly is the halting probability? I've written down an
expression for it:
By the way there's a technical detail which is very important and
didn't work in the early version of algorithmic information theory.
You couldn't write this:
Okay, so this is a probability between zero and one
Well, one thing is it's algorithmically incompressible. If you want
to get the first N bits of Ω out of a computer
program, if you want a computer program that will print out the first
N bits of Ω and then halt, that computer program has
to be N bits long. Essentially you're only printing out
constants that are in the program. You cannot squeeze the first
N bits of Ω. This
Now that's an abstract way of saying things. Let me give a more
concrete example of how random Ω is. Émile Borel at
the turn of this century was one of the founders of probability
theory.
Question: Can I ask a very simple question before you get ahead?
Answer: Sure.
Question: I can't see why Ω should be a probability.
What if the two one-bit programs both halt? I mean, what if the two
one-bit programs both halt and then some other program halts. Then
Ω is greater than one and not a probability.
Answer: I told you no extension of a valid program is a valid
program.
Question: Oh right, no other programs can halt.
Answer: The two one-bit programs would be all the programs
there are. That's the reason this number
So here we have Émile Borel, and he talked about something he
called a normal number.
Let me describe some work Borel did around the turn of the century
when he was pioneering modern probability theory. Pick a real number
in the unit interval, a real number with a decimal point in front,
with no integer part. If you pick a real number in the unit interval,
Borel showed that with probability one it's going to be ``normal.''
Normal means that when you write it in decimal each digit will occur
in the limit exactly 10% of the time, and this will also happen in
any other base. For example in binary 0 and 1 will each occur in the
limit exactly 50% of the time. Similarly with blocks of digits.
This was called an absolutely normal real number by Borel, and he
showed that with probability one if you pick a real number at random
between zero and one it's going to have this property. There's only
one problem. He didn't know whether π is normal, he didn't
know whether √2 is normal. In fact, he couldn't exhibit
a single individual example of a normal real number.
The first example of a normal real number was discovered by a friend
of Alan Turing's at Cambridge called David Champernowne, who is still
alive and who's a well-known economist. Turing was impressed with
him---I think he called him ``Champ''---because Champ had published
this in a paper as an undergraduate. This number is known as
Champernowne's number. Let me show you Champernowne's number.
It goes like this. You write down a decimal point, then you write 0,
1, 2, 3, 4, 5, 6, 7, 8, 9, then 10, 11, 12, 13, 14 until 99, then 100,
101. And you keep going in this funny way. This is called
Champernowne's number and Champernowne showed that it's normal in base
ten, only in base ten. Nobody knows if it's normal in other bases, I
think it's still open. In base ten though, not only will the digits 0
through 9 occur exactly 10% of the time in the limit, but each
possible block of two digits will occur exactly 1% of the time in the
limit, each block of three digits will occur exactly .1% of the time
in the limit, etc. That's called being normal in base ten. But nobody
knows what happens in other bases.
The reason I'm saying all this is because it follows from the fact
that the halting probability Ω is algorithmically
irreducible information that this
This number Ω is in fact random in many more senses. I
would say it this way. It cannot be distinguished from the result of
independent tosses of a fair coin. In fact this number
Why do I say this? Well, let's say you want to use axioms to prove
what the bits of this number Ω are. I've already told you
that it's uncomputable---right?---like the number that Turing
constructs using Cantor's diagonal argument. So we know there is no
algorithm which will compute digit by digit or bit by bit this number
Ω. But let's try to prove what individual bits are using a
formal axiomatic system. What happens?
The situation is very, very bad. It's like this. Suppose you have a
formal axiomatic system which is N bits of formal axiomatic
system (I'll explain what this means more precisely later). It turns
out that with a formal axiomatic system of complexity N, that
is, N bits in size, you can prove what the positions and values
are of at most N + c bits of Ω.
Now what do I mean by formal axiomatic system N bits in size?
Well, remember that the essence of a formal axiomatic system is a
mechanical procedure for checking whether a formal proof follows the
rules or not. It's a computer program. Of course in Hilbert's days
there were no computer programs, but after Turing invented Turing
machines you could finally specify the notion of computer program
exactly, and of course now we're very familiar with it.
So the proof-checking algorithm which is the essence of any formal
axiomatic system in Hilbert's sense is a computer program, and just
see how many bits long this computer program is.
(Technical Note:
Actually, it's best to think of the complexity of a formal axiomatic
system as the size in bits of the computer program that enumerates the
set of all theorems.)
That's essentially how many bits it takes to
specify the rules of the game, the axioms and postulates and the rules
of inference. If that's N bits, then you may be able to prove
say that the first bit of Ω in binary is 0, that the second
bit is 1, that the third bit is 0, and then there might be a gap, and
you might be able to prove that the thousandth bit is 1. But you're
only going to be able to settle N cases if your formal
axiomatic system is an N-bit formal axiomatic system.
Let me try to explain better what this means. It means that you can
only get out as much as you put in. If you want to prove whether an
individual bit in a specific place in the binary expansion of the real
number Ω is a 0 or a 1, essentially the only way to prove
that is to take it as a hypothesis, as an axiom, as a postulate. It's
irreducible mathematical information. That's the key phrase that
really gives the whole idea.
Okay, so what have we got? We have a rather simple mathematical
object that completely escapes us. Ω's bits have no
structure. There is no pattern, there is no structure that we as
mathematicians can comprehend. If you're interested in proving what
individual bits of this number at specific places are, whether they're
0 or 1, reasoning is completely useless. Here mathematical reasoning
is irrelevant and can get nowhere. As I said before, the only way a
formal axiomatic system can get out these results is essentially just
to put them in as assumptions, which means you're not using reasoning.
After all, anything can be demonstrated by taking it as a postulate
that you add to your set of axioms. So this is a worst possible
case---this is irreducible mathematical information. Here is a case
where there is no structure, there are no correlations, there is no
pattern that we can perceive.
Turing says that you cannot use proofs to decide whether a program
will halt. You can't always prove that a program will halt or not.
That's how he destroys Hilbert's dream of a universal mathematics. I
get us into more trouble by looking at a different kind of question,
namely, can you prove that the fifth bit of this particular real
number
Even though you can't have a formal axiomatic system which can always
prove whether a program halts or not, it might be good for everything
else and then you could have an amended version of Hilbert's
dream. And the same with the halting probability Ω. If
the halting problem looks a little bizarre, and it certainly did in
1936, well, Ω is brand new and certainly looks bizarre.
Who ever heard of a halting probability? It's not the kind of thing
that mathematicians normally do. So what do I care about all these
incompleteness results!
Well, Gödel had already faced this problem with his assertion
which is true but unprovable. It's an assertion which says of itself
that it's unprovable. That kind of thing also never comes up in real
mathematics. One of the key elements in Gödel's proof is that he
managed to construct an arithmetical assertion which says of
itself that it's unprovable. It was getting this self-referential
assertion to be in elementary number theory which took so much
cleverness.
There's been a lot of work building on Gödel's work, showing that
problems involving computations are equivalent to arithmetical
problems involving whole numbers. A number of names come to mind.
Julia Robinson, Hilary Putnam and Martin Davis did some of the
important work, and then a key result was found in 1970 by Yuri
Matijasevic. He constructed a diophantine equation, which is an
algebraic equation involving only whole numbers, with a lot of
variables. One of the variables, K, is distinguished as a
parameter. It's a polynomial equation with integer coefficients and
all of the unknowns have to be whole numbers---that's a diophantine
equation. As I said, one of the unknowns is a parameter.
Matijasevic's equation has a solution for a particular value of the
parameter K if and only if the Kth computer program
halts.
In the year 1900 Hilbert had asked for an algorithm which will decide
whether a diophantine equation, an algebraic equation involving only
whole numbers, has a solution. This was Hilbert's tenth problem. It
was tenth in his famous list of twenty-three problems. What
Matijasevic showed in 1970 was that this is equivalent to deciding
whether an arbitrary computer program halts. So Turing's halting
problem is exactly as hard as Hilbert's tenth problem. It's exactly
as hard to decide whether an arbitrary program will halt as to decide
whether an arbitrary algebraic equation in whole numbers has a
solution. Therefore there is no algorithm for doing that and
Hilbert's tenth problem cannot be solved---that was Matijasevic's 1970
result.
Matijasevic has gone on working in this area. In particular there is
a piece of work he did in collaboration with James Jones in 1984. I
can use it to follow in Gödel's footsteps, to follow Gödel's
example. You see, I've shown that there's complete randomness, no
pattern, lack of structure, and that reasoning is completely useless,
if you're interested in the individual bits of this number
Following Gödel, let's convert this into something in elementary
number theory. Because if you can get into all this trouble in
elementary number theory, that's the bedrock. Elementary number
theory, 1, 2, 3, 4, 5, addition and multiplication, that goes back to
the ancient Greeks and it's the most solid part of all of mathematics.
In set theory you're dealing with strange objects like large
cardinals, but here you're not even dealing with derivatives or
integrals or measure, only with whole numbers. And using the 1984
results of Jones and Matijasevic I can indeed dress up Ω
arithmetically and get randomness in elementary number theory.
What I get is an exponential diophantine equation with a parameter.
``Exponential diophantine equation'' just means that you allow
variables in the exponents. In contrast, what Matijasevic used to
show that Hilbert's tenth problem is unsolvable is just a polynomial
diophantine equation, which means that the exponents are always
natural number constants. I have to allow XY. It's
not known yet whether I actually need to do this. It might be the
case that I can manage with a polynomial diophantine equation. It's
an open question, I believe that it's not settled yet. But for now,
what I have is an exponential diophantine equation with seventeen
thousand variables. This equation is two-hundred pages long and again
one variable is the parameter.
This is an equation where every constant is a whole number, a natural
number, and all the variables are also natural numbers, that is,
positive integers. (Actually non-negative integers.) One of
the variables is a parameter, and you change the value of this
parameter---take it to be 1, 2, 3, 4, 5. Then you ask, does the
equation have a finite or infinite number of solutions? My equation
is constructed so that it has a finite number of solutions if a
particular individual bit of Ω is a 0, and it has an
infinite number of solutions if that bit is a 1. So deciding whether
my exponential diophantine equation in each individual case has a
finite or infinite number of solutions is exactly the same as
determining what an individual bit of this
Let me emphasize the difference between this and Matijasevic's work on
Hilbert's tenth problem. Matijasevic showed that there is a
polynomial diophantine equation with a parameter with the following
property: You vary the parameter and ask, does the equation have a
solution? That turns out to be equivalent to Turing's halting
problem, and therefore escapes the power of mathematical reasoning, of
formal axiomatic reasoning.
How does this differ from what I do? I use an exponential diophantine
equation, which means I allow variables in the exponent. Matijasevic
only allows constant exponents. The big difference is that Hilbert
asked for an algorithm to decide if a diophantine equation has a
solution. The question I have to ask to get randomness in elementary
number theory, in the arithmetic of the natural numbers, is slightly
more sophisticated. Instead of asking whether there is a solution, I
ask whether there are a finite or infinite number of solutions---a
more abstract question. This difference is necessary.
My two-hundred page equation is constructed so that it has a finite or
infinite number of solutions depending on whether a particular bit of
the halting probability is a 0 or a 1. As you vary the parameter, you
get each individual bit of Ω. Matijasevic's equation is
constructed so that it has a solution if and only if a particular
program ever halts. As you vary the parameter, you get each
individual computer program.
Thus even in arithmetic you can find Ω's absolute lack of
structure, Ω's randomness and irreducible mathematical
information. Reasoning is completely powerless in those areas of
arithmetic. My equation shows that this is so. As I said before, to
get this equation I use ideas that start in Gödel's original 1931
paper. But it was Jones and Matijasevic's 1984 paper that finally
gave me the tool that I needed.
So that's why I say that there is randomness in elementary number
theory, in the arithmetic of the natural numbers. This is an
impenetrable stone wall, it's a worst case. From Gödel we knew
that we couldn't get a formal axiomatic system to be complete. We
knew we were in trouble, and Turing showed us how basic it was, but
Ω is an extreme case where reasoning fails completely.
I won't go into the details, but let me talk in vague
information-theoretic terms. Matijasevic's equation gives you
N arithmetical questions with yes/no answers which turn out to
be only log N bits of algorithmic information. My equation
gives you N arithmetical questions with yes/no answers which
are irreducible, incompressible mathematical information.
First of all, the connection with physics. There was a big
controversy when quantum mechanics was developed, because quantum
theory is nondeterministic. Einstein didn't like that. He said,
``God doesn't play dice!'' But as I'm sure you all know, with chaos
and nonlinear dynamics we've now realized that even in classical
physics we get randomness and unpredictability. My work is in the
same spirit. It shows that pure mathematics, in fact even elementary
number theory, the arithmetic of the natural numbers, 1, 2, 3, 4, 5,
is in the same boat. We get randomness there too. So, as a newspaper
headline would put it, God not only plays dice in quantum mechanics
and in classical physics, but even in pure mathematics, even in
elementary number theory. So if a new paradigm is emerging,
randomness is at the heart of it. By the way, randomness is also at
the heart of quantum field theory, as virtual particles and Feynman
path integrals (sums over all histories) show very clearly. So my
work fits in with a lot of work in physics, which is why I often get
invited to talk at physics meetings.
However the really important question isn't physics, it's mathematics.
I've heard that Gödel wrote a letter to his mother who stayed in
Europe. You know, Gödel and Einstein were friends at the
Institute for Advanced Study. You'd see them walking down the street
together. Apparently Gödel wrote a letter to his mother saying
that even though Einstein's work on physics had really had a
tremendous impact on how people did physics, he was disappointed that
his work had not had the same effect on mathematicians. It hadn't
made a difference in how mathematicians actually carried on their
everyday work. So I think that's the key question: How should you
really do mathematics?
I'm claiming I have a much stronger incompleteness result. If so
maybe it'll be clearer whether mathematics should be done the ordinary
way. What is the ordinary way of doing mathematics? In spite of the
fact that everyone knows that any finite set of axioms is incomplete,
how do mathematicians actually work? Well suppose you have a
conjecture that you've been thinking about for a few weeks, and you
believe it because you've tested a large number of cases on a
computer. Maybe it's a conjecture about the primes and for two weeks
you've tried to prove it. At the end of two weeks you don't say, well
obviously the reason I haven't been able to show this is because of
Gödel's incompleteness theorem! Let us therefore add it as a new
axiom! But if you took Gödel's incompleteness theorem very
seriously this might in fact be the way to proceed. Mathematicians
will laugh but physicists actually behave this way.
Look at the history of physics. You start with Newtonian physics.
You cannot get Maxwell's equations from Newtonian physics. It's a new
domain of experience---you need new postulates to deal with it. As
for special relativity, well, special relativity is almost in
Maxwell's equations. But Schrödinger's equation does not come
from Newtonian physics and Maxwell's equations. It's a new domain of
experience and again you need new axioms. So physicists are used to
the idea that when you start experimenting at a smaller scale, or with
new phenomena, you may need new principles to understand and explain
what's going on.
Now in spite of incompleteness mathematicians don't behave at all like
physicists do. At a subconscious level they still assume that the
small number of principles, of postulates and methods of inference,
that they learned early as mathematics students, are enough. In their
hearts they believe that if you can't prove a result it's your own
fault. That's probably a good attitude to take rather than to blame
someone else, but let's look at a question like the Riemann
hypothesis. A physicist would say that there is ample experimental
evidence for the Riemann hypothesis and would go ahead and take it as
a working assumption.
What is the Riemann hypothesis? There are many unsolved questions
involving the distribution of the prime numbers that can be settled if
you assume the Riemann hypothesis. Using computers people check these
conjectures and they work beautifully. They're neat formulas but
nobody can prove them. A lot of them follow from the Riemann
hypothesis. To a physicist this would be enough: It's useful, it
explains a lot of data. Of course a physicist then has to be prepared
to say ``Oh oh, I goofed!'' because an experiment can subsequently
contradict a theory. This happens very often.
In particle physics you throw up theories all the time and most of
them quickly die. But mathematicians don't like to have to backpedal.
But if you play it safe, the problem is that you may be losing out,
and I believe you are.
I think it should be obvious where I'm leading. I believe that
elementary number theory and the rest of mathematics should be pursued
more in the spirit of experimental science, and that you should be
willing to adopt new principles. I believe that Euclid's statement
that an axiom is a self-evident truth is a big mistake. The
Schrödinger equation certainly isn't a self-evident truth! And
the Riemann hypothesis isn't self-evident either, but it's very
useful.
So I believe that we mathematicians shouldn't ignore incompleteness.
It's a safe thing to do but we're losing out on results that we could
get. It would be as if physicists said, okay no Schrödinger
equation, no Maxwell's equations, we stick with Newton, everything
must be deduced from Newton's laws. (Maxwell even tried it. He had a
mechanical model of an electromagnetic field. Fortunately they don't
teach that in college!)
I proposed all this twenty years ago when I started getting these
information-theoretic incompleteness results. But independently a new
school on the philosophy of mathematics is emerging called the
``quasi-empirical'' school of thought regarding the foundations of
mathematics. There's a book of Tymoczko's called New Directions in
the Philosophy of Mathematics (Birkhäuser, Boston, 1986).
It's a good collection of articles. Another place to look is
Searching for Certainty by John Casti (Morrow, New York, 1990)
which has a good chapter on mathematics. The last half of the chapter
talks about this quasi-empirical view.
By the way, Lakatos, who was one of the people involved in this new
movement, happened to be at Cambridge at that time. He'd left
Hungary.
The main schools of mathematical philosophy at the beginning of this
century were Russell and Whitehead's view that logic was the basis for
everything, the formalist school of Hilbert, and an ``intuitionist''
constructivist school of Brouwer. Some people think that Hilbert
believed that mathematics is a meaningless game played with marks of
ink on paper. Not so! He just said that to be absolutely clear and
precise what mathematics is all about, we have to specify the rules
determining whether a proof is correct so precisely that they become
mechanical. Nobody who thought that mathematics is meaningless would
have been so energetic and done such important work and been such an
inspiring leader.
Originally most mathematicians backed Hilbert. Even after Gödel
and even more emphatically Turing showed that Hilbert's dream didn't
work, in practice mathematicians carried on as before, in Hilbert's
spirit. Brouwer's constructivist attitude was mostly considered a
nuisance. As for Russell and Whitehead, they had a lot of problems
getting all of mathematics from logic. If you get all of mathematics
from set theory you discover that it's nice to define the whole
numbers in terms of sets (von Neumann worked on this). But then it
turns out that there's all kinds of problems with sets. You're not
making the natural numbers more solid by basing them on something
which is more problematical.
Now everything has gone topsy-turvy. It's gone topsy-turvy, not
because of any philosophical argument, not because of Gödel's
results or Turing's results or my own incompleteness results. It's
gone topsy-turvy for a very simple reason---the computer!
The computer as you all know has changed the way we do everything.
The computer has enormously and vastly increased mathematical
experience. It's so easy to do calculations, to test many cases, to
run experiments on the computer. The computer has so vastly increased
mathematical experience, that in order to cope, people are forced to
proceed in a more pragmatic fashion. Mathematicians are proceeding
more pragmatically, more like experimental scientists do. This new
tendency is often called ``experimental mathematics.'' This phrase
comes up a lot in the field of chaos, fractals and nonlinear dynamics.
It's often the case that when doing experiments on the computer,
numerical experiments with equations, you see that something happens,
and you conjecture a result. Of course it's nice if you can prove it.
Especially if the proof is short. I'm not sure that a thousand page
proof helps too much. But if it's a short proof it's certainly better
than not having a proof. And if you have several proofs from
different viewpoints, that's very good.
But sometimes you can't find a proof and you can't wait for someone
else to find a proof, and you've got to carry on as best you can. So
now mathematicians sometimes go ahead with working hypotheses on the
basis of the results of computer experiments. Of course if it's
physicists doing these computer experiments, then it's certainly okay;
they've always relied heavily on experiments. But now even
mathematicians sometimes operate in this manner. I believe that
there's a new journal called the Journal of Experimental
Mathematics. They should've put me on their editorial board,
because I've been proposing this for twenty years based on my
information-theoretic ideas.
So in the end it wasn't Gödel, it wasn't Turing, and it wasn't my
results that are making mathematics go in an experimental mathematics
direction, in a quasi-empirical direction. The reason that
mathematicians are changing their working habits is the computer. I
think it's an excellent joke! (It's also funny that of the three old
schools of mathematical philosophy, logicist, formalist, and
intuitionist, the most neglected was Brouwer, who had a constructivist
attitude years before the computer gave a tremendous impulse to
constructivism.)
Of course, the mere fact that everybody's doing something doesn't mean
that they ought to be. The change in how people are behaving isn't
because of Gödel's theorem or Turing's theorems or my theorems,
it's because of the computer. But I think that the sequence of work
that I've outlined does provide some theoretical justification for
what everybody's doing anyway without worrying about the theoretical
justification. And I think that the question of how we should
actually do mathematics requires at least another generation of
work. That's basically what I wanted to say---thank you very much!
p1
p2
p3
p4
p5
p6
...
Gödel 1931
Turing 1936
p1 3.1415926...
p2 ...
p3 ...
p4 ...
p5 ...
p6 ...
...
Gödel 1931
Turing 1936
p1 3.1415926...
It's not really important---let's forget about this possibility.
p2 ...
p3 ...
p4 ...
p5
p6 ...
...
Gödel 1931
Turing 1936
p1 -.
d11 d12 d13
d14 d15 d16 ...
p2 -.
d21 d22 d23
d24 d25 d26 ...
p3 -.
d31 d32 d33
d34 d35 d36 ...
p4 -.
d41 d42 d43
d44 d45 d46 ...
p5
p6 -.
d61 d62 d63
d64 d65 d66 ...
...
Gödel 1931
Turing 1936
p1 -.
d11 d12 d13
d14 d15 d16 ...
p2 -.
d21 d22 d23
d24 d25 d26 ...
p3 -.
d31 d32 d33
d34 d35 d36 ...
p4 -.
d41 d42 d43
d44 d45 d46 ...
p5
p6 -.
d61 d62 d63
d64 d65 d66 ...
...
. ≠d11 ≠d22
≠d33 ≠d44
≠d55 ≠d66 ...
Gödel 1931
Turing 1936
3. The halting probability and algorithmic randomness
The reason I talked to you about Turing and computable reals is that
I'm going to use a different procedure to construct an uncomputable
real, a much more uncomputable real than Turing does.
p1 -.
d11 d12 d13
d14 d15 d16 ...
And that's how we're going to get into much worse trouble.
p2 -.
d21 d22 d23
d24 d25 d26 ...
p3 -.
d31 d32 d33
d34 d35 d36 ...
p4 -.
d41 d42 d43
d44 d45 d46 ...
p5
p6 -.
d61 d62 d63
d64 d65 d66 ...
...
. ≠d11 ≠d22
≠d33 ≠d44
≠d55 ≠d66 ...
Gödel 1931
Turing 1936
uncomputable reals
Ω = ∑p halts 2−|p|
This is just the halting probability. It's sort of a mathematical
pun. Turing's fundamental result is that the halting problem is
unsolvable---there is no algorithm that'll settle the halting problem.
My fundamental result is that the halting probability is
algorithmically irreducible or algorithmically random.
Ω = ∑p halts 2−|p|
Instead of looking at individual programs and asking whether they
halt, you put all computer programs together in a bag. If you
generate a computer program at random by tossing a coin for each bit
of the program, what is the chance that the program will halt? You're
thinking of programs as bit strings, and you generate each bit by an
independent toss of a fair coin, so if a program is N bits
long, then the probability that you get that particular program is
2−N. Any program p that halts contributes
2−|p|, two to the minus its size in bits, the number
of bits in it, to this halting probability.
Ω = ∑p halts 2−|p|
It would give infinity. The technical detail is that no extension of a valid
program is a valid program. Then this sum
∑p halts 2−|p|
turns out to be between zero and one. Otherwise it turns out to be
infinity. It only took ten years until I got it right. The original
1960s version of algorithmic information theory is wrong. One of the
reasons it's wrong is that you can't even define this number
Ω = ∑p halts 2−|p|
In 1974 I redid algorithmic information theory with
``self-delimiting'' programs and then I discovered the halting
probability Ω.
0 <
Ω = ∑p halts 2−|p|
< 1
like all probabilities. The idea is you generate each bit of a
program by tossing a coin and ask what is the probability that it
halts. This number Ω, this halting probability, is not
only an uncomputable real---Turing already knew how to do that. It
is uncomputable in the worst possible way. Let me give you some clues
how uncomputable it is.
0 <
Ω = ∑p halts 2−|p|
< 1
is a real number, you could write it in binary. And if you want to
get out the first N bits from a computer program, essentially
you just have to put them in. The program has to be N bits
long. That's irreducible algorithmic information. There is no
concise description.
0 <
Ω = ∑p halts 2−|p|
< 1
Émile Borel
0 <
Ω = ∑p halts 2−|p|
< 1
can't be defined if you think of programs in the normal way...
0 <
Ω = ∑p halts 2−|p|
< 1
What is a normal real number? People have calculated π out to
a billion digits, maybe two billion. One of the reasons for doing
this, besides that it's like climbing a mountain and having the world
record, is the question of whether each digit occurs the same number
of times. It looks like the digits 0 through 9 each occur 10% of the
time in the decimal expansion of π. It looks that way, but
nobody can prove it. I think the same is true for √2,
although that's not as popular a number to ask this about.
Émile Borel --- normal reals
0 <
Ω = ∑p halts 2−|p|
< 1
Émile Borel --- normal reals
Champernowne
.01234567891011121314...99100101...
0 <
Ω = ∑p halts 2−|p|
< 1
is normal in any base. That's easy to prove using ideas about coding
and compressing information that go back to Shannon. So here we
finally have an example of an absolutely normal number. I don't know
how natural you think it is, but it is a specific real number that
comes up and is normal in the most demanding sense that Borel could
think of. Champernowne's number couldn't quite do that.
0 <
Ω = ∑p halts 2−|p|
< 1
shows that you have total randomness and chaos and unpredictability
and lack of structure in pure mathematics! The same way that all it
took for Turing to destroy Hilbert's dream was the diagonal argument,
you just write down this expression
0 <
Ω = ∑p halts 2−|p|
< 1
and this shows that there are regions of pure mathematics where
reasoning is totally useless, where you're up against an impenetrable
wall. This is all it takes. It's just this halting probability.
Irreducible Mathematical Information
0 <
Ω = ∑p halts 2−|p|
< 1
Émile Borel --- normal reals
Champernowne
.01234567891011121314...99100101...
4. Randomness in arithmetic
Okay, what does this have to do with randomness in arithmetic? Now
we're going back to Gödel---I skipped over him rather quickly,
and now let's go back.
0 <
Ω = ∑p halts 2−|p|
< 1
is a 0 or a 1, or that the eighth bit is a 0 or a 1. But these are
strange-looking questions. Who had ever heard of the halting problem
in 1936? These are not the kind of things that mathematicians
normally worry about. We're getting into trouble, but with questions
rather far removed from normal mathematics.
0 <
Ω = ∑p halts 2−|p|
< 1
0 <
Ω = ∑p halts 2−|p|
< 1
halting probability is. And this is completely intractable because
Ω is irreducible mathematical information.
5. Experimental mathematics
Okay, let me say a little bit in the minutes I have left about what
this all means.
Bibliography