
Interview with Bradley Efron
"Statistics is the science of information gathering, especially when the information arrives in little pieces instead of big ones."
Bradley Efron is Professor and Chairman of the Department of Statistics at Stanford University and has been awarded the Ford Prize, MacArthur Prize, and the Wilks Medal for his research work in computer applications in statistics, particularly with techniques known as the bootstrap and the jackknife. He has been invited lecturer or keynote speaker over 50 times, and his latest book, An Introduction to the Bootstrap, was published by Chapman & Hall in 1993.
Aczel: Professor Efron, you, probably more than anyone else, have brought the power of modern computers into statistics. How do you see the future of this marriage?
Efron: First of all, I think [John] Tukey deserves a lot of credit for bringing computers in. At least to my attention. And I believe we are heading toward a world in which people won't even notice that that is the way it is being done. That is, all statistics will be done with a lot of computing, just because you won't have to make a lot of assumptions or theoretical calculations. People already do it, the Box-Jenkins method for example. I think it will just take over after a period of time, that more and more of the theory will be packaged in a way that the consumer doesn't have to think through the theory each time.
Aczel: Do you think it (modern computers and software) will replace our profession (statistics), eventually?
Efron: No. No, I believe it will expand our profession. We will be the people who package the theory. It's sort of like saying: Do you think that computers will put the computer programmers out of business? No, there will be more of us: People will use statistics much more, because they will feel better about it. They will feel like it's not such an alien subject, and there will be much more demand for it.
Aczel: I want to ask you a question about computationally intensive methods in statistics……
Efron: I believe I - Persi [Diaconis] and I - invented the term computationally intensive statistics. I have never found a previous reference before our Scientific American article … so I claim that, ha ha ….
Aczel: Good! You deserve it! Did you devise the bootstrap [an important computationally intensive method] because you had an application in mind, or were you pursuing a theory?
Efron: What actually happened was that Rupert Miller in our department was working on the jackknife. He had written a paper called the "Trustworthy Jackknife" in which he tried to figure out when the jackknife method gave dependable variance estimates. The Jackknife was considered very mysterious. It worked, but nobody could figure out why it worked. And sometimes it didn't work. What I thought was that the jackknife must be a differential, local kind of approximation for something else. And so when I started looking for the something else I came up with the bootstrap.
Aczel: What area of statistics do you see as becoming more important to society in the future?
Efron: OK, now that's a good question. I'm not sure it's areas. Do you mean subject areas like survival analysis? Survival analysis has been a wonderful statistical success story. As I go around the country I have been struck with how prosperous and successful the biostat departments are as opposed to the pure stat departments. And I think that biostat is really moving into the forefront of the national consciousness, or international consciousness, and it'll be as much a part of discourse as things about health cures are now. That is, biostatistics will naturally be a part of newspaper stories about controls and treatments and how the significance level is going, and things like that. So I think that will be tremendously important. I work in biostat and biological kinds of things so I don't get to see a large part of the other kinds of applications. The history since 1900 has been that statistics just takes over field after field in terms of being the methodology of choice, and I think that'll continue. I've done papers with people in astronomy and physics lately, and they're starting to use statistics a lot more for the simple reason that they have to be efficient now. It's hard to say, but I don't see any area where it's being resisted much.
Aczel: What are some of the more interesting problems and applications you've worked on?
Efron: The one I enjoyed most recently was one on Hubble's law. There's a debate in the astronomical community. Hubble's law is that the further a galaxy is away the faster it recedes in a linear way, that space is expanding linearly. Some very good astronomers doubt that. I worked on a set of galaxy redshifts with Vahi Petrosian at Stanford to see whether it was true or not. Much to my surprise, after a long time Hubble's law worked out to be pretty good. I didn't believe it. It just didn't look right to me. But it worked out really well.
I get a lot of good problems in biostatistical consulting. I work for drug companies. I am designing a trial for a company that makes male potency drugs, and it's a rough trial to design. The spurious effects are very easy to have happen, and so it was a lot of fun designing the trial and I hope it will be fun to analyze it too. One of the most fun data sets I ever worked on was that Shakespeare data set - the "how many words did Shakespeare know" problem with Ron Thisted which was a philological kind of data set. So they come up everywhere.
Aczel: Is Hubble's law, by the way, related to relativity? What does relativity predict?
Efron: When Hubble announced his law, Einstein had proved earlier that on of the stable forms for the universe was a constant expansion. And then since everybody told him that was impossible, he recanted. Then Hubble showed that that was what seemed to be happening. That's sort of the accepted majority view. However, there are some other solutions that other people concur are the right ones. I do not know the physics well enough to say.
Aczel: And the statistics prove Hubble's law?
Efron: The statistics at least were very consistent with Hubble's law and not with the other ones. And I was quite surprised because there's been a lot of good criticism of it. The data set was 476 galaxies collected from a bunch of different surveys - it's hard to do redshifts. It sure didn't look like it was going to work, but it did.
Aczel: I want to ask you about the foundations of statistics. Do you think there are holes in the foundations of statistics the way there are in set theory?
Efron: I don't think the holes are the way they are in set theory in that set theory is presumably perfectible, whereas statistics seems much further from that goal yet. I find myself returning again and again as a basis of statistics to things that are just examples like normal translation problems. If you can reduce a problem to a normal translation problem you sort of know the answer. So if you can reduce any problem to that you have the answer. Or confidence intervals. If you can get the kind of confidence property - that seems like the right answer, so if you get something else that approximately does that, that must be pretty good. The only complete theory of statistics is the Bayesian theory and even though it's unassailable it somehow misses part of the story, which is that you can't use it as an actual driving theory for complicated problems. You always are then forced to do something too complicated, and make up your mind on things you have no opinions on. So somehow Bayesian theory is wonderful but it doesn't tell the whole story. Frequentist theory is shot full of contradictions but it seems to work so well. Trying to reconcile those two things… one of my hopes is that computer intensive statistics will make it easier to reconcile philosophical issues because there won't be so many technical problems.
Aczel: You're talking about Bayesian versus Frequentist methods?
Efron: Yes.
Aczel: So you think computationally intensive methods will ….
Efron: Yes, that's just a hope. I use them that way at certain times. The two philosophies don't seem that separate when you can …. I'm working on empirical Bayes kinds of things again now. And I've been using the bootstrap on empirical Bayes, starts bringing the two things together. They don't seem as separate as they [did] to me a long time ago. Maybe I'm just softening up.
Aczel: You started out as a math major. What made you change - or was it a change - into statistics?
Efron: My own history was that my Dad was a truck driver and was an amateur mathematician, and also was the sports statistician for the bowling league and the baseball league at home (St. Paul). And so I grew up with a lot of numbers around and a fairly sophisticated Dad who knew a lot about how to calculate things. I thought I was going to be a mathematician. I went to CalTech, and I think I would have stayed a mathematician if mathematics was like it was a hundred years ago where you computed things, but I have no talent at all for modern abstract mathematics. And so I wanted to go into something that was more computational. After CalTech I came to Stanford. And statistics was definitely better.
Aczel: You made the transition OK?
Efron: Actually, they didn't have statistics courses at CalTech, but some faculty member let me read Cramer's book. It was a reading course. And I really loved that book. I read it from front to back. I don't see anybody else do it the way that Cramer did. I still have that book completely marked up.
Aczel: Can you say something about statisticians who influenced you in your early years?
Efron: Yes, easily. At Stanford, there was Rupert Miller and Lincoln Moses, [who] ran the biostat program. I'd gone in as a mathematician. It was in the biostat program that I learned statistics as a living day-to-day kind of thing that helps people. It was a big eye opener seeing how they did statistics.
Aczel: You are the recipient of the prestigious MacArthur Award, among many other distinctions. Can you tell us something about the project that you were doing with the MacArthur?
Efron: Well, the MacArthur Award was not an award for doing projects. They specifically say that it isn't. They specifically say they won't say anything to make you do anything. They want the award to be a prize, not an impediment to your future work. And they're very nice people to deal with in that regard. So, I never did anything directly. However, it had a big effect on me in the following sense that I never expected. People both inside the university and outside the university suddenly paid a lot more attention to what I had to say.
Aczel: Is there no Nobel Prize in statistics? They consider it part of math [which has no Nobel Prize]?
Efron: Statistics does not have big prizes. And it's probably a good thing. It hasn't had a good effect, say, on physics or math to have the Field medals or Nobel Prizes. They tend to produce a lot of big egos and squabbling. Statistics has been, in a sense, a sort of intellectual working man's field, where you work for the love of the field and not for the fame, because there sure isn't any.
Aczel: What is statistics to you - is it mathematics, computer science, philosophy, all of the above, or something else?
Efron: My definition, that is in our book, is that statistics is the science of information gathering, especially when the information arrives in little pieces rather than in one or two big pieces.
Aczel: Like a puzzle?
Efron: It's really quite amazing that there can be a theory of statistics. You might think that you couldn't have theories about things… I mean, like astronomy is about stars and geology is about rocks, well statistics is about information and information gathering. It's not clear that that's a "thing" in the same way that rocks are rocks.
Aczel: You mean like a concept of a number or something like that?
Efron: Yes, it's pretty philosophical. The philosophers usually say that statistics is impossible. They say that you can't learn from experience. They can always think of counterexamples. But we live in a world where the examples outnumber the counterexamples by quite a bit.
Aczel: Where do you stand on estimation versus hypothesis testing and criticism of p-values? Like when a student asks "why is it alpha .05?" or "what happens when my p-value is .06, why is it not significant but when it's .04 it is?" Or in general, what can you say about hypothesis testing, confidence intervals, the future…?
Efron: Estimation is putting things together and testing is pulling things apart. Analysis versus synthesis. I've always liked confidence intervals because they let you straddle the middle ground. There are times for all those things. Hypothesis testing is immensely efficient if you're really just trying to know whether an effect is positive or not positive. It's the only thing you can use when you've got n=10, a lot of times. But it isn't nearly so interesting when you've got a bigger n and you really want to know more. It's usually fairly clear that the hypothesis is true or not true. The question is, how true. And then estimation and confidence intervals come more … I use confidence intervals a lot more. I think most good applied statisticians use confidence intervals more than they show up in the books. I wish they were in the books. The reason they are not in the books so much is they are much harder to explain than either estimation or hypothesis testing. But that's one of those things that I hope the future will bring us to - a much easier acceptance of things like intervals because you'll be able to get them out of the automatic machine and then you can spend your time understanding what they are instead of this horrible… The math is quite involved.
Aczel: They will be more accurate, right, if you use the bootstrap? I'm really fascinated with the idea that the bootstrap gives you any required level of accuracy and correctness, especially replicated bootstrap, and as the sample size grows, and that you're really doing better than normal theory, t-distribution.
Efron: Yes, you're doing one step better, and that one step is a big step.
Aczel: From 1 over root n to 1 over n …
Efron: Yes, I think that's very important. And the thing that I'm working on - and a lot of people are working on - is trying to make that really dependable so you really get it - and it really works - every time. Prepackaging this theory isn't so easy. It's hard work.
Aczel: One more question. You mentioned earlier a new book you wrote about the bootstrap.
Efron: Yes. Now that our little book is out - [Robert] Tibshirani and I have a little book called An Introduction to the Bootstrap, published by Chapman and Hall, 1993 - it will make it easier to understand the bootstrap. What's happening is that the bootstrap has had its run, its first run, at the theory world, and now I think it will go into applications. The applied people have always seemed happier with it than the theory people.
Aczel: Can you say something about your outside interests? You're a human being and a very interesting one, not just a statistician.
Efron: I love statistics. I live right at the Stanford campus. I go in all the time. I don't work hard every day, though. I work every day. In the evenings I like to go to movies with my girlfriend, and I go to almost all movies. I'm interested in astronomy and science. I claim that I'm an amateur scientist as well as a professional scientist. And I love science. I think it's the greatest thing people ever thought of. So I guess my hobby is science, too. I like to sit around and talk ….Movies, stat, and science.