Recorded: 03 Mar 2006
Right, so we had our first council meeting I think in early September. And during that meeting, it became obvious that they were winding up the EST work, and they were going to do a major publication on it. So that work would be finishing at the end of April in 1994. The meeting was September of ’93 of course. So at one point I raised my hand and I said, “You call yourself a genomic institute, let’s do a genome.” Or, “how about doing a genome,” I forgot the exact words. And he said—I said, how about doing Haemophilus influenzae, it’s 1.9 megabases, it has a very favorable base composition for sequencing, similar to human. And I can make the libraries. And he was extremely enthusiastic, of course. I think they had contemplated doing a bacterial genome, but hadn’t actually gotten into it. They’d done vaccinia, I remember—or smallpox, which is about 200 kilobases, so they could potentially do a much larger scale project. But that was done in the old-fashioned way, not a shotgun.
Anyway, so I went back up to Hopkins, walked into my lab and said, we’ve got the best sequencing facility in the world willing to do Haemophilus influenzae. I need to make the libraries, map clones, get it ready so they can sequence. And nobody in the lab was interested. They each—you know, basically they said look at, we’ve got our projects, what’s in it for us? You know, we don’t have a grant to do this, it’ll take a year to map the clones in the fashion that they were doing for E. coli. So there was totally no interest, and I just went into my office and started sitting there thinking, and I thought, gee, you know, this is obvious that Craig is already doing random shotgun sequencing with ESTs. Instead of doing it on a cDNA library, let’s do it on a genome library.
I mean, it was just totally obvious, I think to Craig and everybody it was obvious, except it just hadn’t been focused. So I worked out the Poisson statistics for it and made up a table. Then I happened to run into Craig a few weeks later—a couple of weeks later, and he said, how is the library coming along? I said, well, I can’t do it the way we had discussed, but could I come down and we’ll talk about using your regular approach? So we got together, and I remember sitting in the room there; we put the Poisson table up on the screen, and I said, look if we go down to 40,000 clones—40,000 reads—it will close the genome. He said, no; he says, we’ll do 25,000, which will get us down to about forty breaks, and then we’ll close it. And he was right, of course. Because we never would—because the fact that the gaps are caused by the library itself, not cloning in coli, we never would close it by going forty. I think he just made a lucky guess, but anyway…He wanted to save money; it was cheaper to, he thought, to close it after twenty-five.
They had a—they were putting together an algorithm that would kind of cluster together cDNA sequences that overlapped. And of course you’re dealing with tens of thousands of sequences. And, so, I think that was—we didn’t use that, but it was something like that. Now I actually also wrote up an assembly program – it wasn’t a very good one – but it had some features that I think maybe they used. Yeah, right. Granger. Granger Sutton. Granger Sutton was extremely—he’s very very good. He wrote up the algorithm that worked. That was the TIGR assembler for several years. And then he worked of course on the human Celera assembler as well. No, you know I’m pretty good at writing C programs, but I didn’t have quite the background to do it.
Well, it was sort of a conceptual thing, in a sense, to realize that if you have a factory of machines, if you have a high throughput facility – and this is what Craig has made his name on, you know. That’s why he was able to—the EST work was successful because of the way he did it. If you have that sort of a thing, it’s easier just to make a single library and then just sequence thousands of clones, just reading the end sequences, and then assume that the computer will put it together. Now nobody at that time considered this method, because they didn’t think that computationally it was tractable. The same as they thought we couldn’t do the human genome, but now everybody is doing it by the assembler methods, shotgun.
Hamilton Smith is a U.S. microbiologist born Aug. 23, 1931, New York, N.Y. Smith received an A.B. degree in mathematics at the University of California, Berkeley in 1952 and the M.D. degree from Johns Hopkins University in 1956. After six years of clinical work in medicine (1956-1962), he carried out research on Salmonella phage P22 lysogeny at the University of Michigan, Ann Arbor (1962-1967). In 1967, he joined the Microbiology Department at Johns Hopkins.
In 1968, he discovered the first TypeII restriction enzyme (HindII) and determined the sequence of its cleavage site. In, 1978 he was a co-recipient (with D. Nathans and W. Arber) of the Nobel in Medicine for this discovery.
He is currently the Scientific Director Synthetic Biology and Bioenergy Distinguished Professor at the J. Craig Venture Institute in Rockville, Maryland.