David Haussler on Genome Research: Hidden Markov Models
  David Haussler     Biography    
Recorded: 08 May 2012

We developed hidden Markov models - that were extensively applied in bioinformatics later. In particular, we were the first to run the hidden Markov model to find the genes in the E. coli genome.

In Markov models, or go, go way back as models of time series data, anything sarcastic variation that happens through time, you can model it in very simple way, with a hidden Markov model. They were used in speech recognition extensively and are the basis for most modern speech recognition mechanisms. So, when you talk to Siri on your iPhone or something like that, that has the benefit of hidden Markov model technology. We were looking at applications of that in molecular biology when Anders Krogh came to do a post-doc at Santa Cruz for the first time and his background was in physics and mine in math and computer science. But we both had an interest in molecular biology and we were thinking one day, just one afternoon, about how we could apply the hidden Markov model to run along the chain of a protein sequence and predict the pattern of amino acids or DNA in a genome and from that rapidly grew a whole blossoming of methodology and ideas, we pieced together a program. We started classifying proteins based on hidden Markov model recognized similarities. We ran the hidden Markov model through the entire E. coli genome, picking out where the genes were…

There were soon many, many groups using hidden Markov models methodologies. It actually, the hidden Markov model, in a way, unified methodologies that were already being used, dynamic programming methods, for example, are a special case of hidden Markov models and so it was, in a sense, a unifying theme for the gene finding field.

1994, Anders Krogh and I, published a paper on using the hidden Markov model to find all of the genes in the E. coli genome and the same year, Anders and others in my lab published a paper on comparing different protein families to each other with hidden Markov models, discovering similar proteins using this technology.

David Haussler (born 1953) is an American bioinformatician known for his work leading the team that assembled the first human genome sequence in the race to complete the Human Genome Project and subsequently for comparative genome analysis that deepens understanding the molecular function and evolution of the genome. He is a Howard Hughes Medical Institute Investigator, professor of biomolecular engineering and director of the Center for Biomolecular Science and Engineering at the University of California, Santa Cruz, director of the California Institute for Quantitative Biosciences (QB3) on the UC Santa Cruz campus, and a consulting professor at Stanford University School of Medicine and UC San Francisco Biopharmaceutical Sciences Department.