## Wednesday, August 30, 2006 ... /////

### Mohammed AlQuraishi: On a theory of biology

The previous article about theoretical biology was an interview with Franziska Michor. Below, you find a completely unedited article by Mohammed. The links to at least three reactions to the article below are listed here.

Thanks to Lubos for the opportunity to write on this blog. Naturally, many potential topics presented themselves, but I ultimately chose to write about the subject that is dearest to my heart, and that is the emerging hard science of biology. I said emerging because while biology as a scientific discipline has certainly existed for many decades, and in some sense centuries, it has only recently started acquiring status as a genuinely hard science. The next few paragraphs will concern themselves with the challenges and opportunities that face us today as biologists, as we embark on formulating a quantitative theoretical foundation for biology.

I will enumerate three main points, all of which represent both a challenge and an opportunity. The first will deal with a scientific challenge of a theoretical orientation, namely the lack of a theory for biology. The second with the sociological organization of biologists and biology departments at the leading research institutions. And the third will be part science, part sociology, having to do with the focus of current experimental methods and programs on biomedical research as opposed to basic biological research. The challenges are listed according to my own judgment of their importance.

First is the need to develop a unified theory and formalism for biology. No true theory of biology exists today. Instead, the field is characterized by a collection of disparate models, each concerned with a particular subset of the general biological space. Furthermore, each subfield employs its own set of formalisms, ones that are in some cases fundamentally incompatible with the formalisms used in other subfields. One such example that has received considerable attention in the past several years is the field of genetic regulatory networks, or GRNs. This class of problems deals with the regulatory relationships that exist between genes. The products of certain genes, known as transcription factors, are able to bind to the DNA molecules encoding other genes and regulate their production. Transcription factors that ‘turn off’ genes are known as inhibitors, and ones that ‘turn on’ genes are known as activators. Such relationships are typically encoded as graphs in the computer science sense, with nodes representing genes, and edges representing different types of relationships, for example inhibition or activation. Initial work in this field, such as that pioneered by Harley McAdams and Adam Arkin at Stanford and UC Berkeley, respectively, employed electrical circuits, modeled by ordinary differential equations, as the underlying formalism. More recent work, such as that by Eric Davidson at Caltech and Hamid Bolouri at the Institute for Systems Biology in Washington, makes extensive use of the sequence properties of the genes—specifically the presence and arrangement of DNA motifs, short sequences of nucleotides such as “AGGTA”, that can be used to predict whether a given transcription factor would bind or not, and the resulting logic of a given binding combination, for example if X and Y bind then activate, if X or Z binds then repress, etc. Yet, GRN models typically lack any grounding in the kinetics of the underlying enzymes, or the structural basis that determines how transcription factors bind to the DNA molecule. Thus, they are fundamentally ad hoc, and so cannot be used to predict the behavior of previously uncharacterized genes and transcription factors. Many other examples exist—biochemical networks make extensive use of kinetic data, also typically formulated as ODEs, but encode no spatial information. Protein folding models do make extensive use of spatial information, and are typically simulated using quantum mechanics-based molecular dynamic simulations or classical rigid-body dynamics. Yet the timescales at which such simulations are made make it entirely impractical, for the foreseeable future, to combine structural data with the other types of models mentioned. Other structural models, such as those used for the morphology of cells or even organisms, completely forgo continuous models and instead employ discrete formalisms, such as cellular automata. Integration in this instance becomes difficult due not only to the vastly different physical scales at which the phenomena is occurring, but also to differences in the very formalism that is representing said phenomena. The opportunity is clear: Develop a theoretical framework that can coherently combine all the aforementioned disparate models, along with an underlying mathematical formalism that can quantitatively and elegantly capture it. To address this problem one cannot rely on better ‘motif-finding’ algorithms for DNA sequences, or more accurate clustering methods for microarrays. The challenge is not building software to process biological data—it is to construct a theory of what biology is.

The second challenge that I will address is a sociological one. It can be summarized as follows: Experimentalists think theoreticians don’t do 'real work' because they’re sitting behind computers all day, and theoreticians think experimentalists are charming individuals whose work will ultimately be automated by robotics. Neither picture is true. What makes experimental work difficult is not performing the experiment, but the careful and thoughtful experimental design that is required to obtain the sought results. Nor is the difficulty of theoretical work merely writing software or solving equations, but the insight necessary to abstract from separate experiments a coherent quantitative picture of what is truly taking place. To address this problem, various institutions have attempted to bring biologists, computer scientists, physicists, and others to the same table and address the problem from their individual perspectives. At Stanford, where I am currently pursuing my graduate research, there is BioX. The University of California campuses at Berkeley, San Francisco, and Santa Cruz have QB3. And up in Washington there's the Institute for Systems Biology. On the east coast, Harvard recently opened a Systems Biology department, and MIT is now offering a Computational Systems Biology major. Alas, I think this approach alone will not solve the problem. Collaboration between different departments is a step in the right direction, but ultimately a new generation of scientists will need to be trained as quantitative biologists. Scientists who care about biology as a science, and who are expertly trained in mathematics and computer science, much like their physicists counterparts. Stated differently, we need one brain with two things in it, as opposed to two brains with one thing in each.

Moving away from sociological factors and half way back to science, the third challenge I would like to outline is an experimental one. Current experimental work, and biology as a whole, suffer from a serious problem—the perception that their existence is only justified as a tool for medicine. Biology is a science, with its own set of fundamental and basic objectives. To require biological research to forgo the foundations and focus on applications is bad for both. I will provide one example to illustrate this point. Significant resources are allocated to the model organisms closest to humans, namely mice and to some extent the fruit fly Drosophila melanogaster. Less is given to simpler eukaryotes like Yeast. And yet less to prokaryotes like Escherichia coli. But in view of the first challenge above, it is imperative that biologists work on as simple an organism as possible. Initial work in quantum mechanics solved the hydrogen atom, not ununoctium. Stated differently, it is as if someone tasked with learning a programming language for the first time decides to examine the source code of Microsoft Windows for tips--unlikely. Instead, beginning computer scientists are usually tasked to write a simple ‘Hello World’ program. I hate to break it to everyone, but we biologists are still at the ‘Hello World’ stage. One organism in particular, Mycoplasma genitalium, has the potential to serve as the hydrogen atom of biology. It is the simplest known organism, yet its status as a model organism for experimental research is very poor. If we are in the business of constructing a quantitative biology, simple organisms deserve to receive a lot more attention, even if they lack immediate biomedical applications.

I will conclude with a parting thought, and an accompanying question. Biologists are rumored to suffer from 'physics envy', an affliction known to arise from the purportedly inferior mathematical skills of the working biologist with respect to her physicist colleague. It appears however that for at least parts of biology, it is not quite clear that a physical model is the right one. Certainly for constructs such as a 'cell', one can think of objects existing in physical space, enzymes, nucleic acids, large protein complexes, etc, and interacting according to the laws of physics. But other biological constructs don't fit the bill as neatly. Consider the 'genome'. What physical object does it map to? One may argue that the chromosome of any one organism is in fact such a physical object. But evolutionary biologists speak of turnover rates for specific nucleotide positions, and 'ultraconserved' regions existing between distant extant organisms as far apart as humans and yeast! Clearly the word 'conserved region' means little in the context of one specific chromosome, as an individual physical object. Perhaps one can escape this dilemma by arguing for the need to examine things on a longer timescale, not simply to consider one single chromosome as a physical object, but the physical life of many such chromosomes over many years and millennia. Or perhaps not. My question is:

Are we trying to construct a physical theory, albeit one on a restricted temporal and spatial scales, like chemistry? Or are we coming up with an entirely new abstraction, much like computer science, where the underlying physics are ultimately inaccessible?

And that, as in the words of one Lubos Motl, is the memo. [LM: Bill O'Reilly will certainly forgive us.]

Mohammed AlQuraishi, visitor #600,000