But the intentional and controllable engineering of biological systems has largely eluded us. Without a set of well-understood principles upon which to create novel systems, we are left with discovery. Almost all modern medicines have been discovered or systematically identified from an existing set of biological solutions, rather than intentionally generated for a given purpose. This is because biology, in contrast to physics, is governed by highly complex and often stochastic rules that do not easily yield a parsimonious set of principles that are readily understood. Maybe evolution has not equipped our brains to understand how millions of minute statistical patterns lead to the complex behavior of the cell. But a new kind of intelligence is emerging that excels at this kind of high-dimensional probabilistic reasoning: machine learning. It seems that machine learning can reveal the engineering principles of biology that have hitherto been hidden. These hidden principles, once uncovered, can be used directly for machine generation of novel and potent biological molecules.
What if machine learning can generate novel biology?
Historically, much of our biological knowledge was discovered through experiments to understand how a given perturbation affects the properties of that system. With each new observation, we updated our world view, formulated new hypotheses, and tested our predictions. This empirical process of discovery underlies the scientific method, which hasn’t evolved much since it was first by Francis Bacon in the 1600s.
In biology, this process results in a systematic cataloging of evolutionary history and the solutions it has produced. In order to create novel biology, inaccessible to evolutionary processes, we need to know the generalizable principles by which biological systems function. However, we often cannot write down such equations or principles, and we even struggle to identify the basic unit of biology on which to operate. Since the first proposal for standard biological parts, we have attempted to define the elementary that can be predictably combined to produce complex biological systems, analogous to the electron and semiconductor for a transistor. This extensive reductionist effort has revealed a breathtaking amount of complexity, and has ultimately stymied our ability to find generalizable principles and medicines of the future at scale.
If the generalizable principles required for engineering do exist, yet remain elusive due to the immense and stochastic complexity, machines are perfectly poised to master this challenge.
Machine learning can extract the hidden principles of biology
Machine learning has revolutionized fields including computer vision, natural language processing, speech recognition, medical imaging, and computational biology over the past decade. The advent of large data sets coupled with advances in processing have made it practical to train enormous neural networks. These neural networks, which are said to employ deep learning because of their many-layered nodes, are accomplishing tasks that were until recently thought to be too difficult for computers to tackle—for example, winning the ancient Chinese game of . In a pivotal development in 2016, a deep learning system created by Google DeepMind, AlphaGo, defeated Go champion Lee Sedol in five matches in Seoul. Some of the system’s decisive moves for Go masters to understand, even with the benefit of hindsight. In the year following, DeepMind developed AlphaGo Zero, a deep learning system that started out knowing nothing more than the rules of the game and was otherwise . The new system quickly learned and discarded strategies that humans had discovered and honed by humans over millennia, and within 36 hours, its abilities surpassed those of the system that had beaten Sedol just a year earlier.