It’s popular these days to decry the superiority of machines over biology. Each time some AI-infused algorithm defeats a human at a board game, people gasp and wring their hands in despair. But these are really straw-man contests. Evolution has never felt pressed to optimize her creations to be fabulously good GO players. It’s really not so strange an algorithm should outwit our human wetware at board games.
A more interesting challenge comes when we pit machines against evolution in an area she has been optimizing for a long time — like data storage and retrieval. This may seem like an odd genre to think of in evolutionary terms; we’ve never seen any other creature trying to store and retrieve data in quite the systematic way we do with computers. But a parallel process takes place in nature every second of the day: DNA storage and replication. It’s breathtakingly effective, and far beyond what we have conceived by mechanical means — and now Microsoft wants in on the action.
Microsoft announced a partnership with the San Francisco-based Twist Bioscience, which will provide the long oligonucleotides used for synthetic DNA storage. As part of the deal, Microsoft will purchase 10 million strands of such DNA, in what augurs to be the first phase of their DNA storage ambitions. With its HealthVault platform, Microsoft has good reason to think they will require an enormous amount of storage space — medical data is one of largest and fastest growing fronts in the data tsunami.
A few exploratory facts about DNA helps shed light on its magnificence as a storage medium. Consider the error rate of DNA polyermerase, the enzyme responsible for copying strands of DNA in the replisome. For every 10 billion basepairs copied, it makes an average of a single mistake — and that in the very “noisy” conditions that is a human body, exposed to a myriad of biological threats like polluted water, viruses, and McDonald’s take out.
Not only is DNA remarkably effective at retrieving and copying data, it’s extremely efficient in scale. It’s estimated that a diploid cell in the human contains about 1.5 gigabytes of information, which it can store and retrieve with frightening accuracy. At 1.5GB per cell, the cells in your hand could provide a storage medium bigger than the largest mechanical hard drive in existence. And it’s easy to see why this should be the case. Storing and retrieving genetic information is fundamental to evolution, and it’s had a long time to perfect the process, much longer than we have been making hard drives. As a result, it’s understandable Microsoft is betting on DNA becoming the ultimate storage solution.
In a poetic twist, some of the storage problems for which DNA is offered as a solution stem from the decoding of genomes themselves. The size of such genetic data silos has skyrocketed thanks to the diminishing cost of sequencing equipment. DNA being used to store decoded DNA — round and round we go, the proverbial snake eating its tail.
Now read: DNA can now reliably store data for 2,000 years or more, or How DNA sequencing works