People of a certain age will remember the 3.5-inch floppy disk, the ubiquitous computer storage medium of the 1990s that held 1.4 megabytes of data – not even enough for a single mp3 file. Times have changed. A 90-minute high-definition film would use up nearly 6,000 floppies, while a palm-sized, two-terabyte hard drive has more than a million times the capacity of its humble forerunner. We live in a data-hungry era, filling devices with gigabytes of music, video and other files – but more significantly, we also generate a huge amount of data ourselves, much of which lives in the cloud, in colossal data centres dotted around the globe.
Every minute of every day, we collectively send 16 million text messages, half a million tweets and 170 million emails, upload 70,000 photos to Instagram and 300 hours of video to YouTube. But we also generate masses of data unwittingly, under the radar, whether it's by parking an internet-equipped car, wearing a fitness tracker, choosing what to watch on Netflix or any number of other activities. This has all led to a glut of ones and zeroes. More data was generated globally in 2015 than in all previous years of civilisation put together, according to Professor Carlo Ratti of MIT, and we're creating more and more of it every year. Storage firm Seagate estimates that by 2025, we will be generating 163 zettabytes of data annually. (A zettabyte is a trillion gigabytes – and if you're finding that hard to imagine, well, you're not the only one.)
Why is this a problem? After all, storage media is cheap, our phones, tablets and computers are becoming more capacious, and companies such as Amazon, Microsoft and Dropbox offer cheap deals to store our data in the cloud. No one is telling us to change our data habits – quite the opposite. But recent studies show that this trend isn’t sustainable in the long term. “Our ability to store data will taper off,” says Devin Leake, chief scientific officer at US archival firm Catalog. “There’s a growing concern that the silicon that we use as storage is limited. It’s a finite resource.”
While we have got better at cramming data into ever-smaller pieces of silicon, there are real-world limits. And it’s not even clear whether we will be able to produce the amount of power needed to keep all the data centres of the future running. In the next few decades, we will either have to become happier with the idea of deleting data or find new ways of storing it.
The scientists at Catalog are at the forefront of this hunt by using synthetic DNA as a storage medium. Given that every cell in the human body holds a piece of genetic code some six billion letters long, it's clear that nature is adept at squeezing information into a tiny space, and that's what scientists are now seeking to emulate. A group at the European Bioinformatics Institute first came up with the notion in 2011, and two of them, Ewan Birney and Nick Goldman, went on to successfully encode five files, including Shakespeare's sonnets, into a synthetic polymer. It was a clever scientific trick, and other artistic experiments followed: British trip-hop group Massive Attack recently had their 1998 album Mezzanine encoded into DNA strands, while Catalog did the same to Douglas Adams' sci-fi novel The Hitchhiker's Guide to the Galaxy.
But this process is far from straightforward. “Right now, we’re synthesising DNA using traditional chemistry that has been around for 30 years or more,” Leake says. “To build a piece of DNA that’s 100 base pairs [the building blocks of DNA’s double helix] long would take 10 hours. When I attended some early workshops on DNA data storage, it was clear that we were some orders of magnitude away from where we needed to be, in terms of speed, to make data storage happen.”
Hopes are, nevertheless, being pinned on DNA to solve the storage crisis of the future. DNA would allow us to fit every movie ever made into a space the size of a sugar cube; all the information on the internet into a shoebox; the entire store of the world’s data into a wardrobe.
Suddenly, the idea of building a data centre the size of 30 football pitches looks positively archaic. But can DNA really deliver? After all, it’s not only time-consuming to construct the DNA, but it’s also a long-winded process to get the data back out using a DNA sequencer. In a world where our access to information tends to be measured in milliseconds, what use is a system that takes hours to deliver the data we need?
Catalog believes it has made some important advances in this regard. Rather than encoding data piece by piece in DNA – the equivalent of making a hard drive with your data already stored on it – the company envisages a system where smaller quantities of pre-made DNA are arranged into combinations that represent the data.
“Current DNA synthesis is like transcribing a book from start to finish, letter by letter,” Leake says. “But we’re creating a kind of printing press that allows us to speed up that process.”
It’s unlikely that DNA will ever become the rapid-access equivalent of the flash drive or USB stick, but it may well fulfil our need to archive large quantities of material. The ease of storing data has led us to almost develop a hoarding mindset, where we feel the need to preserve every piece of digital data we generate just because we can. It’s possible that in the future we will become more selective about what we keep and what we throw away, but Leake believes that the storage possibilities offered by DNA will broaden humanity’s perspective on digital information. “We are inquisitive creatures,” he says, “and we aspire to be more than we are. And because of that, we’re storing valuable information that we won’t fully appreciate the relevance of until 1,000 years from now.”
There’s certainly a beautiful symmetry in using the coding system that defines who we are to preserve our digital lives, for eternity, in a way that won’t falter or decay.