Deoxyribonucleic acid (DNA) is the genetic backbone of living things. But it’s also an architecture for data storage, as a team of researchers demonstrated by storing and retrieving photos encoded in the molecule.
The recent team stored an image of a Chinese rubbing and a photo of a panda—comprised of 16,833 bits and 252,504 bits, respectively—in DNA, and then retrieved the media without error. In their findings—published today in Nature—the researchers posit that their new approach could provide a scalable solution for the demand for high-density data storage.
DNA is the molecule sealed without the nuclei of our cells; it contains the biological instructions for all living things and is made up of chemical building blocks called nucleotides. The billions of nitrogen bases that make up those nucleotides—adenine, thymine, guanine and cytosine—define everything from the rate of our toenail growth to the color of our hair. But the patterns of those base pairs can also encode data, meaning you could use it to store anything from a list of passwords to high-resolution video.
Synthetic DNA has been used for data storage before; in 2018, a different team of researchers encoded and recovered 35 files (constituting over 200 megabytes of data) in over 13 million DNA oligonucleotides, demonstrating such a storage system was possible.
But a key difference between that older work and the recent team’s study is that the latter managed the storage feat without de novo DNA synthesis, which remains an uneconomical way of storing data in terms of time investment and cost. The recent team managed to encode nearly 300,000 bits with 350 bits written per reaction, as described in the study. The DNA storage was done by 60 volunteers without professional experience in biological laboratory settings, showcasing the accessibility of the method to non-experts.
To avoid the reliance on DNA synthesis exhibited in previous attempts at data storage in DNA, the recent team encoded the molecule using methylation. Methylation is a process by which enzymes add a methyl group (a molecule comprised of one carbon atom and three hydrogen atoms) to certain sites on the DNA strand. This allowed the team to assemble chunks of DNA that bind to specific parts of the DNA, allowing those parts of DNA to be read as 0s or 1s—just as bits function in a nuts-and-bolts computer.
“In our scheme, the DNA sequences serve as addresses while the modification status of the letters now represent the data,” explained Long Qian, a researcher at Peking University and co-author of the paper, in an email to Gizmodo. “To write a specific information, one can simply select 0/1 states for each address and the states will align automatically to DNA, a process we call ‘typesetting.’ After typesetting, the data is copied to a DNA strand simultaneously, a process called ‘printing.'”
“Our strategy has the potential to be orders of magnitudes cheaper and faster than the mainstream method,” Qian added. “This could make DNA storage commercially viable.”
Carina Imburgia and Jeff Nivala, researchers at the University of Washington, authored an News & Views article accompanying the new study that outlined the team’s novel methodology. Imburgia and Nivala pointed out that DNA holds promise as a data storage medium: A single gram of the molecule can store a whopping 215,000 terabytes. However, two researchers noted, the methyl groups essential to the team’s approach cannot be copied using polymerase chain reactions (or PCR), the standard way to copy large amounts of DNA.
“Another challenge is that many applications require random access memory (RAM), which enables subsets of data to be retrieved and read from a database,” Imburgia and Nivala wrote. “However, in the epi-bit system, the entire database would need to be sequenced to access any subset of the files, which would be inefficient using nanopore sequencing.”
In other words, you can think of the DNA data storage as zip files. In order to access any data, you have to sequence the entire database.
DNA as a venue for data storage is a tantalizing prospect; it’s amazing that such a small molecule could contain so much information—about ourselves, of course, but then could have a double life as a repository for contact information, photo libraries, full highlights of the 1974 FA Cup final—you name it. It’s an approach to data storage that holds a lot of promise, but will require more research if it is to be viable at scale.