Researchers at Sandia National Laboratories in New Mexico are experimenting with encrypted DNA storage for archival applications.
Husband and wife team George and Marlene Bachand are biological engineers with a remarkable vision of the future.
The researchers at the Sandia National Laboratories Center for Integrated Nanotechnologies foresee a time when a speck of DNA on a piece of paper the size of a millimeter could securely store the entire anthology of Shakespeare’s works.
George Bachand says the first practical applications for DNA-based storage are for long-term archival purposes. Potentially, such a product could securely store records for the National Archives, government personnel records, research findings at the national labs, or other sensitive classified information.
“Historically, the national laboratories and the US government have a lot of highly secure information that they need to store long-term,” Bachand explains. “I see this as a potentially robust way of storing classified information in the future to preserve it for multiple generations.”
Crypto, Synthetic DNA, and The Bard
The Bachands’ project, Synthetic DNA for Highly Secure Information Storage and Transmission, was inspired when researchers at the European Bioinformatics Institute recorded all of Shakespeare’s sonnets into 2.5 million base pairs of DNA – about half the genome of the tiny E. coli bacterium. Bachand says using this method, the researchers could theoretically store 2.2 petabytes of information in one gram of DNA. That’s 200 times the printed material at the Library of Congress.
Bachand adds that unlike digital forms of storage, DNA never becomes obsolete.
“Hard drives fail and very often the data can’t be recovered,” explains Bachand. “With DNA, it’s possible to recover strands that are 10,000 to 20,000 years old.”
There’s another reason why DNA is more secure. DNA consists of four chemically different building blocks, or bases, commonly referred to by their one-letter abbreviations: A, C, G, and T. All life on Earth stores genetic information in DNA, which is read in groups of three making 64 possible triplet codons, or sequences (think 4 to the 3rd power).
So given that spaces make up on average 15- to 20% of the characters in a text document, instead of using AAA for a “space” in the text, an encryption key could specify that TAG, TAA and TGA is the code for a space while GAA and CTC could be code for the “Letter E.” By reducing the amount of repetition–in other words, reducing the AAA’s–it makes DNA synthesizing run more smoothly. As an added bonus, reducing the repetition also makes brute-force hacking much more difficult.
The team’s first test came about 18 months ago with a 180-word tweet. The goal was to turn text to DNA, encrypt it using a unique translation key, and then turn the DNA back to text.
Here’s how it’s done: Using a computer algorithm, the team encrypts a message into a sequence of DNA. They then chemically synthesize the DNA. The DNA is read via DNA sequencing and translated and decoded using the same computer algorithm.
Upon succeeding with the tweet, last fall the team encoded an abridged version of a letter written by former President Harry Truman into DNA. They then spotted the DNA onto a Sandia Labs letterhead and mailed it, along with a conventional letter, around the country. After the letter’s cross-country trip, the Bachands extracted the DNA out of the paper, sequenced the DNA and decoded the message in about 24 hours at a cost of $45.