Massive long-term data storage

« previous post | next post »

News release in EurekAlert, Optica (10/28/21):

"High-speed laser writing method could pack 500 terabytes of data into CD-sized glass disc:  Advances make high-density, 5D optical storage practical for long-term data archiving"


Researchers developed a new fast and energy-efficient laser-writing method for producing nanostructures in silica glass. They used the method to record 6 GB data in a one-inch silica glass sample. The four squares pictured each measure just 8.8 X 8.8 mm. They also used the laser-writing method to write the university logo and mark on the glass.


Yuhao Lei and Peter G. Kazansky, University of Southampton


Researchers have developed a fast and energy-efficient laser-writing method for producing high-density nanostructures in silica glass. These tiny structures can be used for long-term five-dimensional (5D) optical data storage that is more than 10,000 times denser than Blue-Ray optical disc storage technology.

“Individuals and organizations are generating ever-larger datasets, creating the desperate need for more efficient forms of data storage with a high capacity, low energy consumption and long lifetime,” said doctoral researcher Yuhao Lei from the University of Southampton in the UK. “While cloud-based systems are designed more for temporary data, we believe that 5D data storage in glass could be useful for longer-term data storage for national archives, museums, libraries or private organizations.”

In OpticaOptica Publishing Group’s journal for high-impact research, Lei and colleagues describe their new method for writing data that encompasses two optical dimensions plus three spatial dimensions. The new approach can write at speeds of 1,000,000 voxels per second, which is equivalent to recording about 230 kilobytes of data (more than 100 pages of text) per second.

“The physical mechanism we use is generic,” said Lei. “Thus, we anticipate that this energy-efficient writing method could also be used for fast nanostructuring in transparent materials for applications in 3D integrated optics and microfluidics.”…

Paper: Y. Lei, M. Sakakura, L. Wang, Y. Yu, H. Wang, G. Shayeganrad, P. G. Kazansky, “High speed ultrafast laser anisotropic nanostructuring by energy deposition control via near-field enhancement,” Optica, 8, 11, 1365-1371 (2021).

Ever larger storage has been a topic of recent Language Log posts, and is always a concern for scholars and scientists working with large corpora of data.  The types of storage devices continue to proliferate and expand in capacity, enabling researchers to undertake ever more daring projects.

Does anyone remember the Bernoulli Box?  That was what made it possible for my colleague, Robert M. Hartwell (1932-1996), to store vast amounts of biographical and geographical data for his China Biographical Database.  CBDB was, and still is, an incredibly rich tool for studying all sorts of fascinating problems, including issues concerning literacy in premodern times.  Yet the storage capacity of the Bernoulli Box was minuscule in comparison with that of modern personal computers, and the technologies are completely different.

Selected readings

[Thanks to John Tkacik]


  1. Duncan said,

    October 29, 2021 @ 11:09 pm

    While the reading/writing technique may be optical, these are in practice far more comparable to hard disk drive technology, particularly the modern HAMR variants which already incorporate lasers, with a current areal density of 2Tbit/in^2 (from wikipedia), 5Tbit/in^2 by 2023, and a target of ~10Tbit/in^2 by 2030, by which point they're projecting 100 TB hard drives, not /that/ far from the 500 TB given here.

    Meanwhile, current hard drive transfer rates are ~250 MByte/sec single-actuator, 480 MByte/sec dual-actuator, so even the single-actuator speed is more than three orders of magnitute (MByte vs KByte) faster than the article's given so-called "fast" speeds of 230 KByte/sec.

    At 230 KByte/sec the given transfer speeds are between that of 1x and 2x CD, 150 and 300 KByte/sec respectively. For curiosity, I did the math and unless I screwed up the TB/GB/MB/KB conversion factors, at the given transfer rate of 230 KByte/sec we're talking nearly 70 *years* to write the full 500 TB capacity! Talk about transfers taking a lifetime!

    Finally, at that data capacity a speck of dust could easily wipe out several tens of GB worth of data, say a week's worth of transfer capacity, so they'd definitely need shielding again more comparable to today's metal-encased hard drives than a "bare" CD/DVD/Blu-ray disk — for exactly the same reason hard drives are so encased.

    So from the public/practical side of things these really aren't that much different in storage scale to a reasonably near-term hard drive, perhaps a factor of 5 higher capacity, while being *much* slower, by a factor of 2000 or so.

  2. Chester Draws said,

    October 29, 2021 @ 11:50 pm

    Hard drives are faster and store a lot. They are also prone to rot over time, as well as mechanical issues.

    There is the need out there for storage that is not going to degrade over time.

  3. bks said,

    October 30, 2021 @ 7:31 am

    What's the theoretical maximum storage of scribing a clay tablet? Those seem to have staying power (and fire only improves it).

  4. Philip Taylor said,

    October 30, 2021 @ 8:18 am

    For fellow 3-D humans whose minds (like mine) were boggled by the thought of five-dimensional optical data storage, this Wikipedia article may shed some light.

  5. Stephen Hart said,

    October 30, 2021 @ 10:31 am

    Maybe I'm being math challenged, but how do they go from 6 GB on one square inch to 500 TB on 17.5 square inches (CD size)?

  6. RP said,

    October 30, 2021 @ 2:48 pm

    Thank you @Philip Taylor for some help with what on earth 5D might be. I know of 3 spatial ones, but what is an 'optical dimension'? I read quite a lot before it was actually explained.

    Tl:dr Each storage nanostructure has a 3D position plus an orientation and the strength of the light to capture information. Thus one disc has several different images depending on the angle that one views it from, and the magnification of the microscope used to view it.

    I think this might also address @Stephen Hart how the amount of data in a small area cannot just be multiplied to find out how much data in a bigger structure. It's more complicated than that.

  7. Bloix said,

    October 30, 2021 @ 7:49 pm

    True story. Forty years ago I was using the new state-of-the-art computer center at my little liberal arts & sciences college to type up my senior thesis in history (I was doing it in July when there was availability at the terminals that were hooked to the brand-new mainframe.) There were lots of people in there playing Adventure and I got started on it. I never got hooked but I did make it to the waterfall with the mountain of jewels behind it, and I got to thinking.

    Next day I said to the guy at the check-in desk, "one of these days that Adventure game will have pictures and you'll be able to move around on the screen." The guy sneered and said in a voice of utter contempt, "Do you have any idea how much memory that would take?"

  8. ~flow said,

    November 1, 2021 @ 5:07 am

    There's an adage with IT folks, it goes "no-one wants backups, everyone wants to restore", meaning that while systems people love to philosophy about how to best store copies of user data, users couldn't care less—until something goes belly-up and then they want their data back, as fast, convenient and up-to-date as conceivable.

    I think the takeaway is this: storing data is just one part of the equation, other factors such as durability, discoverability, readability are crucial, too, in the sense that if any of these (and more) factors should fail, the data will likely be lost. Take the vinyl record as an example: By today's standard, it's not a lossless medium or one with paramount storage density—a same-area plastic sheet could easily store thousands of times the amount of data as a floppy disk. On the other hand, vinyl records are cherished and reasonably durable cultural artifacts, and ones that can be read out by very simple means: a rotating disk and a stylus (if needs be, a toothpick would suffice for a crude preview). The sum of these properties mean that vinyl disks rather than CDs or HDs might be the essential media to carry our musical heritage into decades and maybe centuries to come.

    A great but also sad data archiving story is the loss of the original Apollo tapes. In the day, NASA had gone out on a limb and in order to make sure mankind's landing on the moon could be received live in moving pictures by any household on the planet with a TV set. Specifically, they had to invent a new, efficient video format, complete with recording machines and storage media so data could be streamed from the Moon via radio. They were so much ahead of their time that what everyone did get to see in 1969 was actually filmed by television cameras from the screens that were on display in Australia, receiving data via the Honeysuckle Creek antenna. The cameras had to film the actual custom-format monitors because no other data conversion was in place. As a result of this, the pictures that got broadcast were of considerably worse quality than the pictures that were actually received and stored on bespoke-format magnetic tape. And as incredible it may sound, those very tapes, the original transcripts of mankind's first step into the great big void—they got shuffled from NASA to the Smithsonian and onward, until they got lost at some unknown point in time and space. If, just IF they should ever turn up again, our only hope to retrieve the data from these tapes rests on the shoulders of a few elderly retired men who are, with NASA's support, dedicated to keep the single machine (!) in working order that could play the tapes. The chances are slight.

    Because of these considerations I put little hope into innovations that manage to burn patterns worth terabytes of data into teensy shards of glass. The glass is not at fault—that is maybe the most durable material that we have. But the technology to even only read out the data is at present limited to a single experimental rig in a single laboratory, hence perishable (i.e. almost sure to not exist in ten years hence). Future generations, be they post-apocalyptic or even-more advanced than we are now will likely not have the immediate means at their disposal to decipher yet another hyper-dense storage format, of which there are already that many. One archivist of moving pictures once put it this way: We have no shortage of storage formats, our problem is the plethora of them.

    This is not to belittle the achievement, but just to sober enthusiasts a bit. There's the Arctic Vault ( project, a long-term archival effort; interestingly, they apparently use silver halide films on which QR codes are pictured, and they also have instructional material that detail how to decode those images, those are photographic pictures for which optical magnification suffices to be read.

    The more I come to think of it, microforms ('fiches') look more and more like a viable storage format that could help preserve knowledge into the future. This technology, like so many others, has been maybe too readily abandoned in favor of 'digitalization' (i.e. throwing our entire society into the maw of the computer). I just had to live without electricity for one and a half days over here. You know what? You pull the plug and everything goes dark. There's one and a half gazillion components on a computer motherboard; should any one of those fail, your data is unreachable. The circuits have to dance in a finely choreographed pattern (called an OS) or your data is inaccessible. Software, like DNA, only works under very specific environmental conditions. In so far, it is less of a blueprint and more of a crib sheet: a seed that needs a jungle to flower.

  9. Philip Taylor said,

    November 1, 2021 @ 7:50 am

    True poesy, ~flow. I am not normally a fan of blank verse, but I willingly make an exception in your case.

RSS feed for comments on this post