Opinion: The Transient Nature of Digital Design (Part 2)

In the Last Column, we looked at the issues that time-compress and obsolete digital devices, and the ramifications of all aspects of software development. Now we need to look at the transition to actual working hardware, issues with fielded systems, and where the system problems and solutions lie.

Making Hardware

In the aviation world, avionics products have to be created and tested in a certification environment that has many interlocking layers. First, RTCA/DO-160E for environmental conditions, and now RTCA/DO-178B/C for software and RTCA/DO-254 for programmable hardware. The product itself also has to conform to a specific TSO category, with functionality testing in accordance with the matching RTCA/DO standards. This is a serious certification and design burden, and has to be balanced against the time pressure the digital elements of a design create. It is not unheard of for new digital parts to vanish before certification is completed. Today, IC viability and volume is not driven by desktop computers any longer, but by very short-lived cell phone iterations.

One factor that has to be understood is that software is by its nature temporary. To date, no technology exists that allows permanent software installation or even digital data storage other than hard-wired connections. Every technique has a time window (and error rate) attached, and while it may be getting longer, it is always transient. Even IF we had some magical storage concept that did not deteriorate, the surrounding environment does not stay stable enough to make use of it effectively long term. Keep these factors in mind in all your digital planning.

Let’s assume everything falls into place to deliver a workable and fully functional digital design. After all, this happens every day, although rarely on time or budget. How will we then get the software into the digital product? Many end products use a mechanical hard drive or equivalent solid state drive in the target machine. But, what is the lifespan of this drive and thus the lifespan of the software and system itself?

Storage Methods

Today, it is quite difficult to buy a hard drive with more than a three year warranty, and the overwhelming majority have only a one year guaranteed lifespan. Using this as a yardstick, let’s say the operating range is 1 to 3 years. Five years is about average for low use items using high quality drives, but it can be frighteningly short for expensive systems, especially those which are critical and always-on. Even looking at the newest high density Samsung 4Tb solid state drives just released, the warranty is still only 3 years, and that is for the highest grade server drives only. Let’s look at Air Traffic Control, for example, or ADSB ground stations, national power grids, hospital records and nuclear power plants. What’s the acceptable life span there, and what happens when it inevitably fails?

Chip Based Storage & UV Technology

The software might also reside in some kind of embedded memory or firmware, which can be EPROMs, mask programmed ROMs, EEPROMs/NVROMs, Flash memory, or be in programmed logic devices such as PALs, GALs or gate arrays of some kind, often OTP or One Time Programmable. Here we have a real problem with hard (guaranteed) storage times, almost everything is “typical”, usually 25 years for UV (ultra-violet light) erasable EPROMs. Anecdotal evidence sometimes reports lifespans of 5 years or less, and my own personal experience with many high end test equipment systems is that the 25 year UV EPROM threshold has proven largely accurate, as systems are now failing in the 25+ year plus window regularly. I have an HP 8573A network analyzer on my bench right now with that very problem.

Firmware

This firmware volatility problem has even created individuals and user groups dedicated to preserving critical firmware repositories to keep valuable test gear alive when the problem inevitably occurs.

In addition, vendors like Atmel (now Microchip) have extensively documented that the specific programming methodology and algorithm has a significant effect on programmed device lifespan. This part and algorithm dependence means that the $50 Chinese universal USB programmer you bought off eBay may not really do a very good job making high quality, long life production parts for you on every device type and vendor.

Importantly, device programming of this type is based on stored, trapped charge on a microscopic piece of semiconductor material. Physics dictates that this means high temperatures, radiation, transients, static discharge and leakage effects all actively conspire to shorten or corrupt stored code life once parts have been programmed.

The Post-UV Era

Once chip-based memory storage technology did not require UV for erasure, it allowed EEPROMs (Electrically Erasable Programmable Read Only Memories) to appear. This finally let designers avoid the horrors of failure-prone battery backed up RAM (Random Access Memory) to store and change important long term variable data (like calibration settings and computer set ups in almost every desktop computer on earth). These electrically erasable memory parts allowed data and programs to be quickly updated in place, on the fly, which was a huge benefit to hardware designers as well as careless software coders.

These parts had some bad traits initially, and had low erase/write cycle limits (10,000) and short storage (10 years), and if not well designed in circuit, could accidentally erase data during power up/down cycles or through bad operating and programming techniques. While non-volatile parts have matured to longer storage and cycle counts, keep in mind existing parts already in place over the years have not magically self-improved. Current life claims for some EEPROM devices are up to 100 years and a million cycles, but to some degree, those numbers have to be simulated, and are perhaps optimistic.

Optional Memory

This non-volatile storage was eventually married to a microprocessor, to create turnkey system parts where a single chip could contain the program and the means to execute it, as well as system I/O capability. Many Intel 8080 based parts existed using this concept, from several vendors. Microchip was also a pioneer in this embedded processor area, and offered very long guaranteed data retention times (even early parts claimed 40+ years), low failure rates, long term availability and a standardized development environment for embedded systems. They remain a top pick for small embedded controllers, with very long manufacturing availability. I have used them to solve some very difficult design problems with excellent results.

When Bits Go Bad

The key issue here becomes code volatility in place. How long will it last, and what will occur when it becomes corrupted? In the case of programmable logic, the issue will be that the very nature of the logic will change, with significant results of totally unknown impact. Program data storage loss can range from a single unimportant display pixel to catastrophic code failure. The key take away here is that programmed code corruption in devices is a statistical certainty, the only argument is over the elapsed time it will take. No technology currently exists for permanent, incorruptible code storage.

This may seem like a foolish consideration, but applications from spaceflight to critical civil infrastructure can be put at serious risk because of this problem.

I have heard many people quickly respond that this is not really a problem (presumably they don’t see themselves in-flight somewhere when this happens). After all, it’s going to last a long time, and even if it doesn’t, the code can simply be re-programmed. Well, is this actually so in practical terms? Where will this code repository be? How will it be stored, and does the platform to use and load it even exist down the road as a service technique? To be candid, if you are designing a consumer cell phone with an 18-month service life, maybe these issues really don’t apply, but in aviation and many other areas, they are fundamental.

Archival and Enterprise Storage

One aspect that has overwhelmed many companies is the shift in the 1980’s to CAD (Computer Aided Design) as the primary design tool. This software is used for schematics, interconnect drawings, circuit boards, physical parts and cases, drawings and 2D and 3D views. In addition, almost all documentation is done via word processing and graphical design tools. In essence, all design is now digital data, and thus highly volatile and transient as a result.

Every program change or update (with its compatibility problems), and every operating system and hardware upgrade brings with it a host of problems and a loss of user productivity to the existing tools. I know of no company that has escaped these problems, so you can be certain it will visit you. As a final bonus, it also makes the entire data storage process highly volatile and vulnerable to loss, just like any other software. Thus, the issue of robust data storage now affects every aspect of a company’s activities.

Today, data file storage often is done via methods like recordable DVD or CD-ROM media, sometimes hot back-up in secondary systems, or thumb drives (USB flash keys). The use of floppy disks, tape drives or cartridges, magneto-optical drives, punched cards or tape, and removable disk cartridges have pretty much fallen out of favor, although large format tape still remains a trusted archival standby at banks and NASA. You might very well already have old critical IP in a now obsolete format.

Many people with programmable physical parts make up a set of master chips, carefully packed in anti-static foam, and secured safely (and well identified) with a hash total sticker for checking. This allows fresh parts to be easily copied in a gang programmer without any interim computer step, and this avoids many awkward problems of file creation, checking and transfer. This is a solid technique until the hash total does not work (probably due to static handling issues), or the master parts are lost, foolishly consumed or carelessly thrown out.

At that moment, re-creating the programmable parts to serve as masters can pose an incredible physical hurdle to overcome. The tool chain to make (or modify) programmable code is formidable, especially once the host machine, its software configuration, and interface ports to the programmer are no longer handily available. I have seen this problem endlessly repeated in every company, so you clearly need to plan for it. Keep in mind, the supply of the programmable parts you intend to use is also not infinite, and once gone, it makes no difference if the files still exist.

Storing back-ups on removable media or a server can be a viable technique, but among the problems that can crop up are accidental deletion, a loss of the ability to transfer code to a suitable programmer, deterioration of stored files, forgotten encryption keys and media that can no longer be read. Files may also require programs or environments that are no longer available, making them essentially useless. This is the inevitable problem of time in a digital environment.

How long does good quality removable back-up removable media (and thus your company’s entire IP investment) last? Sadly, you may own files that are already unrecoverable, because they were made on early CD-recordable media or players, which were fairly problematic. There were many reports of storage life as short as 3-5 years, and many disks damaged by heat or humidity, thanks to the light-sensitive organic dyes used in the media. Read-Write or re-writable (-RW,+RW) media has proven especially problematic, and is not a good choice for archival storage. Archival recordings should also always be made at low speed, for the best possible media retention, and checked annually.

We do have a fairly good study made by the library of congress that shows that reliable simulated CD-R and DVD-R storage of at least 30 years (using current high quality media and drives) was achieved at 25C and 50%Rh if safely stored. See: https://www.loc.gov/preservation/scientists/projects/cd-r_dvd-r_rw_longevity.html

They noted a high degree of variability, also that CD storage was more robust than DVD storage, but also had to add that there is no guarantee that compatible DRIVES might still be working 30 years later, which is a very realistic concern.

CD-ROM

A company called M-disc (http://www.mdisc.com/) has now created what they claim is 1,000 year recordable DVD and Blu-Ray media. I have not used them, and I think the estimated time may be a bit optimistic, but it is a practical alternative to consider. Oddly, none of the info or technology links currently work on their website. You can see a 2015 review here, which includes some very useful DoD test data: https://www.pcworld.com/article/2933478/storage/m-disc-optical-media-reviewed-your-data-good-for-a-thousand-years.html

Drive compatibility for writing M-discs can be a problem, but readability is generally regarded as universal.

USB keys or thumb-drives have seen widespread use as temporary data storage and as a transfer medium for large files, as 16/32Gb sizes are now cheap and common. These flash-memory based devices can give good performance, but can be made un-readable by incorrect insertion or removal, and can be damaged by static and high heat.

I have recovered files from up to 5 years ago with no problem, but because I have had so many unexplained drive failures, I am reluctant to recommend them as a long term storage technique. These devices may also contain malware which is not visible but launches on insertion, and can cause considerable system damage when deployed.

What is the undisputed longest lasting archival medium in use today? As it turns out, it’s paper. Lifespan of 100-500 years is easily achieved, and no special device (other than a desk lamp) or software is required to read it. I always make sure I have paper back up as well as any electronic storage, and it has saved me many times.

Work Environment Techniques

One technique you can use to make this digital design and production process less painful is to make your workstation more flexible by installing VMWare, and adding any physical ports you have to have (serial and/or parallel) for device programming, then adding the operating systems and other code you need to maintain your earlier development work. You can also use dual boot techniques to keep multiple operating systems and their code bases alive, but be aware that damage to the bootloader can cause loss of all environments. Using a high density removable hard drive (like a WD Passport) to do periodic back up can be very effective, and volumes of 1Tb or larger are inexpensive at roughly $100.

Today, I use Linux as my base platform (Ubuntu), and keep Windows XP and 7 alive in their own VMWare spaces. I don’t do this for Windows 10, I have a drive switch to toggle to another clean bootable drive for 10, as it has had way too many problems to make it any part of my daily work flow. This strategy allows me to live safely on the web in Linux for mail, browsing, word processing and design, and keep my vulnerable Windows sections and their apps in safe containers. I can still reach far back to look at older work when needed, but run on a much safer platform for daily work.

While we used to think of sudden computer and software loss as some kind of a wildly improbable situation like a nuclear EMP event, it is important to recognize that in today’s world, it can happen through a single mouse click. Just plugging in an unknown USB device, use of an infected device or drive from home, visiting the wrong web site or clicking on a malicious email attachment can trigger a network event that can result in total system-wide data loss.

Just as Taiwan Semiconductor (TSMC, one of the world’s biggest silicon foundries, supplier to AMD, Apple and NVIDIA) was shut down at the beginning of August in 2018, by an unintentional virus infection during an update, it can happen anywhere and to anyone and on any scale. It is the volatile nature of ubiquitous digital data in a connected environment that now makes this significant risk possible.

previous2018: A Year of Major Movement in the Industry nextClearing the Skies for Satvoice