Grab a drink and sit back. It’s time for my favorite bug story.
IT was my first IT-related job: a summer intern in software development at a manufacturer of major medical devices. The devices are mostly anesthesia delivery systems and patient monitoring devices, which are boxes that beep next to bedridden patients and graphically display their pulse, blood pressure, breathing and so on. If the ECG changes to a straight line, a nurse is summoned immediately. The office was filled with 2-meter canisters of nitrous oxide, and embedded systems giants with super beards, and a room full of people preparing documents for various devices to get them approved by the FDA. From time to time, there are whispered references to a bug that was missed in testing a decade ago, causing an anesthesia machine to restart in the middle of a procedure. Needless to say, for a teenage novice like me, all the production systems were out of reach.
(bole online recharge: Nitrous Oxide (Nitrous Oxide), also known as laughing gas, colorless gas has a sweet taste, is a kind of antioxidant, chemical formula N ₂ O, under certain conditions to support combustion (with oxygen, for laughing gas can be broken down into nitrogen and oxygen at high temperatures), but stable at room temperature, slightly anesthesia effect, and can cause person laugh.)
They did, however, offer me an enviable job testing a prototype project that sounded fashionable in 1997: A Server written in C++, which listens to the serial port of the patient monitoring device, and then forwards some events that need attention to the SQL Server database, and then sends the data to the Java Applet via CORBA, so that the doctor or related personnel can see the patient’s status through the Internet. It can see real-time data as well as browse the data records between. Handsome!!! I just didn’t know any of these languages or systems at the time!
The next few weeks were a bit of a pig’s kill, with most of the time spent reading the troublesome Visibroker ORB manual and the super-ordinary conversion bug, but I finally got my “Simpson” system stumbling and running, using “Homer.” Simpsons dad) to record and provide data, and then “Bart” to display it. In the past few weeks, CORBA has been ridiculously complex, AWT is out of my mind (GridBagLayouts, for example), and applets are at snail’s pace, but Java still looks like a decent language. There was one snafu: the C++ server would crash every now and then, and I started trying to figure out why.
Because the monitor I was listening to was in another room, I did most of my development and testing in manual “demo” mode, simulating a cardiac arrest in a loop, and as far as I know, my server never went down in the process. It did crash when I or someone else handled the controllers manually, especially on a real machine, but I couldn’t find a way to reliably reproduce it, no matter how hard I tried. I logged all the events onto disk to find out exactly what happened before the crash, but I carefully manually repeated each event in the exact sequence of events (e.g., set the filter to X, turn the controller knob three ticks to the right, click the button…). I ran between the two rooms (because I couldn’t see the logs on my computer when I was fiddling with the patient monitoring device) but never managed to recreate the crash. Whatever the “ghost event” was (that’s what it was called to me), it must have caused the crash while evading all the logs. Are there any serial I/O or hardware issues that interrupt events? Did cosmic rays change the data bits on my PC?
Review images
I spent all day all day long time trying to recreate the error, but there is no results, after experiencing frustration of a few weeks, I finally just in all events received from the serial port and write in the middle of the operation of the database with the printf statements, in this process, I review every line of code, and then finally saw dawn gradually.
When I created the database structure, I made a space-saving mistake, a common rookie mistake: I mistook the timestamp as the primary key. So if two events occur within a millisecond, the database will throw an exception with primary key uniqueness constraints. I had noticed this before, but I thought it was very rare and would only happen in less critical environments (such as tampering with the internal configuration of a monitoring device), so I simply added a catch statement, wrote a warning message to the log, and continued.
But! This is old-school code, logging is written in C-style code that logs the log string into an 80-character buffer. The unique exception message itself is a constant, and the log timestamp is formatted with the full English spelling of the week (%E), so the output looks something like “Monday, July 17, 1997, 10:38:47.123.” Finally, because the spelling of the day of the week in English has an interesting property:
What day | Word length |
---|---|
Sunday | 6 |
Monday | 6 |
Friday | 6 |
Tuesday | 7 |
Thursday | 8 |
Saturday | 8 |
Wednesday | 9 |
See? Wednesday (Wednesday), and only on Wednesday, if someone in the monitor configuration manual for a specific operation, there were two events within the same millisecond, so lead to database exception is thrown, the exception message including the terminator at the end of a string of words, just 81 characters, It caused an 80 character buffer overflow and crashed the program!
After that, I made sure to use a special, incremented integer ID as the primary key for all database tables THAT I needed to use, and to record all logs in ISO format (YYYY-MM-DD) instead of the day of the week. Over the years, I’ve learned that no matter how random and unpredictable a bug may seem, there’s always a logical explanation if you dig deep enough, and very few truly “irrelevant” errors are almost always your own fucking fault.
At one point in Dave Baggett’s programming career, however, he was tweaked by quantum mechanics.