The cost of software failure can be felt in different ways. For example, in the stock price of a public company, or in a small company, it could mean bankruptcy.
I often see organizations playing Russian roulette with the way they release software — gambling on customer security, private data, and security, not to mention reliability. They are also gambling with their companies’ reputations and bottom lines. IEEE published an excellent list of common failures a few years ago, and you can be sure that software is still failing.
What I like about this somewhat scary comparison is that I often hear people say, “That software has been out for a long time without problems” or “We do it this way all the time and it works” — but it’s still a bad way to plan. A company focused on software engineering is looking for ways to build and deliver better software and fewer failures. This means proactive planning and achieving success by doing the right things, even if the wrong things have worked so far.
Harvard researchers found that as many as one in two IT software projects fail. Other people have a lot of numbers, and this estimate is not the highest, so let’s look at it first. It’s like playing Russian roulette with three bullets in the chamber – a 50-50 chance of failure. I don’t like those odds, and I certainly wouldn’t bet my company’s future on them.
Let’s take a look at some of the nasty gambles people make every day when releasing software. They roulette rounds from guns, if you will.
1. Known old bugs
We all know that the software we release will have bugs, because airtight software takes a lifetime to make. But that’s no excuse for never fixing what we know is wrong. A lot has been said about technical debt in very abstract terms, but this is a really useful yardstick for measuring debt in your software. If there’s a bug there and you’re not going to fix it, you better have a good reason why you don’t think it’s important. Plan for some time with each release, not just to add new features, but generally to make things better. Take the time to polish your software.
2. New errors in old code
Old code is tricky. I’ve seen companies with a policy of “clean it up anyway” and others with a rule of “only touch what you have to touch, and only if there are bugs reported in the field.” Both policies are interesting, but the most important thing is to understand the risks involved when you find a new bug in old code. I worked with a hardware vendor who was struggling with how to handle the output of a new tool on some old code. In their case, it was an ambiguous issue of scope, which makes me wonder how their compiler allowed such madness. They ran into a conflict — they had the new tool, and they shouldn’t touch the old code unless there were bug reports from the field.
It is important to understand what you plan to do with your legacy code, while also fully understanding the risks to your organization. If code is critical, age may not matter as much as you think. If code is being scrapped, you may be wasting your time testing something you’re not going to fix.
3. Make security part of testing, not development
It can be frustrating for organizations to ignore security issues. In some cases, they think they can test security in their applications (they can’t), and in other cases, they think the security issues won’t apply to their code (they will). To get out of this mess of constant security failures, organizations must harden their code with solid AppSec best practices, as well as into static analysis tools that do more than just process analysis. If you don’t know where to start, honestly, it won’t make a difference to simply pull out the MISRA rules and start following them in whatever code you write, starting today.
4. Test suites that always fail and always pass
One of the most common and dangerous practices I’ve seen is to have a huge suite of tests and rely on a simple metric of test passes. For example, you usually have an 80% pass rate, so you think this will be ok. The problem is that there is no way to know whether the 80 percent that was passed today is the same as the 80 percent that was passed yesterday. It’s easy to have a new real failure hidden in that 80%, because something else has been fixed to make the numbers even. Keep your test suite clean or it won’t tell you anything. I would seriously question the value of a test failure that you think can be ignored. Why not just skip that test — it’s a more honest and useful approach.
5. Post it on your calendar
Probably the most common key publishing criterion is the calendar. People chose a date, and now they’re going to publish it, because that date has arrived. Sure, there are some external issues that can affect your release schedule, but just because a date is up doesn’t mean it’s okay to dump bad software on your unsuspecting soon-to-be partner customer. Release it when it’s ready/safe/stable/ready. If the calendar is a fixed constraint, make sure your process gets you there on time.
In a word……
How many times can you do that before you pay the price? To use our Russian roulette analogy, six times at most, maybe only once. Let’s do our best to make sure we’re delivering the best software that has the best chance of success.