Building Quality Software Part 1
Good enough is better than perfect
Have you read the previous blog yet?: Writing Quality Code, Part 2
“I hear what you’re saying,” the project manager says to the team. “But let’s see what Alex thinks.”
It’s been a few days since Alex joined the Public API team. Switching work between two different teams is difficult, but it’s clear that both teams need help. The Public API team is not doing well. Besides a shortage in the number of team members, the developers and the project manager are in constant disagreement. The developers feel that the team needs to focus too much on new features and other user requests, while the project manager feels that the team is working too slowly.
Alex thinks a bit before responding to the project manager’s prompt. “If I have heard the arguments correctly, there is a well-filled backlog of potential new user features and customer requests. But you, as project manager, don’t think the team is working fast enough and we’re losing money because of the high lead time. And you, as developers, are saying that increasing the speed will decrease the quality of the software we build. The application health will suffer if we build new features in a rush.”
“Correct,” one of the developers says, “However, I would stress that we’ve already done quite some damage to the application health by quick and shoddy jobs. It is far – FAR – from perfect. We need to refactor a lot before we should even be thinking about adding new features.”
The project manager sighs. “You always say that. But management wants our education partner to connect to our e-courses. That would be a huge source of revenue. Surely that can’t be so hard?”
Alex politely coughs. “What I am hearing here is people trying to communicate in two different languages. There’s no way to find agreement if we’re not all sure what we’re even talking about.”
“I think we understand each other fine,” the project manager says.
“Yeah, we just don’t agree,” one of the developers adds.
Alex shrugs. “Humor me. Hm… Do we have a flip chart in here?”
Why I value quality software over quality code
In the last two entries in this series, I argued that code styles, code standards, and even code quality don’t matter as much as others claim. I may also have led you to believe that all I care about is processes, documentation, and automated tests. I got a few responses that claimed I sound more like a business analyst or a test engineer than a software engineer. And that’s not an unfair characterization, nor does it seem to have been intended as an insult.
But here’s the thing: as a software engineer, I care about quality software. I only care about quality code insofar as it helps create quality software. And in my experience, code quality isn’t a hard prerequisite for software quality. Sometimes it helps, sometimes it makes no difference, and sometimes, blind focus on code quality will be detrimental to overall software quality.
So, in this entry, I will try to explain why software engineers have so many more important things to worry about than stuff like the “one-dot-per-line”-rule, or the never-ending discussion regarding “naked-ifs”. With code quality, the only people in the discussion are those who develop or test the code. With software quality, the goal is to have a discussion where everyone involved in the software development process understands each other. And to that end, we need two important things: a shared language, and shared goals. And those goals need to be measurable, and they need to be measured. We need software quality metrics.
Correctness in Software Engineering
The first software quality metric is correctness. If you’ll excuse the generalization, software engineers tend to be perfectionists. We like things to be just right. If people will let us, we spend hours, days, and even weeks tweaking and refining our code until it's as close to perfect as we can make it. I find this is true for my recreative personal software projects, where most of the fun is in getting stuff just right.
This willingness to tweak helps software engineers create software with a high degree of correctness. Correctness refers to the extent to which software behaves as intended. In simple terms, a system can perform its functions correctly, without any bugs or errors. Correctness is one of the first measures of quality developers learn, whether formally or intuitively.
The unit of measure for correctness is defect density (DD).
DD = (number of defects / number of lines of code) * 1000
A lower DD indicates a higher correctness. It is possible to estimate the DD of undiscovered defects in a piece of software by tracking the number of defects over time. Note that refactoring, especially the kind of refactoring where an engineer decreases the amount of code, might increase the DD. This also means you can decrease the DD by adding useless lines of code or refactoring your code, so it consists of more lines. Firstly, don’t do that. Secondly, unless your system is really small or the number of defects is absurdly high, that approach won’t help all that much.
Now, it is very appealing to declare that correctness should be absolute and that the defect density should be 0. However, that’s not only almost impossible, but it’s also an utter waste of time. All that effort isn't worth it. Why? Because sometimes, lower quality means more value. This might sound counterintuitive, but it’s obvious. Let's say we're building an app. Currently, the app has a DD of 5 (meaning 5 errors per 1000 lines of code). We've got two options: we can spend a lot of time refactoring and fixing the app until we reach a DD of 0. Or we can spend much less time, bringing the DD down to a level we find acceptable (for example, 2, or 4), and go live with some errors still present. Which one do you think will be more valuable? Earlier access for the users will usually prove that option 2 is the correct choice, unless the defect density is astronomical, and users reject the software out of frustration or mistrust.
Firstly, the task of decreasing DD gets exponentially harder the closer you get to 0. Secondly, there is another important metric here: time to market (TTM). The longer your app, or a new feature in your app, is available, the more value it creates.
So, if 0 is not the target, what should be the target DD for an application? You’ll find a lot of sources claiming 1 defect per 1000 lines of code is a “good” DD, but I’d like to counter that number with an age-old software engineering idiom. It depends. There is no such thing as a universally accepted target number for DD. Sometimes, high quality is essential. For example, if you're building software for the medical industry, you can't afford to have bugs or glitches. Other times, for example, when you’re building a game, optimizing for a very low defect density may prove fatal as your increased development time means newer engines hit the market and your game is outdated even before you’ve finished it.
In the end, you need to discuss the target correctness with your team. Luckily, correctness is something everyone – including project managers, product owners, and even users - can understand. But for many industries, a basic, functional product is all that's needed to turn a profit.
Next up, we have the second software quality metric: reliability. Reliability is a critical factor in software engineering. It can be defined as the ability of a system to perform its intended function under specific conditions for a specific period of time. In other words, reliability is the measure of the system’s ability to consistently perform without failing, whereas correctness is the system’s relative number of defects.
Reliability is particularly important in applications that handle sensitive or mission-critical data, such as financial transactions, healthcare records, or military operations. Like with correctness, you’ll need to understand the required level of reliability for a given application. This can vary depending on the industry, the application's purpose, and the consequences of failure. In some cases, the business may require high reliability, such as in medical equipment or aviation systems. In other cases, the business may be willing to compromise on reliability to save costs, such as in a consumer mobile app. Regardless, it is the software engineer's responsibility to ensure that the application meets the required level of reliability.
One way to increase reliability is to adopt a DevOps mindset. DevOps is a set of practices that combines software development (Dev) and IT operations (Ops) to improve the quality and speed of software delivery. By implementing DevOps practices, software engineers can automate the deployment process, which reduces the chance of human error and ensures that the software is deployed consistently.
In addition, software engineers can use monitoring tools to detect failures and help with repairs. Monitoring involves tracking various performance metrics, such as response time, CPU usage, and memory usage, to detect anomalies that could indicate a failure. By detecting failures early, engineers can take action to fix the issue before it becomes a major problem.
When it comes to measuring reliability, two important metrics are Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR). MTBF is the average time between failures of a system, while MTTR is the average time it takes to repair a system after a failure. A high MTBF and a low MTTR are preferred. In my eyes, it is a core task for a software engineer to discuss the reliability metrics with their team and the business. By doing so, everyone can understand the level of reliability that is expected and work together to achieve it.
By now, I hope I’ve explained why you should be setting targets for correctness and reliability. One of the most critical reasons why those targets cannot be set at perfection – besides of course those that I’ve mentioned earlier – is costs. Everything costs money, and most of us are writing software intended to make money. Even if you happen to be working for a non-profit organization or building an application for fun, it is still a goal to minimize the costs involved.
Now, we expect the business to understand correctness and reliability, right? So, I think it is only fair that we expect ourselves, as software engineers, to understand the financial side of things. That means a software engineer doesn’t tune out when people are discussing budgets, costs, and return on investment.
ROI – profit earned on an investment / cost of the investment
Note that the ROI of a feature is, of course, always still an estimation during the software development phase.
Software engineering is a business, and it’s essential to consider ROI when developing software. That consideration is usually the responsibility of other people in the company, but as a software engineer, you need to be able to connect with those people, understand where they’re coming from, and translate the software concerns into terms of costs and profitability.
Taking ROI into account helps to determine the level of quality your software needs to achieve. Lower-quality software can sometimes be more profitable than high-quality software, depending on the market and customer expectations. For example, if a company is developing a software product for a niche market with very few competitors, it may be able to get away with lower-quality software because customers have few alternatives. On the other hand, if a company is developing software for a highly competitive market, it may need to invest in higher-quality software to differentiate itself from the competition. However, even in this case, there is a limit to how much investment is worth it.
It’s essential to find the sweet spot between quality and cost that maximizes profits. My suggestion for a way to strike that perfect balance is to focus on the lowest possible cost while still meeting the minimum requirements for correctness and reliability. “Minimum” sounds like an ugly word here, but let’s rephrase it: we’re optimizing ROI while maintaining the required correctness and reliability.
Taking on debt
Sometimes, focusing on ROI means we choose to take on technical debt. Technical debt is not (necessarily) something that the business forces on us. Like any form of debt, it is a way to spend what we don’t have right now. In this case, we’re exchanging software quality for an expected higher ROI. When done well, we expect technical debt to help us find the sweet spot. When done badly, the interest on the debt will only serve to increase costs. In other words, accruing technical debt is a strategy that you can utilize to reach a target. A complex strategy, sure. A strategy you, as a software engineer, should understand. Far too often, software engineers will say that “all technical debt is bad”, and to that, I say: technical debt is a necessity in the real world. Yes, you’ll probably need a repayment plan. But if you take on debt, you can increase the investment, and when done properly, that will increase the return as well.
Some metrics that indirectly measure technical debt are Cycle Time (the amount of time it takes from a commit to its deployment), and Bug Ratio (the number of open bugs divided by the number of closed bugs plus the number of open bugs). For both, lower numbers are better. But the best metric for technical debt, in my eyes, is the Technical Debt Ratio. You can measure TDR in either time or monetary value, which allows easy comparison to other metrics that are expressed in either time or money. TDR = (Remediation Cost ÷ Development Cost) × 100
Here, "Remediation Cost" refers to the cost required to fix or address the technical debt in a software system. This may include the cost of identifying and prioritizing technical debt, refactoring code, or updating dependencies. "Development Cost" refers to the cost required to develop or maintain the software system without considering technical debt. This may include the cost of developing new features, fixing bugs, or performing routine maintenance tasks.
The technical debt ratio formula expresses the percentage of remediation cost relative to the development cost, indicating the level of technical debt within a software system. A higher ratio means that a larger proportion of the development cost is required to address technical debt, which may indicate a higher level of technical debt in the system. Manual estimates can be inconsistent and subjective, so they are less trustworthy from a business perspective. Luckily, you can also automate this estimation with certain tooling, like SonarQube.
Now, I skipped this earlier, but you can also accrue technical debt not as a strategic decision but because of an error in technical leadership. Problems such as poor technological design or flawed review processes, tools, documentation, and test suites can arise without the business putting any “pressure”. This is called unforced debt and software engineers are responsible for avoiding this kind of debt (if you’ll pardon the pun) at all costs.
Finally, besides ROI and technical debt, there is one other important way a software engineer should be aware of costs. And that is software development efficiency. It is up to the software engineer to make sure that development time is spent efficiently, and on the right things, to prevent unnecessary costs. Simply put, you’re the expert and you should act like one.
Depending on the length of a software project, that expertise can mean a few different things, but usually, there is a lot of money to be saved by making sure the following processes are all part of your software development strategy: ticketing and/or user stories, version control, code reviews, a definition of done, continuous integration, automated tests, automated (potentially even continuous) deployments with infrastructure-as-code, monitoring, alerting, and user interaction during development.
Basically, by automating repetitive tasks, documenting work, and increasing interaction between the development team, the system, and its users, engineers can reduce overall development costs. Minimizing costs requires a pragmatic approach that considers the market, customer expectations, and the company's financial goals.
In other words: it is your job to aim for “just good enough” when “perfect” or even “very good” is more expensive and unnecessary. And that holds true regardless of the warm fuzzy feeling we all get when imagining all the different ways we could improve a system.
Speaking the same language
After grabbing a flip chart and a marker, Alex begins to map out the various metrics that the team needs to consider. “Let’s start with Return on Investment,” Alex says. “What is the potential value of connecting our education partner to our e-courses, and how much will it cost us to build?” The team starts throwing out numbers and ideas, and Alex jots them down on the flip chart. A few more metrics and a lot more discussion follow, and soon, the flip chart is filled with numbers and concrete discussion points.
After a lengthy discussion, Alex summarizes the metrics and their implications for the Public API team. “Based on our discussion, it seems like connecting the education partner to our e-courses would provide a high enough ROI to take on some more debt. However, we need to be aware of the reliability of the application. It is dangerously close to being unacceptable. Next time, we need to create a plan to pay off some technical debt as soon as we are done with the education API. That seems to be the best way to keep to all the targets we set.”
The team nods in agreement. One of the developers even suggests adding those targets to the team’s definition of done, after which more agreeable nodding follows. The project manager thanks Alex for helping to bring clarity to the conversation. The next few weeks, while the team still disagrees, they can at least easily express what they are disagreeing on.
It is then that the product owner of the Etrain Public Website team contacts Alex. “We heard what you’ve done for the Public API team, and we’d like you to help us out as well.”
“You’re having problems?” Alex asks.
“No, actually! We’re up for e-learning website of the year,” the product owner responds. “But we think you can really increase our chances of winning.”
What’s still to come, an overview
An overview of what has already been posted and what is still to come, here is a full overview:
- Don't Skip, Hot Take
- Writing Quality Code (part 1)
- Writing Quality Code (part 2)
- Building Quality Software (part 2)
- Creating Ethical Software (part 1): 03-07-2023
- Creating Ethical Software (part 2): 01-08-2023
- The Development Process (part 1): 02-09-2023
- The Development Process (part 2): 01-10-2023
- The Software Engineer Oath: 01-11-2023
As always, I ask you to contact me. Send me your corrections, provide suggestions, ask your questions, deliver some hot takes of your own, or share a personal story. I promise to read it, and if possible, will include it in the series.
Writing Quality Code, Part 1
Azure Blob tags: Working with numbers