Accelerate’s ideas are intriguing because measuring the output of a software engineering organization is extremely difficult. Anyone who has given this issue serious thought has most likely encountered the ‘measuring the unmeasurable’ objection.
In this article, I am going to focus on measuring productivity of software development teams, how to effectively measure software performance and giving examples of they are implemented.
4 Measures of software delivery from Accelerate
Many attempts have been made over the years to measure the performance of software teams difficultly. The issue is that most models have two major flaws: they focus on outputs rather than outcomes. They are more concerned with individual performance than with team or company performance.
The performance of your dedicated software engineering team can be measured with the four measures of software delivery from accelerate. The four measures were adapted from the principles of lean manufacturing by the authors of accelerate. It is important to note that Accelerate is a measure of software delivery performance resulting from four years of innovative research and rigorous statistical methods.
First and foremost, accelerate was created due to a multi-year research project that examined the productivity of a diverse group of software-producing organizations.
Authors Nicole Forsgren, Jez Humble, and Gene Kim compiled over 23,000 survey responses from over 2000 different organizations, ranging from startups to non-profits, manufacturing firms to financial services offices, and agriculture conglomerates to Fortune 500 corporations.
The book, Accelerate, is a summary of those four years of research, which was originally published as a series of papers in peer-reviewed journals around the world.
It is on this note that we introduce to you the four measures of software delivery from accelerate. Each of the metrics can be used to balance or correct other metrics. Let’s see how the four metrics resolve the two identified pitfalls we discussed earlier. As Identified by Forsgren, Humble and Kim, the four metrics are:
Lead time
- The lead time is the amount of time that elapses between the receipt of an order or the perception of a need for an item and the item arriving at the customer’s location and being made available for use.
- Lead time is defined as the amount of time it takes from a task from the moment it started until we offered value to the customers.
- We can also say it is the average time it takes for code to be checked into a version control system and then deployed to production.
The lead time is the time it takes for a customer to request and for that request to be fulfilled. We can say without a doubt that this metric is targeted towards improving software delivery during production and, in the end, ensuring it serves the need of the customers.
To avoid this issue, you should focus on the feature’s delivery rather than its development. According to research, software development is roughly divided into two domains: a highly uncertain, highly variable product design phase at the start of the cycle, and low uncertainty, low variability product delivery phase at the end. The length of time it takes to create and deliver software ideas is measured in lead time. The ability of software developers to react to customers can be improved by reducing lead time. If you want your software development team to fulfill its objectives, you’ll need to make sure you’ve set some, and you’ll need to use software engineering KPIs to monitor their progress. These KPIs will help your team become more efficient and produce a higher-quality product if used correctly.
KPI EXAMPLES
- Time from when a client request a revision to when the revision is completed
- Time from software module ordered to when it is delivered
Development frequency
Reducing batch sizes reduces cycle times and variability in flow, accelerates feedback, reduces risk and overhead, improves efficiency, increases motivation and urgency, and reduces costs and schedule growth.
Development Frequency tells you how many times a software development has been deployed to production in a given time period, typically a sprint.
It can be referred to as the number of deploys or the number of times it is sent out for production in a given time.
The frequency with which code is deployed is referred to as the deployment frequency. Bug fixes, enhanced capabilities, and new features are among the possibilities. Organizations may deploy code bi-annually, monthly, weekly, or multiple times per day, depending on their needs.
You can choose deployment frequency as an alternative to batch size. The assumption here is that software development teams that release software on a regular basis would have larger ‘batch sizes,’ in the sense that they would have more code changes and commits to test. Software teams that practice continuous, on-demand deployments, on the other hand, typically work with very small batch sizes.
Measuring the frequency of deployments can also show the broader effects of changes to your organization’s structure, personnel, or processes over a while. If a senior engineer leaves without being replaced, the deployment time may be slowed, indicating the need to hire more experienced team members. When a team tries to improve other parts of its workflow, the frequency of deployments may change. A concerted measure to reduce technical debt and eliminate bugs, for example, may lead to more deployments in the future. Analyzing and responding to more significant company priorities and practices will aid in faster deployments in an effective organization.
KPI EXAMPLES
- The number of time a bug is fixed in a software
- The number of time a development system is set-out to be unblocked
- The number of time a client request for revision
Meantime to restore
MTTR is a very simple metric to grasp. In the past, software teams measured software reliability by the time between failures. However, with the rise of complex cloud-based systems, software failure is now considered unavoidable. What matters is how quickly teams recover from software failure. How long will it be before services are restored? What is the average amount of downtime?
Most of us should not be surprised by this; after all, uptime is built into most enterprise software SLAs. The authors of Accelerate simply take that and use it as a core measure of stability.
KPI EXAMPLES
- Percentage of cancelled projects
- Missed milestones
- Payment error rate
Change fail percentage
Accelerate’s final metric is change fail percentage, which is the proportion of changes to production that result in a hotfix, rollback, or period of degraded performance. This includes, of course, software releases and configuration changes, as both are common causes of software failure.
This metric is the Lean equivalent of ‘percent complete and accurate,’ commonly used in a generic product delivery process. This is common sense, and it is consistent with our understanding of manufacturing companies: you don’t want to increase factory production throughput at the expense of quality control.
With high pressure on teams to perform, the failure percentage is another vital software engineering KPI. The ratio of defects presents in pre-production testing to those making it to production is your defect escape rate. This enables you to assess your team’s software releases’ overall quality regularly. If you’re having a lot of problems during production, you’ll need to improve in areas like automated and manual testing, as well as quality assurance. When you’ve improved your testing in these areas, you’ll be able to move much more quickly and confidently.
The change fail percentage indicates the percentage of production changes that fail, including hotfixes, rollbacks, fix-forwards, and so on.
KPI EXAMPLES
- Percentage of overdue project tasks
- Cost Performance Index
- Schedule Performance Index
- Number of deploys at the end of the week
Accelerate’s key finding: speed and stability improve go together
The discovery that companies that performed well on speed also performed well on stability is the most counter-intuitive result of this finding. We then conclude that there is no tradeoff between the two!
This contradicts our beliefs that optimizing for tempo trades off against optimizing for software reliability. More specifically, good software organizations appear to perform well on both metrics, bad software organizations consistently perform poorly, and mediocre orgs hover in the middle — and do so on both metrics at every level of quality!
They found no companies that performed significantly better in one metric at the expense of another in their data set.
Spotify’s squad health check
The Squad Health Check Model by Spotify Labs is an exercise that any agile coach or team should have in their toolkit. It is based on careful experimentation over many years. Simple traffic light indicators make it simple for each member to share their point of view.
It helps the squad develop awareness and a balanced viewpoint. Scrum Masters can figure out what will help the team the most.
The first step in resolving a problem is to recognize it. This type of visualization also makes it more difficult for everyone to ignore the problem.
When it comes to assessing the health of a squad (our term for a small, cross-functional, self-organizing development team), there are only two stakeholders to consider:
The squad on its own. The squad gains self-awareness about what works and what doesn’t while discussing the various health indicators. The variety of questions allows them to broaden their horizons. Perhaps they were aware of the code quality issues but hadn’t considered the customer value perspective or how quickly they learn. It also provides a balanced perspective, highlighting both the positive and negative aspects.
People who are rooting for the squad. Managers and coaches who work outside (or partially outside) the squad receive a high-level summary of what is and isn’t working. They can also spot patterns across squads. If you have dozens of teams and can’t talk to everyone about everything, a visual summary like this can help you decide how to spend your time and who to talk to about what.
We primarily do three things:
- Run workshops in which members of a squad discuss and evaluate their current situation from a variety of perspectives (quality, fun, value, etc.).
- Make a graphical representation of the outcome.
- Use the data to assist the squads in improving.
How to get measurable KPI:
- organize workshops with the squads, facilitating a face-2-face conversation around the different health indicators (1-2 hours is usually enough) and use printed cards deck
- use an online tool like Team Retro that includes Squad Health Check
Agile earned value management (AgileEVM)
Agile software development methods have been shown to be effective in the following ways: software is developed faster, with higher quality, and better meets changing business priorities and market conditions. The conventional wisdom was that EVM techniques were too difficult to effectively implement on an Agile project, and that EVM couldn't easily cope with changing requirements. The EVM techniques have been used in Agile projects, and the results have been proven to be correct. AgileEVM is a light-weight, simple-to-use adaptation of traditional EVM techniques that brings traditional EVM benefits to Agile projects.
Using three data points to establish the initial baseline:
- number of planned iterations in a release
- total number of planned story points in a release
- planned budget for the release
In order to calculate the AgileEVM metrics, there are four measurements needed:
- total story points completed
- number of Iterations completed
- total Actual Cost
- total story points added to or removed from the release plan
CP1 > 1 | CPI = 1 | CPI < 1 |
---|---|---|
Under Budget | On Budget | Over Budget |
EV > AC | EV = AC | EV < AC |
SP1 > 1 | SPI = 1 | SPI < 1 |
---|---|---|
Ahead of Schedule | On Schedule | Behind Schedule |
EV > PV | EV = PV | EV < PV |
7 most important UX KPIs
UX KPIs are divided into behavioral and attitudinal KPIs:
KPIs make it easier to communicate your UX issues and the strategic goals that go with them to the right people in your company. You don't have to woo your bosses with equivocal and qualitative arguments any longer. Instead, you can rely on cold, hard facts and figures to back up your claims. It's nearly impossible to prove whether your UX team's actions were worthwhile and effective before, during, or after a project without these metrics. When it comes to determining the annual UX budget, having reliable data makes it much easier to make arguments.
UX KPIs dramatically reduce the complexity of large amounts of data and provide you with quick and accurate information about your product's "health status." In their combined form, UX KPIs act as a physician's pulse, temperature, and blood pressure, allowing him to quickly determine whether something is wrong with the entire organism and whether intervention is required.
Measuring the KPIs that are truly relevant is critical to the success of your UX activities. In a metaphorical sense, measuring a patient's hip circumference to heal a sprained arm is useless. It's best to start by concentrating on the two or three most important UX KPIs for your company or project. This will assist you in staying on top of things and avoiding any unnecessary confusion from the start. Different goals and projects necessitate different UX KPIs that must be tracked. Here are a couple of real-life examples:
- Increasing the number of registrations on the website
- Time spent on task (of the login flow)
- Error rate among users
- Increased sales
- % of tasks completed successfully
- Per purchase, the number of clicks
Behavioural UX KPIs (how users behave)
Behavioural KPIs are numerical representations of what a user is doing and how they interact with a product or website. Without the assistance of an interviewer or observer, this information can now usually be collected completely automatically. As a result, this is a relatively simple and low-cost way to begin collecting UX KPIs.
Tasks completed successfully
The task success rate (TSR) is a commonly used metric that counts how many tasks were completed correctly. You may calculate the TSR if a task has a specified endpoint, such as filling out a form or purchasing a product. However, before you begin gathering data, you must be clear about what targets you consider a success in a specific case.
Although the TSR does not explain why a user cannot complete a mission, it is a first and crucial indicator. Example: In a flower delivery company's online shop, ten respondents are given the task of ordering ten red, ten yellow, and ten white roses using express delivery and credit card payment. Only eight of the testers are successful in completing the task. Two of the testers are unsuccessful:
User 1 is having issues with credit card payments.
On the website, User 2 is unable to locate the yellow roses.
You can calculate the task success rate as follows: 8/10 = 0.8 x 100 = 80% Expert tip:
Measure the TSR of users performing a task for the first time. This allows you to see if and how these metric changes as the user becomes more familiar with the service or product. In general, the better the user experience, the higher the success rate.
Time spent on task
This KPI measures how long it takes a user to complete a task successfully (in minutes and seconds). The average time-on-task is typically reported as the final UX KPI, with the shorter processing time indicating a better user experience. For instance, seven respondents are tasked with finding the customer service phone number on a particular. These are the times taken for them to accomplish this:
User number 1 7 5 2 4 6 3
Time (Seconds) 23 17 59 22 20 30 16
In this scenario, the time-on-task is calculated as follows: (23+17+59+22+20+30+16)/7= 26.71 seconds
Search vs. navigation
Navigation bars represent a vital instrument within the website's 'orchestra.' If a user fails to reach their destination through the given navigation, the search function is the following logical procedure. In most scenarios, the less the search function is used, the better the customer experience. However, it is often advisable to determine which of the two metrics is more desirable on a case-by-case basis. Consider a website with only ten subpages, which usually does not have a search feature and does not need one due to its simplicity.
For example, you assign nine testers to place an online order for three sunflowers from a florist. Then you look at how many people use the search and navigation areas.
- 123456789
- No. of user
- Search X 3 33
- Navigation X 6 66
The search/navigation ratio is calculated as follows: Search 3/9 = 0.33 x 100 = 33 percent Navigation 6/9 = 0.66 x 100 = 66%
User error rate
The number of times a user makes a mistake is referred to as the user error rate (UER). Consider the attempt to enter the user's date of birth in the address field, which is usually unsuccessful. The UER gives you a sense of how user-friendly and straightforward your website is. The higher the UER ranking, the more usability issues there are. It's critical to establish the behaviors are considered errors ahead of time. The user error rate can be determined in a variety of ways. The two most popular forms of measurement are the Level of incidence of errors: This is the metric to use if a mission only makes one possible error (or if there are many and you only want to calculate one of them).
In the 'Repeat e-mail address' area, five out of every hundred users mistype their e-mail addresses. The error incidence rate is estimated as follows: 5/100 = 0.05 multiplied by 100 equals 5% Level of error:
You may use the error rate if several errors are possible per job (or if you want to calculate multiple errors).
For instance, six testers are tasked with completing an international bank transfer through a bank's online portal. The role has five potential errors, with the following distribution of user error rates:
- User's number 1 2 3 4 5 6
- The total number of errors 3 1 2 3 2 1
The formula below is used to measure the error rate:
(3+1+2+3+2+1)/6x5 = 0.4 x 100 = 40%
Attitudinal UX KPIs (what users say)
This UX KPI assesses how customers feel or express themselves before, during, and after buying a product. I'll give three prominent examples of this kind in this section:
The scale of system usability (SUS)
According to its creator John Brooke, the System Usability Scale (SUS) is a "fast and dirty" method for evaluating a product's usability. A 10-point questionnaire with five possible responses, ranging from strongly agree to disagree strongly, makes up the scale.
For instance, suppose you want to assess the usability of your website. The 'SUS' score (0 to 100) can be calculated using the questionnaire data, which averages 68. As a result, if your website receives a score of 68 or lower, it is likely to have significant bugs and will need to be optimized.
Net Promoter Score (NPS)
In one fundamental statistic, the Net Promoter Score depicts consumer satisfaction – and loyalty. Several studies have also shown that the NPS is statistically significant and positively correlates with a company's growth. To calculate the NPS, the user is asked only one question: How likely are you to suggest (brand, website, service, etc.) to a friend or colleague? On a scale of one (Not likely) to ten (very likely), the user responds to this query (very reasonable). The responses are then divided into three classes, with the 'passives' being ignored in the calculations:
(Number of promoters – number of detractors) / (number of respondents) x 100 = Net Promoter Score
For example, with 30 promoters, 10 detractors, and 20 passives, Net Promoter Score is estimated as follows: (30 – 10) / 60 = 0.33 x 100 = 33%
Customer satisfaction (CSAT)
Another attitudinal UX KPI that expresses consumer satisfaction in a helpful metric is the CSAT. The following question is posed to users/testers: How happy are you with (website, product, service, etc.)? The effect is a number ranging from 0 to 100, with 100 representing the highest level of customer satisfaction. The scale usually has five choices, ranging from highly dissatisfied to extremely Satisfied. Since the CSAT score can be calculated quickly and accurately, it can be measured at several points during a customer's interaction (such as in the TOFU, MOFU, and BOFU phases). It is possible to determine where in the funnel the consumer is still stuck using this tool. Customer contentment:
(Number of happy customers) / (Number of respondents) x 100 = percent of happy customers.
Following that, the survey results are categorized and evaluated as follows:
- Extremely dissatisfied
- Dissatisfied
- Undecided
- Satisfied
- Extremely Satisfied
Only the responses of satisfied users, i.e., those who answered 'satisfied' or 'very satisfied,' are used to measure the CSAT ranking.
Conclusion
What gets measured gets improved. Methodical measurement provides a roadmap for identifying areas in need of enhancement and directing targeted efforts to refine processes, ultimately leading to better outcomes.
Software Development KPIs presented in this article can help you not only boost productivity and streamline development cycles but also elevate the overall user experience, solidifying the crucial link between measurement and continuous improvement.