How to Improve Quality?
What to focus on?
Product engineers main purpose is to deliver value to customers in the form of features (code). The quality metrics should reflect this purpose.
There are 3 types of quality issues
Incidents: When a user cannot use the system.
Production Bugs: When a user fails to use a feature.
Staging & Development Bugs: Errors or issues identified pre-production. Doesn’t have impact on the user.

There is major distinction between top 2 parts of the pyramid and the bottom layer.
The top 2 parts reflect “user pain”. Bottom part reflects “development pain”. As a business we should mainly care about users and engineering teams should align with business goals.
Product Engineering teams should track customer facing bugs AKA
Incidents
Production Bugs
Incidents & Production Bugs
reflects user pain
looked periodically (weekly/monthly)
reported regularly (see First Principles of Engineering Metrics )
Staging & Development Bugs
reflects “development pain”
used for optimization efforts (like implementing CI/CD, TDD, PR Review processes etc.)
should NOT be reported
How to use metrics for non-customer facing bugs
There are a lot of bugs created from
Logs (Sentry etc.)
Staging testing
Code Review
Unit/integration/E2E tests
QA
…
The bugs generated from these channels might be as important as customer facing ones. The goal however is to ensure that customers face less bugs. To accomplish this goal, we should ensure bugs from staging & development does not release in front of customers.
Just like product team tracks main product metric like DAU, but also track every single button click, product engineering teams should track customer facing bugs as main metric & track non-customer facing bugs like button click.
A product manager doesn’t look at a button click analytic regularly. Same should apply for non-customer facing bugs. We should only look at them when we think our processes are broken and we need to improve our internal processes
What metrics to track?
When using metrics, we should look into dual metric balancing between efficiency and effectiveness.
Bug Resolution Time: How fast do we solve those bugs for the users?
Bugs Created: How many bugs does our users get?
If we don’t track via dual metrics we’ll have weird scenerios like “We resolve a bug in 1 day on average, but have 1000 bugs created per month. To avoid scenerios like these we use dual metrics.
In this case
Efficiency is Bug Resolution Time
Effectiveness is Bugs Created
Advance tips:
Use week or month as timeframe when tracking these metrics
If you are constantly hiring, you may want to normalize the effectiveness metric as Bugs Created per Member
The definition of a bug & incident should be clear. You can find more info in How should bugs be tracked? section
Use Issue Lead Time (time from issue creation to issue completed) when calculating Bug Resolution Time.
In some organizations rather than Bugs Created, they use Bugs Resolved. We suggest keeping it as “Bugs Created”.
How should bugs be tracked?
Incidents & Product Bugs can be tracked multiple ways. I’ll go over the top ways major organizations track.
Note: Each organization has different needs and requirements. Feel free to edit the the options in a way it fits your needs.
Note2: Below options are based on Jira users. It’s possible to do something similar in all issue management platforms.
Most teams tracks bugs wrongly.
Tracking bugs correctly typically requires process changes.
Option #1: For Small Organizations
Each issue with
type: bugis considered as a quality faultIncidents are tracked as
priority: highestbugs.Production bugs are tracked as
priority: normal/highbugs.Staging & Development bugs are tracked as
priority: lowbugs.
Benefits
It’s simple
Better for small organizations
Cons
Requires alignment & education across all the team
Option #2: For Big Organizations
All incidents SHOULD be created by Incident Management tooling (pagerduty, opsgenie etc.) creates an issue with
type: IncidentAll customer issues SHOULD be created by your Support Suite tooling (zendesk, intercom etc.). Customer success team creates
type: Customer BugsAny internally catched bugs should be created as
type: bug
Benefits
System enforces correct tagging which gets rid of alignment & education
Better for large organizations
Cons
It takes change management across DevOps, Product, Engineering, CS teams
Tactical advice on improving quality
Improving anything goes through the same process
Check metrics
Identify where to improve
Brainstorm on bets
Implement the bet
Check metrics if the bet resulted in succeed
Repeat
Action we take should either
Decrease time it takes us to fix a bug (Bug Resolution Time)
Decrease how many bugs we release to customers (Bugs Created)
1. Check Metrics
You should have board like the following where you track Quality as part of your operational metrics.

2. Identify where to improve
At this point you have 2 options
Debug high level patterns
Click on the graph and group by the field you’d like to understand the patterns of
Most common group by options are
Team
Priority
!
Once you identify where you should be focusing on high level, next step is to find the pattern.
Click on the identified team/priority/group. You’ll see all the related issues

From this view, you’ll need to check multiple metadata and try to see what might be the pattern. In this image it seems like a lot of issues get stuck on the QA status (teal column) in Jira board.
Now we have identified where to improve we can go into next step.
3. Brainstorm on bets
Once you understand what the problem is, next step is to brainstorm on potential fixes on the problem.
Always look for root cause of the problem. Use following template to find root cause
What is the Root cause
What is the Customer impact
What action can we take to prevent this from happening again?
4. Implement the bet
Execute on the idea we have that would fix the bet.
5. Check metrics if the bet resulted in succeed
Once we have implemented the fix, we should
Check the metrics if it improved either Bug Resolution Time or Bugs Created
Check if we have the same issue happening again.
To improve we need to ensure no bug comes twice.
6. Repeat
If we repeat this action a few dozens times - typically in 2-6 weeks - we’ll see drastic improvements across our engineering quality.
Last updated
