How to Improve Quality?
Last updated
Last updated
Product engineers main purpose is to deliver value to customers in the form of features (code). The quality metrics should reflect this purpose.
There are 3 types of quality issues
Incidents: When a user cannot use the system.
Production Bugs: When a user fails to use a feature.
Staging & Development Bugs: Errors or issues identified pre-production. Doesn’t have impact on the user.
There is major distinction between top 2 parts of the pyramid and the bottom layer.
The top 2 parts reflect “user pain”. Bottom part reflects “development pain”. As a business we should mainly care about users and engineering teams should align with business goals.
Product Engineering teams should track customer facing bugs AKA
Incidents
Production Bugs
Incidents & Production Bugs
reflects user pain
looked periodically (weekly/monthly)
reported regularly (see First Principles of Engineering Metrics )
Staging & Development Bugs
reflects “development pain”
used for optimization efforts (like implementing CI/CD, TDD, PR Review processes etc.)
should NOT be reported
When using metrics, we should look into dual metric balancing between efficiency and effectiveness.
Bug Resolution Time: How fast do we solve those bugs for the users?
Bugs Created: How many bugs does our users get?
If we don’t track via dual metrics we’ll have weird scenerios like “We resolve a bug in 1 day on average, but have 1000 bugs created per month. To avoid scenerios like these we use dual metrics.
In this case
Efficiency is Bug Resolution Time
Effectiveness is Bugs Created
Advance tips:
Use week or month as timeframe when tracking these metrics
If you are constantly hiring, you may want to normalize the effectiveness metric as Bugs Created per Member
The definition of a bug & incident should be clear. You can find more info in How should bugs be tracked? section
Use Issue Lead Time (time from issue creation to issue completed) when calculating Bug Resolution Time.
In some organizations rather than Bugs Created, they use Bugs Resolved. We suggest keeping it as “Bugs Created”.
Incidents & Product Bugs can be tracked multiple ways. I’ll go over the top ways major organizations track.
Note: Each organization has different needs and requirements. Feel free to edit the the options in a way it fits your needs.
Note2: Below options are based on Jira users. It’s possible to do something similar in all issue management platforms.
Most teams tracks bugs wrongly.
Tracking bugs correctly typically requires process changes.
Each issue with type: bug
is considered as a quality fault
Incidents are tracked as priority: highest
bugs.
Production bugs are tracked as priority: normal/high
bugs.
Staging & Development bugs are tracked as priority: low
bugs.
Benefits
It’s simple
Better for small organizations
Cons
Requires alignment & education across all the team
All incidents SHOULD be created by Incident Management tooling (pagerduty, opsgenie etc.) creates an issue with type: Incident
All customer issues SHOULD be created by your Support Suite tooling (zendesk, intercom etc.). Customer success team creates type: Customer Bugs
Any internally catched bugs should be created as type: bug
Benefits
System enforces correct tagging which gets rid of alignment & education
Better for large organizations
Cons
It takes change management across DevOps, Product, Engineering, CS teams
Improving anything goes through the same process
Check metrics
Identify where to improve
Brainstorm on bets
Implement the bet
Check metrics if the bet resulted in succeed
Repeat
Action we take should either
Decrease time it takes us to fix a bug (Bug Resolution Time)
Decrease how many bugs we release to customers (Bugs Created)
You should have board like the following where you track Quality as part of your operational metrics.
At this point you have 2 options
Once you understand what the problem is, next step is to brainstorm on potential fixes on the problem.
Always look for root cause of the problem. Use following template to find root cause
What is the Root cause
What is the Customer impact
What action can we take to prevent this from happening again?
Execute on the idea we have that would fix the bet.
Once we have implemented the fix, we should
Check the metrics if it improved either Bug Resolution Time or Bugs Created
Check if we have the same issue happening again.
To improve we need to ensure no bug comes twice.
If we repeat this action a few dozens times - typically in 2-6 weeks - we’ll see drastic improvements across our engineering quality.
!