1. Average response time. 2. Standard deviation. 3. Percentiles 90, 95, and 99. Discover key performance testing metrics and improve results with Abstracta.


Unlocking the full potential of performance testing means delving into the performance metrics that matter. It’s not enough to run tests and gather data; the real power lies in accurate analysis, empowering you to make informed decisions and boost your system’s performance.
Whether you are testing for concurrent users or measuring key performance indicators, having the right approach is crucial.
Embarking on performance testing unveils three crucial metrics: Average, Standard Deviation, and Percentiles. Each offers unique insights, painting a comprehensive picture of system performance.
Through a thoughtful analysis of these metrics, we lay the foundation for enhanced system responsiveness.
Want to learn all about performance testing? Don’t miss our Continuous Performance Testing Comprehensive Guide
Making Sense of The Average, Standard Deviation, and Percentiles in Performance Testing Reports


Certain performance testing metrics are essential to understand properly in order to draw the right conclusions from your test results. These software performance metrics require some basic understanding of math and statistics, but nothing too complicated.
The issue is that if you don’t understand well what each one means or what they represent, you’ll come to some very wrong conclusions.
In this post, we focus on average response time, standard deviation, and percentiles. Without going into a lot of math, we’ll discuss test metrics and their usefulness when analyzing performance results.
Want to learn all about performance testing? Don’t miss our Continuous Performance Testing Comprehensive Guide
The Importance of Analyzing Data as a Graph
The first time we thought about this subject was during a course that Scott Barber gave in 2008 (when we were just starting up Abstracta), on his visit to Uruguay. He showed us a table with values like this:


He asked us which data set we thought had the best performance, which is not quite as easy to discern as when you display the data in a graph:


In Set A, you can tell there was a peak, but then it recovers.


In Set B, it seems that it started out with a very poor response time, and probably 20 seconds into testing, the system collapsed and began to respond to an error page, which then got resolved in a second.


Finally, in Set C, it’s clear that as time passed, the system performance continued to degrade.
Barber’s aim with this exercise was to show that it’s much easier to analyze information when it’s presented in a graph. In addition, in the table, the information is summarized, but in the graphs, you can see all the points. Thus, with more data points, we can gain a clearer picture of what is going on.
Interested in data analysis? We invite you to read this article: Data Observability: What It Is and Why It Matters.
Understanding Key Performance Testing Metrics


Okay, now let’s see what each of the metrics for performance testing means, as a key part of your performance testing process. Let’s do it one by one, checking their importance for analysis purposes.
Additionally, evaluating key performance indicators helps confirm that your tests align with business objectives.
Average Response Time
To calculate the average, simply add up all the values of the samples and then divide that number by the quantity of samples.
Let’s say we do this and our resulting average peak response time is 3 seconds. The problem with this is that, at face value, it gives you a false sense that all response times are about three seconds, some a little more and some a little less, but that might not be the case.
Imagine we had three samples, the first two with a response time of one second, the third with a response time of seven:
1 + 1 + 7 = 9
9/3 = 3
This is a very simple example that shows that three very different values could result in an average of three, yet the individual values may not be anywhere close to 3.
Fabian Baptista, co-founder and member of Abstracta’s board, made a funny comment related to this:
“If I were to put one hand in a bucket of water at -100 degrees Fahrenheit and another hand in a bucket of burning lava, on average, my hand temperature would be fine, but I’d lose both of my hands.”
So, when analyzing average response time, it’s possible to have a result that’s within the acceptable level, but be careful with the conclusions you reach.
That’s why it is not recommended to define service level agreements (SLAs) using averages; instead, have something like “The service must respond in less than 1 second for 99% of cases.” We’ll see more about this later with the percentile metric.
Don’t miss this Quality Sense Podcast episode about why observability is such relevant in software testing, with Federico Toledo and Lisa Crispin.
Standard Deviation
Standard deviation is a measure of dispersion concerning the average, how much the values vary for their average, or how far apart they are.
If the value of the standard deviation is small, this indicates that all the values of the samples are close to the average, but if it’s large, then they are far apart and have a greater range.
To understand how to interpret this value, let’s look at a couple of examples.
If all the values are equal, then the standard deviation is 0. If there are very scattered values, for example, consider 9 samples with values from 1 to 9 (1, 2, 3, 4, 5, 6, 7, 8, 9), the standard deviation is ~ 2.6 (you can use this online calculator to calculate it).
Although the value of the average as a metric can be greatly improved by also including the standard deviation, what’s more useful yet are the percentile values.
Percentiles: p90, p95, and p99


Understanding percentiles is crucial for accurate system performance analysis.
Let’s break down what percentiles like the 90th percentile (p90), p95, and p99 mean and how they can be used effectively in performance tests.
What Are Percentiles?
A percentile is a valuable performance testing metric that gives a measure under which a percentage of the sample is found. This helps in understanding the distribution of response times and other performance metrics. The percentile rank is another important metric that helps in understanding the distribution of response times.
The 90th Percentile (p90)
The 90th percentile (p90) indicates that 90% of the sample values are below this threshold, while the remaining 10% are above it. This is useful for identifying the majority of user experiences and boosting that most users have acceptable response times.
The 95th Percentile (p95)
The 95th percentile (p95) shows that 95% of the sample values fall below this threshold, with the remaining 5% above it. This provides a more stringent measure of performance, enabling nearly all users to have a good experience.
The 99th Percentile (p99)
The 99th percentile (p99) represents the value below which 99% of the sample falls, leaving only 1% above it. This is particularly useful for identifying outliers and making it possible that even the worst-case scenarios are within acceptable limits.
Why Use Multiple Percentiles?
Analyzing multiple percentile values, such as p90, p95, and p99, provides a more detailed view of system performance. Tools like JMeter and Gatling include these in their reports, allowing teams to calculate percentile scores using different methods. This comprehensive approach helps in identifying performance bottlenecks and understanding how the system behaves under various conditions.
Complementing Percentiles with Other Metrics
To get a complete picture, teams should complement percentiles with other metrics like minimum, maximum, and average values. For example:
- p100: Represents the maximum value (100% of the data is below this value).
- p50: Known as the median (50% of the data is below and 50% is above).
Establishing Acceptance Criteria
Teams often use percentiles to establish acceptance criteria. For instance, setting a requirement that 90% of the sample should be below a certain value helps in ruling out outliers and enabling consistent system performance. This is particularly useful in identifying issues related to memory utilization and other critical performance aspects.
By focusing on the percentile score, teams can make more informed decisions and optimize their performance tests to achieve better results.
Need help with percentiles? Explore our Performance Testing Services! Our global client reviews on Clutch speak for themselves.
Careful with Performance Testing Metrics


Before you go analyzing your next software performance testing results, make sure to remember these key considerations:
1. Avoid Averages
Never consider the average as “the” value to pay attention to, since it can be deceiving, as it often hides important information.
2. Check Standard Deviation
Consider the standard deviation to know just how useful the average is, the higher the standard deviation, the less meaningful it is.
3. Use Percentile Values
Observe the percentile values and define acceptance criteria based on that, keeping in mind that if you select the 90th percentile, you’re basically saying, “I don’t care if 10% of my users experience bad response times”.
If you are interested in learning about the best continuous performance testing practices for improving your system’s performance, we invite you to read this article.
4. Overall system health
Understanding metrics like server CPU usage, CPU capacity utilized, and even memory usage in certain cases can provide insights into how efficiently the system is processing requests.
What other considerations and performance issues do you have when analyzing performance testing metrics? Let us know!
Looking for a free performance load-testing tool? Get to know JMeter DSL, one of the leading open-source performance testing tools for .NET developers.
The Importance of Web Page Performance in Testing
When you conduct performance testing, it’s essential to evaluate not only backend infrastructure but also how a web page handles load conditions. Metrics like page load time, time to first byte, and rendering speed are critical key performance indicators that directly affect user experience.
Overlooking these factors can lead to misinterpretation of test results and hidden performance bottlenecks that impact real users.
FAQs About Performance Testing Metrics


What Are Performance Metrics in Performance Testing?
Performance metrics in performance testing are measurements used to assess how a system performs under different conditions. They help identify bottlenecks, optimize resource usage, and improve overall system reliability. These metrics provide insights into system responsiveness, stability, and scalability.
What Are the Three Types of Performance Metrics?
In performance testing, three essential types of metrics provide a comprehensive view of system performance:
- Average Response Time – Measures how long, on average, the system takes to respond to requests.
- Standard Deviation – Indicates how much response times fluctuate from the average, revealing performance variability.
- Percentiles (p90, p95, p99) – Show the response time thresholds for most users, helping identify outliers and ensure a good user experience.
These three metrics work together to analyze performance trends, detect bottlenecks, and improve system efficiency.
What Are the 3 Key Criteria for Performance Testing?
The three key criteria for performance testing, based on essential metrics, are:
- Response Consistency – Evaluated through standard deviation, which shows how stable response times are.
- User Experience – Measured using percentiles (p90, p95, p99) to validate if most users get acceptable response times.
- System Stability – Assessed by analyzing trends in average response time over time.
How We Can Help You
With over 16 years of experience and a global presence, Abstracta is a leading technology solutions company with offices in the United States, Chile, Colombia, and Uruguay. We specialize in software development, AI-driven innovations & copilots, and end-to-end software testing services.
We believe that actively bonding ties propels us further. That’s why we’ve forged robust partnerships with industry leaders like Microsoft, Datadog, Tricentis, Perforce, and Saucelabs, empowering us to incorporate cutting-edge technologies.
By helping organizations like BBVA, Santander, Bantotal, Shutterfly, EsSalud, Heartflow, GeneXus, CA Technologies, and Singularity University we have created an agile partnership model for seamlessly insourcing, outsourcing, or augmenting pre-existing teams.
Our holistic approach enables us to support you across the entire software development life cycle.
Visit our Performance Testing Services page! Contact us to improve your system’s performance.


Follow us on Linkedin & X to be part of our community!
Recommended for You
TOP 10 Best Performance Testing Tools
Cost vs. Value: Analyzing the ROI of Outsourcing Application Testing Services
How to Optimize Sanity Testing for Stable Software
Tags In


Federico Toledo
Related Posts
BlocklyJMX Review
Reviewing JMeter’s new online editor BlocklyJMX is a web-based alternative to JMeter for viewing and editing test plan files. If you aren’t familiar with JMeter, it’s an open source application designed to load test functional behavior and measure performance. Still in its early stages of…
Developer’s friendly tools for continuous performance testing
How many times have we seen a test infrastructure and methodology where the team is not able to get early feedback about the performance of the system they are developing? Typically, it is expected to treat performance testing as a “waterfall project” where we, the…
Leave a Reply Cancel reply
Search
Contents