{"id":10081,"date":"2018-07-30T19:54:18","date_gmt":"2018-07-30T19:54:18","guid":{"rendered":"http:\/\/abstracta.us\/blog\/?p=10081"},"modified":"2025-05-05T21:23:42","modified_gmt":"2025-05-05T21:23:42","slug":"3-challenges-effective-performance-testing-continuous-integration","status":"publish","type":"post","link":"https:\/\/abstracta.us\/blog\/performance-testing\/3-challenges-effective-performance-testing-continuous-integration\/","title":{"rendered":"3 Challenges to Effective Performance Testing in Continuous Integration"},"content":{"rendered":"<h1><span style=\"font-weight: 400;\">Performance testing in CI is a must. Here&#8217;s what to take into account from day one.<\/span><\/h1>\n<p><span style=\"font-weight: 400;\">Recently I gave a talk at Agile Testing Days USA in Boston, my first time attending this testing conference and I was extremely pleased with the event, the things I learned, and the people I had the opportunity to meet. For example, I got to know some of my Agile testing role models: Lisa Crispin, Janet Gregory, and Rob Sabourin, among others. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">In this post, I\u2019m going to share what I presented in my talk, <\/span><a href=\"https:\/\/agiletestingdays.com\/\"><span style=\"font-weight: 400;\">Challenges to Effective Performance Testing in Continuous Integration<\/span><\/a><span style=\"font-weight: 400;\">. I\u2019ll address three main challenges you may face and my recommendations for how to tackle them in order to implement a successful performance testing strategy in CI.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Lets_Cover_the_Basics\"><\/span><strong><span style=\"color: #00b674;\">Let&#8217;s Cover the Basics<\/span><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<h3><span class=\"ez-toc-section\" id=\"First_Off_What_is_Performance_Testing\"><\/span><strong><span style=\"color: #3056a2;\">First Off, What is Performance Testing?<\/span><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">(If you\u2019re already familiar with performance testing and the concept of continuous integration, go ahead and skip this part!)<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Computer performance is characterized by the amount of useful work accomplished by a computer system considering \u00a0the response times and resources it uses. We cannot only see how fast it is, because a system that is very fast but uses 100% of CPU is not performant. Therefore, we need to check both the front and back end; the user experience (the load speed I perceive, the velocity) and the servers\u2019 \u201cfeelings\u201d (How stressed do they become under load?).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Also, if we only pay attention to response times, we would only see the symptoms of poor performance, but what <\/span><b>we want to find are the root causes in order to identify bottlenecks and then ways to eliminate them.<\/b><\/p>\n<p><span style=\"font-weight: 400;\">We perform load tests that simulate load (virtual users) in order to detect:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Bottlenecks (What\u2019s the \u201cthinner\u201d part of the system that causes the holdup in traffic?)<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The breaking point (After what amount of load does the system degrade severely?)<\/span><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><a href=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/Screen-Shot-2018-07-17-at-3.23.03-PM-min.png\"><img decoding=\"async\" class=\"aligncenter wp-image-10082\" src=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/Screen-Shot-2018-07-17-at-3.23.03-PM-min.png\" alt=\"performance bottlenecks\" width=\"500\" height=\"261\" \/><\/a><\/p>\n<p><b>So, to put it simply, performance tests consist of load simulation and measurement to detect bottlenecks and the point at which a system crashes under load.<\/b><\/p>\n<p><span style=\"font-weight: 400;\">You can read about the different types of performance tests <\/span><a href=\"https:\/\/abstracta.us\/blog\/performance-testing\/types-of-performance-tests\/\"><span style=\"font-weight: 400;\">here<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"What_is_Continuous_Integration_CI\"><\/span><strong><span style=\"color: #3056a2;\">What is Continuous Integration (CI)?<\/span><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Continuous integration (CI) is a practice wherein each developer\u2019s code is merged at least once per day. A stable code repository is maintained from which anyone can start working on a change. The build is automated with various automatic checks, such as code quality reviews, unit tests, etc. In this case, we will be analyzing a good way to include performance tests in the mix. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">There are several advantages to CI, including the ability to ship code more frequently and faster to users with less risk. You can read more about how software testing looks in CI (aka testing \u201cshifts left\u201d) <\/span><a href=\"https:\/\/abstracta.us\/blog\/agile-testing\/not-convinced-yet-shift-left-testing\/\"><span style=\"font-weight: 400;\">here<\/span><\/a><span style=\"font-weight: 400;\">. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Now that we understand performance testing and CI, let\u2019s dive into the three challenges that you will face when getting started and my recommendations for each, based on my actual experiences in the field.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Challenge_1_Picking_the_Right_Tools\"><\/span><strong><span style=\"color: #00b674;\">Challenge #1: Picking the Right Tools<\/span><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">There are several tools for load simulation that you can pick from. The ones that I have used the most, and are perhaps my favorite, include <\/span><a href=\"http:\/\/jmeter.apache.org\"><span style=\"font-weight: 400;\">JMeter<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"http:\/\/gettaurus.org\"><span style=\"font-weight: 400;\">Taurus<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"http:\/\/gatling.io\"><span style=\"font-weight: 400;\">Gatling<\/span><\/a><span style=\"font-weight: 400;\">, and <\/span><a href=\"http:\/\/blazemeter.com\"><span style=\"font-weight: 400;\">BlazeMeter<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/Screen-Shot-2018-07-17-at-3.29.43-PM-min.png\"><img decoding=\"async\" class=\"aligncenter wp-image-10083\" src=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/Screen-Shot-2018-07-17-at-3.29.43-PM-min.png\" alt=\"load testing tools\" width=\"500\" height=\"212\" \/><\/a><\/p>\n<h3><\/h3>\n<h3><span class=\"ez-toc-section\" id=\"How_Load_Simulation_Tools_Work\"><\/span><strong><span style=\"color: #3056a2;\">How Load Simulation Tools Work<\/span><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Load testing tools execute hundreds of threads simulating the actions that real users would execute, and for that reason they are called &#8220;virtual users.&#8221; We could think of them as a robot that executes a test case.<\/span><\/p>\n<p><a href=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/robotizar-300x192-min.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-10084\" src=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/robotizar-300x192-min.png\" alt=\"load simulator\" width=\"300\" height=\"192\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">These tools run from machines dedicated to the test. The tools generally allow using several machines in a master-slave scheme to distribute the load, executing for example, 500 users from each machine. The main objective of this load distribution system is to avoid the overloading of these machines. If they overload, the test would become invalid, since there would be problems with simulating the load or collecting the response time data.<\/span><\/p>\n<p><a href=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/simulacio\u0301n-768x536-min.png\"><img decoding=\"async\" class=\"aligncenter wp-image-10085\" src=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/simulacio\u0301n-768x536-min.png\" alt=\"load simulation\" width=\"500\" height=\"349\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">The graphic above shows how, from a few machines, you can run a large amount of load (virtual users) on a system. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">What about the test scripts? Performance test scripts use the<\/span><i><span style=\"font-weight: 400;\"> record and playback <\/span><\/i><span style=\"font-weight: 400;\">approach, but the recording is not done at the graphic user interface level (like for functional tests), rather the communication protocol level. In a performance test, multiple users will be simulated from the same machine, so it\u2019s not feasible to open a large number of browsers and simulate the actions on them. Doing this at the protocol level can be said to &#8220;save resources,&#8221; since in the case of the HTTP protocol, what we will have are multiple threads that send and receive text over a network connection, and will not have to display graphic elements or any thing else that requires further processing.<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\">To prepare a script we proceed in a similar way to functional test scripts, but this time the tool, instead of capturing the interactions between the user and the browser, captures the HTTP traffic flows between the client and the server (HTTP or the protocol to be simulated, and the client can be the browser, a native app or whatever you want to simulate). Therefore, to automate, you\u2019ll need knowledge of automation tools and communication protocols (HTTP, SIP, SOAP, ISO8583, etc.).<\/span><\/p>\n<p><a href=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/recording-scripts-768x400-min.png\"><img decoding=\"async\" class=\"aligncenter wp-image-10086\" src=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/recording-scripts-768x400-min.png\" alt=\"recording scripts\" width=\"500\" height=\"260\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">The image above shows what happens when the test is recorded with tools like JMeter. Basically, a proxy is opened that captures the traffic between the client and the server. The resulting script will be a sequence of commands in a language provided by the tool used, in which requests and responses are handled according to the communication protocol.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Once the script is recorded, it is then necessary to make a series of adjustments to these elements so that it is reproducible. These scripts will be executed by concurrent users (virtual users, remember?), and, for example, it does not make sense for all users to use the same username and password to connect, or for all users to do the same search (since in that case the application would work better than using different values, since there will be caches affecting the response times, both at the database level and at the Web application server level).<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\">The effort associated with making this type of adjustment will depend on the tool used and the application under test. Sometimes it\u2019s necessary to adjust cookies or variables, because those obtained when recording are no longer valid, and must be unique per user. Parameters must be set, for both the header and the body of the message, etc.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Note that with Taurus, we can specify a test with a very simple yml file and it will generate the code to run the test with JMeter, Gatling or around 10 other tools. You can combine different existing tests that you have in different tools, and aggregate the results. For this reason, I find it really innovative and useful.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">So, which tool is going to be best for CI?<\/span><\/p>\n<p><b>TAKEAWAY: <\/b><span style=\"font-weight: 400;\">Make sure to choose a tool that is <\/span><i><span style=\"font-weight: 400;\">CI friendly<\/span><\/i><span style=\"font-weight: 400;\">, which allows you to easily compare versions and detect differences using your Git repository manager (or the one you use). Gatling and Taurus are ideal options. Normally, I\u2019m a proponent of JMeter, but the tests are stored as XML files. For CI, I prefer something based on code or simple text, making it all the easier to compare and detect differences. <\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Challenge_2_Testing_Strategy\"><\/span><strong><span style=\"color: #00b674;\">Challenge #2:\u00a0Testing Strategy<\/span><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Defining the strategy is something that could be very broad, as we could consider various aspects. I like this definition below: <\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">\u201cThe design of a testing strategy is primarily a process of identifying and prioritizing project risks and deciding what actions to take to mitigate them.\u201d &#8211;\u00a0<\/span><\/i><span style=\"font-weight: 400;\">Continuous Delivery (Jez Humble &amp; David Farley)<\/span><\/p>\n<p><span style=\"font-weight: 400;\">I\u2019m going to focus on just some aspects of a performance test strategy, <\/span><b>particularly, what to run, when and where.<\/b><span style=\"font-weight: 400;\"> What I want to show you is just a model to be used just as that, a model for reference. It was useful for me in some cases, so I hope it\u2019s useful for you, or at least it can help to give you some ideas for creating your own model that fits your needs.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This model is based on the idea of continuous testing, where we want to run tests early and frequently. But we cannot test everything early and all the time. So, that\u2019s when a model becomes useful. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">You may have heard of the <\/span><a href=\"https:\/\/abstracta.us\/blog\/test-automation\/best-testing-practices-for-agile-teams-the-automation-pyramid\/\"><span style=\"font-weight: 400;\">test automation pyramid<\/span><\/a><span style=\"font-weight: 400;\">, well, I decided to create a pyramid for exploratory performance tests:<\/span><\/p>\n<p><a href=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/Screen-Shot-2018-07-30-at-3.51.36-PM.png\"><img decoding=\"async\" class=\"aligncenter wp-image-10096 size-full\" src=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/Screen-Shot-2018-07-30-at-3.51.36-PM.png\" alt=\"performance testing pyramid\" width=\"451\" height=\"373\" \/><\/a><\/p>\n<h3><span class=\"ez-toc-section\" id=\"The_What\"><\/span><strong><span style=\"color: #3056a2;\">The What<\/span><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Let\u2019s take a look at the layers:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><b>End-to-end (E2E): <\/b><span style=\"font-weight: 400;\">This involves typical load testing, simulating real users, as I explained at the beginning of this post.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Integration: <\/b><span style=\"font-weight: 400;\">We also want to test the services (assuming that we are talking about a very typical architecture where you have an API, rest, etc.) because we want to know how the services impact one another.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Unit: <\/b><span style=\"font-weight: 400;\">We also want to test everything separately. If an integration test fails (because it detects a degradation), how can we know if the problem is that one service is impacting another, or if one has problems of its own? That\u2019s why we test them unitarily first.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">The pyramid represents graphically not only the amount of tests that you will have at each level, but also how often you should run them, considering a CI approach.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In any layer, we could have an <\/span><b>exploratory testing approach<\/b><span style=\"font-weight: 400;\">. Which means, deciding what to test according to the previous test result, we just try different test configurations, analyze results, and based on what we get, decide again how to continue. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">If we think of the <\/span><a href=\"https:\/\/lisacrispin.com\/2011\/11\/08\/using-the-agile-testing-quadrants\/\"><span style=\"font-weight: 400;\">agile testing quadrants<\/span><\/a><span style=\"font-weight: 400;\"> (shown below), we are covering different quadrants here.<\/span><\/p>\n<p><a href=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/Agile-Testing-Quadrants-min.png\"><img decoding=\"async\" class=\"aligncenter wp-image-10088\" src=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/Agile-Testing-Quadrants-min.png\" alt=\"Agile-Testing-Quadrants-min\" width=\"500\" height=\"366\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">The end-to-end tests have a focus on the business, but the others are supporting the team, with some regression testing automated. The exploration critiques the product, and of course, everything is technology facing, because performance testing is highly technical.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">From here on, I want to put the focus on <\/span><b>regression testing<\/b><span style=\"font-weight: 400;\">, because this is what you have in the CI pipeline.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To compare and contrast the top of the pyramid with the bottom, at the bottom you have the following characteristics: <\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Unit or API level which are less costly<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">You can run them more frequently (every day) because they need less time to run, analyze and debug <\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Performed earlier, because you don\u2019t need to wait until all the layers are done, you can start as soon as you have some endpoints ready.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The problem is that there is no correspondence with the response times that the real users will have<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">On the other hand, as you move up the pyramid, you have tests that:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Allow you to validate performance for real users, since you model user behavior and involve infrastructure similar to that of production, causing them to provide better results<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">So, they provide better results<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The problem is that they are more costly to prepare, maintain and analyze<\/span><\/li>\n<\/ul>\n<p><a href=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/Screen-Shot-2018-07-23-at-2.33.25-PM-min.png\"><img decoding=\"async\" class=\"aligncenter wp-image-10089\" src=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/Screen-Shot-2018-07-23-at-2.33.25-PM-min.png\" alt=\"top vs bottom of performance testing pyramid\" width=\"500\" height=\"357\" \/><\/a><\/p>\n<h3><span class=\"ez-toc-section\" id=\"The_When\"><\/span><strong><span style=\"color: #3056a2;\">The When<\/span><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">So that covers the \u201cwhat\u201d to run the performance tests. Next is the when? Or, how often?<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In my opinion, it\u2019s a good idea to do the end-to-end tests every couple of weeks, depending on how hard they are to maintain, integration tests once a week, and test the units daily. This is just an example that represents the relationship between the frequencies at each layer.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"The_Where\"><\/span><strong><span style=\"color: #3056a2;\">The Where<\/span><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Next, what type of test environment do we need for each? <\/span><\/p>\n<p><span style=\"font-weight: 400;\">For end-to-end testing, we need an environment similar to production, to reduce risks (the more differences between the testing environment and the production environment, the more risks still preserve related to performance). To test the services unitarily, we could and should use a scaled down infrastructure. In that way, we can test each endpoint close to its boundary without using so many machines for the load simulation. It\u2019s also going to be easier to analyze the results and debug.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In both cases, it is essential to have an exclusive environment since the results will be more or less predictable. They won\u2019t risk be affected by someone else running something at the same time, causing the response times to soar, generating false positives, and wasting a lot of time.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Last but not least, I must admit that I have less experience with the integration tests, so I cannot recommend a frequency for those. Please fill in the blanks by leaving a comment and tell me about your experience! <\/span><\/p>\n<p><b>TAKEAWAY: <\/b><span style=\"font-weight: 400;\">This model represented by the pyramid is useful for thinking about the different aspects of your testing strategy. There are more aspects to consider when defining a strategy, but try to see if the model helps you to think about them. One example is the next challenge that follows, scenarios and assertions (acceptance criteria).<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Challenge_3_Model_Scenario_and_Assertions\"><\/span><strong><span style=\"color: #00b674;\">Challenge #3:\u00a0Model Scenario and Assertions<\/span><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">This challenge is knowing which type of load tests we want to run everyday and how we define the load and which assertions to add in order to reach our goal: <\/span><b>detect a degradation as soon as possible (early feedback).<\/b><\/p>\n<p><span style=\"font-weight: 400;\">When we talk about end-to-end tests, in the load simulation, our load scenario and the assertions are based on the business needs (i.e.: how many users will be buying products during the next Black Friday on our servers, and what kind of user experience do we want for them?). There is a great series of articles that explain how to design a performance test in order to achieve this, \u201cUser experience, not metrics\u201d, from Scott Barber, from where I learnt most of how I do that today (they\u2019re more than 10 years old, but still relevant). <\/span><\/p>\n<p><span style=\"font-weight: 400;\">A different set of questions arises when talking about the bottom layer of the performance testing pyramid: How many threads (or virtual users) do we simulate when we run tests at the API level in a scaled down environment? What is the \u201cexpected performance\u201d to validate? <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Let\u2019s dig into both considerations.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Detect_Performance_Degradations_When_They_Happen\"><\/span><strong><span style=\"color: #3056a2;\">Detect Performance Degradations When They Happen<\/span><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">As these tests will not be verifying the user experience, we need a different focus. Our strategy is to define a benchmark for a specific version of the system, and then run tests continuously in order to detect a degradation. In a certain way, it\u2019s assuming that the performance that you have today in your infrastructure is okay, and you do not want to miss it when any change negatively affects \u00a0this current performance. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">For that, the tests should have acceptance criteria (assertions) as tight as possible so that for the slightest system regression, before any negative impact occurs, some validation will fail, indicating the problem. This should be done in terms of response times and throughput.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In order to visualize what problem we are solving, see the following graph:<\/span><\/p>\n<p><a href=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/trend-degradation-not-discovered-768x311-min-1.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-10090\" src=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/trend-degradation-not-discovered-768x311-min-1.png\" alt=\"load test\" width=\"768\" height=\"311\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">The graph shows a degradation in the requests per second, but the test is passing and it cannot show an alert about this degradation, because the acceptance criteria (the green line) is too flexible. It is verifying that the throughput is greater than 45 req\/sec, so when the functionality decreased from 250 to 150 req\/sec, no one is likely to be paying attention.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"The_Load_Pushing_the_System_to_its_Capacity\"><\/span><strong><span style=\"color: #3056a2;\">The Load Pushing the System to its Capacity<\/span><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><b>Here\u2019s a way to define the load and the assertions:\u00a0<\/b><\/p>\n<p><a href=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/Screen-Shot-2018-07-23-at-2.34.08-PM-min.png\"><img decoding=\"async\" class=\"aligncenter wp-image-10091\" src=\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/Screen-Shot-2018-07-23-at-2.34.08-PM-min.png\" alt=\"response times and throughput vs virtual users\" width=\"500\" height=\"318\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">Let\u2019s look at the story of this graph above. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Say we run a first test with 100 virtual users (VU) that results in zero crashes, the response times are below 100ms (at least the 95th percentile) and the throughput is 50 TPS (transactions per second). Then we run the test with 200 virtual users and again, there are no crashes and times are at 115 ms and the throughput at 75 TPS. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Great, it\u2019s scaling.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">If we continue on this path of testing, we will at some point, reach a certain load in which we see that we are no longer achieving an increase in the throughput. We will also be getting errors (which exceed 1% for example) which would indicate that we are saturating the server and it\u2019s likely that response times from then on will begin to increase significantly, because some process, connection or something else begins to stick amid all the architecture of the system.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Following this scenario, imagine we get to 350 concurrent users and we have a throughput of 150 TPS, with 130 ms response times and 0% errors. If we pass 400 virtual users and the throughput is still about 150 TPS and with 450 users, the throughput will be even less than 150 TPS.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There is a concept called \u201dthe knee&#8221; that we would be encountering with this type of testing illustrated in this graph. We expect the TPS to increase when we increase the number of concurrent users&#8230; if it doesn\u2019t happen, it\u2019s because we are overloading the system\u2019s capacity.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">So, at the end of this experiment, we arrived at this scenario and these assertions:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Load: 350 threads<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Assertions<\/span>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">&lt; 1% error<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">P95 Response Times &lt; 130ms + 10%<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Throughput &gt;= 150 TPS \u2013 10%<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Then, the test that we will schedule to continue running frequently is that which is executing 350 users, expected to have less than 1% error with expected response times below 130 * 1.1 ms (this way we give ourselves a margin of 10%, maybe 20%), and last but not least, we have to assert the throughput, verifying that we are reaching 150 TPS.<\/span><\/p>\n<p><b>Running these tests after each new change in the code repository, we can detect at the same exact moment when something decreases the performance.<\/b><\/p>\n<p><b>TAKEAWAY:<\/b><span style=\"font-weight: 400;\"> The takeaway here is the model itself, to have it as a reference, but also: think about it. Design a mechanism for defining the load and the assertions that works for you. Do a Retro and adjust the process.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Dont_Over-Engineer_your_CI\"><\/span><strong><span style=\"color: #00b674;\">Don&#8217;t Over-Engineer your CI<\/span><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">We just looked at the concept of performance testing in continuous integration and the three main challenges of getting started: choosing the right tool, defining the testing strategy, and defining the test scenarios and assertions. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">But, of course, that is not all! <\/span><span style=\"font-weight: 400;\">There are yet more questions to ask, for example:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Who will create the tests?<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Which test cases do we need? How should we prioritize them?<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Who will maintain them?<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Who will analyze the results?<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">What will we do when we find an issue?<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Where and how can we find more information (monitoring, correlate data, logs, etc.)?<\/span><\/li>\n<\/ul>\n<p><strong>And, if there is one more thing I want you to take away from this Agile Testing Days Talk turned blog post, is:\u00a0<\/strong><b>Don\u2019t over-engineer your CI! <\/b><\/p>\n<p><span style=\"font-weight: 400;\">We as engineers love this, but don\u2019t try to turn this into rocket science. Keep it simple. The tester\u2019s focus is to provide value. And no, we will never reach a perfect product, but we can think of testing\u2019s goal as utopia, or a horizon we are trying to reach\u2026 we\u2019ll never get there, but it will keep us moving towards it! <\/span><\/p>\n<p><span style=\"font-weight: 400;\">I invite you to try this methodology for adding performance tests to your CI pipeline. Please, <\/span><a href=\"https:\/\/abstracta.us\/contact-us\"><span style=\"font-weight: 400;\">contact us<\/span><\/a><span style=\"font-weight: 400;\"> if you want to exchange ideas, if you have feedback, questions, or are looking for more help.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">You can also access the slideshare from my Agile Testing Days USA talk <\/span><a href=\"https:\/\/www.slideshare.net\/FedericoToledo\/challenges-to-effective-performance-testing-in-ci?qid=d7dffb6a-5b7e-47a8-a126-ed8d7879dc7b&amp;v=&amp;b=&amp;from_search=1\"><span style=\"font-weight: 400;\">here<\/span><\/a><span style=\"font-weight: 400;\">. <\/span><\/p>\n<hr \/>\n<h3><span class=\"ez-toc-section\" id=\"Recommended_for_You\"><\/span><strong>Recommended for You<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"http:\/\/abstracta.us\/blog\/performance-testing\/shutterfly-continuous-performance-testing\/\"><span style=\"font-weight: 400;\">How Shutterfly Masters Continuous Performance Testing<\/span><\/a><br \/>\n<a href=\"http:\/\/abstracta.us\/blog\/performance-testing\/gatling-tool-review-performance-tests-written-scala\/\"><span style=\"font-weight: 400;\">Gatling Tool Review for Performance Tests (Written in Scala)<\/span><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Performance testing in CI is a must. Here&#8217;s what to take into account from day one. Recently I gave a talk at Agile Testing Days USA in Boston, my first time attending this testing conference and I was extremely pleased with the event, the things&#8230;<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32],"tags":[89,222,276,87,163],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v14.0.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>3 Challenges to Performance Testing in Continuous Integration | Abstracta Blog<\/title>\n<meta name=\"description\" content=\"Performance testing in continuous integration is a must. If you&#039;re going to invest the effort and money, you should take these things into account.\" \/>\n<meta name=\"robots\" content=\"index, follow\" \/>\n<meta name=\"googlebot\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<meta name=\"bingbot\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/abstracta.us\/blog\/performance-testing\/3-challenges-effective-performance-testing-continuous-integration\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"3 Challenges to Performance Testing in Continuous Integration | Abstracta Blog\" \/>\n<meta property=\"og:description\" content=\"Performance testing in continuous integration is a must. If you&#039;re going to invest the effort and money, you should take these things into account.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/abstracta.us\/blog\/performance-testing\/3-challenges-effective-performance-testing-continuous-integration\/\" \/>\n<meta property=\"og:site_name\" content=\"Blog about AI-powered quality engineering for teams building complex software | Abstracta\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/AbstractaQA\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-07-30T19:54:18+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-05-05T21:23:42+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/challenges-performance-testing-ci-min.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"560\" \/>\n\t<meta property=\"og:image:height\" content=\"315\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@fltoledo\" \/>\n<meta name=\"twitter:site\" content=\"@AbstractaUS\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/abstracta.us\/blog\/#website\",\"url\":\"https:\/\/abstracta.us\/blog\/\",\"name\":\"Blog about AI-powered quality engineering for teams building complex software | Abstracta\",\"description\":\"AI-powered quality engineering\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/abstracta.us\/blog\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/abstracta.us\/blog\/performance-testing\/3-challenges-effective-performance-testing-continuous-integration\/#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"http:\/\/abstracta.us\/wp-content\/uploads\/2018\/07\/Screen-Shot-2018-07-17-at-3.23.03-PM-min.png\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/abstracta.us\/blog\/performance-testing\/3-challenges-effective-performance-testing-continuous-integration\/#webpage\",\"url\":\"https:\/\/abstracta.us\/blog\/performance-testing\/3-challenges-effective-performance-testing-continuous-integration\/\",\"name\":\"3 Challenges to Performance Testing in Continuous Integration | Abstracta Blog\",\"isPartOf\":{\"@id\":\"https:\/\/abstracta.us\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/abstracta.us\/blog\/performance-testing\/3-challenges-effective-performance-testing-continuous-integration\/#primaryimage\"},\"datePublished\":\"2018-07-30T19:54:18+00:00\",\"dateModified\":\"2025-05-05T21:23:42+00:00\",\"author\":{\"@id\":\"https:\/\/abstracta.us\/blog\/#\/schema\/person\/7421e539de0357d3adb0c69ed469a1c2\"},\"description\":\"Performance testing in continuous integration is a must. If you're going to invest the effort and money, you should take these things into account.\",\"breadcrumb\":{\"@id\":\"https:\/\/abstracta.us\/blog\/performance-testing\/3-challenges-effective-performance-testing-continuous-integration\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/abstracta.us\/blog\/performance-testing\/3-challenges-effective-performance-testing-continuous-integration\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/abstracta.us\/blog\/performance-testing\/3-challenges-effective-performance-testing-continuous-integration\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"item\":{\"@type\":\"WebPage\",\"@id\":\"https:\/\/abstracta.us\/blog\/\",\"url\":\"https:\/\/abstracta.us\/blog\/\",\"name\":\"Home\"}},{\"@type\":\"ListItem\",\"position\":2,\"item\":{\"@type\":\"WebPage\",\"@id\":\"https:\/\/abstracta.us\/blog\/performance-testing\/\",\"url\":\"https:\/\/abstracta.us\/blog\/performance-testing\/\",\"name\":\"Performance Testing\"}},{\"@type\":\"ListItem\",\"position\":3,\"item\":{\"@type\":\"WebPage\",\"@id\":\"https:\/\/abstracta.us\/blog\/performance-testing\/3-challenges-effective-performance-testing-continuous-integration\/\",\"url\":\"https:\/\/abstracta.us\/blog\/performance-testing\/3-challenges-effective-performance-testing-continuous-integration\/\",\"name\":\"3 Challenges to Effective Performance Testing in Continuous Integration\"}}]},{\"@type\":[\"Person\"],\"@id\":\"https:\/\/abstracta.us\/blog\/#\/schema\/person\/7421e539de0357d3adb0c69ed469a1c2\",\"name\":\"Federico Toledo, Chief Quality Officer at Abstracta\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/abstracta.us\/blog\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/6de7ec6536c4028b5c02ad4ec1b9af0d?s=96&d=blank&r=g\",\"caption\":\"Federico Toledo, Chief Quality Officer at Abstracta\"},\"description\":\"Co-founder and COO of Abstracta\",\"sameAs\":[\"https:\/\/twitter.com\/fltoledo\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/abstracta.us\/blog\/wp-json\/wp\/v2\/posts\/10081"}],"collection":[{"href":"https:\/\/abstracta.us\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/abstracta.us\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/abstracta.us\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/abstracta.us\/blog\/wp-json\/wp\/v2\/comments?post=10081"}],"version-history":[{"count":12,"href":"https:\/\/abstracta.us\/blog\/wp-json\/wp\/v2\/posts\/10081\/revisions"}],"predecessor-version":[{"id":12060,"href":"https:\/\/abstracta.us\/blog\/wp-json\/wp\/v2\/posts\/10081\/revisions\/12060"}],"wp:attachment":[{"href":"https:\/\/abstracta.us\/blog\/wp-json\/wp\/v2\/media?parent=10081"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/abstracta.us\/blog\/wp-json\/wp\/v2\/categories?post=10081"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/abstracta.us\/blog\/wp-json\/wp\/v2\/tags?post=10081"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}