Software Performance Testing Fallacies Part 2
Software Performance Testing Fallacies Continued
Continuing with the previous post about software performance testing fallacies, we will take another look at common ways in which many of us are mistaken about performance testing. We will discuss some that are very common in testing, technology, and infrastructure management.
What is one of the main advantages of modern languages like Java or C#? We don’t need an explicit memory management thanks to the garbage collection strategy implemented by the frameworks on which applications are executed. So, Java or C# guarantee that there will be no memory leaks. I regret to say that this is FALSE. There are two main situations where we could produce memory leaks. One is maintaining references to structures we no longer use, such as when we have a list of elements where we always add but never remove. The other is the case of the Microsoft Framework or JVM with a bug. For example, in a string concatenation operation (this was specifically the case in one of our projects). This is why it is important to pay attention to memory management and use tools to detect these situations, like profilers for Java, which we have available here.
Closely related to this is another fallacy regarding memory management on these platforms, which is the belief that forcing the execution of the Garbage Collector (GC) and explicitly invoking the functionality, the memory is freed earlier, allowing for better system performance. But this is not so, mainly because the GC (when explicitly invoked) is a process that blocks the execution of any code being executed on the system, to enable the cleaning of the memory that is not being used. And that task takes time. Platforms such as JVM and Microsoft’s Framework include algorithms where the memory cleanup is optimized. It is not done at just any time, but rather in special situations which, following experimentation, have led to concluding that it is the most efficient result achievable. There are ways of adjusting the behavior of these algorithms, since according to the type of application and the type of use, it has been proven that different configurations yield various results. But, this is possible with adjustments of configuration parameters, and not with explicit invocation by code of the GC’s execution.
We also saw that it’s commonplace to think that using any cache is an easy and quick way of optimizing an application. Therefore, we mistakenly think we will improve our application by simply setting some SQL queries in the cache, without even evaluating other options first. The cache is something quite delicate, and if we are not careful, it could even add more points of failure. When the cache is lost, a non-optimized operation could lead to instability. We must carry out a functional verification on the application to see if configuring the cache in a certain way does not change the system’s expected behavior. This is because queries will not provide us with fully updated data. We must measure the hit/miss cache, as well as the refresh and update costs in order to analyze which queries will imply more benefits than complications.
Test Design Fallacies
When we test the parts we are testing the whole. This fallacy has been examined by Jerry Weinberg and it’s clear to us that this assertion is incorrect. In performance testing, we cannot do a simulation without bearing in mind the overall processes or operations and focusing only on unit tests with no concurrence of different activities. We might test a “money withdrawal” operation with 1,000 users, and in another test, a “money deposit” operations with 1,000 users, and since we won’t reach more than 500 in total, we will be satisfied. With this method, we are not guaranteeing that the concurrence of the two transactions will be problem-free. If they imply some kind of blocking complication between the two, then a total of 10 users might already cause serious problems in response timing.
There are two almost opposite performance testing fallacies here. What we consider as “correct” or “more adequate” is to find the middle ground. There are those who believe that when we try hundreds of users doing “something,” probably all of them doing the same, we are implementing a good test. And there are those who consider it necessary to include all the functionalities that the system is capable of executing. But neither position is valid. The first one is too simplistic, leaving aside numerous situations that might be the cause of problems. The other position has an associated cost that is too high, as it is focused on running a “perfect test”. We must aim at implementing the best test possible, within the time and resources available, so as to avoid all complications possible. This also includes (when time and resources are available) the simulation of cases that might occur in reality, such as deleted caches, a disconnected server, the generation of noises in communication, etc. It’s clear that we cannot test all possible situations and cannot ignore things either.
The Neighbor Fallacy
We tend to think that applications in use by others with no complications will not cause us any problems when we decide to use them ourselves. Why should we carry out performance tests, when our neighbor has been using the same product that works for them just fine? As mentioned in our previous post, we shouldn’t extrapolate any results. Even when the system works with a given load of users, we must tune it, adjust the platform, ensure the correct configuration of the various components, and provide for a good performance with the use that our own users will make of that system.
There is a belief that the systems where we will encounter problems are those developed by programmers who have made mistakes and lack experience, among other things. Some managers have the belief that their engineers are all quite experienced and so there is no need to test performance, especially if they have developed large-scale systems before without any issues. Of course, it will work out fine. Right? No. We must not forget that programming is a complex activity, and regardless of how experienced we may be, it is common to make mistakes. This is even more so when we develop systems that are exposed to multiple concurrent users (which is the most common case) in which performance is affected by so many variables. In those cases, we must consider the environment, the platform, the virtual machine, the shared resources, and hardware failures, etc.
Another problem we encounter when we are excessively confident occurs during the implementation of performance tests. In general, it is recommended that tests be carried out in an incremental manner. So, we start by executing 20% of the total load that we want to simulate in order to attack the most serious problems first and then scale up the load as the incidents encountered are adjusted. But there are those who prefer to work with the full load from the very start in order to find the problems faster. The problem with that approach is that all problems come up at once, making it harder to focus on each one of them and achieve efficient solutions.
Since we are uncovering fallacies related to tests in themselves, another common fallacy that generates high costs is thinking that changes in an application subject to testing that are not noticed on screen will not affect automation, meaning the scripts that we have prepared to simulate the system’s load. In general, when changes are introduced in the system, even when they don’t imply the graphic interface, we must verify that the test scripts we have prepared continue to correctly simulate the execution by an actual user. Otherwise, we could be arriving at the wrong conclusions. When parameters are changed, for example: the way in which certain data is processed or the order for invoking methods, the simulated behavior may be no longer be in accordance with the action that a user applies to the system being tested.
That wraps up our posts about software performance testing fallacies. Can you think of any others that you have come across? Let us know!
For more performance testing fallacies, read part one.