Quality Sense Podcast: Andy Knight – Test Automation at Scale

Welcome to another episode of the Quality Sense podcast! Today I bring you an interview I had with Andy Knight. Also known as Automation Panda, he describes himself as a software enthusiast, with a specialty in test automation and behavior-driven development. And he is a developer advocate at Applitools. In this episode, we discuss Test Automation at Scale, how to keep up with changes in the system, how to maintain the quality of your test suite, and more.

Episode Highlights

How Andy got into software testing and specifically, automation testing.
Doing test automation at scale.
How expensive (and different overall) it can be to make a mistake within a 10 people team versus a 500 people team.
How to distribute tasks within an automation testing team. What’s the importance of this?

Relevant Links

Listen Here

Quality Sense, a Software Testing Podcast · Quality Sense Podcast: Andy Knight – Test Automation at Scale

Episode Transcript

Federico:

Hello, Andrew. How are you doing today?

Andy:

Doing great. How about you, Fede?

Federico:

I’m fine. Thank you. And I’m really happy to have the opportunity to have this conversation with you and learn about your experience. The first question I would like to ask you is to learn a little bit more about you, please introduce yourself, tell me how you ended up working in software testing and what are you doing today?

Andy:

Sure. Sounds good. So I’m the Automation Panda, Andy Knight. You can follow me on Twitter @AutomationPanda. And like you said, I am a software enthusiast, you could say, with a specialty in testing automation and behavior-driven development. Right now I’m a developer advocate at Applitools where I show people all the awesome ways they can do testing and help them learn better things, and also do that on behalf of the company. So learn from the community and take that back into Applitools to help them make better stuff.

I’ve been doing this for over a decade. Gosh, how did I get into software testing? Well, when I graduated college… Or I should back up. When I was in high school, I discovered programming and I knew I wanted to get into software development somehow. So I graduated high school, went to RIT, got degrees in computer science and coming out of college, all they do is turn you out to be a developer, right? They don’t have any sort of nuance as to what kind of career path you could take; either you’re a developer or professor, right?

So I landed in the industry and I ended up just picking up testing-oriented type responsibilities. In my first internship with IBM, my first summer there, I was doing software testing. I was doing automation before it was cool. Then later in 2011, when I joined NetApp, I joined a QA team where my whole job role was test automation and that kind of sealed the deal. So I’ve been doing that ever since. For roughly the first decade I was an individual contributor engineer, in the trenches doing the testing, doing the automation. It was only last November that I joined Applitools and became a developer advocate, which is a slightly different kind of role, but still in the same space.

Federico:

It’s amazing how your first experience marked a lot of your career, right?

Andy:

Mm-hmm (affirmative). Mm-hmm (affirmative). Indeed.

Federico:

Yesterday I read an article you shared, that you wrote some time ago related to how to mentor testers. It’s a very good one.

Andy:

Oh, thank you.

Federico:

And also it’s really important because this is something that we typically do in our team and I found many perspectives very interesting, especially those typical software engineering techniques or practices that I believe most developers do. But when we do testing, maybe I’ve seen that not all the people take those into account, like code reviews or even static code analysis, using SonarQube or something like this, in order to learn and continue learning and improving your coding skills. Because if you are doing test automation, you are coding and you have to be good at it, right?

Andy:

Exactly. Mm-hmm (affirmative).

Federico:

And how you presented different ideas to mentor and to coach and help others to grow in test automation, I think it’s fantastic.

Andy:

Great. Thank you. Yeah. I mean, way back when I was starting at IBM and then later at NetApp, we didn’t have resources for teaching you how to do testing. We didn’t have a test automation university. I mean, I think at that time there was a small community and people were starting to give conference talks and all, but it wasn’t like a big thing like it is now. And so I didn’t really have a mentor in automation. What I had was a broken hand-me-down project and I had to learn it all the hard way. And that was painful and I don’t want anyone to have to suffer through that again.

Federico:

Yeah, yeah, yeah. So on the main topic we wanted to discuss today, it’s related to doing test automation but in big frameworks. It’s not only starting the first couple of test cases that you want to automate, but what happens when you need to scale that solution? So maybe to start talking about it, what do you think are the main differences of doing test automation at scale?

Andy:

Hmm, that’s a great question. One of the hardest challenges at doing test automation at scale is keeping tests running reliably, that maintenance burden, because we all know test automation can be rather fragile, right? There’s lots of race conditions, there’s lots of opportunities for things to break either because a locator wasn’t that great or the product underneath changed, or there was just some inappropriate waiting condition that hit an edge case. And the more tests you have, the greater chances that something is going to go boom.

Federico:

Yeah.

Andy:

And while I’ve seen teams be successful on a small scale, like up to 100 different tests. They can maintain those, “Oh, if something breaks, it’s not a big deal,” but when you have 1000 tests, 5,000 tests or you’re running literally continuously around the clock in these high-skill environments, failures start popping out left and right, and teams don’t know how to keep up with that. And as soon as that happens, then everybody devalues the test automation. They’re like, “Well, this is just noise. This isn’t helping, this isn’t protecting anything.” And so it becomes defeating, right? The whole point of having those tests is to be a layer of protection, they get ignored or even worse, people bypass them or ignore tests or remove them or even argue, “Why are we investing so much time and effort in this? We should stop.”

Federico:

Yeah.

Andy:

That maintenance nightmare to keep up. That’s why, if you’re going to be doing high-scale test automation, you really, really need to double down on those good development practices. Like you mentioned, like doing code reviews for every single change that goes into a test code repository. Or things like static code analysis, things like following design patterns. All those sorts of aspects help a team congeal around good practice and that keeps the tests maintainable and reliable.

Federico:

Yesterday, I was talking with one of the leaders. We are trying to fix some automation they were working on. And the challenge they have now is that many test cases were designed in a way that if something fails and you want to see what was happening or try to review if it’s a flaky test or if there is really a bug, the test case takes so long because it’s a very large end-to-end flow. So the whole test suite is built like that. So it’s really hard to distinguish what is information and what is just noise, as you said, and that’s related to bad design decisions, right?

Andy:

Mm-hmm (affirmative). Mm-hmm (affirmative).

Federico:

Maybe when you have a couple of test cases you can manage to… you can afford the time, the extra time you have to spend in order to review those test cases. But there is an inflection point, I would say, where when that problem becomes impossible to keep up with, right? You would spend more time analyzing problems than taking advantage of the information you have, right?

Andy:

Yeah, yeah, no, what you’ve said touches on something that I’ve seen so many times, and it’s so painful. And it’s a misunderstanding of the type of test case that is appropriate for automation, right? When we go back in time, when we look at when there was no automation, when everything had to be tested manually, and there would be teams and even organizations of manual testers just beating up on large applications, right? I’m not talking about tiny things. I’m talking about big things where you would have these large test phases, you would have test leaders and managers go in and schedule a subset of tests that the team would run manually for a week, right?

In that world, it made sense to have long, big end-to-end test cases that covered multiple behaviors as what I call a grant tour, because when you’re manually testing, manually testing is inherently slow. One person can do only one thing at a time, and so it was advantageous for a manual tester to hit as many things on their way as they navigate through the app as they could, because that was the most efficient use of their time. And so you ended up with very long test cases.

I remember looking in things like the quality center or application life cycle manual, whatever those tools were, these big test case repositories, and seeing test cases being like 57 steps long, or 118 steps long. And when I get assigned these back in the day, I was like, “Dang, oh, this is painful. I don’t want this. Here we go.” Right? But when teams started moving to automation, the first wave was this idea of, “Let’s just take our manual tests and convert them to automated tests,” with no thought of what makes an appropriate type of test for automation.

So they took that 57-line long test case procedure, and like, “Let’s just automate all the steps.” And so then, now you have teams who have literally paid, or these organizations that have paid teams worth of automation engineers to do this for years without thinking, and now they’re stuck with these horribly long test cases that they fail in step 17. “Oh, the whole test case blew up. Well, what behavior was it?” “I don’t know. It was in there somewhere.” Right? They didn’t think to break apart individual behaviors, to have shorter atomic tests that when that fails, it’s one behavior, and based on the name of that test, you know exactly what the bug is. Right? So I feel that pain, I feel that pain.

Federico:

I never thought about that, why is this happening? Because you’re right, when you are doing manual testing it’s like, you are working with some context of the application and probably the most efficient way to do it is to continue working on the same context. But in automation, you can skip all the steps just with an API call and that’s much more efficient. Yeah. Right.

Andy:

Yep. Yep. Once you can start to identify individual behaviors, break apart, like you said, jump in, “Hey. Instead of going to a form and uploading data and a webpage, hit the API,” right? Boom, it’s there. And then you can have that nice isolated thing, test that individually. And technically, it’s still end-to-end because you’re hitting the whole stack, right? You need the full application running and deployed. That’s what end-to-end means to me, not that you take 57 steps through this gigantic workflow.

Federico:

It’s not end-to-end from the beginning of the journey to the end of the journey, it’s end-to-end considering the different layers, exactly. Yeah.

Andy:

Mm-hmm (affirmative). That’s my view.

Federico:

Yeah. Perfect. And another question, is there any particularity regarding the organization of the team?

Andy:

Hmm, that’s a great question too. Man, so I personally don’t have preference to how teams are structured, whatever works works. What I’ve seen is the pendulum swing between we have separated QA or testing as a specific role with people dedicated in that, that are siloed and they do the stuff, to the pendulum swinging the other way where it’s we don’t even have testers, developers just pick up testing responsibilities, and it’s all purely agile cross-functional and stuff. What I can say are pros and cons I’ve seen of each. And so again, there’s no right answer, it’s just a matter of pick your poison.

So when you have that separation of role and of team structure between a developer and a tester, that’s what they do. Man, I can’t even… English, sorry, back up. The challenges I see with that, are the cross-team communication can be a little difficult, right? Because if they’re even reporting to different managers, it’s going to kind of go up and over and down sometimes. And so things can fall out of step. It can also foster a bit of an us-versus-them mentality, which is unhealthy. It can also foster this feeling, especially from the developer side, of saying, “Well, the testers are going to take care of it; therefore, I don’t need to be concerned about that. I’m just going to crunch down and code and not worry about any checking my things or being as careful. It’ll just go through QA and that’ll catch it for me.”

It’s like, “No.” So those are some of the issues there. The advantages though, are that you get that optimization within specialization, right? Because you’re going to have people who know how to develop tests, whether for manual testing, whether to go off and be exploratory or whether to build up a test automation system, which those skills most developers don’t have out of the box, unless they’ve been in that type of roller situation before. I mean, most developers could probably pull down a framework like Cypress or something to start throwing some tests in there, but I mean, could they engineer a high-scale test automation system of 2000 different tests running 5- to 10,000 iterations on a daily basis at 50 to 100 times parallel scale? Maybe not.

Maybe they could get there, but out of a box, that’s not what they’re setting themselves up to do, if that makes sense. So those are the pros and cons that way. With a fully integrated team where there are no dedicated roles, the developer does everything, the advantage in that is that developers have to be more mindful and so they have to pick up these things and learn them. And that’s very much a shift left thing so problems get resolved a little earlier. But again, they may lack that type of fine-touch, specialized skill. And you could end up getting into a place where you have blind leading the blind or they might max out at a certain level before things start falling apart. Pros and cons.

Federico:

Yeah, of course. But is there anything particular to the way you distribute responsibilities within a… let’s imagine we are talking about this automation team, are different people responsible for implementing the test scripts?

Andy:

Okay. Okay. There are definitely different kinds of tasks to be done. Different kinds of work items. For me, the main split comes down to, is somebody going to be automating a test case or are they going to be working on something at a solution or framework level, right? And I would divvy those up based on what are people already working on? What are their interests? What are their skills? For example, if there’s this new feature coming up and somebody’s already been heads down working in that area, it’s like, “Okay, you’re going to get the test to automate for that when the time comes,” right?

Whereas if I’ve found, for example, myself being in more of the builder and the leader kind of position on these kinds of teams, I’m typically the one who’s doing those cross-cutting concerns of, “Hey, we need to improve logging in our framework,” right? Okay, let me go in there and tear things apart. Those would be the two kinds of coding tasks I would see kind of separate.

Federico:

Yeah. Because I’ve seen teams working in two different ways, I would say. On the one hand, you have those who have a test automation backlog, so-

Andy:

Oh, okay.

Federico:

And then you have others who are more embedded into the team and they participate. They are more like a tester but also maybe they also are in work with manual testing tasks, and I’m doing quotes for manual, and they also automate, but they are involved even in the design of the features, right?

Andy:

Yeah. Yeah.

Federico:

It’s really different than if you have a separate team, but their focus is like, okay, we need to automate this for the revelation, for the small test or something. And here is the backlog and I, as a tester, as a manual tester again, I can design and ask you to help me with automation of these steps, right? So what do you think?

Andy:

So yeah, you touch on something big there because like so many… What you’re describing there was what I would call in-sprint automation where it’s like, you would have that tester role on some sort of development team, whether it’s matrix then or they report or whatever, but it’s like, okay, let’s just presume an agile framework where every two to three or four weeks, they’re doing a sprint, right? And there are deliverables that come out of that sprint. And so that person on that team is responsible for the day-to-day and sprint-to-sprint activities of making sure it’s good. Excuse me, they’re in the trenches, they’re learning the features, they’re trying to automate the tests or manually run the tests while that’s happening.

Versus, okay, you have some sort of team that’s not embedded in that, that’s more kind of separate that, “Oh gosh, we’re already behind the eight ball. We have this huge backlog. This big technical backlog that we’re just trying to burn down to get some sort of basic regression that we can run nightly,” right? Two very different ways of operating. I mean, I’ve been at it where it’s… Both, where it’s… Okay, in my previous company we had both, right? It was a giant backlog from years of not having automation that most of our team’s effort was trying to chew down. But then we would have someone on our team, “Okay, you’re going to go on this front-end team and you’re going to sit there and listen. And when it’s appropriate, you’re going to chime in. You’re going to suggest we should automate this, that, and you’re going to be there,” so you can work both, but you’re right, it’s two very different modes of operation.

Federico:

Ah, probably in a situation where you decide to start automating and you have a previous system with a history of different features, right, maybe you need both to get involved in the new features or changes, and also try to automate the regression of the existing ones.

Andy:

Correct. Indeed.

Federico:

What about test environments? Any particularity, when we are talking about doing automation at scale?

Andy:

Yes. Yes. So the dream, the dream, if you can achieve it, would be that you run your tests in independent isolated environments, right? That you don’t have any other fingers in the pie, you don’t have any interruptions coming in. And I think the easiest way to achieve that is if you have your application containerized, because you can just take the container from the most recent build, throw it out there, throw it on your Kubernetes cluster, whatever, wherever you want to run it, whatever cloud service. And that’s the other thing, cloud service, wonderful. Put it out there on whatever cloud service deploys very quickly, and then you just target your tests against that.

And so when you want to run your tests, you grab the container, you deploy it, you wait a minute or two, you hit your barrage of test suites against it. You get the results, you look at the results, if everything’s good, you just tear down, clean up, no big loss. If there are failures, you can freeze it and then go and poke in and investigate. And when you’re done, okay, then you can destroy it. That is the ideal.

Unfortunately, many of us don’t live in a perfect world where we don’t have our app containerized. Some of us also live in a world where it is far beyond the capability of the team to just go containerize the app. And it’s not because of any deficiency of developers or testers, it’s sometimes that’s just the way things are. It could be bureaucratic, it could be historic, it could be whatever reason. So if you can’t get the containerized version of the app where you can have those spin of those independent environments and such, you’re probably going to be stuck in some sort of shared environment system.

And this is pretty common. I would say, this is probably more common than the other case, especially the larger the application you’re testing, right? Because if you need a test suite of thousands of tests, end-to-end tests, you’re probably not testing an application that fits in one container, right? right? It’s probably multiple things; it’s a huge front-end and multiple databases and cashing and services out the wazoo and it’s this big whole thing.

So in that case, what you’re stuck with are some sort of shared environments, you still need to have some sort of sense of isolation, right? If that means, okay, there’s a development environment and a staging environment and a production environment, maybe you want to come in and set up a test environment that’s basically a clone of depth to say, “Okay, every time that a code change is committed and it kicks off CI, not only are we going to deploy internally here, we’re also going to deploy internally to this test environment and that’s going to kick off our end-to-end tests around the clock,” right?

So that way developers can still go muck around on their environment, staging is still somewhat preserved, but the tests are still running continuously in a non-interruptible way, restricting access to who would have access to that environment so that, again, nobody comes and mucks it up. Because yeah, that’s a big frustration where it’s like somebody comes in and tweaks a setting, all of a sudden, all of your tests blow up because they were expecting A here instead of B. Right?

Federico:

Yeah. But not only configuration, also the test data, because even in the ideal world that you were mentioning, are you talking about using mocks for the database or using a database that maybe is mutating as you run the test because this is a very common challenge, right? Yeah. I mean, absolutely.

Andy:

Yep. Yep. So honestly, for an end-to-end test, I don’t like mocking data, but when I say mocking data, putting in like a fake service in there, just having dummy data spit out. I like having data inside of a database that is actually being used because it’s more real, whether that data is meaningful in any sense or not, right? If you have a tool that just pukes out fakely generated data, but it’s in the right structure, “Okay, good enough,” right? Another way you could do that is kind of siphon data from production and scrub it and sanitize it and all that. That’s another way. But in terms of… Yeah, you’re right, because anytime you have isolated environments, it’s like you kind of have clone data and it’s like, “Okay, whatever happens there doesn’t matter,” right?

But when you have shared data, now you have issues of things like, well, if you run your test in parallel, they could collide on that shared data. You could have a collision. I mean, okay, my previous company was banking so it’s like, okay, you create an opportunity, meaning like a loan application, you create an opportunity; one test creates it, another test comes and deletes it. Ooh, that’s not good. If that’s the case, what I find is at the test design level, you need to make sure that you’re avoiding data collisions, such that every test would create the opportunities that it uses so that no other test would come in and wipe that out, right?

Or if you absolutely have to have a situation where you’re modifying shared data in a way that would affect other tests, those tests cannot run in parallel; they have to be separated. And then you raise the question of testing risk, “Is it worth it to maintain that separately?” Maybe not. Maybe we call that a testing risk and we keep moving on. And we have too much value to chase, let’s chase this value instead of that value, because this is easy and this is hard and they deliver the same level of value, right? You start making those trade-off decisions. Yeah, testing is a nightmare.

Federico:

Yeah, maybe analyzing how stable are they if there is a real problem, because in some cases we have faced, like we run the same suite in the test environment, in the test data that we generated, okay? So we have more or less control on the test data that we are going to find. But then we used to run the same test suite against the station environment where they cloned the production data based from the day before.

Andy:

Oh.

Federico:

We didn’t know what information was going to be there exactly, so we prepared some SQLs to gather the data, should feel the test cases before the test cases has started to run, right? But again, if I need some opportunity in certain states, maybe yesterday, I didn’t have any new opportunity in that particular state where I want to test something, right?

Andy:

Yep.

Federico:

So there are a lot of things related to the test design. We also have to take into account how the data is going to be used, how that can have an impact on other spaces. It’s beautiful.

Andy:

Yeah. Another strategy that I’ve used that was helpful… Because you touched on the, “Do I pre-populate the database with explicit data that is in a format I know, and the tests are hard-coded to expect that type of thing, like by record name or by ID, or do you just peek inside the database and do a search to find one that matches your criteria?”

Federico:

Exactly.

Andy:

And that’s what I would call discovery. We had that issue because between different environments, it’s filled with certain records that actually, based on the design of the app, we, as the users could not go in and modify, it was like kind of pre-populated based on backend processes and stuff. And it’s like, well, we need to test behaviors that are hinging on those types of things, we have no control over setting it ourselves, but it’s just populated there from whatever.

So we had no choice but to go in and have mechanisms that would go through the API, just like get a whole blob of data and just kind of go one record, one at a time search, “Does this match what I need? Does this match what I need?” And when we find it, “Okay, now we can use this. Here’s the idea for the right one,” right? And you hope and pray that it’s in the database to begin with. That it’s not like you go through the entire record set and there’s nothing that matches. Right? Oof. Things you got to do.

Federico:

Yeah. Yeah. Otherwise, you can create your data for those situations where you don’t find anything, right?

Andy:

If you have the capability, if you have the back door.

Federico:

Of course. Yeah. You’re right. So moving on to some other aspects that I consider are important in those situations. What do you do or what considerations we need to take into account with the test results? Because we probably are talking about hundreds of tests, we need to summarize information, but also we need to be able to explore all the details of particular test execution, right?

Andy:

Mm-hmm (affirmative). Yes.

Federico:

Any suggestions there?

Andy:

Hmm. First suggestion is to know your audience because what you’re going to tell your fellow teammates is going to be different than what you tell your manager, is going to be different to what your manager is going to tell your VP, right? The VP doesn’t want to see logs. If they see logs, you’re fired, because you’re screwing up your job, right? Whereas if the team member doesn’t get logs, the results to them are useless because they can’t figure out what’s wrong. The way I like to kind of handle that or structure is to have reports that can go from high-level to low-level, right? High-level is like percentage past, names of test cases that failed and that’s it, right? Done, right?

But from there to be able to expand or to link to, “Okay, failing test case,; here’s the steps of the whole test case, here’s exactly where it failed, here are links to the screenshots, the videos, the artifacts,” all that kind of stuff. If you’re using a cloud-based tool that, for example, let’s say you’re doing webdriver based tests and you’re using Sauce labs or LambdaTest or even Applitools’ Ultrafast Grid, right? Any of those things, it’s like, okay, those portals and dashboards that are online, they’re going to have all that nice and for you. You go to the test and boom, and it gives you everything. If you have to DIY locally, that’s something that you’re going to have to develop just in the same sense that you develop your test cases, right? Depending on your framework, it could be good or bad. I hate to say it; most frameworks out of the box have pretty lousy reports. Some of them are, for example, SpecFlow is actually really, really good. They have their SpecFlow+ Runner report, which I think is great, and then their LivingDoc report, which is really, really nice for that high level view of stuff.

But yeah, I mean, you may need to add a little bit of effort to drawing out the types of things that you find are useful to your team, right? Whether that’s embedding artifact links to things like screenshots or gathering… Like if you’re doing API requests, capturing the request and response that was made automatically, and then putting that in, let’s say, a JSUN file or something that you can link to from your report to see, “Well, why did I get a 400 Request response?” Boom. That kind of stuff.

Federico:

Yeah. Of course. Top down is the approach, first have a general idea and also gives you the possibility to dig into what happened?

Andy:

Yep.

Federico:

Okay. There is another challenge that I typically find when working with a lot of test cases, which is how to keep control of the coverage because there is a moment when you have like hundreds of test cases and there is a new boundary case that you say, “Okay, it would be nice to have this case automated. Do we have it already or not?” Should I explore all the code? Maybe it’s not well organized, so it’s difficult to find where exactly it could be or… How do you manage that challenge?

Andy:

Gosh, yeah, that’s hard and there is no good answer, not that I don’t know a good answer. It’s that a good answer does not yet exist.

Federico:

Okay.

Andy:

Because this is a thing where we need to be careful to separate tests that cover code from tests that cover features, right? Unit tests versus end-to-end tests. And if we’re not careful on this, then our managers and superiors are going to misunderstand and there can be all sorts of problems. When it comes to code coverage, that’s easy, right? Because unit tests are white-box, they directly interact with the source code. It’s tests that test the source code to be… You’re verifying, “Did you write it according to how you wanted,” right? And so those coverages, you check the lines, you check the branches, there’s tools to automate that, I think you mentioned SonarQube, right? Boom. And you’re done and you can pinpoint exactly where it is, and that then tells you exactly where to put it.

When it comes to feature testing, these integration end-to-end tests, that does not exist, right? I mean, some people I’ve seen have tried to say, “Okay, well, you can instrument your build and then run over your end-to-end test and you’ll see the modules that aren’t covered,” I call that bullshit because that again is testing code, it’s not testing features. Features are the behaviors that people use within an application. And so we’re asking for behavior level coverage. What are the things that we’ve covered as far as what users can and can’t, or should and shouldn’t do with our product under test? And it’s very much a heuristic kind of thing.

It’s like, you have to have somebody with that knowledge, that familiarity, that expertise of the product and of the test suites to be able to go in and say, “Oh no, we don’t have that area,” or to have it well organized to say, “Okay, well this is the sub-folder where we keep all of these automated test cases or even manually within a repository where we keep the test procedures for this area. We don’t have it there.” So there’s an organization sense that can kind of help you when you look, if you don’t see it, you don’t have it, therefore go… either you choose to add it or not.

If I can go on that, that’s why good test organization is very important. I always recommend people organized by feature area or functionality area or behavior versus organizing by release. I’ve seen teams where their test project is like, “This is 1.1, and this is 1.2, and this is 1.3,” and I’m like, “How do I look for things like the features of adding new opportunities?” “Well, we got some here and some here and some here.” “No, no, no, no, no. That’s horrible. I can’t search for that.” Right?

But anyway, an alternative approach a team could take in terms of, “How can I judge feature coverage in terms of automation?” If you take the behavior-driven approach and you are shifting left and you are defining your behaviors in a language like Perkin and then go in to automate them, and you’re doing examples like… Not examples, you’re doing activities like example-mapping to kind of map out what the things are, you could get a somewhat quantifiable measure of feature coverage like that.

And here’s how it goes. When someone has a story that they want to do, if you take that and you do example mapping on that story, what you’re doing is you’re taking that story, getting the rules defined, which then becomes your acceptance criteria. And you’re getting the examples defined, which then become your test cases, right? By the end of example-mapping, you have those as artifacts; they’re defined. At that point, it’s just a matter of implementation, both in terms of the feature on the developer and the test cases for automation on the tester.

Your form of feature coverage could be, how many of those example cards have automated test cases? Right? Okay, so for this particular story, there were a total of 11 examples that came out and based on a risk-based strategy, we have chosen to automate 7 out of those 11. So we have… Gosh, I should have done 10. Let’s say you had 10. Okay, let’s make it easy math, it’s too early for me. Let’s say you had 10 examples and you decided to automate 7; 7 out of 10, 70% automated coverage.

Federico:

Okay, yes.

Andy:

Based on known behavior. That’s the way that you can kind of quantify it, right? Now, the pre-supposition there is that you fully mapped it out and there’s no thing that is missing in your examples but any hoo. Again, that’s why this is a hard problem. You can kind of judge, but I’ve had VPs ask me this and I’m just like, “Bro, you’re asking for something impossible.”

Federico:

Yeah, and the closest answer to that, what you just proposed, it’s not something that only depends on the tester. It’s related to how the team, the whole team is working, right?

Andy:

Exactly. And if the whole team is not working on that process, you can’t do that, so it’s like-

Federico:

And you mentioned there are 10 examples, how good are those examples. Only features have enough examples to consider that, like this is 100%, because otherwise it’s like measuring, “Okay. We have 70% of something that we don’t know if it’s enough, or if it doesn’t make sense to have it as a goal,” right?

Andy:

Mm-hmm (affirmative). Mm-hmm (affirmative). Yeah.

Federico:

Okay, cool. So I already asked about what you think of using a tool like SonarQube for your test automation. Do you typically use an internal, or?

Andy:

So if I’m automating some sort of unit test or something, absolutely, I’m going to be using some sort of code coverage. If it’s something like SonarQube, if it’s something like, in Python, Coverage.py, just something there to have it and then include that with CIS that you build, you test, you do your static analysis. Linters, yeah, I use them from time to time. I’m not super gung-ho about it. It’s more of, “Okay, I’m working in this IDE and it just slaps that on and it points out some things to me. Cool.” Right? That’s how I use linters.

But then again, I’m also very, very picky in how I style and write my code as well. So I haven’t found that I’ve needed linters as much as I think some other people have needed them, but nevertheless, they’re still useful because it’s… Especially when I’m using a language I’m less familiar with, like if I have to drop into JavaScript or something and say, “Oh, this thing pointed it out to me. Nice. Okay. I’ll fix it. Now I remember for the future.”

Federico:

Yeah. Especially I think they are very useful to continue improving your skills in coding, right? Because in my experience, for example, I only know how to program in Ruby for test automation, also for Scala because I used to work with Gatling. So I had to learn some Scala just to work with Gatling, but I cannot program a system with Scala. You know what I mean?

Andy:

Yeah. Yeah.

Federico:

It’s like, having this type of tools and also an idea that helps you with some basic stuff also is a way to continue acquiring some good practices.

Andy:

Oh, yeah.

I mean, you mentioned Ruby and Scala. I had this similar kind of thing happen with csharp.net because in my previous two roles, I was basically doing C# all day, every day. But before that, I hadn’t really done .net development. Historically, I had known Java and Java is very similar to C# but there’s these differences. And so when I first started doing C# day-to-day, I was like, “Okay, let me fumble around, figure this out.” And Visual Studio’s like, “Oh, you should try this. Oh, don’t do it like that, do it like this.” I’m like, “Oh, okay, cool. Will do.” And I learned as I go.

Federico:

Yeah. Yeah. Yeah. And of course, peer reviewing with someone else with more experience, or more experience or not, with another programmer or another test automator, I think this is another excellent way to continue learning and improving your skills. Excellent.

Andy:

I can’t imagine developing any code; product code, test code, whatever, without code review.

Federico:

Yeah.

Andy:

Every single line, every change should go through a code review. I’ve talked with teams where I’ve asked them things like, “Hey, what do you think is the appropriate percentage of changes or poor requests that you make, to undergo code review?” And I’ve had teams say, “Oh, I mean, I think 25% would be a good target.” I’m just, “Oh my God.” Face-palming. “No, no, no, 100%, no questions asked,” but… Yeah.

Federico:

It has a lot of advantages. It’s not only making sure that your code gets better with another couple of eyes, it’s also sharing knowledge across the team, right? Because, I don’t know, tomorrow I am taking some days off and someone else needs to improve or correct or review something in my code, so there’s a better chance for them to do a better job if they review, right?

Andy:

I want to rephrase what you said there. It’s not just about making your code better, it’s about making you better.

Federico:

Exactly. Exactly. Yeah. So I have another big question.

Andy:

Sure.

Federico:

Which is related to the performance of the test suite, because we knew… I mean, when we have hundreds of test cases, they are going to take longer and longer to give you the results. Can you tell us different strategies to improve, to make the test suite run faster?

Andy:

Yes. Oh, yes, I can. So there’s macro and micro. In the big, big picture, if you are running large end-to-end

test suites, they will take a long time. I like to call attention to the rule of ones. If we look at the classic test automation pyramid, love it or hate it, the layers still apply. Unit test, integration tests, end-to-end tests. And if we’re talking in a web app context; unit test, API test, web UI test, that’s how they kind of break down.

How long does a typical unit test take? About a millisecond. Order of magnitude, right? Boom. Done. Very fast. How long does an API test take?About one second; make the request, you wait for the response, you check some stuff, maybe there’s a lot of data, maybe there’s not. Order of magnitude is one second. How long does a typical web UI test take? One minute. Why? Because you have to get the web driver up, you got to wait for the page to load and you click and you login and you wait and then you click on these things and you wait. Right?

Now, sometimes web tests might be shorter, sometimes they might be longer. Around average is one millisecond for unit, one second for API, one minute for web UI; the rule of ones. Boom, boom, boom. So how long does it take to run a suite of a thousand unit tests? One second. Who cares? Right? How long does it take to run a suite of a thousand API tests? Okay, a couple minutes, right? Like 16 minutes or so? No. Okay. How long is it going to take to run a thousand web UI tests? Almost 17 hours.

This is if you’re running serially one after the other. 17 hours? That’s two thirds of the day, right? Oh, I mean even if you wanted to stuff that as an overnight run, mm-mm (negative), you’re bleeding into the morning. So in the macro sense, one of the takeaways we can get from the rule of ones is that you need to parallelize. You can’t not parallelize at scale. It has to be done, right? Because otherwise you’re literally having trade offs of, “Well, maybe I run my test suite in… split my test suite up in five portions and run a different subset every night of the week.” And that’s not acceptable.

So parallelization is absolutely necessary. And the larger your suite, the higher scale you need for parallel testing. How do you do that? Within your code, you have to make sure that every single test case is truly independent, right? That means that it’s not colliding on any shared data. That means that it can run by itself apart from any other test cases. You don’t need to have one test case run first to set up something for the second one. Oh my gosh, how many times have I seen… Oops, sorry. Oh my gosh, how many times have I seen this problem? You can edit that out. How many times have I seen this problem? It’s painful.

Things like don’t use global variables. Oh my goodness. So many times I’ve seen people have in Java or C# write static variables that are not constant, that one test will change and store data. As soon as you run in parallel, that collides right there. It’s not even in the application under test, that is a programming data collision. Oh my gosh, that prevents you from effectively running parallel tests. That’s why you use dependency injection instead of global variables, every time. Right?

So you have those issues to resolve, but then there’s, how do you build the infrastructure to scale as well? Because okay, on my local machine, you can run one parallel thread per processing core roughly. You got a 4 core machine, you can run up to four tests. And in fact, I found this more like three, because there’s other crap going on on your machine, right? And it’s like, okay, so I could run three tests on my machine in parallel. And that’s cool and that’ll cut it down significantly. But if you’re talking like you have 2000 tests and you want to run this in a somewhat continuous way, now we need… Three ain’t going to cut it. We need like 30, 50, 100 in parallel.

How do we do this? It’s not about scale up, it’s about scale out. So you need tools like Selenium Grid that you can set up to scale out, you can distribute your… Well, for web app testing you distribute your web driver sessions so that you can do something like 20 in parallel. What I found is that, in my previous company we did between 50 and 100 times parallel with our own in-house Selenium Grid. Everything was in Azure cloud, so we didn’t have any latencies or anything, which is really nice.

If you use cloud-based providers like the ones I mentioned before, typically you’re going to have a slow down, unless you’re doing something like Applitools Ultrafast Cloud is a bit different, so you don’t get the slow down but your traditional ones, you do get a slow down of like two to four times, which is, “Ooh.” And you pay a lot for that, “Ooh.” Not trying to throw shade, just being real about it because I’ve used them all and it’s like, “Yeah, that’s what happens.” If you build your own Selenium Grid, you can have the opportunity of tuning for faster speed, which we, in my previous company, because it was all in Azure cloud, the same region and everything, we did not have any slowdown between running locally versus running in that grid.

But then you can scale up to get these large numbers of parallel tests, and if your test code is developed properly, then you can… All the tests are independent, you spread them out on however many sessions you want; 3, 30, 100, and it just goes and it comes back and it’s great. That’s like a necessity. You need to engineer that type of solution if you’re going to keep up. We were able to run about 1000 tests in a 15 minute window.

Federico:

Wow.

Andy:

At my previous company. And that was 50 times parallel with Selenium Grid. What we would do is every time there was a code change with the application code, it would go through unit testing, it would then get deployed to our test environment and then we would kick off our, what we call the continuous tests. And that was a set of 1000 and it would complete in 15 minutes, start to finish. And that included powering on the VMs that ran Selenium Grid, waiting for that to get ready, hitting it with the battery of tests, shutting it down, publishing reports. Full setup, tear down for the test suite. The test suite itself, just the execution of the test was about 8 to 10 minutes.

Federico:

With how many in parallel, you said?

Andy:

50 tests in parallel, a total of about a 1000 tests because they run in 8 to 10 minutes.

Federico:

That probably brought some other advantages to the test, which is you’re testing for performance at the same time?

Andy:

Oh my goodness. Oh my goodness. Absolutely right. Because it was, in effect, a de facto load test because of the sheer amount of weight we were putting on the system. I mean, I had developers questioning, “Why are we running tests at such a high scale? We’ve never hit that load in production.” And I had to tell people, “I’m not trying to make it a load test. That’s not the goal here. The goal is to get as much functional testing done as possible, and by the way, we’re getting a load test.”

So yeah, on a daily basis, my team and I pushed more load than had ever been seen in production. And we found performance issues. For a while, I thought that we were seeing the app choke and basically freeze, like at three to four minute intervals, it would just freeze, you see this wall of red in our tests and two minutes later would recover. And we’re like, “What the heck?” And I thought it was, there was something poorly tuned in our Selenium Grid, something poorly set up in our test infrastructure and it turned out, no, no, no. The app couldn’t keep up. We uncovered performance issues at scale in the application because of the sheer magnitude of testing we were pushing. And when they fixed it, it just flew beautifully. I’m like, “This is amazing.” True story.

Federico:

I want to stress how important it is to first think that there is a problem in the test, right? This is always how you first think, it’s like, “Probably there is a problem with the test.” Research that, and if you can’t find a problem there, maybe there is a problem in the system, right?

Andy:

Yeah. Yeah. Because you don’t want to be the boy who cried wolf, right?

Federico:

Yeah. Exactly. Man, this is amazing. I’ve learned a lot talking with you.

Andy:

Cool.

Federico:

I also want to ask you another question. If you have to recommend a book, which one would you choose?

Andy:

Specifically on testing and automation or any book at all?

Federico:

Whatever you want to share.

Andy:

Okay. So I’m going to give two books, one that exists and one that does not yet. I’m writing a book on software testing. Hopefully it’ll come out in like a year or two because I’m slow to write. It’ll be with Manning Publications, the working title is The Way To Test Software, where I’ll be teaching how to do all the things we talked about today.

But in general, if there’s one book that I recommend to people who want to improve themselves at whatever they’re doing, whether it’s testing with automation, whether they want to be managers, whether they want to be more effective in their families or in the communities that they’re a part of, I recommend The 7 Habits of Highly Effective People®. It is an excellent book and it has helped me and has helped many other people I know.

Federico:

Excelled, from Covey. I will share the-

Andy:

Oh, yes. You know the author, you know who?

Federico:

Yeah. Yeah. Excellent. Is there anything else you would like to share, to invite the audience to reach out in social media or something else?

Andy:

Sure, sure. So I always love meeting new people. I love it when people slide into my DMS, that’s how we met.

Federico:

Yeah.

Andy:

It’s like, ” You are Automation Panda. You want to chat?” “Sure. Why not?” So the best ways to reach out to me would be; Twitter @AutomationPanda, my blog AutomationPanda.com. Wow, I see a trend. You can also try to find me on LinkedIn, but there’s a lot of Andrew Knights in the world, so that might be a bit difficult. But yeah, I love meeting with people. I love chatting. People send me questions about stuff all the time. I do my best to keep up and answer.

Federico:

Excellent. Excellent. Thank you so much, Andy. I really appreciate your time and your experience.

Andy:

Well, thank you for inviting me. This was fun.

Federico:

Have a nice day. Bye-bye.

Andy:

You too. Thanks. Bye.

Blog

Quality Sense Podcast: Andy Knight – Test Automation at Scale

Episode Highlights

Relevant Links

Listen Here

Episode Transcript

Andy:

Tags In

Federico Toledo, Chief Quality Officer at Abstracta

Search

Categories

Read the Ultimate Guide to Continuous Testing

Contact Us

Blog

Quality Sense Podcast: Andy Knight – Test Automation at Scale

Episode Highlights

Relevant Links

Listen Here

Episode Transcript

Andy:

Tags In

Federico Toledo, Chief Quality Officer at Abstracta

Related Posts

Innovations in Software Testing

9 Reasons to Pick Uruguay as your Nearshore Service Provider

Search

Categories

Read the Ultimate Guide to Continuous Testing

Contact Us

Subscribe to our Newsletter