ChatGPT: Will AI Replace Testers?

Only a few weeks have passed since the platform went live, but it is already revolutionizing the way many people work. The purpose of this article is to explore the impact of this issue on software testing. We will examine a few ChatGPT possible uses in software testing and discuss whether it can replace testers or help them. 

By Sofía Brun

AI-powered content development tools, such as DALL·E 2 and GitHub Copilot, have been on the rise for the past few months, years in some cases. These tools allow users to generate images and text, or code when talking about Copilot, from inputs based on natural language. 

ChatGPT is arguably one of the most discussed examples of these kinds of developments since it recently became available massively. It is a chatbot powered by Artificial Intelligence (AI) that uses Natural Language Processing (NLP) to generate responses to your questions and prompts. 

With training on a massive dataset of conversational text, ChatGPT can accurately imitate the way people speak and write in a variety of contexts and languages.

“We are living a historic milestone. It is the first time that AI is at everyone’s fingertips, without friction or difficulties; it is the tip of the iceberg of everything that is to come and all we will be able to do with it,” emphasized Federico Toledo, co-founder, and COO at Abstracta.

“I think we must question and educate ourselves, to be prepared and be able to take it as a support tool instead of a threat,” he outlined.

Let’s get started!

Before we delve into ChatGPT, let’s touch on some related concepts that might come in handy to understand it.

What is Artificial Intelligence?

AI is a broad field that involves the development of intelligent machines and systems that can perform tasks that would typically require human intelligence, such as learning, problem-solving, decision-making, and perception.

What is Machine Learning?

Machine learning is a field of artificial intelligence that focuses on developing algorithms and models that can learn from data and improve their performance over time without being explicitly programmed. These algorithms and models can be used to make predictions or decisions based on data and adapt as they are exposed to new information.

What is Natural Language Processing (NLP)?

NLP is a subfield of Artificial Intelligence that focuses on the interaction between computers and humans through natural languages, such as speech and text.

NLP includes a wide range of tasks: text and speech recognition, language translation, and text summarization. It also involves the development of algorithms and models that can understand, interpret, and generate human language.

NLP has many applications, including chatbots, voice assistants, and text analysis. It has the potential to transform many industries and has already had a significant impact on the way we communicate and interact with computers.

What is Generative AI?

Generative AI is Artificial Intelligence that can generate new content by utilizing existing text, audio files, or images. It leverages AI and machine learning algorithms to enable machines to generate content, just like ChatGPT does.

Can ChatGPT Test Better Than Me?

When we see these kinds of developments, our initial concern is often whether they will be able to replace us or be a helpful tool for our job.

As part of our effort to better understand the impact of an AI like ChatGPT, we attempted to use it in some of the activities that testers might do as part of their work: designing test cases, test ideas or test data, automating scripts, reporting errors and assembling SQLs to generate test data or verify results.

1. Designing test cases and test data

At first sight, we thought that we got what we were looking for. But taking a closer look, it is clear that the test is written at a very high level, missing important information that could mislead any tester. 

But for an AI, despite having some missing steps, such as clicking on the right-corner “Shopping Cart” button, it’s a pretty impressive output.

Something interesting to see is that if we repeat the same question, ChatGPT seems to improve its responses. On a second round (a few minutes later), we saw that those steps that were missing were added. A few days later, we repeated the request, and it warned us with a clarification that it had returned general steps since it did not have access to browse the internet.

We don’t think this use case can be helpful, but let’s explore other ideas. Imagine we have to test a system with specific inputs, and we are running out of ideas. We could brainstorm ideas with the chat! We played around a lot with requests and questions like: 

  • Give me test data for a login form.
  • Can you give me test ideas for a bank transaction?
  • Can you help me with test data for a date picker including edge cases?

The answers were very interesting and precise, in some cases even explaining why we should try this case. 

Another idea we had to use the chat in our favor was asking for specific test data that might take us some time to get or to write:

  • Give me 10 city names with special characters.
  • Give me last names with more than 15 characters. 
  • Give me a 500-word paragraph using all the characters.

2. Automating test cases

Once we had a test case, we tried was to automate it, so we gave ChatGPT the following instruction:

“Give me the code to automate that test case using Java, Selenium, and JUnit.”

This is the output we got:

Followed by this explanation:

This code uses the Selenium library to automate a web browser and interact with the OpenCart website, and the JUnit library to add assertions to the test case. The ‘@Before’ and ‘@After’ annotations are used to set up and tear down the Chrome driver before and after the test case is run. The ‘@Test’ annotation indicates that the ‘testAddToCart’ method is a test case. The ‘Assert.assertTrue’ method is used to assert that the product was successfully added to the cart. If the assertion is true, the test case is considered to have passed. Otherwise, the test case has failed.

Before talking about the generated code, we would like to highlight one aspect that caught our eye: the ability that ChatGPT has to preserve the context of the conversation.  We must outline that we didn’t use the test as a direct input. We just referenced a piece of the conversation. And that is one of the features that streamline the creative process.

Back to automation, at first sight, it may give you the impression of complete success. But taking a deeper look, we can notice that it has some considerable mistakes, such as lack of assertions, wrong selectors, and unnecessary added logic, things we avoid while developing automated tests.

As of now, we just have an isolated test, but it is well known that we don’t usually do that. So what if we would like to use this test to create our automation framework?

Let’s try to use Page Object Model to see if it can improve our test:

“Add Page Object Model”

As you can notice, it failed to complete the page objects, even when we retried to ask to do the same. 

Even when the output is quite smelly code, it’s a great start. We could use this piece of code, refactor it, and we would end up with a “decent” framework to start working with.

Let’s add another layer, what if we want to do it as a Maven project? We asked to generate our pom.xml.

“Generate pom.xml maven project for the last test”

Even though it failed to complete the task (we noticed that this is a typical ChatGPT error, returning an incomplete code), it gave us a starting point, and we can refactor this piece of code and end up with a framework to start with.

3. Combining test data

Pairwise (all-pairs) is a very useful technique in testing, but it is a challenging thing to calculate by hand; you need a tool. So we asked ChatGPT to calculate it for us, with its variables and values.

While ChatGPT interpreted the request correctly (which is not an easy task), it got it wrong: it applied the Cartesian product (another data combination technique) instead of all-pairs.

This showed us that we have to be careful with the tool. It can help us in many things, but you have to pay attention, not fully trust, and look critically at the result.

4. Reporting errors

Reporting errors is not simple, we need to find good ways to communicate certain aspects to avoid hurting someone’s feelings. ChatGPT can help us improve our writing,  providing greater clarity in conveying information,  and even managing to do so in a more friendly and effective way, especially if we are reporting in a language other than our native tongue.

Can you improve this error report to be more polite and not hurt the developers feelings?

System not working as expected.I needed the login button to be red and very visible, on the top of everything else, instead, it is not legible, orange, and I need to scroll to find it.


Sure, here is a revised version:

“Hello, I noticed that the login button is not quite meeting my expectations. Specifically, I was hoping for it to be more visible and easy to locate by being red and at the top of the page. However, it is currently orange and I have to scroll to find it. Is there any way this could be adjusted to better suit my needs? I understand that the developers have put a lot of effort into creating this system, and I apologize if my feedback comes across as critical. Thank you for your hard work and for considering my request.”

5. Assembling SQLs

Here we made a very concrete request for an SQL to query certain data from some tables that were not completely described. We liked that the system made a good explanation and facilitated the code.

Of course, it is necessary to review its answer in detail to define if it is optimal. Even so, sometimes it is easier to start from something already built than to start from scratch.

Wrapping Up

ChatGPT is a promising tool, but our job as testers requires a lot of analytical and logical mindset and an empathetic view of the user’s reality, and it is an intellectually challenging activity. Cutting to the chase, it shouldn’t be taken as a ‘‘replacement’’ tool, but it can help in different situations when used with care.

We can use it in creative ways to improve our work or develop valuable ideas, for instance:

✔️Generating test cases or test data, helping with test ideas.
✔️Assist you in drafting, and brainstorming.
✔️Avoiding blank page syndrome when creating SQL queries and trying to come up with test ideas for a particular flow.
✔️Improving the way we communicate errors or results.
✔️Refactoring code or generating some base for what you are trying to implement.

If we rely on these tools without engaging in critical thinking, we risk producing low-quality results and perpetuating any inherent shortcomings of the tools. 

So, can ChatGPT test better than us? Nowadays, it cannot, as mentioned by the CEO of OpenAI in this tweet:

“It is an initial path in which we must pay close attention to the biases that can be generated. ChatGPT is a very powerful tool, and it’s crucial that we can use it with critical thinking,” highlighted Federico Toledo. 

“The biggest risk is people believing everything without checking. In my opinion, the human being will always need to supervise and validate what is done, which is a great opportunity for software testers,” said Fabián Baptista, co-founder, and CTO of Apptim.

“That is why it is increasingly important for testers to learn to program and understand how things work, as well as to understand these intelligent assistants so that they can help us do a better job,” he continued.

“The tool has many blind spots. Human and mature testing will continue to be necessary in order to consider accessibility, cybersecurity, different risk situations, and much more,” stressed Vera Babat, Chief Culture Officer of Abstracta.

In short, human curiosity and critical thinking will continue to make a difference. Still, it is one of the most exciting technological breakthroughs with significant potential and great impact. This area will continue to be a topic of discussion and exploration.

Federico questioned: “Is what we have today state of the art, or is it just a sneak peek of what is out there?.” What do you think about this? Will this technology help testers? Tell us in the comments!

Follow us on Linkedin & Twitter to be part of our community!

355 / 398