Season three of Quality Sense is here and it starts off with a brand new topic and featured guest, a rising speaker and blogger in the testing community, Laveena Ramchandani. Seven years ago she fell into testing, working in different industries from oil and gas to finance to consumer goods. Currently, she is exploring a new area: data science model testing. Currently, 80% of UK businesses are looking to hire a data scientist or seek data consultancy. So, someone needs to go in and validate those models! Listen to the episode to learn how to she’s testing a data science model, the testing process her team follows, advice for anyone getting started in this area, and more!
Episode Highlights:
- What is a data science model and benefits it can bring for a business
- Risks when testing data science models to be aware of
- How she plans her testing activities
- Skills needed to be a tester in a data science project and advice
Relevant Links:
- Follow Laveena on Twitter – https://twitter.com/Laveena_18
- Watch her MegaTesting Week talk
Listen Here:
- Listen on Soundcloud
- Listen on Spotify
- Listen on Apple Podcasts
Episode Transcript:
Federico:
Hello, Laveena. It’s a pleasure for me to have you here in the show. Welcome. How are you doing today?
Laveena:
Hi, Federico. Thank you very much for having me on your show. I’m doing very well, actually. How are you?
Federico:
Fine. Enjoying my birthday actually. This is when we are recording this, so.
Laveena:
Excellent. Yeah. Happy birthday once again.
Federico:
Thank you. Something curious I wanted to mention here is that the first time I read your name, I thought that you were from India. Then you told me that you speak Spanish and you were from the Canary Islands and also you live in the UK, right? Can you tell me a little bit about your amazing background and multicultural background?
Laveena:
Yeah. Sure. Definitely. Basically, I was born in London and I was only three months old when I was brought to the Canary Islands, specifically Las Palmas. I’ve stayed here all my life up until 19 years old. Then I went to university. Because I studied in a British school so I thought it’s best to do university in England. I did my degree in London and I found a job in London and I decided to stay in London forever.
My parents are Indian, so yes, I do speak Hindi and I do speak fluent Spanish. Yeah. I’m a bit of a salad let’s say.
Federico:
Yeah. Totally.
Laveena:
Yeah.
Federico:
It’s a very useful set of skills to be fluent in different languages, mainly in this world where we are connected with anyone and we could be collaborating with different people from around the world. That’s amazing.
Laveena:
Yes. Definitely. Yes.
Federico:
One of the first questions I want to ask you is how you ended up working in software testing.
Laveena:
I think the answer I’m going to give is something that you might have heard a lot. I fell into testing actually.
Federico:
Yes. A lot.
Laveena:
I had no idea what testing was about. I think at university, we just touched the topic of black box and white box, but we never actually tested something. I was clueless. I got my first graduate role and in my graduate role, they said, “We have a testing project happening with one of the top oil companies.” I was like, “Okay. Let me try it.” Then it wasn’t even a graduate role. I actually learned testing on the job. I was put in as a tester and I learned with my mentor.
I thought at that point I was very confused as to what I’m doing because I was very young. I had just left university, but then slowly, slowly when I understood what I’m analyzing, I’m trying to find issues in BI reports and things like that, so I thought this area is quite interesting. Obviously there’ll be more projects in the future, more areas that I can expand my testing into.
I think I had a great fall, definitely. Yeah. That’s how I ended up in software testing. It’s been over seven years now actually, that I’ve been testing.
Federico:
I guess you like it so far, right?
Laveena:
Yeah. Definitely, because I’ve experienced so many different projects, different areas, different markets that you test in. It’s been interesting and it’s been interesting learning from other people as well, like peers you work with. I quite enjoy it and especially the testing community, shout out to all of them. It’s excellent. It’s so easy to talk to everyone. Any question you have, easy to get an answer to and vice versa.
Federico:
Yeah. Totally. I fully agree with that.
Laveena:
Yes.
Federico:
You are working right now in testing, but specifically in the data science area.
Laveena:
Yes.
Federico:
How can we get introduced to what it?
Laveena:
Okay. Basically, again, when I was interviewing for my role, I had no idea it was going to be a data science model that I’ll be testing. It’s a complete new area and I appreciate many testers are not actually involved in this area. Shout out to every tester. If you do get a chance, try and get involved. It’s super interesting because I’m testing a model that actually provides some sort of optimized results for companies and how to improve the way they are working.
Let me explain what data science is to you. It’s a comprehensive process that involves pre-processing analysis, visualization, prediction making. It also comprises statistical techniques. It’s basically data-driven decisions. You use a lot of data. You push in data, the model does the statistical magic, and then it spits out data. Then you need to understand if this data is accurate or not. That was data science.
Data science model is basically a mechanism to provide optimized results, how to improve your business, how to look at the historic data that you had to understand what your future could look like, but not necessarily telling you this is exactly what’s going to happen in your future. It’s actually helping a company be more optimized.
Federico:
How can you deal with that? Because for me, it’s very challenging. It sounds really challenging. I am really interested in how you deal with this.
Laveena:
Yes. Basically with every customer comes different data. We are not obviously using a golden dataset as of now, but we are planning to make one, to actually pick all the top features to actually make that dataset work. Currently, we’ve got so many clients and we’ve got so many demands. What we do is we ask the client to give us basically a set of data set according to our schema. We provide them a schema that our model accepts your data in this way.
If you give us this schema, then our model will ingest it. It will run the model and then it will show you what your business could look like in an optimized manner. Now, obviously there’s new data coming in. Maybe for example, a production wheel, for example, for one customer might look different versus another customer, but it’s according to the data that they have provided. According to each customer, we have seen loads of ups and downs.
It could be a certain type of column they have, or a new file that they have, which we didn’t have previously. The developers have to obviously make a code so that we can accept another new file for example, and make sure that when we are ingesting it, we can actually see that in the output when we download. Or because we’ve got a front end, I can at least see that things are happening accurately on the dashboard. The analysis looks good, the graphs look okay. Nothing looks distorted for example.
One thing actually, our model, that it does something really different to other models is randomness, stochasticity. What I mean by that is every time I ingest client files, some of the roles for example, will always be different to what it was before. Now, what that means, well, when I saw it for the first time, I thought, “Something’s broken. We found a bug.” When I sat down with the data scientist, he said, “The only difference between the first result and the second result is only about 1%. It’s not a major change.”
What I said was, “Okay. This is how you understand it, but how would a tester understand this?” What we did was we introduce thresholds. We said, because of the type of clients we’re working with, we are not actually working with any bank, for example, a financial system that we need to look into fraud, or it’s very, very critical. What we said, we thought that thresholds would be excellent. We kept a change between one and 5%.
If we see a change between that, we’ve passed. If we see something above that, we know something’s not right. We might need to rerun the model again or optimize our parameters so that we can get decent enough results. That’s something special, part of our genetic algorithms that we use as part of the model, but all data science models don’t have to be stochastic. That’s another thing as well. What you put in, you might see at the end or at the output level as well.
It’s just a matter of how, as a team, you’re bringing in strategies, how, as a team, you understand the parameters and how you can play around with these parameters. Sometimes what I’ve also done is I’m trying to move the parameters so that I put them into maximum or the minimum, just to see how my results are changing. At the same time, I’m actually testing the performance. If I max it out, it takes longer. If I leave it to a minimum, it’s very fast.
If I understand what I’m trying to provide to the client, I would obviously understand testing it better, and when I see the result, I will know for sure that this looks very realistic. Nothing looks distorted here.
Federico:
Yeah. As I understand from what you’re saying, it’s like you’re applying a lot of heuristics, right?
Laveena:
Yeah.
Federico:
It’s not deterministic maybe, but it gives you a hint when something could be wrong so someone should review it.
Laveena:
Yes. Yes. It’s more of a simulation type model. As I’ve suggested many testers in the past, when I’ve had chats with them, if you bring your testing kit with you, your testing skills, strategies you’ve learned, pairing with everyone, collaborating, making sure you test edge cases, negative scenarios, all of that will actually be so, so handy because data science looks like it’s very complex. But when you’re actually in the project, working alongside data scientists has been very, very useful.
Just questioning them. Why am I doing this? Why did you make this feature different? Why did you introduce this new feature? How is it helping the client? When they explain it to you, you understand why you’re testing it as well? I think it’s been super, super useful to bring my own testing skill set because I’ve actually used it throughout. There’s nothing new apart from the threshold side of things that I’ve learned and the data science model side of things, because there’s a bit of statistics, so when we have the databases, now we don’t use databases.
We use a flat file system. In terms of databases, I used to use SQL and I used to query the database. That was a bit of the statistical sides as well, as well as SQL. Apart from that, I think it’s been super interesting, there’s an abundance of knowledge actually and loads to learn there as well.
Federico:
Yeah. For sure. At the end of the day, I guess it’s always paying attention if we are solving the problems of our users, right?
Laveena:
Yeah.
Federico:
This is always what testing is about. Checking and reviewing if there are specific risks in the functionalities or in the software we are providing to our users. It doesn’t matter if behind the scenes we have a data science model or an AI or something like this, or if it’s a typical old style software, let’s say. Talking about different risks, maybe in this type of systems, there are specific risks that we should be aware of or that we should be trying to find.
Can you help us to imagine what type of risks are especially in this type of software?
Laveena:
Yes. Definitely. I’ve actually written them down because there’s quite a few actually. I might as well share all of them.
Federico:
Cool.
Laveena:
The first one is bad data. Now, if we have bad data, data out of date or irrelevant data or erroneous data, what happens is what we are inputting, the model will work as it is, but then it will throw results and it will look really unrealistic. In the past, when we didn’t have schemas for the clients to send their data in that manner, we would get tons of files. The consultants would have to clear it all out, make it so that it can fit the right columns, rows.
Then the data scientists would try and use it, or the data analysts would use it and it would always fail. It’s because new columns were added or it’s irrelevant, but the client didn’t tell us they don’t need it. It’s super, super important to sit down with your data analyst as well and make sure what data is coming in. Are they removing anything that we don’t need? If they’re removing anything, is this creating any bias? That is super, super important.
Bias in its own is a massive topic, but you need to understand what’s being added, what’s being removed because that’s going to impact your results.
Next one that I would like to share is bad analytics. Misinterpreting the patterns shown on your data. With data science, most likely you’re going to have some graphs.
You’re going to have algorithms basically that need fine tuning to pick anomalies. You need to understand, this is the graph. Do I understand it, or am I misinterpreting? Now, if you see any anomalies in my kind of dataset and data science model, some anomalies were looking like worrying, but because we added our threshold, they were within the threshold. We were like, “Okay. Ooh, it’s not worrying.”
Sometimes when we do see an anomaly, I have to rerun my model and change some parameters because maybe it’s my mistake that I didn’t put the right parameter. It’s super important that you don’t misinterpret a graph because then you’re misinterpreting for the team as well as for the consultants and then to the client. The clients might get happy like, “Oh wow, it’s working well.” But really, the model didn’t do what it was supposed to do.
Just be careful with those details as well. Then the cost is a risk as well. Obviously data collection, you have so much data, the aggregation side of things. Storage of data, analysis and reporting, they all cost quite a lot of money. Make sure to plan well and have the data to write granularity to avoid any spiraling costs. Make sure that you’re not overdoing things. Like if you’ve got the data, what I would say, just make sure you store it accurately.
You have to write data, anything you don’t need, just remove it and make it so that it’s the right amount and not bad data as well. In terms of two more risks that I would like to mention are privacy and security, which I would put in one section. You have to keep data safe of course, as there may be sensitive information. Clients that send us details, they might be sending us details of each and every product. It’s super, super risky to have all of that in your system, first of all.
It’s important you make sure any sensitive data you’re looking at is swiped off your system or anonymize it. The way we do it is I’ve anonymized all the datasets so now we have no risk of thinking like, “Oh, someone’s going to read this data.” Because it’s all jibberish. No one will understand what it is. Then security as well is super important because you have to be careful with data theft.
Therefore, as I just said, anonymized data, or have a golden dataset and make sure that you can understand it as a team, but no one else can understand it.
One last one actually, the selection of the wrong algorithm. This is for the data scientists. This is not for testers. Now, if you select the wrong algorithm, what will happen is it might not give you the best results. You will have an incorrect model selection. You would have a bad model validation as well.
Just be careful of which kind of algorithm you’re selecting and just research. As a tester, I looked into genetic algorithms. That’s the one that creates stochasticity. Let me just explain stochasticity in a simple analogy. Five plus three for me and you is eight, but for the model, it might be 8.0000012. Now, we would think that is wrong, but actually it’s not wrong. It’s within the one or 2% threshold that we’ve introduced. It’s perfectly fine. Not to worry. Then-
Federico:
It’s again about threshold. Yeah.
Laveena:
Yes. Yes. Actually the threshold helps. If you’re working for a credit card company or a finance company, you might want to keep your threshold even lower, like maybe 0.5 to 1% because of card fraud for example. You want to keep issues to a very minimum, any anomalies to very minimum. You can change according to what your team agrees with.
Federico:
Yeah. This is where you need to understand the business, right?
Laveena:
Yes.
Federico:
To understand which threshold makes sense and which doesn’t, right?
Laveena:
Exactly. Yeah. Yeah. That’s right. Then some of the other algorithms, you can also look into our Monte Carlo simulation and Brent’s method. Then the other kind of rules that we use are just business logic rules that we are using in Python. That’s the nutshell of the risks, that you can have a checklist in your team.
Federico:
This sounds really amazing and useful for testers starting with this type of challenge. Thank you so much.
Laveena:
All right.
Federico:
What about the testing process? Is there any special thing to take into account when you are planning your testing activities?
Laveena:
Yes. In my case, we’ve got a backend and we’ve got a front end. What I’ve noticed a lot is dependencies. That’s a big topic for me. If a new feature’s coming in and the backend is getting implemented, is that going to impact my front end? If so, I should make sure I have a ticket or some sort of acceptance criteria that includes the front-end developers as well, because if there’s a change coming in the backend, I want to be able to test that, plus, seal it in the front end.
If not, I’m going to be stuck in the middle and then wait for the front-end developers to do that. It’s super, super important to make sure you have your dependencies ticked off. Again, follow your testing strategies.
We follow a pyramid. We have our unit tests, heavy unit tests. We’ve got the integration-level testing, API endpoints testing, and then we’ve got the UI side with Cyprus. We follow the normal strategies and we obviously raise bugs or any defects that we find.
We triage them as a team and then we make sure those are fixed and retested. Then we also do release processes. Again, release… Actually testers also do the release process, which I find super, super interesting. It’s not just the developers. Yeah. It’s quite interesting. It’s good to be able to test your product both in QA, pre and prod actually so you know what the clients will look at when you release the latest features.
Federico:
It sounds very similar to any typical testing process.
Laveena:
Yeah.
Federico:
I have another question related to this. Do you have any interaction or collaboration with a data scientist in terms of data generation for testing purposes or something like that?
Laveena:
Yes. There’s been quite a few features where we don’t have the data and we have to create data there and then just to test it. Definitely, I’ve interacted a lot with data scientists. I know I definitely eat their head, for sure. I have so many questions and until I’m not satisfied, I don’t stop, but it’s good actually, because I’m understanding how I need to test it. It’s actually good. Then, yeah, so basically we tend to create datasets there and then, or we actually edit what we have to be able to satisfy what the data scientist has created, any new feature.
It’s super interesting because that’s when I ask them, why did we change this column and what are these numbers actually going to give us now? When we actually run the model and see the results, the new feature looks perfectly fine. It’s just a matter of waiting for the clients to give us that kind of data so we are ready for that beforehand.
Federico:
Cool. Another aspect to consider is what about the skills you need? Because you mentioned that the typical skills of a tester like paying attention to details, edge cases and things like these are very useful. Is there any other skill? Well, also, you mentioned having some knowledge about statistics and these types of things, is there anything else we should consider?
Laveena:
I think just reading up on data science and making sure that you collaborate with your data scientist, you ask as many questions as you can. Also, pairing up is super, super useful. Apart from that, I think reading up on data science as well as just understanding basic statistics. You don’t have to be a mathematics pro, but having a basic understanding would be super useful. You don’t have to know everything. That’s one thing that I always say to everyone.
We are not perfect and we are not meant to be perfect, but we can try to understand things. Yeah. I mean, just bring your testing kit with you and aim to learn as much as you can and ask everything you can. It’s okay not to know everything. A hundred percent fine with that. There are so many algorithms, how can you actually know each and every algorithm? It’s not possible. Never get worried. Actually be a warrior and test like a warrior. That’s what I say.
Federico:
Amazing. Amazing pieces of advice. Well, we covered skills, you mentioned the challenges, you mentioned some specifics about the process, the testing process. Is there anything else we should take into account in this specific context of data science model testing?
Laveena:
I think if you get a chance to be involved in this kind of project, please do, because I’ve heard loads of testers being in a team where there are data scientists, but they’re not actually collaborating with them or they don’t work together. I would suggest try and make it so that you can work with them because you understand the product better rather than just testing it at the last moment and trying to release it.
I’m sure there’s loads of data science projects going around. See if you’re interested. Yeah. If anyone wants to have a coffee catch-up with me just to get more knowledge, understand the data, or just understand how the data science model works, then I’d be happy to do so.
Federico:
That’s amazing, Laveena. That’s right. Two final questions that are not specifically related with testing. One is related to maybe productivity, we could call it. If you have any habit or something that you want to suggest people to form.
Laveena:
A habit, I think, is to ask loads of questions. Raise your hand all the time. I would say communicate a lot. If you have ideas that could help your team, make sure you mention it because you know what you’re doing. Even if it’s not a priority at that moment, you might get pushed back, which is most of the times, but at least you’ve actually sown a seed in people’s head. If, for example, accessibility was not part of your team, just by creating a small meeting and mentioning it, you’ve actually shown your team that there’s something that we can look into.
Maybe not right now, but in the future. At least they’ll remember that, “Yes. Oh, Laveena mentioned this. We should maybe look into this. Maybe not the current sprint but maybe three sprints down we should bring it in.” Don’t be worried. Just mention whatever you can, because at the end of the day, we are all responsible for quality and whatever you can bring in would add value. I would suggest, try and do that. Don’t be put off if they say no to you because of the priorities in the team, just try and push for it.
Federico:
Yeah. Developing curiosity, right?
Laveena:
Yes. Yes. Definitely.
Federico:
It’s really important. Also, this is the way to avoid biases, to work on different areas that maybe there are known unknowns.
Laveena:
Yes. Yes.
Federico:
The things that we don’t know that could be hiding a risk or a problem, right?
Laveena:
Exactly. Yes.
Federico:
Cool. Amazing. The last one, do you like to read? Do you have any books to suggest?
Laveena:
I’ve not read any recently, but I do know that Perfecto released a book with Eran around AI. I would suggest that would be quite useful. I think quite a few actually testers have formed part of that. Raj Subrameyer’s written something there, even Jonathon Wright has written something there. It’s mostly about the future of testing and how machine learning and AI could help with testing. I would definitely suggest trying to read that. I think it’s called AI-Driven Testing.
Federico:
Laveena, thank you so much for all the recommendations, all your knowledge. Is there anything you like to invite our listeners to do or to follow you or to reach out?
Laveena:
Yes. Definitely. I’ve just come on Twitter. Please add me on Laveena_18. I’m on LinkedIn. It would be nice to catch up with any of you who’s interested in data science model testing and just share some knowledge around that area.
Federico:
It was not only a pleasure, it was amazing to listen to you and learn from you, so thank you so much. Thank you.
Laveena:
Thank you, Federico. Really, really glad to have me on your podcast.
Federico:
Enjoy the rest of the day. Bye-bye.
Laveena:
Thank you. Bye. Bye.
Did you enjoy this episode of Quality Sense? Explore similar episodes here!
Recommended for You
Embracing AI Based Testing: A New Era
Quality Sense – Anand Bagmar: What You Should Know About Visual Testing
Tags In
Abstracta Team
Related Posts
Best Software Testing Companies 2022
Which are the leading software testing companies in 2022? Clutch published its “Top Software Quality Assurance Companies” list and we want to tell you everything about each one of them. With this article, we aim to help you choose the software testing provider that best…
Quality Sense Conf 2023, A Celebration of Technology and Quality in Latin America
Would you like to know how Quality Sense Conf 2023 is redefining the IT industry? Dive into the stories of our speakers and discover how this event is helping to position Latin America as a leading digital hub in the world. In the wake of…
1 Comment
Leave a Reply Cancel reply
Search
Contents
Great piece Kalei. I think it is so insightful and I highly suggest people listen to the whole podcast but the breakdown was extremely beneficial.