I have been pondering our QA at work and how that must work out for game development. There is never enough time to do as much as we would like, and there are always more ways things could potentially go wrong. Let’s narrow that discussion to the multiplicity of test environments and the length of regression testing.
Let’s assume you can enumerate all the changes your latest update going to have, all the systems that are affected or reasonably could be, and all the things you should check to see if it performs as expected (and does not do new, unexpected things when the player/user does not perform as expected). Now that you have all of those test cases, go make sure it works in every environment your game/program officially supports.
12 settings for screen resolution? Go make sure the icons look all right on every size. Now do that for every graphics card you support, and multiply that by a range of hardware and setting configurations. Make sure you have someone colorblind look at them, and remember that there are different kinds and degrees of colorblindness, and maybe for extra credit have people with other vision problems give it a once-over, because I know how my strong prescription can affect the way light refracts. Great, that covers checking the icons, let’s start on some gameplay…
Regression testing is critically important and inadequately done. The previous paragraph started on testing to make sure the new stuff works. Regression testing is making sure the new stuff did not break the old stuff. You take a test case that worked last week and make sure it still works this week. And here is the hard part: you really should have comprehensive test cases, because programming is subject to complex and subtle interactions. It is as if changing a light bulb in your house could cause the chimney to collapse; it shouldn’t happen, but occasionally the interactions are weird.
In my old system, we had a reasonable regression test plan. The system had fixed start and end points for our core processes, so we had test cases we fed through the system from start to finish. If they changed unexpectedly, something went wrong. I am getting that going on my new system, taking something from start to finish to make sure it still works.
You cannot consistently and comprehensively regression test an MMO. If it takes 400 hours from 0 to the level cap, plus 400 hours of content at the cap, it takes a month of doing nothing else 24/7 to complete one run-through, and that run-through certainly does not exhaust all the different ways one might level to the cap. And your game sometimes releases four updates on patch day. You have one hour to QA the change to a dungeon that takes more than an hour to complete — go, and don’t forget to document! So you take shortcuts: you test individual pieces, you have godmode test commands, and you generally try to focus on what should have been affected, because you cannot even play through the whole game once. Veteran QA testers who develop a sense of where things could break are extremely valuable, because they can efficiently find things that would embarass you in front of your customers.
Your game has a bug in how something accummulates over time, or only when you hit particular levels, or when you combine two pieces of equipment with a specific enchantment along with a particular class ability? Nope, there is no reasonable way to test all those combinations. This is why you frequently see bugs that happen with normal leveling; testers do not have time to re-test normal leveling.
Recently, League of Legends had a bug when you combined one type of boots with one type of enchantment on the boots, and for all I know it only happened in one game mode, with one champion, etc. Let’s start multiplying through: there are seven types of boots you can enchant times seven enchantments you can put on them, so 35 things to check, plus the base boots, and then there is some tiny chance the boots will function differently if you buy the upgraded boots all at once, buy all the pieces separately and combine them, or buy the base boots then jump to the final boots. That is a potentially incomplete list of configurations to check to see if the boots themselves acquired a bug during the latest update, before seeing if they interact badly with changes to champions, abilities, maps, timers, other items, and more. And then you should check all the things like really testing whether you run faster and how fast and is it still working properly with slows and other speed buffs, including ones that wear off over time. And then boots can have other stats like increased attack speed or magic penetration, and to be really comprehensive you should check all the stats on all the boots.
It’s absurd to try to be comprehensive, but my programmers and testers used to have the following conversation at least once a month:
Programmer: Why are you testing that? We didn’t change that.
Tester: But it changed.
So when I say a game is a buggy mess, I am not necessarily disparaging the testers. They have a lot to do, being truly comprehensive would take years for each build, and they may have a few days to test the build.
And remember: after you test the build and find the bugs, you should start regression testing over to make sure the bug fixes did not introduce new bugs. Have another month handy for testing?
One thing you really hate as a tester? Reporting a bug, the management team sends the build live anyway, and people complain about the testing on the forums. No, testing went just fine…