Get a Feedback Loop and Listen to It
Imagine writing a novel like this: every two weeks, you gather thirty people into a room and ask them to read your draft. When they’re done, they fill out a series of questions. “Did you understand what was going on at all times? Did you understand the protagonist’s motivations? Did you feel compelled to read more? On a scale of one to five, would you recommend this novel to your friends?”
Imagine it doesn’t stop there. While the readers are reading, you watch them the entire time. How long does the average page take to finish? When does their pace slow? When do they skim? Imagine a camera on these readers’ faces constantly, tracking their eye movements across each page, data that is then aggregated and mined for trends. Imagine their brains wired, too, looking for activity related to rational reasoning, emotional response, excitement, imagination.
The data you gather feeds back into the revisions, and, two weeks later, you are testing your novel again.
In 2007 Wired Magazine did a cover story on then-upcoming Halo 3. Though the magazine could have picked from any number of interesting angles for the story– how it was the Xbox 360’s biggest punch yet in the war with the PlayStation 3, or how its features like Saved Films and the Forge editor mode would bring Web 2.0-style user-generated content into the console space– its editors chose instead to focus on the playtesting lab at Microsoft and its unprecedented data-mining capabilities. The deck to the article didn’t mention Bungie at all. It read, simply, “How Microsoft Labs Invented a New Science of Play.”
When the article came out, I pointed a friend towards it, feeling proud of myself for being part of something deemed a big deal by the cover of a Condé Nast publication. But my friend came away from the article upset, even disturbed. Though the writer had almost flippantly tossed off the notion that video game development also “involves artistry, obviously,” he clearly saved his real swooning for the lab– for the thousands of hours of recorded play, the gigabytes of log files, the propellerhead science– not so much the game as an experience.
“Where has the creativity gone?” my friend asked.
Nobody really disputes that playtesting in some form or another is indispensable in order to make good games, of course. Understanding how an audience may react is tricky even in a linear medium such as the novel. At least there, we can assume readers will start at page one and continue to the end. In games, though, our agency enables our habits, and out habits become our blinders. You don’t consciously know that you always strafe to the right to avoid grenades– you just do, and your design ends up reflecting that. The choice then is to ship it that way, or to show the game to a lot of people, some of whom will instinctively dodge to the left or backpeadal or who might try to bunny-hop over the damage radius (something you’ve never thought to do).
It’s no surprise, then, that the playtest’s leading proponents, including developers like Valve and Bungie, are often known for the high quality of their titles. The feedback is valuable because games are systems with many moving parts, players included, and predicting how it will all interact is impossible for a single person or even a group of people. Psychologists tell us that once something is learned it is very difficult to imagine having not known it. The playtest allows us to see what it is like to not know our game.
If design by playtest creates the most perfect games, however, then how do we reconcile this with games as a medium in connection with the arts? For while it is easy to accept the notion that in mass-market entertainment there may very well be the optimal action film trailer, or the optimal casino layout, to speak of art in its artiest sense as a thing that can be optimized in one way or another is heading down a tricky path. If game design is actually a series of tests, and we employ scientists to perform empirical research to determine the best path, then where (like my friend said) has the creativity gone? Given a certain set of goals, is there a “correct” game design, an optimal design from which any further innovation is unnecessary?
When put this way, it is tempting to try to refute the design by playtest as a practice that can get in the way of an auteur’s personal expression: if people find a part of my highly idiosyncratic art game frustrating, well, that’s just part of the point. (I am a frustrated artist, after all, and my art is to make you just as frustrated as I am.) If we want games to travel into this realm of meaningfulness, the argument continues, we will just have to learn to accept unorthodox control, impossible puzzles, and general obtuseness as part of their repertoire.
Whether or not this is true, I think the uneasiness we get from a reliance on playtesting most often stems from the way it can feel like the tests are telling us what to do– that as creators we have relinquished our control to the mob of the focus group, or to some impersonally codified rules of human behavior. Over-dependence on playtesting, we fear, may lead us to a middle-of-the-road game that is the average of all games, unremarkable in every way.
These are legitimate concerns. But it is also important to remember that as personal as they can be, art, entertainment and video games are all transactions: things that occur, somehow, in the space between the creator and the audience. The playtest is a tool, one that has evolved to help us grapple with that single most important quality of games, the reason they are so beguiling and why they are so problematic: their interactivity.
Used properly, playtesting does not tell you what to do, so much as it tells you what you have in front of you. It shines a light into the possibility space of the game– a light of a different color or from a different angle than you are used to, one that makes possible a better understanding of its true shape. In this sense, a game designer’s artistry is not thwarted by the playtest. The artistry is present in how he or she reacts to the results. Data is just data, and it is up to us to decide what we will do about it.
Commentary |
April 20, 2010 
Reader Comments (13)
Your book analogy at the beginning is interesting but doesn't entirely apply, does it. Because most novels are used in exactly the same way, and the usability of books as a medium is well understood. I'm sure book publishers do, in fact, test different paper consistencies and typefaces on prospective audiences to deduce the optimal reading experience -- an excercise that's less about the artistry of the prose and more about the practical experience of reading.
So it is with playtesting games. The idea for GlaDos in Portal didn't come from playtesting. Instead the playtesting was used to dull the hard edges of the game, reduce the difficulty spikes, make sure that players were responding to the intentions of the design team generally as the designers hoped or expected. This way, more players tolerated the experience of the game and were able to appreciate its artistry through to the end.
In other words, I agree with you. On the surface there seems to be a conflict between the overuse of playtesting in game development versus the idea of artistry and authorial voice coming through in the end result. But I think highly playtested games like Portal prove that the two halves can coexist just fine.
But like you say, playtesting could simply reveal that the people you aren't making your game for don't get your game. And that's okay. Just don't make 'frustration' a pejorative and we're on the same page.
The thing about games, unlike nearly all other types of art/ert/media/whatever, is that they are artifacts people use. There's no question of functionality in reading a book, or watching a movie or admiring a sculpture. Any failing with these things is due to its content, not its function.
With games, their functionality and their artistry are related (perhaps even dependent), but they are not the same thing. Playtesting address the former well and if you're lucky, may give some insight into the latter.
Thusly, playtesting is great and everyone (aside from maybe Bungie and Valve) should do more of it.
Playtesting should function in the same manner - you want concrete, well-defined ideas behind all your gameplay mechanics and content, so that you can use playtesting to verify those ideas, and spot places where players aren't having a good time. Based on that, you can figure out what (if anything) needs to change. Data (especially from experiments) is easy to reinterpret to fit a particular worldview, especially when you're dealing with highly opinionated programmers or designers, so having theories established beforehand helps keep people honest.
Hypothesis: Players are dying a lot here because this hallway is too small. If we widen the hallway, players will die less.
Run a playtest that compares the old version of the map with a new version that has a wider hallway.
Did players die less in the hallway? Good, your theory remains viable. Did they keep dying just as much? Did they die more? That indicates that you're missing some important piece of information to understand how players are interacting with your map, so you need to step back and take another look.
A blog post by an ex-Google designer once remarked on how frustrating it was that they ran experiments in order to select the best shade of blue for a particular UI element. That style of experimentation seems to be the thing that people are most afraid of, when it comes to playtesting, A/B tests, etc - an almost perfectly mechanical design process, eliminating any room for creative expression. But I think that as long as bureaucracy doesn't get in the way, a well-informed designer is always going to make the final decision.
Sure, a mechanic may seem completely impenetrable, and maybe even a little counter-intuitive at first, but is this a worthwhile price to pay for a feature that ends up being more effective in the hands of a player who spends the full 10 hours with a title?
I'll freely admit however, that I have no experience in the field, so I've next to no idea about how long an individual playtester will spend on one product.
Is it the case that new players have to be brought in with every build to more closely mimic the experience of the first time players? Or will an employee spend months with a single product, allowing them to fully get to grips with the game?
If anyone could give me an answer I'd be very grateful.
Half of the fun in games is discovering the method to overcome each challenge, and developing one's skill. I have been playing a game called Rage of The Gladiator, where the purpose is to learn the timing, and pattern of each enemies, and expoit them to win. It is challenging, yet extremely fun. Enemies who moved in convenient patterns, fitting my instincts and movements, would be much easier, and less fun to beat.
These experiments set a very dangerous precedent in an industry whose remaining imagination seems to be concentrated, almost entirely, in one country--and I do not mean America!
The reason this is probably difficult with an action game is that there is often very little in the way of an intended message. It would be like a producer criticizing a Rambo movie by asking a writer, "why does he use the machete? Why not the AK?" -- it's hard to defend any decision when you had no reason for making it in the first place other than that it seemed like a novel idea at the time.
There's no reason for play testing to have to become focus grouping. But if it does become that throughout the industry, then it won't be the first time a creative endeavor was stunted or watered down and simplified to be more appealing to the masses, right? Just look at the music biz. Once there's enough money and enough momentum, no label is going to take a chance of confounding the dumber members of the audience.
If you do a focus test for a game before pre-production even begins, in effect asking a room full of average folks "Hey, we're making a first-person shooter, what kind of first-person shooter would you like to play?", then you are going to get the demon straw-man of this discussion, the Soulless Designed-By-Committee WW2 Shooter.
I'm sure this happens but I've been fortunate enough to have never worked at a company that engaged in this practice.
On the flip side, you bring in testers late in development, after you've already decided that you're making an FPS set in an Underwater Objectivist Failed Utopia - after you've already pretty much made that game - and you just need to find out from testers where the levels are too dark, whether the analog stick tuning works, whether guys deal too much damage to you.
In the middle, there's the contested ground. The testers say Helpful Radio Man's southern accent is "kinda corny", so you recast and make him Irish for the next test. Was the southern accent part of the grand beautiful vision, or a usability variable? Usually: both, kinda.
For the time being I think testing in AAA games, as commonly practiced, serves as a source of useful constraints for an immature medium desperately in need of them.
For major publishers, most books will have gone through a structural editing (where the editor has tried to resolve any problems with pacing, plot, etc), a copy editing (sentence structure and grammatical errors) and a proof-reading (making sure no errors have snuck through to the print copy).
There's also book design redrafting, but that seems besides the point.
Anyway, each will often be performed by a different editor, and will rarely be completed in one draft. And that's not counting everyone who's read the book in the process of writing or selling it to the publisher.
So, yeah. Book editing is very much an iterative process.
Did I have a point... ah, yes.
The difference is that the formal iterative process begins much later in the life of the work. It's easier to shape something once it's fully made. And it's more worthwhile solving copy problems once structural problems are sorted out. From what I know, it'd be pretty tough to do that with games.
Also, much of the iterative testing is done by professionals, not amateurs. Most of the observational stuff in terms of reading has been done - an experienced editor will know whether someone is going to bunny-hop to avoid a grenade.
From what I've read as regards Portal, Valve do the same thing - they test internally with staff who presumably know a fair bit about games.
Also, editors have more power in a publishing relationship than QA and testing seem to in game design.
Of course, the other value of iteration is that it gives you structural cohesion and prevents you rabbiting on around an uncertain point.
I'm sorry.
*slinks away*
This is a legitimate concern and fear -- and something I bring up with developers that I work with and have talked about as a reason (or excuse...) folks give to not do user testing.
[slides from a recent talk where I call out this "excuse" to not do user testing: http://tinyurl.com/25q7tjo ]
The example I often talk about that the game creator gets to decide if up is down, left is right, the sky is green, and then grass is blue. I just use user testing to help the creators and vision holders understand how their target gamer will experience their world. If we observe that people struggle, but want to continue and keep playing then that's one thing. If there is frustration that leads to abandonment, that's another.
Most importantly, though, is helping creators understand discoverability and usability issues. Do players even *notice* the cool ways there are to interact with the world? When players notice the interactive component, can they figure out how to use it in a way that promotes engaging and rewarding game play?
My approach (that I detail in other talks) is collaborative in nature. I want the game creators and vision holders to understand deeply what the current player experience is and then we work together to figure out how to solve for the gap between design intention and player experience.
This is done in real time (the observation room is a social place) and is most effective when the key visionaries and implementers can iterate on possible solutions quickly so we can re-test and validate while the issues are still fresh in our minds.
Of course, this is all stated as an abstract aspiration... Once we start collecting and quantifying user experience data, then other folks want to start using it to judge marketability -- or even worse, to stack rank titles that are vastly different and are tested at different stages of development. Part of my job is to manage that flow of information, too, and ensure that it is not misused by well-meaning but misguided folks.
Ian Fischer (Robot Entertainment, ex-Ensemble) also wrote a good piece on playtesting, including its problems, in January: