A/B Testing, a Love/Hate Relationship

A/B testing heavily affects our design choices. We let it statistically identify the difference between good and bad practice, also known as evidence-based design. It is a powerful tool, but more often I see A/B tests hindering real progress. A/B testing can easily drive you into the wrong design direction. We allow ourselves to be easily influenced by numbers rather than by the idea's behind those numbers. As a result, statistics can crush other ideas, ideas that could bring us further and deliver better results in the long run.

Companies like Google and Facebook make the exact same mistakes. It's worrisome. This isn't the only reason why big players with huge budgets continue to make serious design mistakes, but more on that another time. As a result, everyone copies these wrong design choices without any hesitation. But if we want to innovate we need to consider the results critically and as well the research itself. I wrote already how this applies to eye tracking and personas. Now it is time for A/B testing.

First a short introduction. A/B testing is a way to compare two situations, for example different colors for a button. Some of your visitors use 'situation A' and others 'situation B'. The results are easily measurable and clear. You can measure the impact on both the micro conversion and the final conversion. One could argue, you can scientifically proof one situation is better compared to the other. And therein lies a problem.

Even scientists draw false conclusions based on significant differences. In science, two factors help us to make it less likely to draw false conclusions and possibly correct them afterwards. First of all, a scientist starts with a solid theoretical framework. I remember at the university my supervisor maintained hammering on the fact that the theoretical framework should be airtight. As long as you have enough data, you can 'prove' the most bizar untruths. A theoretical framework protects you from that. A framework is a combination of previous research and practical hints. The second factor that helps is that scientists are very critical to each other and verify the results with their own research. Although repeating research is not sexy, these additional research uncover false conclusions. That may sound like nitpicking, but it is the only way to create knowledge we can build on.

Back to A/B testing. I will show you a clear example: the 'hamburger icon' that hides the navigation menu on mobile.

First result: use an icon without label 'MENU'

James from Exis Web performs an A/B test with three different situations in response to a discussion on Twitter.

Situations for the first test.
  1. The hamburger icon without any addition.
  2. The hamburger icon with a small label "MENU" underneath it.
  3. The hamburger icon with a border.

Why he compares these exact situations is not completely clear. From the past we know icons are less clear to users, compared to labels.

In every study that considered the question, icons were demonstrated to be more difficult to understand than were labels, especially at first viewing, which contradicts one of the most frequently cited reasons for using icons, namely, comprehensibility for beginners.

– Jef Raskin, 2004, The Humane Interface, p.170.

You expect situation 2 will win. On the other hand, the label in situation 2 is written so small it is questionable whether it contributes anyway. Also the affordance of situations 1 and 2 are far from perfect. By adding a border, you create the idea of ​​a button, and you increase the affordance. Eventually situation 3 wins, probably because of the affordance. By not building a theoretical framework before the test and consequently not performing a proper test, you can easily draw the wrong conclusions: the word "MENU" is not needed and adding a border around an icon proofs to be the right solution in this case.

Although the results are not significant, there is a winner. A misleading winner. I think James from Exis Web realised this and performed an additional test.

Follow up: use 'MENU' as label

James compares four situations in his follow up test. Later on, he partially repeated his test.

Situations in the second test.
  1. The winner of the first test: the hamburger icon with a border.
  2. The word 'MENU' with a border.
  3. The hamburger icon next to the word "MENU" with a border.
  4. The word 'MENU' without a border.

Situation 2, with the label 'MENU' and border, wins from situation 1, the original winner. This was easily predictable if we used knowledge from the past: affordance is essential and text is always better than icons.

The result is better, but still the research remains incomplete. Two relevant aspects are still missing: the location of the button, and the way the label is written. The location of the button is far from ideal. The button is positioned at the upper left corner of the screen, the upmost difficult place to touch. Top right would be much better. Also, uppercase text makes it harder to read. My advice would be to spell the word with one initial capital as 'Menu' and not as 'MENU'.

The alternative.

Although it would be very interesting to add these variables to the test, I want to propose a complete different approach. Forget the whole menu. You should not hide important navigation behind a bland menu button. When navigation is not important at all, the menu button is also as irrelevant.

Forget the whole menu

Because of the A/B tests, we forget 'menu' is a bad solution anyway. It hides important navigation. Anthony Rose from zeebox shows with his A/B test, having a menu hidden behind a button reduces the engagement of your users to half!

Around Sep 2013 Facebook switched to a new side menu design. Surely if Facebook was doing this, then it had to be good… right? [...] But when we looked at our analytics, it was a disaster! Engagement time was halved! It looked like “out of sight, out of mind” really was the case.

Anthony Rose, 2014, The Next Web

Surprisingly they choose the worse version of the menu button: hamburger icon without affordance, very small, light and located at the top left of the screen. Again a poorly executed A/B test. Although there is a lot to say about the setup, it doesn't change the fact that a menu button hides important information and the label 'Menu' or the hamburger icon says nothing about its content. A no-no in information architecture.

The problem reveals a lack of integration between content and user interface. We should not design user interface and content as different parts of a webpage. I wrote about the 'menu' button earlier. The menu button represents an transitional solution to make 'old desktop menu's' available on mobile. Instead we should design the homepage as a menu itself and for underlying pages we should move the 'old fashioned menu' to the footer. A new pattern without menu button at all. This is not only a good design choice for now, for mobile. Already 14 years ago, Jakob Nielsen wrote about users having problems with the way we design navigation for the old fashioned desktop computers.

For almost seven years, my studies have shown the same user behaviour: users look straight at the content and ignore the navigation areas when they scan a new page.

Jakob Nielsen, 2000, Is Navigation Useful?

Again, the winning situation from Anthony Rose's A/B test is still an old fashioned menu and is far from ideal.

A/B testing, a love/hate relationship

To be clear: I appreciate James' and Anthony Rose's additions to the discussion about mobile menus. My aim is to use these test as a illustrations for the pitfalls of A/B testing.

A/B testing is a very powerful tool. The same is true for eye tracking. Although I remain critical about these methods, these methods are also tools that helps us to innovate. Remember you should always create a strong theoretical framework before you start, remain critical about other studies and stay open to new ideas that might initially oppose the results of research.

Bart van de Biezen

Design Lead at Incision. My background: Industrial Design and Psychology at the University of Twente, graduated at Philips in midair pointing for the next generation TV's, Apple Design Award for CSSEdit, usability researcher at MetrixLab and blogger.

You can contact me via email or Twitter, GitHub, or Flickr.