Verbatim analysis in the real-world

Introduction

Last week, we had the great pleasure of presenting at the Quirks Virtual AI event.

The event itself was really interesting, with a wide variety of speakers all with one thing in common - leveraging AI's capabilities to help marketing research become more efficient, data-driven and customer-centric.

During our presentation we touched on the relative strengths and weaknesses of various techniques for verbatim data analysis.

We also thought it would be interesting to find out from the attendees, what methods they used most commonly to analyze open ended survey responses. So, we did what all good researchers should do - we polled the attendees. The results were quite interesting, so we thought we would share them with you.

Disclaimer

Clearly, we need to take these findings with a pinch of salt. Mainly because our webinar attendees are a self-selected sample. As our webinar focused on using AI to code verbatim data, it’s likely that the attendees are skewed heavily and have a vested interest in this area. Attendees were also biased towards the US, Canada and UK. So, we can’t claim that the results are scientifically representative of the industry as a whole, but we do think they tell us a few interesting things about what’s currently going on out there in the real-world.

The Results

Throughout our webinar, we had the following poll question open for attendees to answer:

“Which of the following methods do you use most often to analyze verbatim survey responses?”

The results were as follows:

Coding	52%
Read / skim read to get main gist	18%
Text analytics	11%
Word clouds	11%
Generative AI	5%
Verbatims are ignored / not analyzed	2%
Other	2%

So, what do these results tell us about how our industry really approaches the tricky problem of verbatim data analysis. I think we can draw the following broad conclusions:

Coding is still alive and well. Despite the cost and time pressures we often hear about, there’s still a lot of traditional coding going on out there. And for good reason. Coding is still generally regarded as the gold standard in terms of getting meaning out of verbatim responses. Those cost and time pressures certainly are a real factor, so traditional coding methods could benefit from a technology boost. Luckily, we have an app for that, whilst making sure that “gold standard” is maintained using human control and oversight.
Almost a fifth of verbatims are simply skim read to get a general feel for the content within. Of course, for very small surveys this is a totally feasible and valid approach. For anything larger, say, more than 100 responses, a casual and informal analysis will be unreliable and very subjective. I suspect a fair proportion of the 18% fall into this larger category - namely people simply skimming through verbatims trying to get an informal qualitative read of the data due to time and cost pressures.
With the emergence of AI, it increasingly makes less sense to analyse verbatims in this way. Leaning on the tech requires less effort and yields a higher quality and more reliable result. As it happens, we also have an app for that!
Text analytics are clearly playing a role in this world. Again, this is likely to be driven by cost and time pressures. The figure of 11% shows that this is still a minority sport. I suspect this is largely because Text Analytics can seem quite technical to most, and often yields fairly superficial results (especially compared to manually coded data).
Word clouds are also still a tool that some people are still reaching for. Word clouds definitely have a role to play, but they are quite a superficial analysis technique. Again, time, cost and technical factors make this an understandable tool for people to use.
Similar to the “skim read” approach, the emergence of AI is starting to give us tools that offer much more informative results, for the same amount of effort. The themeit tool built into codeit is one such approach.
Generative AI is the interesting new entrant on this list, but given the amount of hype and expectation around this, it still only features at a low level. Away from the hype and noise, perhaps this isn’t too surprising. Firstly, generative AI is still quite a technical area, off-putting to most. It also does not yield complete and reliable results on its own. In fact, if not implemented correctly, results can be misleading and incorrect. So, to make use of this effectively, requires the combination of generative AI and human oversight working in collaboration. This is exactly the aim of our themeit tool.
In our results, there’s a small percentage (2%) of verbatims that are simply ignored and not analyzed. This does seem like a real waste - to collect valuable data and then do nothing with it. I also suspect that across the wider industry this number is actually a lot higher. Again, time and cost pressures play a big part in this. Our hope is that if technology can make it quicker and cheaper for people to analyze verbatim data, then it will become economically viable for this untapped data resource to be used to its fullest.
Last but not least, there’s another 2% of people who use some other technique, not in the list. During our webinar, there wasn’t time to ask the attendees about this, but I’d really like to know what “other” techniques are being used out there. If you’re one of these “others” and you use a method we’ve not mentioned, do get in touch and let us know what you use.

Can’t believe you missed the webinar? You can watch it on-demand on the Quirks website. If you aren’t registered, click here to register today. Or get in touch and we’ll catch you up on what you missed.