Top Considerations for Text Analytics

Experts estimate that 80% to 90% of the data in any organization is unstructured. Text can be a large part of unstructured data, and there are a wide variety of techniques that can be utilized for text analytics. Here are the top 5 things to consider when evaluating the proper text analytics approach:

What are you hoping to learn about the text?

This question is key, as you do not want to over-engineer a solution. Despite the availability of advanced techniques, a well-crafted word cloud filtered between promoters and detractors of your business can tell you a lot. That being said, if you are looking to extract named entities or noun phrases, you will need to bring natural language processing to the table.

Should you use ready-made text analytics software, APIs, custom builds, or hybrids?

The answer, of course, is that it depends. There is a wide variety of potential solutions available to help meet your business needs. Software packages can be expensive and difficult to implement, but leveraging open-source solutions via R and Python can take a long time to learn. Many times, a hybrid approach is best. With this approach, we leverage technology to do most of the heavy lifting and then add a human curation step. This gives you the efficiency available from technology solutions, but the context that only a human can provide.

What about machine learning?

There are two, broad types of machine learning for text – unsupervised and supervised. Unsupervised machine learning can use techniques like word vectors or LDA topic modeling to classify similar text into ‘themes’. The words contained within a text string and their distance from one another determine the themes in a multidimensional space. Unsupervised machine learning can be a good first step in identifying common topics in your text but often requires some human curation to make it useful. The other advantage of unsupervised machine learning is that, unlike supervised methods, you do not need pre-labeled data.

Supervised machine learning leverages a variety of techniques on pre-labeled data to come up with predictions about unseen text. With this approach, you will need a good deal of data that have already been classified to train potential models. Supervised machine learning techniques don’t always get it right and can have a prediction accuracy of around 70% to 80%. This trade-off of classification accuracy may be acceptable for your use case if speed or a high volume of data needs to be considered. Supervised machine learning is also only as good as the data used to train it. If new themes or phrases emerge, then these approaches may not be able to pick up on them.

What is wrong with traditional coding?

Absolutely nothing! Although human coders can make mistakes or introduce subjective bias, if you need to put structure around a few hundred verbatim comments, this may be the way to go.

If you have tasks that need repeating on a periodic basis, we can automate theming using a combination of NLP and Boolean structures that are semi-curated and checked by human coders. This hybrid approach can provide the best of all worlds on speed, accuracy, and repeatability.

In certain circumstances, these approaches may be your first step to getting high-quality training data for future machine learning efforts.

Is theming text the end of the story?

Not at all. Once you’ve put structure around text, say into themes, we can run analyses to understand how certain themes may relate to other business key performance indicators. This could pertain to understanding what drives Overall Satisfaction or Likelihood to Recommend in a CX study, or understanding which features of a product or service are associated with higher product ratings. With a quantitative structure in place, we can bring a wide variety of analytical techniques to derive insights.

Through compelling visuals, analytics and sourcing third party reviews or other unstructured data, ENGINE can provide a holistic story. Understanding more of the brand conversation through text analytics can provide insight to otherwise overlooked or still emerging themes. Whether that is from niche segments of consumers or trending topics, analyzing text can provide early indicators of issues or differentiators that the business can execute on.

ENGINE has the expertise to strategically guide your business through the complicated world of unstructured data by applying the proper technique to existing first-party data, conducting primary research, or seeking third-party data to fill in the gaps of the brand conversation.

Written by Mike Miller, Research Director, and Kyle Swan, Senior Research Director, at ENGINE Insights.