From CRM databases, transactional systems, questionnaires and customer contact logs, pop-up surveys, online consumer reviews and forum postings, to chats, blogs, tweets and other threaded conversations and online babble: Hearing the voice of the customer has never been so easy.

Or so difficult. Thanks to Web 2.0 and Big Data, we now have too much information (or, as the Twitter generation would say, TMI).

Since much of this content is unstructured, the challenge is to chisel out the “gold” nuggets trapped inside this bedrock of customer feedback and use the refined raw material to make decisions aimed at specific business goals.

Relevant to this task are two intertwined disciplines: text analytics and data mining. During the text analytics process, unstructured content and underlying sentiment are transformed into data (concept and sentiment variables) that can be quantified and combined within a structured framework. The variables derived from text analytics are then combined with other structured variables within a traditional data mining model.

Text Analytics Extracts and Structures
In the past, business decisions relied primarily on structured data (checkboxes and rating scales on surveys, transactional details from point of sales and CRM systems) that could be easily entered into relational databases and spreadsheets. Free-form answers from open-ended questions on survey questionnaires, though widely used, were often rightly ignored. Anyone involved in survey research knows how difficult (and time-consuming) it is to read and code hundreds of free-form comments. The complexity, nuances and contradictions of the English language often prevent easy, reliable categorization.

Modern text analytics tools have changed all that. Rooted in technology first used in the 1950s, these text analytics engines enable unstructured content of every type to be represented and analyzed in the same way that structured information is represented and analyzed.

The earliest text analytics methods involved keyword or statistical analysis. These have largely been replaced by more comprehensive and reliable natural language processing (NLP) solutions that analyze text as spoken or written by human beings. The algorithms and rules behind NLP represent basic grammar usage and word forms (verbs, nouns, prepositions, adjectives, etc.) and are used to discern facts and entities (such as persons, places and things) as well as attitudes and opinions (sentiment).

For example, in analyzing a call center note such as “The system is slow,” NLP analysis would identify a thing (system) and an associated qualifier (slow). Creating a pattern such as “System +” would ensure that similar statements, though worded differently, would be identified and categorized.

Different Flavors of NLP
Even among NLP-based text analytics engines, there are crucial differences. Named Entity Recognition (NER) types of NLP engines identify entities and assign them to groups or classes. The drawback of NER is that it requires predetermination of entities and groups or classes, so it can miss novel problems, issues or topics that weren’t anticipated in advance.

Targeted Event Extraction is a type of NLP that uses trigger words, each of which is associated with rules that define common attributes in relation to that word. For example, the word “cancellation” might trigger a rule that identifies the reason, customer location, product and date of all cancelled orders mentioned in the source documents.

Exhaustive Fact Extraction is a third NLP methodology. It uses linguistic heuristics and patterns to discern an exhaustive list of key facts and concepts contained within the entire universe of text (e.g., all the free-form answers in a given survey+call center, notes+feedback from the corporate blog, etc.) This results in a database that can be queried and analyzed just like any traditional database to, for example, report on the most frequently occurring topics and identify trends in the data. The advantage of Exhaustive Fact Extraction is that nothing is defined in advance, so there are no preconceived notions or risk of missing customer insights or emerging issues. (As a wise person once said, “How can you know what you don’t know?”)

TMI into Business Intelligence (BI): Combining and Mining Data to Obtain Actionable Knowledge
The next step towards transforming TMI into BI involves bringing the concept variables derived from text analytics into a data mining model.

Ideally, the variables culled from text analytics are used alongside structured and transactional data from many other databases such as customer satisfaction scores, geographic data, demographics, purchase and usage histories, or product-feature data.

Through data mining, we can uncover the hidden value of information and identify and refine patterns and trends among hundreds or thousands of variables. We can then make predictions based on information obtained from analyzing and exploring this data.

For example, in the area of customer loyalty, preceding data mining with text analytics can determine which variables extracted by the text analytics engine have the biggest impact on loyalty scores or satisfaction ratings, and help answer the following questions:

  • Are there patterns in the data that differentiate non-promoters from promoters? Which combination of variables best predicts whether a customer will be one or the other?
  • What factors are associated with customer attrition, and are there certain groups that have a greater propensity for “churn?”
  • Which groups exhibit readily identifiable patterns that are more predictive of future behavior (as compared to other groups or the total population)?
  • How much “lift” does one group have over another? (Lift measures the degree to which a given subgroup deviates from the general population in response patterns.)

The following example shows results from a customer loyalty survey that contained structured and unstructured items. Though only 17 percent of the total number of respondents indicated they would highly recommend the company, further text analytics and data mining uncovered a “hidden” highly loyal subgroup. Within this subgroup, 56 percent gave top loyalty ratings. What differentiated the subgroup from the overall respondent group was the fact that all mentioned “technical expertise” in answer to an open-ended question about the strengths of the company. This represents a lift of 3.2 in engagement ratings for a given group (i.e., this group is 3.2 times more likely to be engaged compared to the overall population.)

Prioritizing Actionable Insights to Achieve Results
In short, text analytics/data mining solutions help us connect data to business practices and obtain insights that will make a difference. The results of an effective text analytics/data mining paradigm are insights that can be acted upon to achieve specific business goals in many areas, including operational efficiency, customer engagement or product innovation.

For example, a successful software company gleaned some surprising insights after analyzing text and data from multiple sources, including net promoter scores (NPS), structured survey data, demographics, customer experience data and free-form answers to open-ended survey questions. After completing the text analytics phase, all the data were merged into a data mining decision tree model. In addition to identification of highly concentrated subgroups of customers, the company discovered:

  • Those customers who mentioned “feature set” within their free-form answers were six times more likely to be a promoter of the product and company.
  • Those customers who made a negative comment about “reliability” were most likely to be non-promoters.
  • Of the variables found to be most predictive of the NPS, the least important predictor was the number of times a customer had used the software in the previous three months.

Using the results as a blueprint for improvement, the company pulled out specific surveys from the group of promoters who had mentioned the feature set to help them understand what it was about the feature set that triggered their comments. The company then devised a strategy centered on this variable, with the goal of implementing a relevant action to raise the overall NPS among the group of non-promoters.

Text analytics capabilities also save time and money. HireVue, a provider of digital recruiting and interviewing technology, performs 90,000 surveys a year and achieves a 30 percent response rate. Because feedback is critical to the widespread adoption of the company’s innovative interview platform, HireVue felt it was important to take action on verbatim comments in surveys. Therefore, the company used personnel to manually scan through all the comments and categorize them using spreadsheets. Today, that is no longer necessary, thanks to an automated text analytics tool that automatically “reads” verbatim comments and channels them to the appropriate person or department for immediate action.

Text Rules!
As the preceding examples illustrate, companies that limit data analysis to structured data are missing critical nuggets of information that only text analytics can unearth. Data derived from text has proven time and again to be more predictive and valuable than structured data by itself. Additionally, combining data from many sources and applying data mining techniques provides lift – the ability to see more than can be discerned with any one source or method.

Getting Started with Text Analytics and Data Mining

  • Choose a text analytics engine that suits your needs and the type of content you are analyzing. Know its strengths and weaknesses. Even among NLP engines, there are important differences: For example, can the solution handle idioms, slang, variations in sentiment, tone and voice, pronoun resolution, etc.?
  • Start with verbatim responses from surveys. Using text analytics on public sources like forums, product reviews and social media is more vulnerable to inconsistent and inaccurate results due to the noise inherent in publicly available data, including out-of-context replies, repeated forwarding and re-posting of the same comments and online slang and abbreviations.
  • Tackle basic sentiments (like, dislike; happy, angry) before more subtle sentiments.
  • For text in other languages, determine how the results will be analyzed. If they will be analyzed in English, translations can be utilized. If they are to be reviewed in the native language, use a text analytics engine tuned to that language.
  • Know what native data sources are supported by your overall solution. Are you restricted to certain data sources? How easy is the process of merging data from multiple sources?
  • Don’t assume you can automate every step of the text analytics process. Humans will alway be necessary to train, guide and interpret findings.
  • Choose a method that matches your resources. Many text analytics solutions require technical expertise whereas others have out-of-the-box features that jumpstart the process. Seek ease in deployment, understanding and sharing unless you have unlimted resources.