Big Data allows “for more robust research and correlation” than in days past, and that has the Federal Trade Commission (FTC) concerned that it could be put to unfair or deceptive uses. “Finding a representative data sample sufficient to produce statistically significant results” used to be “difficult and expensive,” but now the “scope and scale of data collection enables cost-effective, substantial research of even obscure or mundane topics.” While survey, opinion and marketing researchers probably thought this was a useful and positive development, the FTC is not so sure.

The regulatory agency admitted up front that Big Data analytics can be “often valuable to companies and to consumers, as it can guide the development of new products and services, predict the preferences of individuals, help tailor services and opportunities, and guide individualized marketing,” but also warned that some people “have raised concerns about whether certain uses of big data analytics may harm consumers, particularly low income and underserved populations.”

The FTC, the chief U.S. regulator of the survey, opinion and marketing research profession, released a report yesterday, “Big Data: A Tool for Inclusion or Exclusion? Understanding the Issues,” building off of an FTC workshop in September 2014 focused on potential discriminatory impact from Big Data analytics. It followed a 2014 FTC report on data brokers, and the White House's 2013 report on Big Data that touched on similar issues.

MRA responded to the FTC’s Big Data workshop in October 2014. We asked the agency not to turn Big Data into some kind of pejorative term, based on hypothetical scenarios of unmeasurable consumer harm.

The lifecycle of Big Data

The FTC report divides Big Data’s lifecycle into four phases: “(1) collection; (2) compilation and consolidation; (3) analysis; and (4) use.” While the FTC tries to emphasize that the agency’s report is solely concerned about that 4th phase – use -- regulatory guidance or actions often focus on the first three phases just as much or more. Indeed, the FTC report references “research that demonstrates that there is a potential for incorporating errors and biases at every stage—from choosing the data set used to make predictions, to defining the problem to be addressed through big data, to making decisions based on the results of big data analysis—which could lead to potential discriminatory harms.” That means that, protests aside, the regulatory focus could include the entirety of the research process.

Big Data analytics, and existing laws

Discrimination in decision-making about credit and other kids of eligibility is already covered in many circumstances by existing laws:

  • The Fair Credit Reporting Act (FCRA), which restricts companies (consumer reporting agencies) “that compile and sell consumer reports, which contain consumer information that is used or expected to be used for credit, employment, insurance, housing, or other similar decisions about consumers’ eligibility for certain benefits and transactions.”
  • Federal equal opportunity laws, like the Equal Credit Opportunity Act (ECOA), the 1964 Civil Rights Act, the Americans with Disabilities Act (ADA), the Age Discrimination in Employment Act, the Fair Housing Act, and the Genetic Information Nondiscrimination Act. “These laws prohibit discrimination based on protected characteristics such as race, color, sex or gender, religion, age, disability status, national origin, marital status, and genetic information.”
  • Section 5 of the FTC Act, which empowers the agency to punish and prevent unfair or deceptive acts or practices.

The most relevant of these for researchers is clearly Section 5. The FTC levied familiar warnings in the report that people working with Big Data analytics “should consider whether they are violating any material promises to consumers—whether that promise is to refrain from sharing data with third parties, to provide consumers with choices about sharing, or to safeguard consumers’ personal information—or whether they have failed to disclose material information to consumers.”

FTC warning: Carefully consider to whom you are selling your data services

A further FTC concern of impact to researchers is the “sale of data to customers that a company knows or has reason to know will use the data for fraudulent purposes.” In the case of Sequoia One, a company allegedly sold the personal information of financially distressed payday loan applicants—including sensitive data—to third-parties, one of which used that data to “withdraw millions of dollars from consumers’ accounts without their authorization.” In ChoicePoint, a company allegedly ignored “obvious red flags” when it “sold the personal information of more than 163,000 consumers to identity thieves posing as legitimate subscribers.” The FTC guidance is to, “at a minimum,” ensure that companies are not selling “big data analytics products to customers if they know or have reason to know that those customers will use the products for fraudulent purposes.”

Other questions from the FTC for legal compliance

  • “If you compile big data for others who will use it for eligibility decisions (such as credit, employment, insurance, housing, government benefits, and the like), are you complying with the accuracy and privacy provisions of the FCRA?”
  • “Do your policies, practices, or decisions have an adverse effect or impact on a member of a protected class, and if they do, are they justified by a legitimate business need that cannot reasonably be achieved by means that are less disparate in their impact?”
  • “Do your policies, practices, or decisions have an adverse effect or impact on a member of a protected class, and if they do, are they justified by a legitimate business need that cannot reasonably be achieved by means that are less disparate in their impact?”
  • “Are you honoring promises you make to consumers and providing consumers material information about your data practices?”
  • “Are you maintaining reasonable security over consumer data?”
  • “Are you undertaking reasonable measures to know the purposes for which your customers are using your data? If you know that your customer will use your big data products to commit fraud, do not sell your products to that customer. If you have reason to believe that your data will be used to commit fraud, ask more specific questions about how your data will be used.”
  • “If you know that your customer will use your big data products for discriminatory purposes, do not sell your products to that customer. If you have reason to believe that your data will be used for discriminatory purposes, ask more specific questions about how your data will be used.”

FTC recommendations for research

The FTC laid out specific questions in this report for the conduct of research and analytics in order to “maximize the benefits and limit the harms” and avoid “pitfalls that may violate consumer protection or equal opportunity laws, or detract from core values of inclusion and fairness”:

  • How representative is your data set? Companies should consider whether their data sets are missing information about certain populations, and take steps to address issues of underrepresentation and overrepresentation. For example, if a company targets services to consumers who communicate through an application or social media, they may be neglecting populations that are not as tech-savvy.”
  • Does your data model account for biases? Companies should consider whether biases are being incorporated at both the collection and analytics stages of big data’s life cycle, and develop strategies to overcome them.”
  • How accurate are your predictions based on big data? Companies should remember that while big data is very good at detecting correlations, it does not explain which correlations are meaningful. A prime example that demonstrates the limitations of big data analytics is Google Flu Trends, a machine-learning algorithm for predicting the number of flu cases based on Google search terms. While, at first, the algorithms appeared to create accurate predictions of where the flu was more prevalent, it generated highly inaccurate estimates over time. This could be because the algorithm failed to take into account certain variables. For example, the algorithm may not have taken into account that people would be more likely to search for flu-related terms if the local news ran a story on a flu outbreak, even if the outbreak occurred halfway around the world.”
  • Does your reliance on big data raise ethical or fairness concerns? Companies should assess the factors that go into an analytics model and balance the predictive value of the model with fairness considerations. For example, one company determined that employees who live closer to their jobs stay at these jobs longer than those who live farther away. However, another company decided to exclude this factor from its hiring algorithm because of concerns about racial discrimination, particularly since different neighborhoods can have different racial compositions.”

Ultimately, the FTC said that companies in Big Data analytics should take the following steps:

  • “Consider whether your data sets are missing information from particular populations and, if they are, take appropriate steps to address this problem.”
  • “Review your data sets and algorithms to ensure that hidden biases are not having an unintended impact on certain populations.”
  • “Remember that just because big data found a correlation, it does not necessarily mean that the correlation is meaningful. As such, you should balance the risks of using those results, especially where your policies could negatively affect certain populations. It may be worthwhile to have human oversight of data and algorithms when big data tools are used to make important decisions, such as those implicating health, credit, and employment.
  • “Consider whether fairness and ethical considerations advise against using big data in certain circumstances. Consider further whether you can use big data in ways that advance opportunities for previously underrepresented populations.”

A disunited FTC: Commissioner Olhausen defends research and analytics

FTC Commissioner Maureen Olhausen filed a dissenting statement with the Big Data discrimination report, focusing primarily on the lack of consideration for economic analysis in the report’s conclusions. For instance, while she felt that the impact on consumers of inaccurate data was certainly a concern, “market and economic forces” can help mitigate it: “Businesses have strong incentives to seek accurate information about consumers, whatever the tool. Indeed, businesses use big data specifically to increase accuracy. Our competition expertise tells us that if one company draws incorrect conclusions and misses opportunities, competitors with better analysis will strive to fill the gap.”

Olhausen’s example of competitive advantage in research and analytics was “Moneyball.”

“To the extent that companies today misunderstand members of low-income, disadvantaged, or vulnerable populations, big data analytics combined with a competitive market may well resolve these misunderstandings rather than perpetuate them.”

The Commissioner concluded that considering the “powerful forces of economics” and the free market was necessary in order to fully understand the costs and benefit of Big Data analytics. “If we give undue credence to hypothetical harms, we risk distracting ourselves from genuine harms and discouraging the development of the very tools that promise new benefits to low income, disadvantaged, and vulnerable individuals.”