Friday, March 17, 2017

Aggregate random coefficients logit—a generative approach

This post illustrates how to fit aggregate random coefficient logit models (often called BLP) in Stan, using Bayesian techniques. It’s far easier to learn and implement than the BLP algorithm, and has the benefits of being robust to mis-measurement of market shares, and giving limited-sample posterior uncertainty of all parameters (and demand shocks). This comes at the cost of an additional assumption: we employ a parametric model of how unobserved product-market demand shocks affect prices, thereby explicitly modeling the endogeneity between prices and observed consumer choices, whose biasing effect on estimates classical BLP uses instrumental variables to resolve. Because we specify the full generative model, instrumental variables are not strictly needed (though they can certainly still be incorporated).


A common problem in applied economics, especially in industrial organization or marketing, is that a decision maker wants a good model of how individual customers make purchase decisions, but has data only at the aggregate level.
Let’s give examples of the sort of problem we’re trying to solve.

Example 1: Regulating a merger

First is the classic merger problem: a regulator is be interested in whether a merger will hurt customers, as might be the case if
  1. The merging firms offer similar products that appear to be in close competition, and few other options exist. And
  2. Marginal economies of scale from the merger are small.
Let’s focus on the first problem. The regulator really needs to know whether the customers of the two firms’ products perceive the products as being similar—that is, in genuine competition with one another. This is a harder task than it might appear on the surface, as many goods might look similar to an outside regulator but are really quite different. A mid-range Mercedes Benz might have similar specifications to a Toyota, but is perceived by customers as being a different product.
Ideally the regulator would have sales-level data for each customer, their purchase decisions, and second choices. But this might be impossible to get. Yet it is quite straightforward to purchase aggregate sales data at the product-market level from market research firms. So the regulator has to make do with that.

Example 2: A manager considering a new product or a new market

Managers ideally want to create goods and products where there is a strong latent demand but little competition. To do this, they need to understand the distribution of customers’ preferences over product characteristics, illustrated in the figure below by the blue contours. The manager should also understand the distribution of of competitors’ (and their own) products on those same characteristics. These are illustrated as points in the figure below. A manager might then decide to enter a market by offering a product with characteristics that customers value but where few competing products exist.
Such analysis might be quite straightforward when the manager has access to customer-transaction-level data of their competitors, but this is not feasible in most cases. Instead, they need to make do with the sorts of aggregate data available from the same research houses in the example above.

A generative model of consumer choice

These examples above have the same objective: we want to perform analysis that requires knowing about the distribution of customer preferences. And both problems face the same major constraint: we don’t observe transaction-level data for all products–we only observe aggregate sales in each market. Moreover, the most valuable aspects of consumer preferences are relative, not absolute - e.g. a manager would be interested not just in whether a particular product would sell, but how well it would sell at different price points. How much would consumers be willing to pay for a new product, and how much would that affect the prices and sales of existing products?
Even with very fine data and a few frequently purchased products, this is a difficult question to answer. It requires inferring how consumers would substitute one good for another across a range of prices for each good—the sheer number of combinations of possibilities that could be relevant is too high to even conceive of, let alone observe in data. Of course, we don’t actually have to observe every possible combination of products and prices to have a predictive model of substitution patterns that performs pretty well. We just need a tractable, flexible generative model that can smooth out the space of possibilities in a sensible way.
One extremely flexible model is the aggregate random coefficients logit model. In this model, customer i in market t has preferences over product j such that their total utility is

This says that each consumer receives utility, , from good