Where Machine Learning Fits into Economics

Daniel Sexton
May 12
6 min read

Two titans with much in common. Photo credit: Tim Stief/Unsplash

Where do artificial intelligence and machine learning fit into an economist’s toolkit? The field of economics prizes intuition and model interpretability, which AI and ML often lack despite having quite similar math under the hood as the models economists typically use. This blog post will explore how economists can better leverage techniques from the world of machine learning – particularly the recent immense advancements in artificial intelligence – to support econometrically sound analysis.

______

Around 2018, I remember being in a conference room full of economists who were quite unimpressed by a demonstration of a simple machine learning algorithm for forming groups in data. The two chief complaints in the room were as follows:

1) The algorithm did not produce parameters that the economists could compare to economic theory

2) The algorithm produced “good enough” groupings of data points, rather than the true optimum

In this post, I’ll walk through what their complaints mean - taking some unfriendly-looking equations and explaining them intuitively - and will then go on to give an example of how newer techniques can be leveraged in economics anyway. (If you're already comfortable with regression analysis, you might consider just skipping down to the takeaways.)

Economists want parameters and data scientists want predictions

When economists do data work, it is virtually always some flavor of a linear regression. A linear regression takes the form

Let's break each part of this down in the following table.

Piece of the equation	What it means
	All of the data we have observed. Each numbered x refers to a different measure. For example, if the data is about houses, the first x might be the number of beds, the second the living area, and so on.
	An identifier for each observation. Again, if we're talking about houses, i just says which house we're talking about.
	What we're trying to predict. If we're looking at houses, we are trying to predict house price (y) based on all of the x's, using all of the i's.
	A "starting point" for the model to make a guess at y, before any x have been accounted for.
	How much to multiply each corresponding x by to make a prediction of y. (There must be one β per x.)
	The error term; that is, the amount by which the model's prediction is incorrect. (By adding more variables, we are seeking to minimize this.)

When we actually do the predictions, we end up with

This looks similar to the first equation, but we have lost the error term (because we cannot predict those), and everything now has "hats," meaning they are estimated values. y-hat is the predicted value of y given all of the x and their estimated weights, which are given by the beta-hats. Depending on the size of the dataset, one might need a fairly powerful computer to estimate the beta-hats.

The key here is that economists really care a lot about the beta-hats, and data scientists really care a lot about y-hat, which is to say: economists care that the fundamentals of the model comport with economic theory, and data scientists care more about the model’s predictiveness.

To make it more concrete, if I’m an economist trying to predict the price of a house (y-hat), I’ll need to know how many bedrooms it has (x1), how many bathrooms (x2), its area (x3), and so on. When I get the results of my regression, I want to be certain that all three corresponding beta-hats – the respective multipliers pertaining the number of bedrooms, bathrooms, and square feet that will lead me to a predicted house price – indicate that more beds, baths, and square feet mean a higher price. On the other hand, if I’m a data scientist, all I want to know is that I can accurately predict the price of a house with my model, no matter what the beta-hats are.

The data scientist’s relative lack of concern for beta-hats results in something of a Cambrian explosion of modeling techniques, in which the data scientist can use models that generate seemingly illogical beta-hats, or don’t even have meaningful beta-hats at all (like gradient boosting, neural networks, random forests, and so on.) Meanwhile, the economist insists that beta-hats that make intuitive sense are imperative for the model’s validity, and so the economist is thereby limited to the techniques and model specifications that produce them.

Good enough isn’t good enough

One feature of linear regression that economists and data scientists both love is that it gives a global optimum. In human-speak, that essentially means that if you run the model on the same data twice, you’ll get exactly the same set of beta-hats every single time, and you can prove mathematically that that set of beta-hats is the only one that makes the average error term ε as small as it can possibly be (given, of course, the x's you chose.)

When you enter the data scientist’s zoo of techniques that are largely frowned upon by economists, you encounter many with odd properties, like “local optimization.” Local optimization means that the solution is probably good enough, but that if you run the model again, you could get a different answer – maybe better, maybe worse. In the eyes of data scientists, this doesn’t really matter, because data scientists don’t (usually) need to defend the model in an academic setting; rather, they're trying to get you to do things like spend more time watching YouTube or scrolling on Instagram, no matter the model or data. But economists will have to defend their modeling choices to their peers, and one that’s “good enough” for predictions generally isn’t actually good enough to get published.

How to bridge the gap

At first glance, these problems do seem quite divisive, but there is opportunity for their coexistence and even their collaboration. Economists are limited to linear regression because it’s the only thing they can compare to theory and defend to their peers. Data scientists want a model that makes awesome predictions because that’s (usually) their only objective. It isn’t all bad news, though; here are two ways that economists can leverage techniques from the world of machine learning.

Cross-validation

The example of regression analysis applied to housing was something I did frequently early in my career. As I was creating these models with economists, the primary consideration was the validity of the model’s qualitative output; e.g., is a smaller house cheaper than a big house on average? Do more bathrooms mean a higher price? While these answers serve as important checks for an academic paper, they could also be used for policymaking if their predictions (and not just their beta-hats) were known to be a reasonable approximation of reality.

One technique not often used in research-oriented economics is cross-validation, in which the data is split into two parts, and the model parameters (the beta-hats) are estimated using only one part. Later, the model is applied to the second part of the data, and it’s easy enough from there to compare the predictions versus the real-world values of the outcome variable. The purpose of splitting-predicting regime is to ensure that the model can make good predictions on data not only on the data from which it was created, but also data that is has not yet seen. Using cross-validation more frequently in econometrics would allow forecasters – who very much rely upon cross-validation – to work more fluidly with the academics who are building and testing the theories. This would mean that both forecasters and academics could more easily improve their models using each others' work.

Data extraction

Imagine a project in which you are developing an econometrically sound regression model and you plan to present it to other academics, but you know that there is information valuable to your model that is stored in text and images. Of course, you cannot put text and images into a regression model that is looking for columns of numbers. What you can do, however, is use machine learning to create covariates - the very x's themselves - for your regression model.

What would this look like in practice? Suppose we are again trying to predict house prices, and have images of the most recent listing as well as the real estate agent’s blurb. An algorithm could be trained to identify hard-to-quantify features from a listing’s photos – such as the approximate age of the kitchen appliances – and the real estate agent’s blurb could be given a sentiment index (where blurbs with phrases like “as-is” or “investment opportunity” would score poorly and “dream home” would score well). Essentially, any unstructured data that can be turned into a numerical column with reasonable accuracy can potentially become fair game for an econometric model.

Conclusion

By adopting techniques from machine learning and artificial intelligence, the field of quantitative economics has the ability to drastically expand its analytical ability with data that can be distilled from previously untouchable sources and added to highly interpretable econometric models. Furthermore, econometric models can (and should) be more often benchmarked against their own data using cross validation. Many great ideas in economics remain trapped (and also unchallenged) in the realm of theory because economists cannot test them with data. Uses of machine learning like the above examples can help generate the data the field of economics needs to answer questions both new and old.