Kelly betting and Bayesian inference

Epistemic status: Most of this is just restating known results, except for the speculation at the end.

Short version: In Bayesian reasoning, you start out by spreading your belief across different hypotheses, and as you receive evidence you shift your belief towards the hypotheses that more confidently predicted the evidence you saw. In a prediction market made up of many bettors, you start out with wealth distributed between bettors, and as events happen the bettors who made good predictions get richer at the expense of bettors who made bad predictions. There’s a sense in which these processes work exactly equivalently.

I’ve recently been excited by the analogy between Bayesian belief updating and prediction markets. I think that this analogy is pretty well known among people who like math and enjoy thinking about betting, but in my experience other people enjoy it too. The only resources I know on this are Daniel Filan’s blog post, and this paper. The goal of this post is to be a more accessible explanation.

Suppose you’re interested in predicting a sequence of events. As a running example, let’s say that I have a biased coin and I’m repeatedly flipping it and observing whether it comes up heads or not.

The Bayesian way of thinking about this kind of question is to say that I have some set of hypotheses, and I start out with some degree of belief in each. In this case, my hypotheses are all of the form “This coin is biased such that it comes up heads with probability p” for all different values of p, and I start out thinking that all of these are equally likely. As I observe coin flips, I use Bayes’ rule to update my beliefs towards the hypotheses that predicted my observations.

(Normally when I pick up a coin, I suspect that it’s probably approximately fair. In that case, I’d start out with much more belief in the hypotheses that say that the coin’s probability of coming up heads is close to 50%, and it would take a lot of tails in a row before I started suspecting that the coin was significantly biased.)

If you’re asked for your probability that the coin will come up heads next time, then you answer by taking the average answer of all your hypotheses, where you weight each hypothesis by your current degree of belief in it.

Suppose we flip the coin a bunch of times, and get the sequence HHHTHHTHHH, with 8 heads and 2 tails. Hypotheses that assigned very low probability to this outcome (such as the “10% heads” and “99% heads” hypotheses) will have lost most of the weight that they initially had. When we compute our probability that the next flip will come up heads, our answer is most affected by the hypotheses which assigned high likelihood to this sequence.

If we started out with a uniform prior over the bias of the coin, then we end up believing that the probability of the coin coming up heads next time is 75%.

Here’s a totally different way you could think about sequence prediction. We start out with a bunch of people who all have different beliefs about how biased the coin is, and all want to make money from betting about it. You can imagine lining them up according to how likely they think the coin is to come up heads. Let’s say they all start out with the same amount of money.

Now, we set up a betting market for them to bet on the outcome of the coin flip that’s going to happen. The market price is going to start out at 50%, because the bettors with beliefs on both sides of 50% cancel out.

If the bettors all wanted to maximize their expected money at the end of the first round, they’d all bet all their money. In real life, people have diminishing marginal returns with respect to money. Let’s model that by saying that our bettors have utility equal to the logarithm of their wealth. The result of this is that people bet more of their money when they think that the market is more wrong.

The rule for how much someone with logarithmic utility should bet is called the Kelly criterion. Someone who follows this criterion is known as a Kelly bettor. For example, Warren Buffet is allegedly a Kelly bettor.

The actual details of the Kelly criterion don’t matter for the main point of this post, but if you’re interested, read on: The Kelly criterion says that if you think an event has probability $p$ of happening, and it costs $q$ to buy a share that pays out $1 if the event happens, and if $p > q$, the fraction of your wealth that you should bet is given by $\frac{p-q}{1 - q}$. (If $p < q$, then you instead buy the shares in the event not happening.) Another way of putting this is that the fraction of your wealth that you gamble is the amount that you think the asset is underpriced divided by the distance between the market price and 1. So if you think a share is worth 60c and it’s being sold for 40c, you should spend ⅓ of your wealth on it, because 60c is ⅓ of the way from 40c to $1.

Anyway, so people spend some of their money on shares in their preferred outcome:

So the guy who thinks the coin is 100% likely to come up heads has spent all his money on Heads tickets, which are blue, and the guy who thinks it’s 100% likely to come up tails has gone all in on Tails tickets, which are red. People with less extreme beliefs are betting less of their money–the person who thinks the coin is fair isn’t betting anything.

Then the coin is flipped: it came up heads. So everyone who bought shares in heads gets paid $1 per ticket. The new distribution of wealth looks like this:

So the person who said that heads was 0% likely lost all their money. The person who said heads was 100% likely spent all their money on heads shares priced at 50% and so doubled their money.

Now, we bet again. Let’s say that all of our betters are totally stubborn—none of them learn anything from the previous result. But the ones who happened to be right about the previous bet now have more money. And because people want to spend a proportion of their wealth on shares, the wealthy people end up having disproportionate influence on the market price:

The price for heads has now risen, so the 50% guy now thinks it’s worthwhile to buy some tails tickets. Note that the 10% guy is still betting a larger proportion of his wealth on tails than the 20% guy.

We do this nine more times, and the sequence turns out to be HHHTHHTHHH, with 8 heads and 2 tails. Now the wealth distribution looks like this:

and the market price for the next flip is 0.75. So our market price has adjusted to think heads is more likely, despite none of the bettors updating their beliefs at all.

As you’ve probably guessed, these two processes are equivalent. Let’s review the connections:

Set of hypotheses <-> set of Kelly bettors
Prior probabilities <-> starting wealth for different bettors
Making predictions by taking the weighted sum over your hypotheses <-> making predictions by setting up a market, where bettors with more money affect the market price more.
Updating with Bayes’ rule <-> distributing the winnings

There are some more nice analogies between statistics concepts and the market picture. For example, maximum a posteriori (MAP) estimation is what you get if you decide that instead of making predictions by setting up a market, you make predictions by finding the richest bettor and asking them to make all the predictions for you; this works pretty well because the richest bettor generally has a pretty good idea of what’s going on.

For a proof of this equivalence, see this paper.

One nice property of this analogy is that you can combine these two pictures. For example, instead of having a market totally comprised of pigheaded investors who never learn, you can have a market made up of Bayesian reasoners who have different priors. As time goes by, bettors with priors that are closer to the real world gain money, and also every individual bettor updates towards having more accurate beliefs. It doesn’t matter how you combine these pictures. (Except for some fiddly practical details about how markets work. For example, it doesn’t really make sense to talk about a prediction market with exactly one participant.)

This analogy is on my mind because I’ve been thinking about logical induction, which I think is a super neat result that uses a much more general version of this analogy–it imagines a market made up of traders who have a wide variety of beliefs and aren’t necessarily Kelly bettors, and then demonstrates that this market has some nice properties. The market analogy allows you to clearly imagine strategies like arbitrage between related shares, or buying shares that you think will go up in price, that are harder to think of as actions that beliefs can do.

Here’s one final bonus argument that I haven’t heard anyone else make. It’s strongly inspired by logical induction. The details are pretty sketchy; let me know if you have ideas for how to formalize this more.

I think the sketchiest part of this prediction market/Bayesian reasoning analogy is that it requires everyone have log utility with respect to money. This seems kind of artificial and might make you think that the analogy is just a trick that doesn’t mean anything. Luckily, I think you can weaken this requirement in a fairly natural way.

When I introduced Kelly betting, I said that it’s the policy that you undertake if you want to maximize your expected log wealth. But there’s another reason to like Kelly betting: Suppose that you have some large number of opportunities to engage in a bet where the market price is $q$ and you think it has probability $p$ of paying off, with $p > q$. If the number of opportunities is $n$, then the bet will almost surely pay off approximately $pn$ times. If you want to maximize how much money you have in this situation where you win the number of times you expect to win, it turns out you want to use the Kelly criterion. (See Daniel’s post, Ctrl-F “another appealing fact”.)

Over the course of predicting a lot of events, I postulate that all the bettors have to make a lot of bets at a wide range of credences. So I suspect that if we consider a bunch of different traders who have the same beliefs but different utility functions, after a while the richest traders are almost certainly going to be the ones with log utility, because they’re going to be using the Kelly criterion–everyone with a faster-growing utility function ended up losing all their money, and everyone with a slower-growing utility function was too skittish to get as rich as the Kelly bettors with good priors.

There’s another argument you could make, though, which is that the sublogarithmic utility people are also going to be too skittish to affect market prices much, even if they have as much money as the Kelly bettors. This argument might also matter; I’m not sure.

I would love to see this formalized–let me know if you have ideas about how to do this.

I am pretty confident that some argument like this works in non-adversarial situations. I think it’s plausible that your environment can be inhospitable in such a way that things go badly, but I haven’t thought of exactly what that would entail yet.