*Epistemic status: mad speculation, mixed with math that’s so much fun that I find it hard to believe it’s useful. I’m serious about thinking these questions are important, I’m much less confident in any of my answers.*

It’s kind of interesting that living creatures seem to feel both pain and pleasure, which seem like distinctly negative and positive experiences. When you construct a theory of reinforcement learning, you don’t see anything like this distinction between positive and negative sensations: reward is just a real-valued function, and the behavior of agents isn’t affected by any transformation of their reward function which is either a shift (eg you feel +3 happy about every outcome) or a positive scaling (eg your reward function is multiplied by two). Obviously humans aren’t literally reinforcement learners, but it’s surprising that our sense of reward is *so* different.

So why do humans experience reward as a function centered around zero? Why does it feel meaningful to think about what it would be like for all your sensations to be twice as powerful? And given that we seem to have experiences that work this way, what factors determine where the zero point for an organism is, and how powerful its sensations are?

Suppose I’m trying to train a little robot to pick up trash for me. Every day, it goes out and picks up some trash and brings it home for inspection. I need to give it a reward signal so that it knows how well it’s done. The reward signal that works for it is slightly costly though: I need to either give it small chocolates as a reward, or small pieces of coal as punishment. The cost of a piece of chocolate or coal is $1, and the robot views one piece of coal as exactly as bad as a piece of chocolate is good.

There’s some distribution of robot success which I’m expecting. Let’s say the robot always brings home 10, 11, or 13 pieces of trash. I need to choose how to reward the robot. That is, I need to choose a function from the success of the robot to the reward experienced by the robot.

One really simple option is to just give the robot one chocolate per piece of trash that it brings home. But this is pointlessly inefficient: you’re always going to be giving the robot a bunch of chocolate regardless of its actual success level.

Or you could give the robot a number of chocolates equal to the amount of trash minus 10. This is more efficient—you give out 0, 1, or 3 chocolates.

More efficient still, you could give the robot a punishment of 1 when it brings home 10 pieces of trash, give it nothing for 11 pieces, and give it a reward of 2 for bringing home 13 pieces. This is the best option.

So given a distribution, we need to choose a zero point for the reward function of the robot. What’s the optimal choice? Why, it’s the point which minimizes the expected cost, so it’s the point with minimum expected absolute distance to a randomly chosen point in the distribution. The median of a distribution is the number with this property.

(Why? Suppose that you want to choose a point to minimize the sum of absolute distance to the set [10, 11, 13, 16, 20, 23, 28]. Which point is better out of 20 and 23? Well, if you go from 23 to 20 you’re only making the distance longer for the two points 23 and 28, and you’re making it shorter for all 5 smaller points. And all these distance changes are the same magnitude, so all that matters is how many points you’re getting closer to. So 20 is better than 23. So the sum of absolute distances is minimized if you choose the point in the middle of the list.)

Now, I don’t care about my little robot. But suppose you do care about my little robot. You might care about how it likes its reward scheme. You care about its mean reward, which is (-1 + 0 + 2)/3 = 0.33. This is positive, which is great—the robot has a life worth living!

So in this situation, the robot-master will set the zero point to the median amount of trash, and the utilitarian onlooker cares about the mean value of this reward. The mean value of the reward is going to be positive iff the mean of the trash distribution is greater than the median of the trash distribution.

The median of a distribution is greater than the mean if the distribution is skewed to the left:

So left-skewed trash distributions will lead to overall sad robots.

I’ve been thinking a lot recently about cases like this where one optimization function is choosing the reward function for another system to maximize some function of its own. For example, evolution tries to pick reward functions for animals to maximize the animal’s reproductive fitness. But evolution has a bunch of constraints on this choice of reward function, and it’s also a really stupid optimization process, so the reward function is only a pretty rough approximation of the fitness benefit of various actions.

For example, humans have a tendency to consume an unhealthy amount of sugar: this is because evolution gave me a reward function that tries to motivate me to get sugar, which made sense in the ancestral environment but doesn’t make as much sense now. Other examples are when people are scared of the dark, or when they watch porn and masturbate, or do drugs.

The way that optimization functions choose reward functions is crucially important for utilitarians. Understanding the factors which affect the shape of reward functions is essential to predicting the welfare of beings like: ems in a competitive em world, subroutines in a paperclip maximizer, subroutines used by a galactic human civilization, factory farms if they exist a thousand years in the future, and wild animals.

I have lots of speculations about all this. I’ve been having trouble writing it all up, because there are a lot of different angles to approach these issues from. But with this post, at least I’ve made a start. Let me know if you’re interested in hearing my less well-formed crazy ideas on this topic.

The only academic discussion of this general question I’ve seen is in the 1995 paper “Towards Welfare Biology”, which was one of the first academic papers about the ethical importance of wild animal suffering, where Yew-Kwang Ng writes:

Our common sense recognition of the suffering of a typical non-surviving individual in most species may be supported by a simple argument based on evolution. We start by asking, why do we enjoy eating but suffer in starvation? The answer is that this genetic program provides us with the right incentives to do things favourable to survival. But why suffering? Why not just less enjoyment when starving and more enjoyment when eating? If the difference in the degrees of enjoyment between the two is big enough, we will still do the “right things”. However, the existence of suffering may be explained below.

First, both enjoyment and suffering are costly in terms of energy requirement, tissue maintenance, etc. This is why we feel neutral most of the time when we are not starving, eating, having sex, etc. (It would be nice if we could be programmed to feel ecstatic most of the time.) Secondly, it is likely that the extra (or marginal) costs involved in having an extra unit of enjoyment (or suffering) increases with the amount of enjoyment (suffering). Viewed differently, we have diminishing marginal returns in both enjoyment and suffering per unit of cost. Thirdly, it is likely that the costs (generalized resource costs, not subjective welfare costs) of suffering are unlikely to be significantly less, and maybe actually more, than those of enjoyment.

I used the first and third these claims in my robot story, and ignored the second.

If we also assume that the metabolic cost of rewards increases superlinearly with their absolute value, then some other things might happen. For example, if the cost of a reward is the square of its magnitude, then the reward distribution will have mean zero. More generally, if the cost of a reward is its magnitude to the power of p, then:

- if p == 1, then the zero point is the median, and the creature has a life worth living iff the distribution is skewed to the right.
- if 1 < p < 2, again the creature has a life worth living iff the distribution is skewed to the right.
- if p == 2, the reward distribution has mean zero.
- if p > 2, the creature has a life worth living iff the distribution is skewed to the left.

view Facebook comments here