I recently became interested in thermal physics and spent a while studying it. When I was in college, I thought of thermal physics as a boring applied subject that only matters if you want to know how to build engines or refrigerators. I was wrong to dismiss it. After studying it for a few weeks, I think that it’s risen to be one of my favorite areas of physics. In particular, I really like statistical mechanics, a major subfield.

My interested in this subject was piqued by realizing that it offered answers to may of the different physics questions that I’d been wondering about for the last few months. For example:

  • Oxygen molecules are heavier than nitrogen molecules. Why isn’t the atmosphere stratified with all the heavy molecules at the bottom and all the light ones on top?
  • For that matter, why does the atmosphere have pressure that changes with altitude the way that it does? Why don’t the atoms high up in the atmosphere fall down? The average molecule in the atmosphere has way more graviational potential energy than kinetic energy; what’s with that?
  • When I have a glass of water sitting on the counter, how much are the molecules jiggling around in different ways? How much do the bonds between the oxygen atom and the hydrogen atoms bend or vibrate? What’s the standard deviation in bond angle at room temperature? I asked this question on Chemistry StackExchange but no-one gave me a satisfactory answer.

These are all instances of the same type of question. In each, I have a system which I understand on the micro level (eg I know how the potential energy and kinetic energy of a single molecule works) and I know some facts about the overall system (its temperature and volume and so on), and I want to use the facts about the overall system to infer the distribution of various properties of the individual particles.

Statistical mechanics provides a satisfying answer to this type of question.


Here is the short version of that answer.

Every particle in our distribution is in a particular state. For particles in a gas, this includes a position and a momentum, because they can move around freely. For molecules, it also includes things like the rate and axis of rotation, and the lengths and angles between as many atoms as is necessary to specify the shape of the molecule. We have some set \(S\) of allowed states for a particular particle.

The obvious objection is “What are you talking about with these discrete states? Position and velocity are both vectors of real numbers; surely there are uncountably infinitely many of them?” This objection is correct in classical mechanics, but in quantum mechanics the distinguishable states of a particle are discrete, and there are only finitely many available for any finite energy. This nicely justifies the discretization of the state space. In the case of position and velocity in a gas, at normal volumes and pressures there are so many allowed positions and velocities that the discretization doesn’t actually matter. But for other types of energy, the discretization does matter. For example, at room temperature particles usually don’t have enough energy to get out of their vibrational ground state. (This discretization can be noticed by looking at how the heat capacity of various materials changes with temperature; accurate predictions of heat capacity were a large part of why physicists accepted quantum mechanics when it was first invented.)

The probability of a randomly chosen particle being in a particular state \(s\) is given by the following formula, known as the Boltzmann distribution:

$^$ P(s) \propto \exp\left(\frac{-E(s)}{kT} \right)$^$

where \(E(s)\) is the energy of state \(s\), \(k\) is the Boltzmann constant, and \(T\) is the temperature of the system. This equation can be satisfyingly proven from first principles; I sketch the proof in my notes at the bottom of this post.

Let’s look at the functional form of this equation. As the energy of a state increases, the probability of observing a particle in it decreases exponentially. Temperature determines the amount that energy affects the probability of a state. When the temperature is extremely high, then the energy of the state will barely affect its probability—the exponent in the formula will always have a value very close to zero. And when the temperature is extremely low, then the energy of the state is extremely important for the probability, because differences in energy get magnified and it becomes much less relatively likely to see states with higher energy.

At this point, we’re ready to try to tackle questions like the ones I posed earlier. For example, let’s look at how pressure should change with altitude in the atmosphere.

Let’s define our states by their height \(h\). The energy of a state is \(mgh\). So the relative probability of a particular height is given by \(P(h) \propto \exp\left(\frac{-mgh}{kT} \right)\).

If mass, gravity, and temperature are fixed, then this just looks like an exponentially decaying pressure with height. This prediction is accurate and is known as the barometric formula. According to this formula, the atmosphere is more concentrated near the ground if the molecules are heavier, if gravity is stronger, or if temperature is lower. All of that seems reasonable.

You might object that I neglected the other types of energy in the molecules in that calculation—their kinetic energies and internal energies. I did neglect those, which is allowed because the total energy can be separated out into the sum of the potential energy and those other stores of energy. This follows quite easily from the formula I gave.

How about if we want to know the distribution of velocities in this gas? The magnitude of velocity is quantized with evenly spaced energy levels. However, for every possible magnitude there are many possible velocity vectors, because you also get to choose a direction. The magnitude of velocity defines a sphere in momentum space, and its surface area is proportional to the number of possible velocity vectors. So at a velocity magnitude \(v\), there are something like \(v^2\) many velocity vectors, and each has a probability proportional to \(\exp\left(\frac{mv^2}{2kT}\right)\). So the overall probability for a particular magnitude of velocity is given by

$^$P(v) \propto v^2 \exp\left(\frac{-mv^2}{2kT} \right) $^$

which is indeed the Maxwell-Boltzmann distribution of speeds in a gas.

To figure out the distribution of bond angles in the water molecules in a glass of water, I’d have to know the energy levels of this mode of vibration in water molecules. I am looking into this but haven’t found it explained yet. I understand that vibrational modes are fairly high energy and so I suspect that very few water molecules are vibrating at room temperature.


If you want to learn this material properly, I recommend the excellent text “Thermal Physics” by Schroeder. It’s an extremely easy-to-read book; I think it’s the easiest to read physics textbook I’ve ever read.

I am personally more interested in the statistical mechanics side of all this than the thermodynamics side. If you feel similarly–you just want to understand how to make predictions about the distributions of molecular properties– I recommend the following reading guide for Schroeder. After you read all this, you’ll understand everything I wrote above, and you’ll know how it follows from extremely general postulates about the behavior of systems with many moving parts.

  • 1.1, 1.2, 1.3, 1.4, 1.6
  • All of chapter 2
  • All of chapter 3 except 3.5
  • Skip chapter 4 entirely
  • 5.1, 5.2
  • All of chapter 6

I wrote up some notes on thermal physics that cover what I think are the important steps for understanding the Boltzmann distribution, which seems to me to be the most interesting result in thermal physics. These are very rough, but they might be helpful for people whose interest in this subject is very similar to mine. These follow Schroeder pretty closely. If you don’t know quantum mechanics, let your eyes glaze over the QM references and you should be fine.

  • Microstates are points in the configuration space for your whole system.
    • These are quantized because QM.
    • Particular cases:
      • In an ideal gas, velocity and position is quantized the same way the particle in a box is quantized, but you usually have high enough energy levels that the states look pretty much continuous.
      • In an ideal gas of molecules, you have things like rotations and vibrations that are quantized according to the harmonic oscillator.
      • In a solid, the jiggles around the static structure look like a 3D harmonic oscillator.
  • Macrostates are a partition of your microstates into classes that you can’t distinguish between. For example, the macrostates of a gas in a container might be defined by their total energy. For the following treatment, we’re going to want every microstate in a macrostate to have the same internal energy.
    • One fundamental postulate of statistical mechanics is that if you know you’re in a particlar macrostate (eg you know that the temperature of your gas is 15 degrees), then every microstate in that macrostate is equally plausible.
    • We can guess properties of our system by looking at the proportion of microstates in the macrostate with a particular property. For example, very few microstates of a gas in a box have all the gas on the right hand side of the box.
  • Entropy
    • Entropy is a property of macrostates. It is closely related to the Shannon entropy. It is a measure of the number of different microstates corresponding to the macro state. This number of microstates is known as the multiplicity and represented with the symbol \(\Omega\).
    • All microstates are equally likely, so the probability of being in a particular macrostate just depends on the number of microstates it has. (This argument kind of mixes up probabilities and multiplicities in a bad way.)
    • Sometimes you hear entropy described as “the amount of information required to specify what state you’re in”. This definition is accurate, but I think it’s a little confusing. Why should physics care how many bits it requires to specify your state? I think it’s much more intuitive to say that entropy is a measure of how many microstates there are in this macrostate, because that makes it more obvious why we’re probably in a high entropy state: indifference over microstates leads to expecting to be in a high entropy macrostate.
    • Entropy \(S\) is defined as \(S = k \ln(\Omega)\), where \(k\) is the Boltzmann constant.
    • We can very nicely make the approximation “At equilibrium, your macro state is a maximum entropy macro state”.
    • Often we want to talk about what the most likely macrostate is, given a particular set of facts. When we talk about this, the context is often that we have a few different interacting objects and we want to know what their equilibrium is. For example, we might have two different blocks of metal in contact. Energy is conserved, so the internal energy of block A + the internal energy of block B is fixed. What is the most likely split of energy between them? Why, it’s the split of energy that maximises the sum of the entropies of the blocks.
  • Now we need to look at how the entropy of a system changes with its internal energy.
    • it usually goes up, because you can access more states, because your states have discrete energy levels. The number of states usually grows exponentially with the amount of energy (over small ranges of energy).
    • Two models of this:
      • Einstein solids. An Einstein solid is a bunch of extremely loosely coupled harmonic oscillators. By “loosely coupled”, I mean that the potential function for each harmonic oscillator is barely affected by the state of the others, so they can be considered independently. But the different oscillators need to have some amount of influence on each other, so that all the possible microstates are accessible. You can model solids as three harmonic oscillators per atom (three because each atom can oscillate in three dimensions).
        • In quantum mechanics, the energy levels of a harmonic oscillator are evenly spaced. So if you have ten oscillators and a thousand units of energy to spread between them, you have more microstates than if you only had 999 units of energy to spread between them. If you have \(N\) oscillators and \(q\) units of energy, your multiplicity is approximately (see Schroeder section 3.3):
        • $^$ \left(\frac{q+N}{q}\right)^q \left(\frac{q+N}{N}\right)^N $^$
      • Ideal gases. You model this one as a particle in a 3D box. As energy increases you’re able to access higher-momentum wavefunctions.
    • (this is not universally true, though, there are systems with different entropy-energy relations, eg ones where the entropy decreases as you increase energy. For example, paramagnets are systems made up of particles that have two states, up and down. The energy of the system is some constant times the number of up states. So the minimum energy state is “everything is down” and the maximum energy state is “everything is up”. If the energy in the system is such that more than half of the particles are spin up, then increasing energy decreases the entropy.)
  • Define temperature
    • What is temperature? Temperature is the quantity that objects have in common with each other iff they are in thermal equilibrium.
      • This definition only makes sense up to an invertible function though. Like, we could always double it, or cube it, and the definition would work just as well.
    • To learn more about this, let’s look at the thermal equilibrium of two objects, A and B.
      • The system has some total energy \(U\), and the objects have energy \(U_A\) and \(U_B\) such that \(U = U_A + U_B\).
      • As discussed above, we’re almost certainly going to be in an almost entropy-maximizing state.
      • Both materials gain energy as they gain entropy. So if I move a bit of energy from A to B, entropy will increase in B and decrease in A. By assumption, we’re at maximum entropy. The rate of change of entropy of the system as we move energy from A to B is given by \(\frac{dS_A}{dU_A} + \frac{dS_B}{dU_B} = \frac{dS_A}{dU_A} - \frac{dS_B}{dU_A}\). If this is zero, then these two rates of change of entropy wrt energy must be equal.
        • You should mediate on this. The connection between thermal equilibrium and the rate of change of entropy wrt energy is very interesting and deserves reflection.
      • This suggests that temperature is related to dS/dU.
      • Turns out that it’s inverted. So \(1/T = \frac{dS}{dU}\).
    • What’s nice about the temperature scale we chose, then?
      • It’s actually surprisingly artificial. The reason it’s nice is that most materials have heat capacities that are pretty close to piecewise constant functions. So if you mix together two containers of water at different temperatures, the overall temperature is pretty close to the weighted average of those temperatures. And if you put some mercury in a tube, it expands close to linearly with its temperature.
      • But there are ways that this definition seems pretty unnatural. For example, there exist systems where to increase the entropy, you need to decrease the energy. These systems are defined to have negative temperature. They transfer heat to any positive-temperature system they’re in contact with, suggesting that their temperature should be very high rather than very low.
    • BTW, by this definition of temperature it’s possible for things to have negative temperature.
  • Now, let’s figure out what a single particle is going to do.
    • We’re looking at a single particle in a big system. The system has a total energy of U. This is a big energy, much much bigger than the differences between plausible states of our single particle.
    • Suppose our single particle has two possible states, \(s_1\) and \(s_2\). These have energies, that I’ll write as \(E(s_1)\) and \(E(s_2)\).
    • A priori these two states are equally likely. But we know that the system has a total energy of \(U\). So if the particle is in s1, then the rest of the system has to be in a state with energy \(U - E(s_1)\). And if it’s in \(s_2\), the rest of the system has energy \(U - E(s_2)\).
    • The probability comes from the multiplicities: \(P(s1) = \Omega_1 / (\Omega_1 + \Omega_2)\).
    • We need to calculate the difference between these multiplicities. How will we do that?
    • Ah, we can do it with the temperature! \(1/T = dS/dU\), and the change in internal energy here is very small, so it’s correct to say that \(S(s_1) - S(s_2) = 1/T \cdot (E(s_1) - E(s_2))\).
    • To turn this back into a multiplicity, we exponentiate and add in a power of \(k\), and voila:
      • \[\Omega(s_1)/Ω(s_2) = \frac{\exp(-E(s_1)/kT)}{\exp(-E(s_2)/kT)}\]
      • \[P(s_1) = \frac{\exp(-E(s_1)/kT))}{\exp(-E(s_1)/kT) + \exp(-E(s_2)/kT)}\]
    • But actually there are many different states. If this is true of every different state, we learn that the overall probability distribution is:
      • P(s) = 1/Z exp(-E(s)/kT)
        • where Z is a normalization constant, \(Z = \sum_s \exp(-E(s)/kT)\)
    • In a system where each degree of freedom has equally spaced energy levels, you’re done.
      • For example, this is a way of deriving the barometric equation, if you’re okay with neglecting the kinetic energy of the gas. This is okay if you think about it practically: the KE of gas molecules is way lower than the PE.
    • Let’s look at what this function does.
      • When T is very small, then differences in energy level are extremely important to the probability. You are very likely to get minimum energy states.
      • When T is very large, then the energy of a state barely matters at all.
      • So T is obviously controlling how much we should be biased towards the lower energy states.
      • As a computer scientist, I’m very familiar with thinking of temperature this way. This metaphor where different states have different energies and you’re more likely to pick low energy states and the amount that you bear this in mind is used in a lot of topics. One particular topic I’m familiar with is undirected graphical models.

And some more scattered notes:

  • Pressure
    • Above, we defined temperature as “the quantity that two systems have in common when they’re in thermal equilibrium”. We could do the same thing for pressure.
    • Pressure is “the quantity that two systems have in common when they’re in mechanical equilibrium”. That is, if you have a partition between the two systems, they have the same pressure if the partition doesn’t move.
    • Remember that we say two systems have the same temperature if \(\frac{dS}{dU}\) is the same for both. What should the analagous definition for pressure be?
    • Well, two things have the same pressure if their entropies increase at the same rate when volume changes. So pressure is a function of \(\frac{dS}{dV}\).
    • It turns out to be \(T \frac{dS}{dV}\), holding internal energy and the number of particles constant.
    • Where did the \(T\) come from? I am not totally sure, but I think it’s related to the fact that the partial derivatives we had to do involved holding internal energy constant, which implies that the systems are in thermal equilibrium.
  • The probability of any particular microstate is proportional to this function. But to actually get results out of this model, we need to be able to count up the different microstates.
    • The proper way of doing this is to consider your system quantum mechanically and look at all the definite energy states. However, various sketchy hacks work out alright too.
      • One example sketchy hack is the uncertainty principle one: we use the Heisenburg uncertainty principle to say that you need to choose a discretization of your position and momentum spaces such that the Heisenburg uncertainty principle is obeyed. I think this is a trick and I am suspicious of it.
    • Situations:
      • Ideal gas
        • Velocities and positions
      • Einstein solid
      • Molecules
        • nicely summarized here.
        • Re water: “Water absorbs a wide range of electromagnetic radiation with rotational transitions and intermolecular vibrations responsible for absorption in the microwave (~1 mm - 10 cm wavelength) and far-infrared (~10 µm - 1 mm), intramolecular vibrational transitions in the infrared (~1 µ- 10 µ) and electronic transitions occurring in the ultraviolet region (< 200 nm).” (from here)
        • You need to consider rotational energies, vibrational energies, and excitations of electrons.
          • Vibrational energy is considered using the harmonic oscillator model, under which the energies are equal to \((2n + 1)\hbar\omega/2\) for all \(n\) in [0, 1, 2…]. Angular frequency \(\omega\) is \(\sqrt{\frac km}\).
            • In practice vibrational energy isn’t that important in molecules.
            • The energy of excited states increases with the square root of the spring constant. So stronger bonds require higher temperatures to start vibrating.
          • Rotational energy:
            • If linear, use rigid rotor model.
              • This has energies of \(B\cdot n\cdot(n+1)\) for n in [0, 1, 2…], where B is \(\frac{\hbar^2}{2I}\), where \(I\) is the moment of inertia. See chapter 6 of McQuarrie. The moment of inertia is on the bottom of that fraction, so it’s easier to get molecules to spin around axes that have higher moments of inertia.
          • Excited electron states
            • These have energies that are proportional to \(1-\frac1{n^2}\). These energies are in the ultraviolet region.
    • Here are some different ways degrees of freedom can play out:
      • They might not affect the energy level. For example the x position of an atom in a box.
      • They might affect the energy, but only a tiny tiny bit. Eg the height of a molecule in a small box—the PE is minuscule compared to everything else.
      • A degree of freedom might have spacings between energy levels such that at the temperature you care about, the ground state is pretty much the only accessible state.
      • Many might be accessible, and the energy might be quadratic in the energy level. This is how kinetic energy in a gas works, and it’s also how potential energy in a solid works.
      • Many states might be accessible, and the energy is linear in the state. Eg the barometric formula case. In this case you get an exponential dropoff.
    • By the way, here’s an interesting result that helps you. If two different degrees of freedom are uncorrelated (that is, the energy of the overall state can be separated out into a sum of energies from each degree of freedom) then you can calculate distributions over each degree of freedom without considering the other.
  • Interesting facts:
    • Equipartition theorem: every degree of freedom whose energy function is quadratic (eg rotational energy, kinetic energy, potential energy in a spring) has an energy linear in the temperature.
    • You can use statistical mechanics to calculate the heat capacity of various objects. The heat capacity of an object is determined by the number of degrees of freedom that are available to be excited at the current temperature. So it’s proportional to the number of atoms. As you cool things down, the quantization of the available energies of different degrees of freedom starts to matter, and various degrees of freedom “freeze out”. So heat capacity usually increases as things get hotter.
    • Temperature is measured in joules per degree; this leads directly to the Landauer limit on efficiency of computation.
  • I elided many cool details. In particular:
    • Quantum mechanics of low temperatures.