12 Rigid Bodies and the Inertia Tensor

12.1 Translation and rotation of rigid bodies

In most of our discussion of mechanics this semester, we’ve been treating everything like a point particle. Point particles can only translate - move around in space - and unconstrained, have 3 degrees of freedom, i.e. we need 3 numbers to uniquely specify each particle’s location.

Now we’ll extend our attention to rigid bodies: collections of many particles held together in a fixed shape (ignoring bending, stretching, etc.) For a system of N particles (label: \alpha = 1, ..., N), we know that we can treat collective motion as if all mass and forces were concentrated at the center of mass, \vec{F}_{\textrm{ext}} = M \ddot{\vec{R}}, where M = \sum_{\alpha=1}^{N} m_\alpha, and the CM position is \vec{R} = \frac{1}{M} \sum_{\alpha=1}^{N} m_\alpha \vec{r}_\alpha. In principle, we need 3N different coordinates to describe such a collection of particles - and N is incredibly large for macroscopic objects, if we treat them as collections of atoms! However, the requirement that we have a rigid body provides almost as many constraints. There are only two kinds of motion that a rigid body can undergo: translation of the center of mass, and rotation about the center of mass. (This is hopefully an intuitive statement, but we’ll demonstrate it shortly!)

Although I’m going to derive a lot of results using sums like this, keep in mind that an everyday object is made up of such an enormous number of individual atoms that we can, to good approximation, just treat the object as a continuous distribution of matter. In that case, we can write integrals instead of sums, \vec{R} = \frac{1}{M} \int \vec{r}\ dm. So again, although we’ll derive a lot of facts using sums, in practice we’ll mostly be working with integrals in this subject.

There was one example of a system we didn’t treat as simply a point particle with Lagrangian mechanics: a rolling object.

As we saw, the kinetic energy of the rolling disk has two components: T = \frac{1}{2} m \dot{x}^2 + \frac{1}{2} I \dot{\phi}^2. The first term is the usual translational kinetic energy, while the second term is the kinetic energy associated with the rotation of the disk. Because the disk is an extended object, it is able to store energy internally, even if it isn’t moving in space. For example, if we mounted the disk on a rod and just spun it about its axis without letting it go anywhere (this is a “flywheel”), it still has kinetic energy - energy associated with its motion.

Let’s generalize beyond the disk. For any system of N particles, the total kinetic energy is just the sum of the individual particle kinetic energies, T = \sum_{\alpha = 1}^N \frac{1}{2} m_\alpha \dot{\vec{r}}_\alpha^2. We can split each position vector \vec{r}_\alpha into two components: the CM position vector \vec{R}, and the position of the particle relative to the CM, \vec{r}'_\alpha. \vec{r}_\alpha = \vec{R} + \vec{r}'_\alpha. CM of a collection of point masses.

For the kinetic energy, then, we have T = \frac{1}{2} \sum_\alpha m_\alpha (\dot{\vec{R}}^2 + \dot{\vec{r}}'_\alpha{}^2 + 2 \dot{\vec{R}} \cdot \dot{\vec{r}}'_\alpha). We can rewrite the last term as T = (...) + \dot{\vec{R}} \sum_\alpha m_\alpha \dot{\vec{r}}'_\alpha \\ = \dot{\vec{R}} \frac{d}{dt} \left( \sum_\alpha m_\alpha \vec{r}'_\alpha \right). This sum looks exactly like the definition of the CM, except now it’s in coordinates relative to the CM itself! So it’s just equal to zero. We can show this mathematically: going back to the definition of the CM, notice that M \vec{R} = \sum_\alpha m_\alpha \vec{r}_\alpha \\ \Rightarrow 0 = \sum_\alpha (m_\alpha \vec{r}_\alpha - M \vec{R}) \\ = \sum_\alpha (m_\alpha (\vec{R} + \vec{r}'_\alpha) - M\vec{R}) \\ = \sum_\alpha m_\alpha \vec{r}'_\alpha since \sum_\alpha m_\alpha = M. So crossing the last term out, we have for the total kinetic energy T = \frac{1}{2} M \dot{\vec{R}}^2 + \frac{1}{2} \sum_\alpha m_\alpha \dot{\vec{r}}'_\alpha{}^2 \\ = T_{CM} + T_{\textrm{rel}}. For any collection of particles, total T splits into T associated with motion of the center of mass, plus T relative to center of mass.

If only central, conservative forces are involved, U can be divided up similarly, U = U_{ext} + U_{int} = U_{ext} + \sum_\alpha \sum_\beta U_{\alpha \beta} (|\vec{r}_\alpha - \vec{r}_\beta|). (Non-central internal forces can mess with total angular momentum, a complication we want to avoid for now.) For a rigid body in particular, the inter-particle distances |\vec{r}_\alpha - \vec{r}_\beta| are all fixed, so U_{\rm int} is just a constant that we can drop, leaving U = U_{\rm ext}.

So, extended objects can store energy internally, and it can be treated more or less separately from CM motion (which acts like a point particle.) A flywheel stores kinetic energy internally; a spring stores potential energy internally. (This is mechanics and not E&M, but of course there’s a close analogy here to a battery, which stores electric potential energy internally.)

12.2 Angular momentum of rigid objects

Since we’re interested in rotation, we should also look at how angular momentum splits up for a collection of particles. Recall that for a single point particle, \vec{L} = \vec{r} \times \vec{p} = m \vec{r} \times \dot{\vec{r}}. The angular momentum of a collection divides up just like the kinetic energy did, between the motion of the CM and motion relative to the CM: \vec{L} = M \vec{R} \times \dot{\vec{R}} + \sum_\alpha m_\alpha \vec{r}'_\alpha \times \dot{\vec{r}}'_\alpha = \vec{L}_{CM} + \vec{L}_{\rm rel}. (The derivation is in §10.1 of Taylor; it’s very similar to our kinetic energy derivation above.)

The angular equivalent of force is the torque; for a point particle, \vec{\Gamma} = \vec{r} \times \vec{F}. (Some people prefer \vec{\tau} for torque, but I’ll follow Taylor and use \vec{\Gamma}.) Torque and angular momentum are related just like force and linear momentum: \vec{\Gamma} = \dot{\vec{L}}. Torques can act on either \vec{L}_{CM} or \vec{L}_{\rm rel}, or on both at once. In particular, if there are no torques, the splitting above means both angular momenta are conserved separately.

The canonical example of this division is the Earth itself. We can identify two different types of rotational motion: motion of the Earth around the Sun (“orbital” angular momentum, \vec{L}_{\rm orb}), and rotation of the Earth around its own axis (“spin” angular momentum, \vec{L}_{\rm spin}). The separation noted above means we don’t have to worry about the spin when considering the orbit, and vice-versa.

Spin vs. orbital angular momentum in the Earth-Sun system.

In fact there is a small net torque on the Earth’s spin from the Sun and Moon, mainly because the Earth isn’t perfectly spherical. This causes the precession of the equinoxes, a slow change in the Earth’s orientation relative to the stars. Currently the North pole is roughly pointed at Polaris (the “North star”), but due to the precession it will shift away; in about a thousand years, Gamma Cephei will become the closest star to the North pole.

(When you study quantum mechanics, you’ll see that the motion of an electron around a proton divides similarly into orbital and spin components which are approximately conserved in the absence of external influence. This division turns out to be crucial for an accurate description of the observed energy levels of the hydrogen atom.)

We now have all of the pieces to look at the details of how rotational motion of a rigid object gives rise to (internal) kinetic energy and angular momentum. Let’s start simple, with rotation about a fixed axis: imagine a steel rod as a physical axis, onto which we thread some arbitrary solid object.

A strange-looking rigid object, forced to rotate around a fixed axis.

The rotation axis is set to be \vec{\omega} = \omega \hat{z}. For any point within the object, we write its position as \vec{r}_\alpha, with origin at the CM. Since the object is rigid, in a frame rotating with the object every point is at rest: the velocity of a point due to the rotation is thus \dot{\vec{r}}_\alpha = \vec{\omega} \times \vec{r}_\alpha \\ = -\omega y_\alpha \hat{x} + \omega x_\alpha \hat{y}. The angular momentum is then \vec{L}_\alpha = m_\alpha \vec{r}_\alpha \times \dot{\vec{r}}_\alpha \\ = m_\alpha \left| \begin{array}{ccc} \hat{x}&\hat{y}&\hat{z} \\ x_\alpha & y_\alpha & z_\alpha \\ -\omega y_\alpha & \omega x_\alpha & 0 \end{array} \right| \\ = m_\alpha \omega \left( -z_\alpha x_\alpha \hat{x} - z_\alpha y_\alpha \hat{y} + (x_\alpha^2 + y_\alpha^2) \hat{z} \right). To calculate the total angular momentum of the object, we sum over all of the component angular momenta. Let’s do the z component first: L_z = \sum_\alpha (L_{\alpha,z}) = \sum_\alpha m_\alpha \omega (x_\alpha^2 + y_\alpha^2) \\ = \left(\sum_\alpha m_\alpha r_\alpha^2\right) \omega \\ = I_z \omega In other words, the angular momentum about the axis of rotation is proportional to \omega, and the proportionality is given by the sum over (mass) x (distance from axis){}^2. This is exactly the moment of inertia about the z-axis, as you learned it in freshman physics!

Moreover, notice that the kinetic energy of the rotating object is equal to T = \frac{1}{2} \sum_\alpha m_\alpha \dot{\vec{r}}_\alpha^2 \\ = \frac{1}{2} \sum_\alpha m_\alpha (\omega^2 y_\alpha^2 + \omega^2 x_\alpha^2) \\ = \frac{1}{2} \sum_\alpha m_\alpha r_\alpha^2 \omega^2 = \frac{1}{2} I_z \omega^2. This is another formula you’ve seen before (and we’ve used in Lagrangian mechanics.) Here’s the part that you haven’t seen before: what about the x and y components of \vec{L}? We can calculate those too: L_x = \sum_\alpha m_\alpha (-z_\alpha x_\alpha \omega) = I_x \omega \\ L_y = \sum_\alpha m_\alpha (-z_\alpha y_\alpha \omega) = I_y \omega. For something simple like a uniform cylinder rotating around its symmetry axis, these components vanish identically: at any z, for every x_\alpha there is an equal mass at -x_\alpha to cancel it in the sum. But for an arbitrary object, these moments don’t have to be zero!

This brings us to a very, very important point: \vec{L} and \vec{\omega} do not necessarily point in the same direction. The equation \vec{L} = I \vec{\omega} (where I is a numerical constant), as you learned it in freshman physics, is obviously not true in all scenarios; in general, I must be a matrix, an operator which can transform one vector into another.

Let’s drive this home with an example. Consider a single mass attached to a massless, pivoting rod, at a fixed angle to the z-axis. The mass and rod pivot about the z-axis with constant angular speed \vec{\omega}.

A mass on a pivoting rod, rotating around the z-axis.

What we’ve seen is that even in a simple case like this, it’s often true that \vec{L} and \vec{\omega} are not aligned. In general, this doesn’t require the presence of torque, nor does it require that \vec{\omega} remain fixed. In fact, this scenario is somewhat unrealistic; it’s much more common for us to encounter motion with no torques in which \vec{L} is fixed, but \vec{\omega} can still change. If you throw your copy of Taylor into the air (note: do this at your own risk!) you will see it tumbling around in the absence of any forces, which is exactly this sort of motion.

Of course, books are expensive and don’t stay in the air very long, so how about we watch a video from an astronaut on the International Space Station instead?

Video: Spinning a deck of cards in space!

12.3 Frame dependence of angular momentum

Before we go back to our study of rigid bodies, let’s step back and just look at some basic features of how statements about angular momentum depend on coordinate choices. This is meant to challenge your intuition, which you shouldn’t always trust when dealing with rotational motion!

Consider a very simple example: a ball of mass m traveling in a straight line. We take our initial coordinate system so that the ball is moving along the \hat{y} axis with speed v, and starts at y=0 when t=0.

Setup for our simple straight-line example.

Now we’ll make some simple observations. First, there are no forces and therefore no torques: since \vec{\Gamma} = \dot{\vec{L}}, we conclude that \dot{\vec{L}} = 0. Moreover, the angular momentum at any moment is equal to \vec{L} = m \vec{r} \times \vec{v} which, since \vec{r} and \vec{v} always point along the y-axis, is always zero.

This probably seems very intuitive and trivial so far, but now what if we change coordinates?

Let’s call d the distance between the y and y' axes. We can evaluate \vec{L} at t=0. Then \vec{r} = d\hat{x}' and \vec{v} = v \hat{y}', so |\vec{L}| = mvd. Not zero! Is it changing with time? The vector \vec{r} is changing, but \vec{v} isn’t, so it looks like it might be…let’s evaluate later when y'=h.

Some geometry in the primed coordinates.

Now |\vec{L}| = mvr \sin \theta = mvd. Still conserved! (It should be, since there are still no torques.)

But wait, it gets better! What if I switch gravity on? In the original coordinates, we still have \vec{L} = 0 for the entire motion, because \vec{\Gamma} = \vec{r} \times \vec{F}, and gravity and \vec{r} are both in the y direction. But in the primed coordinates… \vec{\Gamma} = \vec{r} \times \vec{F} = mg \vec{r} \times (-\hat{y'}) The angular momentum is, meanwhile, \vec{L} = m\vec{r} \times \vec{v} = m\dot{y}' \vec{r} \times \hat{y'} Rate of change: \dot{\vec{L}} = m\ddot{y}' (\vec{r} \times \hat{y'}) + m \dot{y}' (\dot{\vec{r}} \times \hat{y'}) \\ = m \ddot{y}' (\vec{r} \times \hat{y'}) since \dot{\vec{r}} always points in the \hat{y}' direction, so the second cross-product is zero. But then, we see that \vec{\Gamma} = \dot{\vec{L}} \\ (-mg) (\vec{r} \times \hat{y'}) = m\ddot{y}' (\vec{r} \times \hat{y'}) which is just the usual equation of motion, \ddot{y}' = -g. So the motion isn’t really any more complicated, but the equations certainly look more complicated in terms of angular momentum.

The main point to take away from all of this is the importance of the choice of coordinates for working with angular momentum; even seemingly general statements like “angular momentum is conserved” are strongly coordinate-dependent! This is very counter-intuitive compared to the frame dependence of linear momentum, especially since two observers who are stationary with respect to each other can disagree about angular momentum. (But they will always agree on the actual, physical motion!)

12.4 The inertia tensor

Now let’s move on to rotation about an arbitrary axis, and see the general form of this inertia matrix. Take a general rotation vector \vec{\omega} = (\omega_x, \omega_y, \omega_z); the rotation is taken with respect to the center of mass and the body is otherwise stationary, so the speed of each particle is \dot{\vec{r}}_\alpha = \vec{\omega} \times \vec{r}_\alpha. Thus, the angular momentum carried by each particle is \vec{L}_\alpha = m_\alpha \vec{r}_\alpha \times (\vec{\omega} \times \vec{r}_\alpha). Expanding out both cross products in turn is tedious, so I’ll just skip to the result, which can be written in matrix form: \vec{L}_\alpha = m_\alpha \left( \begin{array}{ccc} y_\alpha^2 + z_\alpha^2 & -x_\alpha y_\alpha & -x_\alpha z_\alpha \\ -x_\alpha y_\alpha & x_\alpha^2 + z_\alpha^2 & -y_\alpha z_\alpha \\ -x_\alpha z_\alpha & -y_\alpha z_\alpha & x_\alpha^2 + y_\alpha^2 \end{array} \right) \left( \begin{array}{c} \omega_x \\ \omega_y \\ \omega_z \end{array} \right). This is a compact way of writing three equations, one for each component of \vec{L}_\alpha. Notice that setting \vec{\omega} = \omega \hat{z} gives us the fixed-axis result we found above. Once again, to get the total angular momentum, we sum over the individual particles \alpha, and we can pull \vec{\omega} out of the sum, so finally we have: \vec{L} = \overset{\leftrightarrow}{I} \vec{\omega} which defines the inertia matrix or inertia tensor:

Inertia tensor (full form)

\overset{\leftrightarrow}{I} = \sum_\alpha m_\alpha \left( \begin{array}{ccc} y_\alpha^2 + z_\alpha^2 & -x_\alpha y_\alpha & -x_\alpha z_\alpha \\ -x_\alpha y_\alpha & x_\alpha^2 + z_\alpha^2 & -y_\alpha z_\alpha \\ -x_\alpha z_\alpha & -y_\alpha z_\alpha & x_\alpha^2 + y_\alpha^2 \end{array} \right) \\ {\rm or:}\\ \overset{\leftrightarrow}{I} = \int dV\ \rho(x,y,z) \left( \begin{array}{ccc} y^2 + z^2 & -x y & -x z \\ -x y & x^2 + z^2 & -y z \\ -x z & -y z & x^2 + y^2 \end{array} \right).

The second version of this just replaces the sum with an integral, for an object described by some continuous density function \rho(x,y,z). Don’t be confused by the “integral of a matrix”; the components of the matrix are all separate, so you can think of writing nine separate integrals inside the matrix, I just wrote it like this to save space.

Notice that this is a symmetric matrix; that has some important implications, but one thing we can see immediately is that although it’s a 3x3 matrix, there are only six independent entries we have to find.

It’s very important to remember that this formula defines the moments of inertia only for rotational axes which pass through the origin of the coordinate system, x=y=z=0. This special point through which all rotational axes pass is called the pivot point. If we want to calculate rotation around an axis which doesn’t pass through the origin, we have to shift our coordinates to use the formula above. Remember, angular momentum is sensitive to where we put our coordinates! (The actual motion of the system that we find in the end will, of course, be independent of coordinate choices, but as we saw above, statements like “\vec{L} is constant” do depend on coordinates.)

By the way, it shouldn’t surprise you that the equation for kinetic energy is modified too! The derivation is a bit of a pain, so I’ll just tell you that the general formula is T = \frac{1}{2} \vec{\omega} \cdot \vec{L} = \frac{1}{2} \vec{\omega}^T \overset{\leftrightarrow}{I} \vec{\omega}. You can see that this reproduces the simplified formula T = \frac{1}{2} I \omega^2, which is valid only for rotation that keeps \vec{\omega} and \vec{L} parallel.

12.4.1 Tensors and index notation

It’s worth making a brief detour here into index notation, which will give us a much nicer-looking formula for the inertia tensor. We know that we can break any vector down into its components: \vec{a} = a_x \hat{x} + a_y \hat{y} + a_z \hat{z}. Here we’ve written the components as a_i, where i is a placeholder for any of x,y,z. (By convention, latin indices i,j,k... are used for spatial coordinates.) Index notation gives us a compact way of rewriting certain formulas in terms of components. For example, the dot product: \vec{a} \cdot \vec{b} = a_x b_x + a_y b_y + a_z b_z \\ = \sum_i a_i b_i. This notation is easily extended to larger objects. The inertia tensor itself is a matrix, so we need two indices ij to write out its 9 components. Which entry is which, in this notation? Well, we can write out matrix-vector products as sums over indices. These two equations have the same meaning: \vec{L} = \mathbf{I} \vec{\omega} \\ L_i = \sum_j I_{ij} \omega_j. So, for example, the x-component of \vec{L} is given by L_x = I_{xx} \omega_x + I_{xy} \omega_y + I_{xz} \omega_z \\ \left( \begin{array}{c} L_x \\ ... \\ ... \end{array} \right) = \left( \begin{array}{ccc} I_{xx} & I_{xy} & I_{xz} \\ &...& \\ &...& \end{array}\right) \left( \begin{array}{c} \omega_x \\ \omega_y \\ \omega_z \end{array} \right) This is the dot product of the column vector \vec{\omega} with the first row of \mathbf{I}, so we see that the first index labels the row, and the second index is the column.

With this notation set up, we can write a much more compact formula for the inertia tensor:

Inertia tensor (compact form)

I_{ij} = \int\ dV\ \rho(x,y,z) (r^2 \delta_{ij} - r_i r_j)

where r_i = (r_x, r_y, r_z) = (x, y, z), and r^2 = x^2 + y^2 + z^2 (this is all still in rectangular coordinates!) I have introduced a special symbol here known as the Kronecker delta symbol, which is equal to 1 if the two indices are equal and 0 otherwise: \delta_{ij} = \begin{cases} 1, & i = j;\\ 0, & i \neq j. \end{cases} In matrix form, the Kronecker delta is just the identity matrix. We could go much further with index notation; for example, I could define the cross product using a different special tensor. But for now, we only really wanted the compact version of \mathbf{I}, so let’s move on.

By the way, I haven’t really defined the term tensor. A tensor is a mathematical object that is sensitive to coordinate changes in a certain way. Consider the simple change of coordinates from rectangular to cylindrical: x = r \cos \theta \\ y = r \sin \theta \\ z = z If we’re just interested in a quantity like the kinetic energy, then we just substitute in one coordinate for another using these equations: \dot{x}^2 + \dot{y}^2 becomes \dot{r}^2 + r^2 \dot{\theta}^2, for example. But if we have a vector and we want to change coordinates; we have to go further, and account for the change of basis vectors from \hat{x}, \hat{y} to \hat{r}, \hat{\theta}.

A vector is an example of a tensor, because of this extra sensitivity to the unit vectors in our coordinate change. The “inertia tensor” will also depend on the unit vectors, but in a more complicated way: to fully change coordinates here, we have to identify what I_{rr} is in terms of the original x and y components, and so on. We’re not going to deal with coordinate changes of tensors in full generality here; just keep in mind that the inertia matrix isn’t just a fixed set of numbers, but that it depends intricately on the definition of our coordinates. I’ll continue to say “inertia tensor” since that’s standard jargon, but you can just think “inertia matrix”.

12.4.2 Example: inertia tensor of a cube

Consider a cube of fixed density \rho, side length b, rotating about one of its corners. What is the inertia tensor?

A solid cube, pivoting around one of its corners.

Before we begin, let’s think about whether we really need to evaluate all six integrals in this case. The cube has a lot of symmetry; in particular, we can see immediately that it looks exactly the same in the x, y, and z directions from our starting point. So exchanging the variables x,y,z will give us the same integral and thus the same answer. This tells us that the three off-diagonal components of \mathbf{I} are all equal, and so are the three diagonal components.

There’s no way to reduce further; the diagonal and off-diagonal integrals look different, and none of them are obviously equal to zero. So we have 2 integrals to evaluate.

Let’s start with the top left diagonal component. Since \rho is constant, we can pull it out of integral: I_{xx} = \rho \int_0^b dx \int_0^b dy \int_0^b dz (y^2 + z^2) \\ = b \rho \int_0^b dy \left.(y^2 + \frac{1}{3} z^3)\right|_0^b \\ = b \rho \int_0^b dy (by^2 + \frac{1}{3} b^3) \\ = b \rho (\frac{1}{3} b^4 + \frac{1}{3} b^4) = \frac{2}{3} \rho b^5 = \frac{2}{3} Mb^2. where at the last step I’ve identified that the density of the cube can be rewritten as M = \rho b^3. Now, the off-diagonal component: I_{xy} = \rho \int_0^b dx \int_0^b dy \int_0^b dz (-xy) \\ = -b \rho \int_0^b dx (\frac{1}{2} x b^2) \\ = -\frac{1}{4} \rho b^5 = -\frac{1}{4} Mb^2. All off-diagonal components are the same, and all diagonal components too, thanks to the symmetry of the cube! So the full tensor is: \overset\leftrightarrow I = Mb^2 \left( \begin{array}{ccc} \frac{2}{3} & -\frac{1}{4} & -\frac{1}{4} \\ -\frac{1}{4} & \frac{2}{3} & -\frac{1}{4} \\ -\frac{1}{4} & -\frac{1}{4} & \frac{2}{3} \end{array} \right). As we’ve just seen, it’s very helpful to identify the symmetries of our object before we start calculating integrals! Often you’ll be able to demonstrate using symmetry either that some components of the inertia tensor are equal to each other, or even that they must be zero (for the off-diagonal components.)

Once again, I’ll stress that the location of the pivot point is important! You know from intro physics that a cylinder has a different moment of inertia for rotation about its edge or about its center, for example. In our cube example above, we can only use the expression we found for rotations that pass through the corner of the cube; for other pivot points, \mathbf{I} will change!

Let’s see explicitly what the difference is, by recomputing the cube’s inertia tensor from its center of mass.

12.4.3 Example: inertia tensor of a cube about the CM

Much like the cube about its corner, the high degree of symmetry here means we only need to do two out of nine possible integrals. Let’s look at a diagonal element first: I_{xx} = \int\ dV \rho (y^2 + z^2) \\ = \rho \int_{-b/2}^{b/2} dx \int_{-b/2}^{b/2} dy \int_{-b/2}^{b/2} dz (y^2 + z^2) \\ = \rho b \int_{-b/2}^{b/2} dy \left. (zy^2 + \frac{1}{3} z^3) \right|_{-b/2}^{b/2} \\ = \rho b \int_{-b/2}^{b/2} dy \left( by^2 + \frac{1}{3} \frac{b^3}{4} \right) \\ = \rho b \left. \left( \frac{1}{3} by^3 + \frac{1}{12} b^3 y \right) \right|_{-b/2}^{b/2} \\ = \rho b \left( \frac{1}{12} b^4 + \frac{1}{12} b^4 \right) \\ = \frac{1}{6} \rho b^5 = \frac{1}{6} M b^2. On the other hand, all the off-diagonal moments are zero, for example I_{xy} = \int\ dV \rho (-xy). This is an odd function of x and y, and our integration is now symmetric about the origin in all directions, so it vanishes identically. So the inertia tensor of the cube about its center is \overset{\leftrightarrow}{I} = \frac{1}{6} Mb^2 \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right).

12.5 The parallel axis theorem

Let’s consider two coordinate systems: the first \vec{r} = (x,y,z) with origin at the center of mass of an arbitrary extended object, and the second \vec{R} = (X,Y,Z) offset by some distance. As described in \vec{R}, the object is displaced from the origin (but not rotated!) by some constant vector \vec{a}.

As the name implies, the axes of the two coordinate systems are parallel - \hat{x} and \hat{X} point in the same direction, and the same for y and z. We’re just displacing the origin. In vector form, the coordinates are related as \vec{R} = \vec{a} + \vec{r}. Note that \vec{a} points towards the center of mass - the direction is important! It’s a straightforward exercise in algebra to write down the inertia tensor in one coordinate system and translate it to the other coordinates: just write R_i = r_i + a_i and recognize that anything which looks like \sum_{\alpha} m_\alpha \vec{r}_\alpha is zero, since it’s the position of the CM in the CM coordinates.

I’ll skip the algebra here, and just give you the answer. If I_{ij} is the inertia tensor calculated in CM coordinates, and J_{ij} is the tensor in the displaced coordinates, then: J_{ij} = I_{ij} + M(a^2 \delta_{ij} - a_i a_j). Let’s try on the cube! Recall that the CM inertia tensor was \overset{\leftrightarrow}{I} = Mb^2 \left( \begin{array}{ccc} 1/6 & 0 & 0 \\ 0 & 1/6 & 0 \\ 0 & 0 & 1/6 \end{array} \right) If instead we want the tensor about one corner of the cube, the displacement vector is \vec{a} = (b/2, b/2, b/2), so a^2 = (3/4) b^2. We can construct the difference as a matrix: the on-diagonal components are M \left[\frac{3}{4} B^2 - \left(\frac{1}{2} b\right)\left(\frac{1}{2} b\right) \right] = \frac{1}{2} M b^2 and off-diagonal, M \left[-\left(\frac{1}{2} b\right)\left(\frac{1}{2} b\right) \right] = -\frac{1}{4} M b^2 so the shifted inertia tensor is \overset{\leftrightarrow}{J} = Mb^2 \left( \begin{array}{ccc} 1/6 & 0 & 0 \\ 0 & 1/6 & 0 \\ 0 & 0 & 1/6 \end{array} \right) + Mb^2 \left( \begin{array}{ccc} 1/2 & -1/4 & -1/4 \\ -1/4 & 1/2 & -1/4 \\ -1/4 & -1/4 & 1/2 \end{array} \right) \\ = Mb^2 \left( \begin{array}{ccc} 2/3 & -1/4 & -1/4 \\ -1/4 & 2/3 & -1/4 \\ -1/4 & -1/4 & 2/3 \end{array} \right) exactly matching the result from calculating it the direct way (we’ll do this as an example.)

12.6 Shortcuts for inertia tensors

Here we’ll add a couple of simple tricks to our toolkit for calculating and/or estimating moments of inertia for certain types of objects.

12.6.1 Compound objects

One other important aside to note: moments of inertia for a compound object add together. This is easy to see from the definition: suppose we have a compound object that we can write as the sum of two densities, \rho = \rho_a + \rho_b. Then the total moment of inertia is I_{ij} = \int dV [\rho_a + \rho_b] (r^2 \delta_{ij} - r_i r_j) = \int dV \rho_a (...) + \int dV \rho_b (...). So if you can split something into a small number of simple shapes, you can still easily compute its inertia tensor. The canonical example is the planet Saturn:

This trick works just as well with negative densities, too. For example, a hollow sphere can be treated by finding the inertia tensor for a large sphere, and subtracting the inertia tensor of a smaller sphere from it.

12.6.2 Exchange symmetry

Consider the general formula for the inertia tensor: I_{ij} = \int\ dV\ \rho(x,y,z) (r^2 \delta_{ij} - r_i r_j) For an object with uniform density, the various components are related by swapping the coordinate labels (x,y,z) with each other. For example, I_{xx} = r^2 - x^2 = y^2 + z^2, while I_{yy} = x^2 + z^2 - exchanging x with y takes one to the other.

The implication is that if an object looks the same along different directions, then some of its moments of inertia will be equal. The extreme example is a (constant density) sphere rotating about its geometric center:

Sphere rotating about its geometric center.

Clearly, for the sphere we can exchange any of the axis labels (x,y,z) with each other and the result will be the same. The result is that all of the sphere’s moments of inertia are equal: I_{xx} = I_{yy} = I_{zz}. (Thus, unsurprisingly, the sphere rotating about its center is a spherical top.)

Sphere rotating about a point on its edge.

What about the inertia tensor for rotation about the a point on the edge of the sphere, instead? Now there is a clear difference between the z-direction and the other two. However, the x and y directions are clearly symmetric - we can swap one for the other and get the same geometry back. Thus, we have I_{xx} = I_{yy}, but I_{zz} will be different. For this sort of rotation, a sphere is only a symmetric top.

We’ve only discussed diagonal moments of inertia; in this case the off-diagonal moments are all zero due to a different symmetry, reflection (which we’ll talk about next.)

Because we can think of exchanging two coordinate axes with each other can be done with a rotation by \pi/2, these equalities can be thought of as arising from rotational symmetry of our object. We assumed constant density, but as long as the density of the object is also rotationally symmetric in this way, all of the observations we’ve made here still apply. However, exchange symmetry is a little more general; there are plenty of objects that are not invariant under an arbitrary rotation, but are under a specific rotation by \pi/2. (A cube is a good example.)

12.6.3 Reflection symmetry

Even for relatively complicated objects, when we calculate the inertia tensor about certain points, we will find important simplifications. Consider the general formula for the inertia tensor: I_{ij} = \int\ dV\ \rho(x,y,z) (r^2 \delta_{ij} - r_i r_j) Let’s focus specifically on an off-diagonal component, say I_{xy}: I_{xy} = - \int\ dV\ \rho(x,y,z)\ xy. Suppose that our object has a reflection symmetry, which means that \rho(-x,y,z) = \rho(x,y,z) and our integral is symmetric from -L_x to +L_x. We can split into two equal and opposite integrals, from -L_x to 0 and 0 to +L_x - opposite because the x inside will change sign. Thus, reflection symmetry gives us I_{xy} = 0 - no matter how complicated the object’s geometry or density are!

Now, there are three off-diagonal moments; reflection symmetry in one direction will get rid of two of them (it should be obvious that I_{xz} = 0 as well in our example above.) If our object has reflection symmetry about any two axes, then all three off-diagonal moments will vanish.

Let’s try this with a relatively complicated object that you don’t want to directly calculate \mathbf{I} for: a tennis racket.

There is one obvious reflection symmetry from the diagram: reflection in the x-direction. This will eliminate I_{xy} and I_{xz} immediately. Although it’s less obvious from the diagram, the tennis racket has the same shape along the z-axis, which means it is also symmetric under reflection in z. This eliminates the remaining off-diagonal moment I_{yz}, leaving a diagonal inertia tensor:

\mathbf{I}_{\rm racket} = \left(\begin{array}{ccc} I_{1} & 0 & 0 \\ ...&I_{2}&0 \\ ...&...&I_{3} \end{array} \right)

The racket is definitely not symmetric along the y-axis, of course, but the point is that it doesn’t matter - we only need two out of three reflection symmetries to eliminate all off-diagonal moments (and skip having to calculate them explicitly, or work out eigenvectors.)

Once we know that the inertia tensor is diagonal, as we discussed last time, the size of the moments of inertia will be proportional to the cross-sectional area of the object along each axis:

Principal axes for a tennis racket, an example of an asymmetric top.

Clearly we have I_{zz} > I_{xx} > I_{yy}, so the racket is an asymmetric top as expected. Moreover, we can see that motion around the x-axis will be unstable and exhibit tumbling.

12.7 Laminar objects

An interesting class of objects are laminar objects, which are objects with almost no thickness in one direction, for example a flat and very thin sheet of metal. All of our formulas so far have been three-dimensional, so it’s not so obvious what to do when one dimension nearly vanishes.

Suppose we take the z-direction to be the one along which our object is very thin. Let’s think about the total mass of the object first: M = \int dV \rho(x,y,z) = \int_{-\epsilon}^{\epsilon} dz \int dA \rho(x,y,z), where dA = dx dy. Suppose that we do the area integral first, which leaves us with M = \int_{-\epsilon}^{\epsilon} dz\ f(z). Whatever the function f(z) looks like, we know that the total mass M has to be finite as \epsilon \rightarrow 0. But this means that as we make \epsilon smaller and smaller, more of the area under the function f(z) has to be accumulating in a tiny region around z=0:

The limit of this squeezing as \epsilon \rightarrow 0 actually gives us a special function known as the Dirac delta function, \delta(z).

Dirac delta function

The Dirac delta function \delta(z) is equal to zero everywhere except z=0, where it’s infinite, and obeys the sifting property \int_{-\infty}^\infty dz\ f(z) \delta(z-a) = f(a).

This may not seem perfectly well-defined to you (and technically, it’s not even a function but a “distribution”), but as long as it’s inside an integral, we will find sensible results. You can think of the delta function as an object which picks out a certain value of a function from underneath an integral. We have to be integrating over the region with the delta-function spike for this to work; if we integrate from -1 to 1 over \delta(z-2), for example, then we’ll just get zero.

We can thus write the density of our infinitesimally thin laminar object as \rho(x,y,z) = \sigma(x,y) \delta(z), assuming the pivot point is at z=0; this guarantees that we get a finite total mass, no matter what \sigma(x,y) looks like. Here \sigma is standard notation for an area mass density, with units of mass/length {}^2.

Notice, by the way, that the delta functions all have inverse units of whatever is inside them, in this case length. You can see this by noticing that \int dx\ \delta(x) = 1, where the units on the left-hand side are length (dx) and 1/length (\delta(x)).

There’s one immediate simplification that happens for a laminar object: any off-diagonal moments involving the direction perpendicular to the flat surface are zero. For example, in our example using the z-axis, we see that I_{xz} = \int\ dV\ \sigma(x,y) \delta(z)\ xz and the z-integral just gives us zero. Similarly, I_{yz} = 0. However, we’re not guaranteed that we have a symmetric top: as long as the area density \sigma(x,y) is not constant, in general we will find three different diagonal moments of inertia. The one general statement we can make is that the axis perpendicular to the plane of the lamina is always a principal axis, because of the vanishing off-diagonal components.

12.7.1 Example: inertia tensor of a triangular plate

Let’s go through the example of studying a simple laminar object. We take a flat, 45^{\circ} triangular plate of uniform density and total mass M. The density is thus \sigma = M/A = \frac{2M}{a^2}.

We take the z-axis to be perpendicular to the triangle, so \rho = \sigma \delta(z). Since this is a laminar object, we know immediately that I_{xz} = I_{yz} = 0. We can further see that there is symmetry under the exchange x \leftrightarrow y, so we expect I_{xx} = I_{yy}. That leaves us with three integrals to do, using our general formula, I_{ij} = \int\ dV\ \rho(x,y,z) (r^2 \delta_{ij} - r_i r_j). Let’s go through them. To set them up, note that the hypotenuse of the triangle corresponds to the line y = a-x: I_{xx} = \int dV \rho(x,y,z) (y^2 + z^2) \\ = \int_0^a dx \int_0^{a-x} dy \int dz \frac{2M}{a^2} \delta(z) (y^2 + z^2) \\ = \frac{2M}{a^2} \int_0^a dx \left. \frac{1}{3} y^3 \right|_0^{a-x} \\ = \frac{2M}{3a^2} \int_0^a dx (a-x)^3 \\ = \frac{2M}{3a^2} \int_a^0 (-du) u^3 \\ = \frac{2M}{3a^2} \frac{1}{4} a^4 = \frac{1}{6} Ma^2, where I substituted u = a-x to make the last integral easy. Skipping some of the details, for the other two integrals we find I_{zz} = \frac{2M}{a^2} \int_0^a dx \int_0^{a-x} dy\ (x^2 + y^2) \\ = \frac{2M}{a^2} \int_0^a dx \left( x^2 (a-x) + \frac{1}{3} (a-x)^3 \right) = \frac{1}{3} Ma^2 and I_{xy} = \frac{2M}{a^2} \int_0^a dx \int_0^{a-x} dy\ (-xy) \\ = -\frac{2M}{a^2} \int_0^a dx \frac{1}{2} x(a-x)^2 = - \frac{1}{12} Ma^2. Putting everything together, the inertia tensor takes the form \mathbf{I} = \frac{1}{12} Ma^2 \left( \begin{array}{ccc} 2 & -1 & 0 \\ -1 & 2 & 0 \\ 0 & 0 & 4 \end{array}\right).

Let’s keep going and find the principal axes and principal moments. We know before we start that the z-axis is principal, since there are no off-diagonal z components. In terms of the eigenvalue problem, this means that the determinant will factorize nicely: \det (\mathbf{I} - \lambda \mathbf{1}) \propto \left| \begin{array}{ccc} 2-\lambda & -1 & 0 \\ -1 & 2-\lambda & 0 \\ 0 & 0 & 4-\lambda \end{array}\right| = 0 \\ (4-\lambda) ((2-\lambda)^2 - 1) = 0 (the \propto symbol means “proportional to”, and is a reminder to put back in the prefactor Ma^2 / 12 at the end.) So one solution is \lambda = 4, the other two come from the quadratic equation \lambda^2 - 4\lambda + 3 = 0, which gives \lambda = 1,3. Thus, putting the prefactor back in, the principal moments are \lambda_i = \left( \frac{Ma^2}{12}, \frac{Ma^2}{4}, \frac{Ma^2}{3} \right). Finally, to find the principal axes we solve (\mathbf{I} - \lambda_i \mathbf{1}) \vec{v}_i = 0. Let’s start with the smallest moment \lambda_1. Since we’re just looking for a direction, now we can ignore the prefactor entirely: \left( \begin{array}{ccc} 1 & -1 & 0 \\ -1 & 1 & 0 \\ 0 & 0 & 3 \end{array} \right) \left( \begin{array}{c} v_x \\ v_y \\ v_z \end{array} \right) = 0 We can read off v_z = 0 and v_x = v_y. Thus, normalizing to get a unit vector, we find the first principal axis corresponding to \lambda_1 = 1 is \hat{e}_1 = \frac{1}{\sqrt{2}} \left( \begin{array}{c} 1 \\ 1 \\ 0 \end{array} \right). I leave it as an exercise for you to show that the second axis is \hat{e}_2 = \frac{1}{\sqrt{2}} \left( \begin{array}{c} -1 \\ 1 \\ 0 \end{array} \right), and the third is just the \hat{z} axis, i.e. \hat{e}_3 = (0,0,1).

We could have guessed these principal axes from the sketch, in fact! Notice that there is a reflection symmetry about the \vec{v}_1 direction, which means that if we take \vec{v}_1 and \hat{z} as axes, we must have a diagonal inertia matrix. (The third direction \vec{v}_2 is then required by orthogonality of the three principal axes.)

12.8 Matrix algebra: a primer

Matrices are everywhere in physics, much like vectors. In fact, matrices show up so often exactly because vectors are so important in the description of physical laws. The product of a matrix with a vector is another vector, \vec{v'} = \overset{\leftrightarrow}{O} \vec{v}, so you can think of a matrix as an operator, acting on the space of vectors v: it’s a machine that takes a vector and input and gives you another one as output.

For \vec{v} and \vec{v'} in an n-dimensional vector space (for our purposes, usually the familiar three-dimensional space), the matrix \overset{\leftrightarrow}{O} is an n \times n square matrix.

Simple example to see how this works! Consider a plane with a mirror across it at a 45^\circ angle, i.e. sitting on the line y=-x. If we draw a vector in the top right, there will be a mirror image in the bottom left. The mirror image is a transformed version of the original vector.

What does the mirror image look like? The image vector should have the same length and make the same angle with the mirror. If we look at the unit vectors, then we see that \hat{x} \rightarrow -\hat{y}, \\ \hat{y} \rightarrow -\hat{x}. Mirror reflection of unit vectors.

What about an arbitrary vector \vec{v}? Well, it turns out that we’re already done, because \overset{\leftrightarrow}{M} is a special type of operator called a linear operator. Linear operators satisfy the following two conditions:

Applying the operator to a vector rescaled by a constant, c\vec{v}, gives us back the operator applied to the original vector, and then rescaled: \overset{\leftrightarrow}{O} (c \vec{v}) = c \left(\overset{\leftrightarrow}{O} \vec{v} \right);
Applying the operator to a sum of two vectors \vec{v_1} + \vec{v_2} is equivalent to applying it to both vectors and then adding them: \overset{\leftrightarrow}{O} (\vec{v_1} + \vec{v_2}) = \overset{\leftrightarrow}{O} \vec{v_1} + \overset{\leftrightarrow}{O} \vec{v_2}. In fact, any operator which is described by a matrix automatically has these properties. (Non-linear operators do exist, but are rare, and we certainly won’t see any this semester.)

Thanks to linearity, we thus know how \overset{\leftrightarrow}{M} acts on any vector: it takes (v_x, v_y) into (-v_y, -v_x). We can thus write the operation of reflecting about the line y=-x as, \overset{\leftrightarrow}{M} = \left( \begin{array}{cc} 0&-1\\-1&0\end{array} \right). For example, the mirror image of the vector (1,2) is \left( \begin{array}{cc} 0&-1\\-1&0\end{array} \right) \left( \begin{array}{c} 1\\2 \end{array} \right) = \left( \begin{array}{c} -2\\-1\end{array} \right) Matrices show up very often in physics, especially as operators. Another example is rotation: rotating a vector about the origin by an angle \alpha can be written as a rotation matrix. We can easily find the matrix form once again by looking at what happens to the unit vectors:

Rotation of unit vectors. \hat{x} \rightarrow \cos \alpha \hat{x} + \sin \alpha \hat{y} \\ \hat{y} \rightarrow -\sin \alpha \hat{x} + \cos \alpha \hat{y} so we can immediately write down \overset{\leftrightarrow}{R}(\alpha) = \left( \begin{array}{cc} \cos \alpha&\sin \alpha\\ -\sin \alpha & \cos \alpha \end{array} \right). Other examples involve more complicated relations, and can change units for example. For example, we know that for rotation of an object about a single axis, its angular momentum L is L = I \omega, where I is the moment of inertia about that axis, and \omega is the angular velocity. But what if we can’t just work with a single fixed axis? If you throw a book in the air, it will tumble in a complicated way even though no forces are acting on it in mid-air; since \vec{L} is fixed, but apparently \vec{\omega} is not, the only explanation is that I is really an operator, the inertia operator, and the more general equation is \vec{L} = \overset{\leftrightarrow}{I} \vec{\omega}. We will explore this equation in detail when we come to the rotation of rigid bodies shortly.

12.8.1 Some matrix basics

The components of a matrix are written out with respect to a basis, which is just a set of unit vectors which can be combined to reach any point in the n-dimensional space. Remember that if you change coordinates, any matrix quantities change too!

I will drop the arrow notation for now, since we’re going to be writing a lot of matrices down and it shouldn’t be confusing in context. In linear algebra, matrices are usually denoted by capital letters A, B, C, ..., while vectors and scalars are lower-case letters.

Some basic rules of (square) matrix manipulation, which I’ll go through fast because you should know them:

Matrix equality

Two matrices are equal if and only if they have the same dimensions, and every component is equal. So \left( \begin{array}{cc} x&r\\ y&s \end{array} \right) = \left( \begin{array}{cc} 2&1 \\ 3&-5\end{array} \right) implies the four equations x = 2, y = 3, r = 1, s = -5. This is foreshadowing the other use of matrices which we will encounter, which is to compactly write a system of equations as a single matrix equation. For example, if we have two masses connected by springs to each other and to a wall on either side, the equations of motion are m\ddot{x_1} = -(k_1 + k_2) x_1 + k_2 x_2 \\ m\ddot{x_2} = k_2 x_1 - (k_2 + k_3) x_2 which can be written in in the much more compact form m \left( \begin{array}{c} \ddot{x_1} \\ \ddot{x_2} \end{array} \right) = - \left( \begin{array}{cc} k_1 + k_2 & -k_2 \\ -k_2 & k_2 + k_3 \end{array} \right) \left( \begin{array}{c} x_1 \\ x_2 \end{array} \right) or simply \ddot{\vec{x}} = -\frac{1}{m} \overset{\leftrightarrow}{K} \vec{x}.

The application of matrix algebra will allow us to quickly and generally find solutions to these systems.

Index notation

You know that we can write the components of a vector in terms of an index, which iterates over the n coordinates in our vector space: for a Cartesian vector, we write v_i, where i = 1,2,3 (or x,y,z.)

We can write the components of a matrix out in exactly the same way, but for a matrix we need a pair of indices: one to specify the row, one for the column. By convention, the first index is the row index. A = \left( \begin{array}{ccc} 2&1&6 \\ 3&-5&0 \\ 1&-4&9 \end{array} \right) \Rightarrow A_{21} = 3, A_{13} = 6, A_{33} = 9, ... 2. Multiplying by a number

To multiply a matrix by a number c, we just multiply every component by c. 2 \left( \begin{array}{cc} 2&1 \\ 3&-5\end{array} \right) = \left( \begin{array}{cc} 4&2 \\ 6&-10\end{array} \right) 3. Addition of matrices

Matrices add (or subtract) component by component, just like vectors do. \left( \begin{array}{cc} 2&1 \\ 3&-5\end{array} \right) + \left( \begin{array}{cc} 1&0 \\ -2&3\end{array} \right) = \left( \begin{array}{cc} 3&1 \\ 1&-2\end{array} \right). 4. Matrix-vector multiplication

When a vector is to the right of a matrix, we write it as a column and multiply by the rows of the matrix, \left( \begin{array}{cc} 2&1 \\ 3&-5\end{array} \right) \left( \begin{array}{c} 2\\ -1\end{array} \right) = \left( \begin{array}{c} 2(2) + 1(-1) \\ 2(3) + (-1)(-5) \end{array} \right) = \left( \begin{array}{c} 3 \\ 11 \end{array} \right). On the other hand, a vector to the left of a matrix is written as a row and multiplies the columns: \left(2\ \ \ -1\right) \left( \begin{array}{cc} 2&1 \\ 3&-5\end{array} \right) = \left(2(2) + (-1)(3)\ \ \ 2(1) + (-1)(-5) \right) \\ = \left(1\ \ \ 7\right). Notice that the answer depends on which side the vector is on! In index notation, we can expand out: if \vec{w} = A\vec{v}, then w_i = \sum_j A_{ij} v_j while for left multiplication \vec{w}'' = \vec{v} A, w_i' = \sum_j v_j A_{ji}. 5. Matrix-matrix multiplication

The rule for multiplying two matrices together is a generalization of the above: we multiply the columns of the right-hand matrix by the rows of the left-hand matrix, one by one. If C = AB, then the element C_{ij} in row i, column j of the product matrix C is given by row i of matrix A, times column j of matrix B, i.e. C_{ij} = \sum_{k} A_{ik} B_{kj}. If you take the columns of B from left to right and multiply by rows of A, you will build up C column by column, starting at the left. \left( \begin{array}{cc} 2&1 \\ 3&-5\end{array} \right) \left( \begin{array}{cc} 3&-1 \\ -4&2\end{array} \right) = \left( \begin{array}{cc} 2&0 \\ 29&-13\end{array} \right). Once again, order of operations matters: AB \neq BA! In fact, we define an object called the commutator as the difference, [A,B] \equiv AB - BA. The opposite of this, which you’ll run into less frequently, is the anti-commutator \{A, B\} \equiv AB + BA. We can now also define a power of a matrix: A^n is just the matrix A multiplied by itself n times. \left( \begin{array}{cc} 2&1 \\ 3&-5\end{array} \right)^2 = \left( \begin{array}{cc} 2&1 \\ 3&-5\end{array} \right) \left( \begin{array}{cc} 2&1 \\ 3&-5\end{array} \right) = \left( \begin{array}{cc} 7&-3 \\ -9&28\end{array} \right) 6. Identity and zero matrices

The identity matrix, 1, has all 1’s on the diagonal, and 0 elsewhere; the zero matrix, 0, has all zero entires. You’ll sometimes see the identity in n dimensions written as 1_n. 1_3 = \left( \begin{array}{ccc} 1&0&0\\ 0&1&0\\ 0&0&1\end{array} \right) Adding the zero matrix, or multiplying by the identity matrix, just gives you back the same matrix. A + 0 = A \\ 1A = A1 = A 7. Trace and determinant

Two operations which are special to matrices are the trace and the determinant. The trace of a matrix is just the sum of all the diagonal elements: \textrm{tr}(A) = \sum_i A_{ii}. An example: \textrm{tr} \left( \begin{array}{cc} 2&1 \\ 3&-5\end{array} \right) = 2-5 = -3. The trace is distributive over sums of matrices, i.e. \textrm{tr} (A+B) = \textrm{tr} (A) + \textrm{tr} (B), Trace is also invariant under cyclic permutation, i.e. \textrm{tr} (ABC) = \textrm{tr} (BCA) = \textrm{tr} (CAB). This is easy to see if you write everthing out in terms of indices, for instance \textrm{tr} (AB) = \sum_i \sum_j A_{ij} B_{ji} = \sum_i \sum_j B_{ji} A_{ij}. The determinant is a very useful function of a matrix, but is a little harder to define. For any two-by-two matrix, the determinant (which is written with straight lines) is equal to \det \left( \begin{array}{cc} a&b\\c&d \end{array} \right) = \left| \begin{array}{cc} a&b\\c&d \end{array} \right| = ad - bc. There are a few ways to calculate determinants of larger matrices, but the easiest to remember is the technique of expansion in minors. The algorithm is as follows:

Pick any single row or column of the matrix - usually the top row, but you will get the same answer no matter what!
For each element of that row or column, the corresponding minor is equal to the determinant of a smaller matrix, namely the original matrix with the row and column of our chosen element removed.
The determinant is equal to the sum of elements times minors, including an additional minus sign whenever the sum of row and column indices of the element are odd. (This last requirement means that we add or subtract element times minor according to a checkerboard pattern, e.g. for a 4x4 matrix:) \left| \begin{array}{cccc} +&-&+&- \\ -&+&-&+ \\ +&-&+&- \\ -&+&-&+ \end{array} \right|. Let’s do an example! \left| \begin{array}{ccc} -2&1&6 \\ 3&-5&0 \\ 1&-4&9 \end{array} \right| = -2 \left| \begin{array}{cc} -5&0\\ -4&9\end{array} \right| - 1 \left| \begin{array}{cc} 3&0\\ 1&9\end{array} \right| + 6 \left| \begin{array}{cc} 3&-5\\ 1&-4\end{array} \right| \\ = -2 (-45 - 0) -1 (27 - 0) + 6 (-12+5) = 21. This is easy to do by hand for 3x3 matrices, and reasonable for 4x4; employing something like Mathematica is probably a better option if you run into larger matrices than that.

One very useful fact: the determinant is distributive over products of matrices, \det (AB) = \det A \det B = \det(BA). 8. Matrix inverse

The inverse of a matrix A is the matrix which when multiplied by A gives the identity matrix, A A^{-1} = A^{-1} A = \mathbf{1}. The inverse of a product of matrices is equal to the product of the inverses, but in reverse order: (ABCD)^{-1} = D^{-1} C^{-1} B^{-1} A^{-1}. You can see this easily by applying those four matrices on the left or right in turn to the product ABCD.

Not every matrix has an inverse! If A^{-1} does exist, then A is said to be invertible; if there is no A^{-1}, then A is said to be singular. The determinant provides a very useful test for singular matrices! We can prove this quickly: take the determinant of both sides of the definition above, \det(A A^{-1}) = \det(\mathbf{1}) \\ \det(A) \det(A^{-1}) = 1 which tells us that \det(A^{-1}) = \frac{1}{\det(A)}. But this relation is only possible if \det(A) \neq 0! So any matrix with \det(A) = 0 must be singular. This doesn’t prove that every square matrix with non-zero determinant is invertible, of course, but that happens to be true as well.

Transpose and orthogonality

The transpose of a matrix (denoted A^T) is obtained by swapping the rows and columns, or equivalently, reflecting all of the entries across the diagonal. \left( \begin{array}{cc} 2&1 \\ 3&-5\end{array} \right)^T = \left( \begin{array}{cc} 2&3 \\ 1&-5\end{array} \right). Like the inverse, the transpose acts on a product of matrices by giving the same product in reverse order, (AB)^T = B^T A^T. If a matrix is equal to its own transpose, A = A^T, then it is said to be a symmetric matrix. On the other hand, if the transpose of a matrix is also its inverse, A^T = A^{-1}, then it is said to be an orthogonal matrix. Orthogonal matrices have a number of special properties, and they occur often in physics applications. In fact, you can check that the reflection matrix and the rotation matrix we considered above are both orthogonal matrices.

Orthogonal matrices have a number of special properties, but one of the most important is that they are length-preserving: an operator described by an orthogonal matrix doesn’t change the length of any vector.

12.8.2 Eigenvalues and eigenvectors

Let’s go back to the mirror operation from above. Are there any vectors which the reflection operation leaves unchanged?

The mirror reflection operation can be represented as the matrix \mathbf{M} = \left( \begin{array}{cc} 0&-1\\-1&0\end{array} \right). We can identify the two vectors \vec{v}_1 = \hat{x} - \hat{y} and \vec{v}_2 = \hat{x} + \hat{y} which are mapped to themselves times a constant, i.e. they represent solutions to the equation M \vec{v_i} = \lambda_i \vec{v_i}. The vectors solving this equation, the ones which the matrix doesn’t change the direction of, are called eigenvectors of M. The rescaling factor \lambda_i associated with each eigenvector is called its eigenvalue. So the reflection matrix has two eigenvalues: \lambda_1 = +1, \lambda_2 = -1.

What is the point of finding the directions which our reflection operation leaves unchanged? Notice that for any vector in the space, we could have split it into components along the two vectors \vec{v}_1, \vec{v}_2. The reflection operation is then particularly simple: we do nothing to the \vec{v}_1 component, and multiply the \vec{v}_2 component by -1. In other words, in a coordinate system defined by the eigenvectors, the matrix is diagonal, and its entries are the associated eigenvalues.

Often in physics, you have seen that picking a “clever” set of coordinates for a particular problem allows you to easily solve for the motion; for example, on the homework you solved for the motion of two masses connected by a spring, and found that changing coordinates to the center of mass and the separation distance of the two masses made the problem much simpler. The great power of the eigenvector and eigenvalue approach is that it doesn’t require a clever guess; we will now have an algorithm to decide what the physically important directions are, and to tell us how to change coordinates accordingly!

For the reflection matrix, it was easy to see what the eigenvectors were. But how do we solve for them more generally? Remember that we’re solving the equation A \vec{v} = \lambda \vec{v} for some matrix A. We can rewrite the right-hand side also as a matrix-vector product by using the identity: \lambda \vec{v} = \lambda I \vec{v}. Then subtracting, we have (A - \lambda I) \vec{v} = 0. This equation is always true if \vec{v} = 0, but this is the trivial solution - we’re interested in solutions involving non-zero vectors. It is a fundamental result of linear algebra (which I won’t prove) that the equation M \vec{v} = 0 has non-trivial solutions if, and only if, \det(M) = 0. This means that if \lambda is an eigenvalue of A, then it must satisfy the equation \det (A - \lambda I) = 0. If A is an nxn matrix, this equation will be an n-th order polynomial in \lambda, known as the characteristic polynomial; its solutions are the eigenvalues of A.

Let’s do an example! Suppose we’re given the matrix A = \left( \begin{array}{cc} 5&-2 \\ -2&2 \end{array} \right). The characteristic polynomial is then |A - \lambda I| = \left| \begin{array}{cc} 5-\lambda&-2 \\ -2&2-\lambda \end{array} \right| \\ = (5-\lambda) (2-\lambda) - 4. Solving for the roots: (5-\lambda) (2-\lambda) - 4 = 0 \\ \lambda^2 - 7 \lambda + 10 - 4 = 0 \\ \lambda = \frac{1}{2} (7 \pm \sqrt{49 - 24}) \\ = \frac{1}{2} (7 \pm 5) \\ = 1, 6. Once we have the eigenvalues, it’s straightforward to find the eigenvectors: just plug in to the equation A \vec{v} = \lambda \vec{v} and solve. Let’s try it! First, \lambda_1 = 1: \left( \begin{array}{cc} 5&-2 \\ -2&2\end{array} \right) \left( \begin{array}{c} x_1 \\ y_1 \end{array} \right) = 1 \left( \begin{array}{c} x_1 \\ y_1 \end{array} \right) \\ \left( \begin{array}{c} 5x_1- 2y_1 \\ -2x_1 + 2y_1 \end{array} \right) = \left( \begin{array}{c} x_1 \\ y_1 \end{array} \right) \\ \left( \begin{array}{c} 4x_1 - 2y_1 \\ -2x_1 + y_1 \end{array} \right) = \left( \begin{array}{c} 0 \\ 0 \end{array} \right) Both of these equations are satisfied so long as 2x_1 = y_1. This ambiguity shouldn’t surprise you; since the condition on an eigenvector is just that its direction doesn’t change when the matrix is applied, the length of the vector is arbitrary. It is usually convenient to normalize the eigenvectors as unit vectors, so let’s do that: in addition to 2x_1 = y_1, we have x_1^2 + y_1^2 = 5x_1^2 = 1, so the unit eigenvector is \hat{v_1} = \frac{1}{\sqrt{5}} \left( \begin{array}{c} 1 \\ 2 \end{array} \right). We can do the same exercise for the other eigenvalue \lambda_2 = 6, and we will find \hat{v_2} = \frac{1}{\sqrt{5}} \left( \begin{array}{c} -2 \\ 1 \end{array} \right). Notice that our eigenvectors are orthogonal to each other; that is, \hat{v_1} \cdot \hat{v_2} = 0. This is not a coincidence; it is a consequence of the fact that our original matrix A was symmetric. Any real, symmetric matrix has only real (non-complex) eigenvalues, and orthogonal eigenvectors. (You can use this as a trick to find the last eigenvector, if you’ve already found the first n-1 for an n \times n matrix.)

We haven’t talked very much about changes of basis and matrices, but I will just state without proof that for a symmetric matrix, as long as we have n independent eigenvectors, then we can always change coordinates to use the unit eigenvectors as our basis vectors, and in that basis the matrix becomes diagonal: A \rightarrow \left( \begin{array}{cccc} \lambda_1&&&\\ &\lambda_2&& \\ &&...& \\ &&&\lambda_n \end{array} \right) Note that the order of the eigenvalues matters, and corresponds to the order of the eigenvectors.

Some facts satisfied by the determinant and trace: the determinant of a matrix is equal to the product of its eigenvalues, and the trace is equal to the sum. \det A = \lambda_1 \lambda_2 ... \lambda_n \\ \textrm{tr} A = \lambda_1 + \lambda_2 + ... + \lambda_n. We haven’t talked very much about changes of basis, but these both follow from the fact that changing to the basis specified by the eigenvectors is a “similarity transformation”, which leaves the eigenvalues unchanged. It’s easy to see that in the basis of the eigenvectors, where A becomes diagonal, that the trace and determinant are exactly given by these formulas.

There are two special cases that can happen as we go through this procedure. The first is the occurrence of degenerate eigenvalues, that is, the same eigenvalue appearing more than once as a solution to the characteristic polynomial. For example, the matrix \left(\begin{array}{ccc} 0&1&1\\ 1&0&1\\ 1&1&0 \end{array} \right) has characteristic polynomial -\lambda^3 + 2 + 3\lambda = -(\lambda - 2) (\lambda+1)^2 so there is a pair of -1 eigenvalues, and a single 2. If we try to solve for the -1 eigenvector, we will find the equation \left(\begin{array}{ccc} 1&1&1\\ 1&1&1\\ 1&1&1 \end{array} \right) \left( \begin{array}{c} x \\ y \\ z \end{array} \right) = 0. This is one equation for three unknowns; the equation defines a plane in our three-dimensional space. So the eigenspace associated with the paired eigenvalue has dimension 2. We can still construct a set of orthogonal eigenvectors, but we have freedom to choose any pair that lies in the plane defined by the equation x+y+z = 0.

12.9 Principal axes and principal moments

So far, the only objects we have seen with diagonal inertia tensors have been highly symmetric — a cube about its center, for example. For a less symmetric object, or even a cube about its corner, \mathbf{I} has off-diagonal entries, which means \vec{L} and \vec{\omega} are not in general parallel. But we can ask: for any given object, is there a clever choice of coordinates in which \mathbf{I} becomes diagonal? The answer turns out to be yes, and the special directions that accomplish this are the principal axes of the body.

Although in general the inertia matrix \overset{\leftrightarrow}{I} can give \vec{L} oriented in a different direction from \vec{\omega}, in some cases we have seen that the two vectors do point in the same direction. If we can identify that \vec{L} = \lambda \vec{\omega}, for some constant \lambda and some choice of \vec{\omega}, then we say that \vec{\omega} points along a principal axis of the corresponding rigid body, and the factor \lambda is said to be the corresponding principal moment.

So the eigenvectors of \overset{\leftrightarrow}{I} for any object point in a direction along which \vec{L} is proportional to \vec{\omega}. In coordinates described by the eigenvectors, then, the inertia matrix is diagonal!

Because the inertia tensor is a real, symmetric matrix, there will always be three perpendicular principal axes that we can find. This is worth stating clearly: for any rigid body and any point O, there are three mutually perpendicular principal axes passing through O, such that the inertia tensor of the body along these axes is diagonal.

The principal axes automatically give us a “clever” choice of coordinates for the inertia tensor: when we use the principal axes as our coordinate directions, we always have a diagonal inertia tensor, and the equation of motion can be written in a simplified form.

12.9.1 Example: principal axes for a cube rotating about a corner

Let’s have a look at finding the principal moments and principal axes. We’ll go back to the cube rotating about its corner, which we found to have inertia tensor equal to \overset\leftrightarrow I = Mb^2 \left( \begin{array}{ccc} \frac{2}{3} & -\frac{1}{4} & -\frac{1}{4} \\ -\frac{1}{4} & \frac{2}{3} & -\frac{1}{4} \\ -\frac{1}{4} & -\frac{1}{4} & \frac{2}{3} \end{array} \right) = \mu \left(\begin{array}{ccc} 8 & -3 & -3 \\ -3 & 8 & -3 \\ -3 & -3 & 8 \end{array} \right) where I’ve introduced \mu \equiv Mb^2 / 12 to save some space. To find the principal axes, we must start by finding the eigenvalues of this matrix, which are solutions to the characteristic equation \det(\overset{\leftrightarrow}{I} - \lambda \mathbf{1}) = 0. where \mathbf{1} is the identity matrix (to avoid using the notation I twice.) In other words, \left| \begin{array}{ccc} 8\mu - \lambda & -3\mu & -3\mu \\ -3\mu & 8\mu - \lambda & -3\mu \\ -3\mu & -3\mu & 8\mu - \lambda \end{array} \right| = 0. The book says this determinant is “straightforward to evaluate”; that only happens to be true if you think solving cubic equations by hand is straightforward. We can do it by hand, or with Mathematica; either way, we will arrive at the factorized characteristic polynomial: (2\mu - \lambda) (11\mu - \lambda)^2 = 0.

Solving the cubic by hand

This isn’t a foolproof way to solve cubics, but it shows a nice way to guess at a solution. Expanding in minors, (8\mu - \lambda) [(8\mu - \lambda)^2 - 9\mu^2] + 3\mu [-3\mu (8\mu - \lambda) - 9\mu^2] \\ - 3\mu [9\mu^2 + 3\mu (8\mu - \lambda)] = 0 \\ Gathering terms, (8\mu - \lambda)^3 - 27\mu^2 (8\mu - \lambda) - 54 \mu^3 = 0 \\ (512 - 216 - 54) \mu^3 + 24 \mu \lambda^2 + (192-27) \mu^2 \lambda - \lambda^3 = 0 \\ 242 \mu^3 + 24 \mu \lambda^2 - 165 \mu^2 \lambda - \lambda^3 = 0. where as you will remember, (x+y)^3 = x^3 + 3x^2 y + 3xy^2 + y^3. We can factorize this by assuming the form (a-\lambda)(b-\lambda)(c-\lambda), which expands out to (a-\lambda)(bc - \lambda (b+c) + \lambda^2) = abc - a(b+c) \lambda \\ + a\lambda^2 -bc \lambda + \lambda^2 (b+c) - \lambda^3 \\ = -\lambda^3 + \lambda^2 (a+b+c) - \lambda (ab + ac + bc) + abc. so for our determinant, a+b+c = 24\mu \\ ab + ac + bc = 165\mu^2 \\ abc = 242 \mu^3. There’s not a really nice way to solve these equations in general, but we can notice from the last equation that the prime factors of 242 are 2, 11, and 11. If you try those values for a, b, c, you will see that they do in fact work: their sum is 24, and the sum of products in the middle is 22 + 22 + 121 = 165.

The eigenvalues are thus \lambda_1 = 2\mu and \lambda_2 = \lambda_3 = 11\mu. To find the eigenvectors, we go back to the eigenvalue equation and plug in each eigenvalue one by one. First, (\overset{\leftrightarrow}{I} - \lambda_1 \mathbf{1}) \vec{v}_1 = \mu \left( \begin{array}{ccc} 6&-3&-3 \\ -3&6&-3 \\ -3&-3&6 \end{array} \right) \left( \begin{array}{c} v_{1,x} \\ v_{1,y} \\ v_{1,z} \end{array} \right) = 0. We can use row reduction to manipulate these equations, but here we’ll just expand out in components. 2 v_{1,x} - v_{1,y} - v_{1,z} = 0 \\ -v_{1,x} + 2v_{1,y} - v_{1,z} = 0 \\ -v_{1,x} - v_{1,y} + 2v_{1,z} = 0. The first two equations combine to give us 3v_{1,x} - 3v_{1,y} = 0, so v_{1,x} = v_{1,y}, and then from the final equation -2v_{1,x} + 2v_{1,z} = 0 so v_{1,x} = v_{1,z} as well. Thus, the first eigenvector points in the direction (1,1,1); rescaling to a unit vector, we have \hat{e}_1 = \frac{1}{\sqrt{3}} (1,1,1). So the principal axis corresponding to the smallest eigenvalue \lambda = 2\mu points along the diagonal of the cube. Notice that we didn’t need the third equation; if we were doing row reduction, we would have found one row reduces to (0,0,0). This is completely as expected! Eigenvectors are only unique up to a constant - their length is undetermined - so we should only have two unique equations in three unknowns.

Doing the same thing for the next eigenvalue, we find the matrix equation: (\overset{\leftrightarrow}{I} - \lambda_1 \mathbf{1}) \vec{v}_2 = \mu \left( \begin{array}{ccc} -3&-3&-3 \\ -3&-3&-3 \\ -3&-3&-3 \end{array} \right) \left( \begin{array}{c} v_{2,x} \\ v_{2,y} \\ v_{2,z} \end{array} \right) = 0. Now we have two redundant equations; the one remaining equation can be rewritten in the form \vec{v}_2 \cdot (1,1,1) = 0, But this is just the vector geometric statement that \vec{v}_2 lies anywhere in the plane perpendicular to the first eigenvector, (1,1,1). Moreover, \vec{v}_3 gives exactly the same equation, since it has the same principal moment. (This is an example of a degeneracy, which is when two eigenvalues of a matrix are equal.)

Although \vec{v}_2 and \vec{v}_3 aren’t uniquely determined, they must both be orthogonal to \vec{v}_1, and they must be orthogonal to each other. Thus, we may choose any pair of vectors satisfying these conditions, for example \hat{e}_2 = \frac{1}{\sqrt{6}} (2,-1,-1) \\ \hat{e}_3 = \frac{1}{\sqrt{2}} (0,1,-1). Any combination of these vectors is also an eigenvector, so this is not a unique choice; physically, this means that it doesn’t matter how we rotate \hat{e}_2 and \hat{e}_3 with each other, our inertia tensor remains diagonal.

With our choice of coordinates complete, the inertia tensor in these (body-frame) coordinates provided by the eigenvectors is now diagonal: \overset{\leftrightarrow}{I} = \frac{1}{12} Mb^2 \left( \begin{array}{ccc} 2 & 0 & 0 \\ 0 & 11 & 0 \\ 0 & 0 & 11 \end{array} \right). You can think of the cube as standing on its point in these coordinates, with the \hat{e}_1 axis pointing through the diagonal of the cube vertically.

12.10 Categories of rigid bodies

Previously, we noted that for any rigid object we can always find its principal axes by solving for the eigenvectors of its inertia tensor. The corresponding eigenvalues are the principal moments, equal to the diagonal entries of \mathbf{I} in the coordinates of the eigenvectors: \mathbf{I} = \left( \begin{array}{ccc} \lambda_1 & 0 & 0 \\ 0 & \lambda_2 & 0 \\ 0 & 0 & \lambda_3 \end{array} \right) It’s useful to categorize objects based on the structure of their principal moments - we’ll be able to make general statements about rotational motion in these categories.

12.10.1 Spherical top

All moments equal, \lambda_1 = \lambda_2 = \lambda_3.

The cube about its CM is a spherical top, as is (obviously) the sphere. Because all three moments are equal, the eigenvectors can be chosen arbitrarily not from a plane, but from all of space; in other words, there are no unique principal axes - any direction is a principal axis.

For a spherical top, Euler’s equations simply reduce to \lambda_i \dot{\omega_i} = 0. This is consistent with the idea that the principal axes aren’t unique, since we already saw that for an object spinning exactly about a principal axis, the motion is absolutely stable; \vec{\omega} doesn’t change.

12.10.2 Symmetric top

Two moments equal, \lambda_1 = \lambda_2 \neq \lambda_3.

When I say the word “top”, this is the sort of object you probably think of. The single different principal moment singles out one special axis, while the other two principal axes can be chosen freely.

Notice that when we’re classifying objects like this, we must choose a pivot point as well. The cube, as we’ve just seen, is a spherical top when pivoting about its CM, but a symmetric top about its corner.

This is a good place to point out that two objects with the same principal moments will spin in exactly the same way! This is manifest in Euler’s equations; the evolution of \vec{\omega} only depends on the three principal moments. So an egg spinning on its tip and a box spinning on its corner could give identical motion (if their masses and proportions are matched.)

In fact, it’s sometimes useful to think about equivalent ellipsoids which will describe the same motion. In particular, the case \lambda_3 > \lambda_1 corresponds to an oblate ellipsoid, which looks squashed compared to the principal \hat{e}_3 axis. The alternative \lambda_3 < \lambda_1 is a prolate ellipsoid, which is stretched in the \hat{e}_3 direction instead.

12.10.3 Linear rotor

A special symmetric top where \lambda_3 \ll \lambda_1, \lambda_2.

This sort of symmetric top is common in describing the energy of molecules, especially certain ones like carbon monoxide or molecular oxygen. The motion essentially reduces to considering the two-dimensional rotation about the special axis picked out by the small eigenvalue.

For example, given a pair of equal masses connected by a rigid rod, the CM is exactly centered between them. If we set up coordinates so that both masses are on the x-axis, then they both have y=z=0, and so not only do all off-diagonal elements of the tensor vanish, but also I_{xx} = m \sum_i (y_i^2 + z_i^2) = 0. In other words, since the masses are point particles, rotation about the axis of the rod (about \hat{x}) isn’t measurable, and carries no momentum. You will encounter the linear rotor again in quantum mechanics, but we won’t say much more about it here.

12.10.4 Asymmetric top

All three moments unequal, \lambda_1 \neq \lambda_2 \neq \lambda_3.

This gives rise to the most complicated motion, but we can make some general statements like what we observed for stability under torque-free motion, namely that with three different principal moments \lambda_1 > \lambda_2 > \lambda_3, free rotation (about the CM, no torque) about axes \hat{e}_1 or \hat{e}_3 will be stable, while rotation about axis \hat{e}_2 is unstable.

Tennis racket theorem

For an asymmetric top with \lambda_1 > \lambda_2 > \lambda_3 undergoing torque-free rotation, the motion is stable about the axes of largest and smallest moment (\hat{e}_1 and \hat{e}_3), but unstable about the intermediate axis \hat{e}_2.

If you’ve ever played tennis, you’ve probably found this result out experimentally by tossing the racket up in the air and trying to catch it again. This was also the result we found for the deck of cards previously.