21  Degenerate-state perturbation theory

21.1 Basic setup and two-state example

Now that we’ve covered the basic formalism for time-independent perturbation theory, let’s tackle the problem of degeneracy. As we have seen, our formulas break down badly in the presence of degenerate unperturbed energy eigenstates. From the formalism we derived last time, we saw that our perturbation theory can be thought of as a series expansion in the ratio \frac{W_{nk}}{E_n^{(0)} - E_k^{(0)}}, where the labels n,k run over all distinct energy eigenstates and n \neq k. If we have an exact degeneracy, that is, if E_n^{(0)} = E_k^{(0)}, then this is obviously just nonsense. But even without an exact degeneracy, even just in the limit where the energies of two states become close together, this is clearly no longer small enough to expand in regardless of the size of our perturbative small parameter \lambda. However, this formula also suggests an obvious solution; we distinguished clearly in our derivation between the diagonal and off-diagonal matrix elements of W. If we can diagonalize W over the space of degenerate kets, then we will be able to proceed with our perturbative expansion as normal for the rest.

Let’s get a better feeling for what is happening by considering a simple two-state example. Suppose we have only two unperturbed states \ket{i^{(0)}} and \ket{j^{(0)}} which are degenerate, both with energy E_0. In the space spanned by these two states, we can write the full Hamiltonian (for these two states) as \hat{H} = \left(\begin{array}{cc} E_0 + \lambda W_{ii} & \lambda W_{ij} \\ \lambda W_{ji} & E_0 + \lambda W_{jj} \end{array}\right) We know nothing about the specific form of the perturbation, in particular it may be the case that W_{ii} \neq W_{jj}; what matters is that the energy levels are degenerate with \lambda=0. Since this is just a two-state system, we know how to solve the full Hamiltonian: the energy eigenvalues are E_{\pm} = E_0 + \frac{\lambda}{2} (W_{ii} + W_{jj}) \pm \lambda \sqrt{\left( \frac{W_{ii} - W_{jj}}{2}\right)^2 + |W_{ij}|^2} No trouble here; the energies split apart smoothly as \lambda is turned on. However, the eigenstates are a different matter. Recalling our formula, the mixing angle is equal to \tan 2\theta = \frac{|W_{ij}|}{W_{ii} - W_{jj}} The dependence on \lambda cancels out entirely! So for any \lambda > 0, we have a discontinuous jump from the unperturbed eigenstates to the perturbed ones; there’s no way to write the state with the perturbation switched on as an order \lambda correction to the original state.

Now we come to the point: this discontinuity is not physical in any way, and we can get rid of it if we’re careful! First, notice that with \lambda = 0, we can write the Hamiltonian of this two state system as just \hat{H}_0 = \left(\begin{array}{cc} E_0 & 0 \\ 0 & E_0 \end{array}\right) This matrix is proportional to the identity, due to the degeneracy of the states \ket{i^{(0)}} and \ket{j^{(0)}}. But this means that there is an ambiguity as to what we call the energy eigenstates; any linear combination a \ket{i^{(0)}} + b \ket{j^{(0)}} is also an eigenstate with energy E_0. The perturbation \hat{W} breaks the ambiguity, by selecting a particular direction in terms of the \lambda-independent mixing angle \theta above. If we apply that rotation, then there is no discontinuity in our state definitions. Moreover, applying this mixing angle will diagonalize the perturbation, so it fixes our perturbative formulas and we can proceed normally afterwards.

Since this is a critically important idea, let’s repeat it in a slightly different way. One of the keys to applying time-independent perturbation theory is that the effect of the perturbation is small, which means that all of our energy eigenstates after perturbation are equal to the \hat{H}_0 eigenstates plus a small correction, \ket{n} = \ket{n^{(0)}} + \lambda \ket{\delta_n}. In the degenerate case, the choice of \ket{n^{(0)}} itself becomes ambiguous until we switch \lambda on. To do our calculation correctly, we thus have to work backwards: we need to find the unique directions in the degenerate subspace which our perturbation \hat{W} picks out as eigenstates, and then find the \lambda \rightarrow 0 limit of those to identify the “correct” choices of unperturbed energy eigenstates.

In general, these states will be part of a larger Hamiltonian: \left( \begin{array}{cccc} E_0^{(0)} + \lambda \hat{W}_{ii} & \lambda W_{ij} & \lambda W_{ik} & ... \\ \lambda W_{ji} & E_0^{(0)} + \lambda \hat{W}_{jj} & \lambda W_{jk} & ... \\ \lambda W_{ki} & \lambda W_{kj} & E_k^{(0)} + \lambda \hat{W}_{kk} & ... \\ ... & ... & ... & ... \end{array}\right) We can diagonalize by rotating \ket{i} and \ket{j} to \ket{i'} and \ket{j'}, which changes the matrix in the corresponding columns and rows: \left( \begin{array}{cccc} E_0^{(0)} + \lambda E_{i'}^{(1)} & 0 & \lambda W_{i'k} & ... \\ 0 & E_0^{(0)} + \lambda E_{i'}^{(1)} & \lambda W_{j'k} & ... \\ \lambda W_{ki'} & \lambda W_{kj'} & E_k + \lambda \hat{W}_{kk} & ... \\ ... & ... &... & ... \\ \end{array} \right) With this rotation, the degeneracy is removed; our perturbation-theory formulas will be valid. All we have to do is make sure we apply everything using the new, rotated basis.

“Degenerate PT” isn’t a very good name!

I should comment that “degenerate state perturbation theory” is actually something of an oxymoron. There’s no deep new formalism we had to derive here; the point is quite simply that perturbation theory doesn’t work on degenerate states. Remember that what we’re trying to do is find approximate solutions for the eigenvectors and eigenvalues of the full Hamiltonian \hat{H} = \hat{H}_0 + \lambda \hat{W}. All we’ve done here is recognized that this won’t work on a degenerate subspace, done the complete diagonalization within that subspace, and then used perturbation theory to study what happens to the rest of the states.

In particular, if we had a Hamiltonian which was completely degenerate, then we could only solve the full diagonalization problem, and then there would be nothing left to treat perturbatively! (Of course, high degrees of degeneracy usually indicate that there is a lot of symmetry in our system, in which case we have many other tools we can use to solve.)

21.2 Tips, tricks, and pitfalls in degenerate-state perturbation theory

The basic idea here is quite simple, but the reason degenerate-state perturbation theory is a completely separate chapter in my notes is that there are some subtle problems that can show up in the context of degeneracy and perturbation theory if we’re not careful, and there are also some useful tricks and methods that go beyond the simple idea of “diagonalize away the degeneracy” which are worth getting into a little bit.

21.2.1 Reminders about eigenstate and energy corrections

When we carry out our diagonalization procedure, we’re partly solving the problem we started with, namely finding the eigenstates and eigenvalues of the full \hat{H} = \hat{H}_0 + \lambda \hat{W}. This just means we have to be extra careful about keeping track of how the corrections appear.

First, we should always keep in mind that usually, the “diagonalized” states are still corrected at first order in \lambda. Notice that in our matrix above, \ket{i'} and \ket{j'} are not eigenstates of the full Hamiltonian; although we’ve diagonalized \hat{H}_0 + \lambda \hat{W} in their subspace, they still overlap with other non-degenerate states. If we write out the equation for the corrected states at first order, we see for example \ket{i'^{(1)}} = \sum_{k \neq i'} \ket{k^{(0)}} \frac{W_{ki'}}{E_{i'}^{(0)} - E_k^{(0)}} Our diagonalization removes the pathological W_{ij} term from the sum, but everything else remains, so we do expect to find order-\lambda corrections in general.

Second, the “diagonalized” energies are still corrected at first order in \lambda. Let’s write out the formula for the perturbative energy to second order: E_{i'} = E_{0}^{(0)} + \lambda W_{i'i'} + \lambda^2 \sum_{k \neq i'} \frac{|W_{i'k}|^2}{E_{i'}^{(0)} - E_k^{(0)}} + ... It just so happens that the first-order correction \lambda W_{i'i'} is usually computed as a byproduct of the diagonalization of the subspace, as we saw in our explicit two-state example from last time; this is because the first-order correction only depends on the diagonal element W_{i'i'}. However, at second order, corrections from coupling of the perturbation to states outside the degenerate subspace show up again.

Warning: first-order energy corrections and degeneracy

There is one term in the perturbative expansion which will not obviously blow up in the presence of degeneracy, namely the first-order energy correction, proportional to W_{ii}. Make sure you’re not studying a system with degenerate states if you’re only calculating first-order energy corrections! If there is a degeneracy and you don’t deal with it, your calculation will give you a “first-order” result in an expansion which is infinity at the next order - not very useful!

Exercise: Tutorial 7

Here, you should complete Tutorial 7 on “Degenerate-state perturbation theory”. (Tutorials are not included with these lecture notes; if you’re in the class, you will find them on Canvas.)

21.2.2 Near-degenerate perturbation theory

The states don’t have to be exactly degenerate for us to apply this method; they also don’t have to be exactly degenerate to cause problems in perturbation theory, since as we previously observed, convergence of the perturbative expansion requires \frac{\lambda |W_{ij}|}{E_i^{(0)} - E_j^{(0)}} \ll 1. If any two energies are extremely close to one another, then even if \lambda and \hat{W} are small the expansion will be poor. The good news is that if we have a near-degeneracy, we can do an approximate diagonalization of the corresponding subspace to greatly improve our perturbative series.

Suppose that we have two states \ket{E_m^{(0)}} and \ket{E_n^{(0)}} whose corresponding energies are nearly degenerate. We can then write E_m^{(0)} = \bar{E} - \epsilon, \\ E_n^{(0)} = \bar{E} + \epsilon, defining the average energy \bar{E} and the (half) difference \epsilon between the two energy values. By assumption, if they are nearly degenerate then we have \epsilon \ll \bar{E}.

Now, if these two states were exactly degenerate, then the state \ket{\psi} = \cos \theta \ket{E_m^{(0)}} - \sin \theta \ket{E_n^{(0)}} would still be an eigenstate of \hat{H}_0 for any mixing angle \theta. This is not true in the present case, but since the levels are nearly degenerate, we can rewrite the above as \hat{H}_0 \ket{\psi} = \cos \theta E_m^{(0)} \ket{E_m^{(0)}} - \sin \theta E_n^{(0)} \ket{E_n^{(0)}} \\ = \bar{E} \left[ \cos \theta \ket{E_m^{(0)}} - \sin \theta \ket{E_n^{(0)}} \right] + \epsilon \left[ \cos \theta \ket{E_m^{(0)}} + \sin \theta \ket{E_n^{(0)}} \right] where I’ve taken \bar{E} to be the average of the two energy values, and \epsilon to be their difference. So the rotated state \ket{\psi} is almost an eigenstate of \hat{H_0}, up to a small correction proportional to \epsilon.

This suggests how to proceed: we rewrite our Hamiltonian as \hat{H} = \hat{H}_0 + \lambda \hat{W} = \hat{H}_0' + \lambda \hat{W}' where \hat{H}_0' = \hat{H}_0 + \sum_{k=i,j} (\bar{E} - E_k^{(0)}) \ket{E_k^{(0)}}\bra{E_k^{(0)}} \\ \lambda \hat{W}' = \lambda \hat{W} + \sum_{k=i,j} (E_k^{(0)} - \bar{E}) \ket{E_k^{(0)}}\bra{E_k^{(0)}}. We’re just adding and subtracting the same term, of course, so \hat{H} is unchanged. After redefinition, \hat{H}_0' contains exactly degenerate states with energy \bar{E}, and our perturbation has picked up two additional terms proportional to the small parameter \epsilon. From here, we can proceed with exactly-degenerate perturbation theory.

21.2.3 Higher-order degeneracies

If the off-diagonal terms W_{ij} vanish in a degenerate subspace, then our diagonalization procedure itself becomes ambiguous, and we are back to the problem we started with that a discontinuity will appear between the perturbed and unperturbed states. In this situation, the higher-order terms in \lambda arising from interaction with the other states have to break the degeneracy. Let’s work through a simple example to see what happens in practice in this case.

Going back briefly to the 3x3 example studied on the tutorial above, consider the following Hamiltonian: \hat{H} = \left( \begin{array}{ccc} E_0 & 0 & \lambda a \\ 0 & E_0 & \lambda a \\ \lambda a^\star & \lambda a^\star & E_1 \end{array}\right) We still find that we can rotate the first two states \ket{i}, \ket{j}, but now it’s not enough to just consider diagonalizing their subspace alone; here \hat{W} in the subspace will always be diagonal. However, we still expect the perturbation to lift the degeneracy and choose a preferred combination of these two kets when \lambda is switched on; once again, we need to rotate the states now to remove the possibility of a discontinuity in \lambda.

Determining how to rotate the initial states in this case now requires working to higher order in \lambda, and accounting for coupling to states outside of the degenerate subspace. Let’s see how this works in principle. Remember that in the non-degenerate case, we derived an operator equation for the full \hat{H} eigenstates \ket{n}: \ket{n} = \ket{n^{(0)}} + \frac{1}{E_n^{(0)} - \hat{H}_0} \hat{\phi}_n (\lambda \hat{W} - \Delta_n) \ket{n} Now suppose we have a set of states \ket{n_i^{(0)}}, with i = 1, ..., d, all degenerate with energy E_D^{(0)}. We once again run into the problem that the operator (E_n^{(0)} - \hat{H}_0)^{-1} is ill-defined when acting on the degenerate states. Our original solution was the projector to separate \ket{n^{(0)}} out. Now we just need a bigger projector: \hat{\phi}_{D} = 1 - \sum_{i=1}^d \ket{k_i^{(0)}}\bra{k_i^{(0)}} Then \ket{n_i} = \sum_{j=1}^d c_{ij}(\lambda) \ket{n_j^{(0)}} + \frac{1}{E_{D}^{(0)} - \hat{H}_0} \hat{\phi}_D (\lambda \hat{W} - \Delta_{i}) \ket{n_i}. At leading order in \lambda, the second term vanishes, and if we apply the operator (E_i - \hat{H}) on the left we will find that the c_{ij} at leading order are simply determined by an eigenvalue/eigenvector equation for \hat{W}; in other words, solving this at leading order just requires diagonalizing \hat{W} in the subspace, as we’ve been doing. In fact, if we’re able to diagonalize on the subspace at first order, then the formal construction of the second term is unnecessary since all of the badly-behaved terms vanish, and we can just use the perturbative formulas we derived before on the rotated basis.

On the other hand, in the case that the degeneracy isn’t fully broken at first order, then we have to keep the second term and solve for the c_{ij}(\lambda) at higher order. We can, in principle, solve for them order by order by plugging in the expansion of \ket{n_i} in terms of both the kets \ket{n_j^{(0)}} and the coefficients c_{ij}(\lambda) = c_{ij}^{(0)} + \lambda c_{ij}^{(1)} + ... Second-order degenerate perturbation theory is sufficiently rare that I won’t go through the derivation of the higher-order equations here, but at least I’ve given you the setup. Schiff’s book on quantum mechanics is an excellent resource if you ever find yourself confronted with higher-order degenerate perturbation theory in the wild.

21.2.4 Symmetries and degeneracy

Sometimes, even going to second order in perturbation theory isn’t good enough, or third order; it is in fact possible for the perturbation to fail to lift the degeneracy between energy eigenstates to all orders in perturbation theory. This is actually good news and not bad; the vanishing of perturbative corrections to all orders typically signals the presence of an underlying symmetry.

Remember that our basic problem with degenerate states was that a discontinuous jump could appear when we switch on \lambda, since there are many ways to write our unpertubed basis in the degenerate subspace. But if there is another operator \hat{A} so that [\hat{A}, \hat{H}] = 0, then there is no ambiguity between perturbed and unperturbed states; we can label the perturbed energy eigenstates with the eigenvalues of \hat{A}.

This will save us a lot of trouble in the hydrogen atom; for example, if we consider any perturbation which commutes with \hat{L}_z of the electron, then we can use m_l to label the perturbed eigenstates, and the perturbation will already be diagonal and unambiguous.

Another way to state what is happening here is: in cases where all off-diagonal matrix-elements vanish due to a symmetry, we are protected from any perturbative corrections appearing, and simple diagonalization in our degenerate subspace ends up becoming an exact solution of that part of the system.

Now that we’re fully equipped to handle problems with degeneracy, let’s run through a couple more important example problems.

21.3 The linear Stark effect

Let’s return to complete our discussion of the effects of an external electric field on the hydrogen atom. We once again take as our perturbing potential \hat{W} = -e |\mathbf{E}| \hat{z}.

We previously considered the quadratic correction to the ground-state energy, for which there were no issues of degeneracy to deal with. Now let’s move on to the n=2 states. Here we find four states that are degenerate: the l=1 triplet (aka 2p) and the l=0 singlet (2s). Once again, parity simplifies the discussion: since \bra{nl'm'} \hat{z} \ket{nlm} \rightarrow -(-1)^{l'-l} \bra{nl'm'} \hat{z} \ket{nlm} under a parity transformation, the perturbation will only have non-vanishing matrix elements between l=0 and l=1 states. We also recognize \hat{z} as the q=0 component of the spherical position tensor \hat{r}_q^{(1)}, which means that we must also have m=m'. Based on these two selection rules, for the n=2 energy level there is only one non-vanishing matrix element: \hat{W} = \left( \begin{array}{cccc} 0 & \bra{200} \hat{W} \ket{210} & 0 & 0 \\ \bra{210} \hat{W} \ket{200} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right) with the column vectors taken to be \ket{200}, \ket{210}, \ket{21(-1)}, \ket{211} in order. This is a degenerate-state perturbation theory problem - all four states have unperturbed energy E_2 - so we have to diagonalize in the m=0 states. Since the matrix is so simple, we see immediately that the correct eigenkets are \ket{\pm} = \frac{1}{\sqrt{2}} (\ket{200} \pm \ket{210}), and the first-order energy corrections are given by the resulting diagonal entries \Delta_{\pm}^{(1)} = \pm \bra{200} \hat{W} \ket{210} = \mp e |\mathbf{E}| \bra{200} \hat{z} \ket{210}. As we saw before, there’s no easy way to evaluate general matrix elements of \hat{z} between hydrogen levels; we’ll use the explicit formulas here. We can rewrite z = r \cos \theta, and as always the integral splits into an angular part and a radial part. For the angular integral, we have \int d\Omega (Y_1^0)^\star(\theta, \phi) \cos \theta Y_0^0(\theta, \phi). Inspecting the definitions of the harmonics, we see that Y_1^0 is proportional to \cos \theta; in fact, Y_1^0(\theta, \phi) = \sqrt{3} \cos \theta Y_0^0(\theta, \phi), which allows us to use the orthogonality of the spherical harmonics to evaluate the integral: \int (...) = \frac{1}{\sqrt{3}}. For the radial integral, you’ll have to go back and look up the form of the wavefunctions; I’ll just tell you that the result is -3 \sqrt{3} a_0. Combining, we have that \Delta_{\pm}^{(1)} = \pm 3 e a_0 |\mathbf{E}| . A diagram showing the splitting of the n=2 energy level due to the linear Stark effect as an electric field is applied is shown below.

By the way, this was really a problem in nearly degenerate perturbation theory; the \ket{200} and \ket{210} states don’t have the same energy in reality, due to other effects we’ve ignored. In fact, they’re not even energy eigenstates; we know that the spin-orbit coupling splits \ket{210} into two energy levels, {}^2P_{1/2} and {}^2P_{3/2}. Our calculation is only valid in the limit that \lambda |\mathbf{E}| is large compared to these effects (but still small enough compared to the orbital energy differences in hydrogen that we can use perturbation theory.)

21.4 Applied magnetic fields (the Zeeman and Paschen-Back effects)

Since we’ve covered the effects of an applied electric field to the hydrogen atom, let’s turn to the applied magnetic field.
\hat{W}_B = \frac{eB}{2m_e c} (\hat{L}_z + 2 \hat{S}_z) This time, we obviously need to include the spin of the electron as well, so we label our eigenstates by \ket{n,l,m,m_s}. If this perturbation was the only interaction, then we would be able to stay in this basis and calculate; the perturbation above is already diagonal in the given basis. However, we know that the spin-orbit coupling \hat{\vec{S}} \cdot \hat{\vec{L}} splits the energy levels of hydrogen based on their total angular momentum \hat{\vec{J}} = \hat{\vec{L}} + \hat{\vec{S}} eigenvalues. Assuming that our magnetic field is truly small, even compared to the spin-orbit energy corrections, we must change basis: \ket{nlm_l m_s} \rightarrow \ket{nljm} We’ve studied the matrix elements of the above operator in the \ket{n l j m} basis before, as an application of the projection theorem; we found that \Delta_{jm}^{(1)} = \frac{eB}{2m_e c} g_j \hbar m where g_j is a purely numerical factor known as the Landé g-factor. We found the full expression for g_j as a function of s,l,j before, which I won’t rewrite here, but I will point out that if we have s=1/2 then the expression for g happens to simplify quite a bit: g_j = \begin{cases} 1 + \frac{1}{2l+1}, & j = l + 1/2; \\ 1 - \frac{1}{2l+1}, & j = l - 1/2. \end{cases} This small splitting of the \ket{nljm} energy eigenvalues is known as the Zeeman effect.

There is another limit we can study easily, which is to take B to be large compared to the size of the spin-orbit corrections, but small compared to the unperturbed energy splittings of hydrogen (this limit gives the Paschen-Back effect.) In this limit we reverse the orders of the perturbations; we can solve for the effects of the magnetic field first, and then add the spin-orbit term if we’re interested in higher precision.

With the spin-orbit term neglected, we’re free to work in the \ket{n l m_l m_s} basis. Our perturbation acts very simply on these states: we find quickly that \Delta E = \bra{nlm_l m_s} \hat{W}_B \ket{nlm_l m_s} = \frac{eB\hbar}{2m_e c} (m_l + 2m_s). Why can’t we use the \ket{nljm} states as our unperturbed basis? Remember that this is a degenerate-state perturbation theory problem, and we have to choose our starting basis to be oriented along the direction of the first-order perturbative corrections. Since [\hat{\vec{J}}, \hat{W}_B] \neq 0, we must use the \ket{nlm_l m_s} states; the j, m quantum numbers are “bad”, in the sense that they don’t diagonalize the interaction we’re now treating as a perturbation. For the Zeeman effect, the opposite is true since \hat{L}_z and \hat{S}_z don’t commute with the spin-orbit term. With both interactions, neither set of states are eigenstates of the full Hamiltonian; we expect to find perturbative corrections to the states themselves.

A canonical example of both effects together occurs in the study of the hydrogen ground state, in particular the “hyperfine” structure, which involves including the spin of the proton \hat{\vec{I}} as well as the spin of the electron. We will see the details of this in the next chapter, but for now I’ll just make the simple observation that with no magnetic field applied, the only interaction in the Hamiltonian involving these spins is a spin-spin coupling of the form \hat{\vec{I}} \cdot \hat{\vec{S}}, which we deal with in the usual way: we write a total-spin operator \hat{\vec{F}} = \hat{\vec{I}} + \hat{\vec{S}}, which reveals that there are two energy levels with four states total, an f=1 triplet and an f=0 singlet.

While we’re not quite set up to find the size of this splitting yet, it’s much easier to consider the Paschen-Back effect on this state. Applying a strong magnetic field (but still weak enough that we can trust perturbation theory), we need to switch to the \ket{m_s m_i} product spin basis to resolve the degeneracy and align with this perturbation. The result is shown below:

We find, roughly, two pairs of states whose energy splitting is dominated by the electron spin m_e, with a smaller contribution from the proton spin that leads to two pairs of lines with slightly different slopes. Being careful about both effects at once allows us to match which state in the Paschen-Back limit maps back to which of the \hat{\vec{F}}^2 eigenstates at zero magnetic field; see the Feynman lectures, chapter 12 if you’re interested in seeing all the details. (You’ll note that he doesn’t call this effect Paschen-Back; I think that name is reserved specifically for the strong magnetic field regime with orbital angular momentum and spin, not two spins. I think they’re obviously similar things happening, but just know that you likely won’t find the hyperfine effect under the name “Paschen-Back” if you search.)