does the Earth really go around the Sun?

tl;dr: Executive summary: It is not fundamentally true that the Earth goes around the Sun; it is just easier to calculate things that way.

We like to say that the critical event that started the scientific revolution is the discovery that the Earth goes around the Sun, and not the other way around. This was incredibly important; the hypothesis by Copernicus led to the immensely important data-taking by Tycho Brahe and the quantitative, theoretical explanation of it by Kepler. Galileo's discovery of moons of Jupiter bolstered the case in important ways, and Newton's quantitative description of it all in terms of the inverse-square law solidified it all into an edifice of great importance, that is just as important and valuable today as it was then. It is also a great example of how a scientific discovery requires both observational and theoretical backing to become confidently adopted by the community.

In the 20th Century, Einstein brought us General Relativity, with the eponymous generality granting us immense coordinate freedom. That is, there are (infinitely) many ways we can make decisions about what is stationary and what is moving, and what we choose as reference points. In some choices, calculations are harder. In other choices, calculations are easier. In yet others, certain symmetries become more obvious or more valuable for making predictions. That is, GR delivers to us lots of choices about how to think about what's moving and how.

So the crazy insane thing is this: In GR, there is no answer to the question of whether the Earth goes around the Sun or whether the Sun goes around the Earth. There is literally no observational answer to the question, and no theoretical answer. All observations can be incorporated to an analysis from either perspective. The question of which goes around which is not a question you can ask in the theory.

That said, it really is far, far easier to do calculations in the Copernican frame. Indeed, absolutely all calculations of Solar System dynamics are done in this frame with post-Newtonian code. The way I see it (with modern eyes) is that Copernicus's hypothesis was based on parsimony or simplicity and was adopted for that reason. Brahe and Kepler confirmed that the data are consistent with Copernicus's simple model (though with the eccentricities added). After Brahe and Kepler it was still possible to understand the observations in an Earth-centered (or even stranger) coordinate system, but was far, far easier to do calculations in the heliocentric frame.

Even today, now that GR is our model of gravity, we still calculate the Solar System with Newtonian codes (with adjustments to approximate GR corrections). And even today, now that we have this amazingly accurate model of the Solar System, we still often calculate the positions of celestial bodies by looking at paths on the celestial sphere, as did Ptolemy. How we calculate something is incredibly context-dependent, and doesn't always respect our most fundamental ideas. And the truth of Copernicus's hypothesis really just represents the pragmatism of the present-day mathematical tools. All these thoughts bolster my rejection of scientific realism and play into questions of social construction and so on. It also bolsters my view that Ockham's Razor should be thought of as a statement about calculation, not truth.

Sure the Earth goes around the Sun! But let's remember that this is a statement about calculation and pragmatism, not the fact of the matter.


where's the information?

It came up frequently in discussions this summer (and last): Where is the information (in, say, a spectrum of a star) about some parameter of interest (say, the potassium abundance of the star, or the radial velocity), and how much information is there? The answer is very simple! But the issues can be subtle, because there is only calculable information within the context of some kind of model. And by “model” here, I mean a probability density function for the data, parameterized by the parameters of interest. That is, a likelihood function.

The fast answer is this: The information about parameter θ is related to the (inverse squared) amount you can move parameter θ and still get reasonable probability for the data. The nice thing is that you can compute this, often, without doing a full inference. It is easiest in linear (or linearized) models with Gaussian noise! That's the question we will answer here.

When you have a linear or linearized model with Gaussian noise, there are derivatives of the expectation Y for the data with respect to the parameter of interest, dY/dθ. Here (for now) Y is an N-vector the size N of your data, and θ is a scalar parameter (let's call it the velocity!). So the derivative dY/dθ is an N-vector. The information about θ in the data is related to the dot product of this vector with respect to itself: The accuracy with which you can measure θ given data with Gaussian noise with N×N covariance matrix C (possibly diagonal if the N data points are independent) is:

σθ-2 = [dY/dθ]T C-1 [dY/dθ]

where σθ is the uncertainty on θ. That is, the inverse variance on the θ parameter is the inner product of the derivative vectors, where that inner product uses the inverse variance tensor of the noise in the data as its metric! Here we have implicitly assumed that the vectors are column vectors. When the N data points are independent, the C matrix is diagonal, as is its inverse. Note the units too: The inverse variance tensor has inverse Y-squared units, the inner product uses the derivatives to change this to inverse θ-squared units.

(When there are multiple parameters in θ—say K parameters—the inner product generalizes to making a K×K inverse covariance matrix for the parameter vector, and the expected variance on each parameter is obtained by inverting that inverse variance matrix and looking at the diagonals.)

But we started with the question: Where is the information in the data? In this case, it means: Where in the spectrum is the information about the velocity? The answer is simple: It is where the data—or really the inverse variance tensor for the noise the data—makes large contributions to the inverse variance computed above for θ. You can think of splitting the data into fine chunks, and asking this question about every chunk; the chunks or pixels or data subsets that contribute most to the scalar inverse variance are the subsets that contain the most information about θ.


walk or take the elevator?

I'm just generally excited about getting back into the classroom after a long sabbatical. I'm thinking about problem-set problems for the Physics Majors. Here's what's in my head right now:

NYC has had a hot summer, with most buildings running air conditioning on a thermostat continuously. To save energy, NYU (and other large entities in NYC) asked their employees to conserve energy in various ways, some of which we might take issue with. Here's an uncontroversial one: You should take the stairs, not the elevator.

But is that uncontroversial? What considerations are required to figure out whether this policy would reduce or increase energy consumption? Obviously—if you take the stairs—you use less elevator energy, but then you drop a metabolic load on the building air-conditioning. Which uses more power in the end? Use a combination of web research and simple physical arguments to make cases, and identify weaknesses in your argument as you change assumptions. Things that matter include: Neither humans nor elevators are 100-percent efficient delivery vehicles for potential energy (in fact, can you see a fundamental argument that elevators must spend more than 50 percent of their energy generating heat?). Elevators are heavy but counter-weighted. Some buildings have very busy elevators, so your contribution to the elevator load is only the marginal contribution; in other buildings you are typically the only person in the elevator. Air conditioning systems have efficiencies limited by fundamental ideas in thermodynamics, but are probably much less efficient than the limits. And so on!

Thanks to Andrei Gruzinov (NYU) for starting me thinking about this one.


Syllabus (the book)

I just read Syllabus by Lynda Barry. It is a set of syllabi and daily in-class instructions, along with some reflections, from a few years of teaching cartooning and writing at Wisconsin. It is a combination of hilarious and insightful, both about learning to draw and cartoon, but also about the practice of teaching.

A page from Lynda Barry's Syllabus.

The syllabi and daily exercises include many great practices. For example, when students are listening to something (being read or previously recorded), she has them either draw tight spirals in their notebooks or else color in a line drawing. She has the students sketch in non-photo blue pencil and ink in later; it reduces inhibitions. She has simple forms for students to write daily diary or journal entries so that they make close observations of the world, in words and pictures. She loves most the students who have never drawn (since childhood), and she makes exercises that capitalize on their newness (and defeat those who are more experienced at rendering), like forcing them to do drawings in increasingly short time intervals.

The book is hilarious in part because she shows lots of great student work (it is not clear she has proper permissions here, because she does not individually credit each student drawing), and because the whole book is a collage of drawings, writings, and found objects, in the Lynda Barry style. I doubt there is another book out there on teaching that makes you laugh out loud all the way through.

A great question for me is what of Barry's practices could carry over to a class in Physics?


Make it Stick

I just read Make it Stick, about research-based results in how people learn and the implications for education. I loved it; it is filled with simple, straightforward ideas that will be useful in the classroom. For example, it is better to do many low-stakes quizzes than a few high-stakes exams. For another, it isn't useful for students to re-read the textbook, and it is useful for the lectures and the textbook to be misaligned. For another, students' perception of their learning is often wrong and misguided. For another, it is useful to interleave topics and not just do “massed practice”. It points out that it is very adaptive for learners to believe that their brains are plastic and their abilities are not innately limited. Luckily this also appears to be true. All of these things will come into my next pre-health (or other big) class. And I will explicitly explain to the students why.

My quibbles with the book are few. One is that they slag off unschooling, and then immediately follow with a long profile of a Bruce Hendry, who is a perfect example of the power of unschooling (he is entirely self-taught through self-directed projects of great importance to himself!). I also found the writing repetitive and a bit slow. But the book is filled with good ideas. Also, it is not just informative, it is responsible: The authors clearly differentiate between research findings and speculations or over-generalizations of them. This is a great contribution to the literature on teaching and learning.


emission lines from stars

At the end of Mike Blanton's brown-bag talk at NYU yesterday, Matt Kleban asked: Why don't stars produce emission lines; why only absorption lines? Maryam Modjaz said "because they are hotter on the inside and cooler on the outside". That's true! But it is slightly non-trivial to see why the consequence is always absorption-lines only. And does it mean that if the stars were cold, condensed objects bathed in a hotter radiation field, they would produce emission lines? (I think the answer here might be "yes"; think of a gas cloud bombarded with ionizing radiation.) Also Kleban pointed out that actually the very outside of the Sun is in fact hotter than the surface, which is true, but it must be that this is just so optically thin it barely matters.

In some ways, the biggest paradox about stars is that they aren't all the same temperature: After all, the "surface temperature" of a star is the temperature around the place where the photosphere becomes optically thin; shouldn't this be around 10,000 K for all stars? After all, that's the temperature around which hydrogen atoms recombine (see, for example, the CMB). I don't know any simple answer to this paradoxical question; to my (outsider) perspective it seems like the answer is always all about detailed atomic physics.


oscillations and the metric

In class, I was solving the normal-mode problem for a solid object near equilibrium, using generalized coordinates, in the usual manner. This starts by orthogonalizing the coordinates to make (what I call) the "mass tensor" (the tensor that comes in to the quadratic kinetic-energy term) proportional to (or identical to) the identity. This operation was annoying me: Why do we have to get explicit about the coordinates? The whole point is that the coordinates are general and we don't have to get specific about their form!

In my anger, I solved the problem without this orthogonalization. It turns out that this solution is easier! Of course it is: I can do everything with pure matrix operations.

I had two other in-class epiphanies about the problem. The first is that the solution you get when you don't do the orthogonalization is more analogous to the simple one-dimensional problem in every way. The second is that, in a D-dimensional problem with D generalized coordinates, the tensor that goes in to the kinetic energy term is some kind of spatial metric for a D-dimensional dynamical problem. (Or proportional to it, anyway.) That is simultaneously obvious and deep.