Hogg's Teaching: 2016-08

2016-08-28

where's the information?

It came up frequently in discussions this summer (and last): Where is the information (in, say, a spectrum of a star) about some parameter of interest (say, the potassium abundance of the star, or the radial velocity), and how much information is there? The answer is very simple! But the issues can be subtle, because there is only calculable information within the context of some kind of model. And by “model” here, I mean a probability density function for the data, parameterized by the parameters of interest. That is, a likelihood function.

The fast answer is this: The information about parameter θ is related to the (inverse squared) amount you can move parameter θ and still get reasonable probability for the data. The nice thing is that you can compute this, often, without doing a full inference. It is easiest in linear (or linearized) models with Gaussian noise! That's the question we will answer here.

When you have a linear or linearized model with Gaussian noise, there are derivatives of the expectation Y for the data with respect to the parameter of interest, dY/dθ. Here (for now) Y is an N-vector the size N of your data, and θ is a scalar parameter (let's call it the velocity!). So the derivative dY/dθ is an N-vector. The information about θ in the data is related to the dot product of this vector with respect to itself: The accuracy with which you can measure θ given data with Gaussian noise with N×N covariance matrix C (possibly diagonal if the N data points are independent) is:

σ_θ^-2 = [dY/dθ]^T C^-1 [dY/dθ]

where σ_θ is the uncertainty on θ. That is, the inverse variance on the θ parameter is the inner product of the derivative vectors, where that inner product uses the inverse variance tensor of the noise in the data as its metric! Here we have implicitly assumed that the vectors are column vectors. When the N data points are independent, the C matrix is diagonal, as is its inverse. Note the units too: The inverse variance tensor has inverse Y-squared units, the inner product uses the derivatives to change this to inverse θ-squared units.

(When there are multiple parameters in θ—say K parameters—the inner product generalizes to making a K×K inverse covariance matrix for the parameter vector, and the expected variance on each parameter is obtained by inverting that inverse variance matrix and looking at the diagonals.)

But we started with the question: Where is the information in the data? In this case, it means: Where in the spectrum is the information about the velocity? The answer is simple: It is where the data—or really the inverse variance tensor for the noise the data—makes large contributions to the inverse variance computed above for θ. You can think of splitting the data into fine chunks, and asking this question about every chunk; the chunks or pixels or data subsets that contribute most to the scalar inverse variance are the subsets that contain the most information about θ.

2016-08-14

walk or take the elevator?

I'm just generally excited about getting back into the classroom after a long sabbatical. I'm thinking about problem-set problems for the Physics Majors. Here's what's in my head right now:

NYC has had a hot summer, with most buildings running air conditioning on a thermostat continuously. To save energy, NYU (and other large entities in NYC) asked their employees to conserve energy in various ways, some of which we might take issue with. Here's an uncontroversial one: You should take the stairs, not the elevator.

But is that uncontroversial? What considerations are required to figure out whether this policy would reduce or increase energy consumption? Obviously—if you take the stairs—you use less elevator energy, but then you drop a metabolic load on the building air-conditioning. Which uses more power in the end? Use a combination of web research and simple physical arguments to make cases, and identify weaknesses in your argument as you change assumptions. Things that matter include: Neither humans nor elevators are 100-percent efficient delivery vehicles for potential energy (in fact, can you see a fundamental argument that elevators must spend more than 50 percent of their energy generating heat?). Elevators are heavy but counter-weighted. Some buildings have very busy elevators, so your contribution to the elevator load is only the marginal contribution; in other buildings you are typically the only person in the elevator. Air conditioning systems have efficiencies limited by fundamental ideas in thermodynamics, but are probably much less efficient than the limits. And so on!

Thanks to Andrei Gruzinov (NYU) for starting me thinking about this one.