Complex Square Root

February 14, 2008 – 1:45 am

Proof that 1 = -1:

 \frac{1}{\sqrt{i}} = \sqrt{\frac{1}{i}} = \sqrt{-i} = \sqrt{-1} \sqrt{i} = i \sqrt{i}

 \Rightarrow 1 = i \sqrt{i} \sqrt{i} = i^2 = -1

This function should be taken outside and shot.

Legendre Transform

January 5, 2008 – 4:19 am

The Legendre transform is a simple and useful tool in some branches of physics such as Thermodynamics and Mechanics. In my experience though, this transform is usually explained in a very confusing way.

I want to give my own derivation, which I hope is clear and straightforward. If you’re already familiar with the transform, you can skip straight to Derivation of the Legendre Transform.

Some Background

Let’s start with the definition. If you have a function f(x), its Legendre transform is a function g(p), where p is thought of as f(x)’s derivative. g(p) is defined as:

g(p) = xp-f(x)

This definition is already confusing, because x is considered a function of p. This function is found by solving the following equation for x:

p = \frac{df}{dx}(x)

To clarify matters, let’s consider an example. Take f(x)=x^2, then:

p=\frac{df}{dx}(x)=2x

x=p/2

The Legendre transform is then:

g(p) = x(p)p-f(x(p))=\frac{p}{2}p-(\frac{p}{2})^2=\frac{p^2}{4}

Okay, what is it good for? The usual explanation starts with some geometric construction like the one you see on Wikipedia(*). Then, to explain why the transform works, you are shown the differential:

df=\frac{df}{dx}dx=pdx

dg=d(px)-df=xdp+pdx-pdx=xdp

And the conclusion is that g is indeed a function of p, which is f’s derivative. Mathematically speaking, this argument is quite unconvincing, because saying that g is a function of f’(x) has no meaning. g is a function of some real number, and this number has no intrinsic connection to any derivatives.

One possible relation could be that g and f agree if you give the corresponding arguments:

g(\frac{df}{dx}(x))=f(x)

but this is not the case! Some better explanation is obviously needed.

Derivation of the Legendre Transform

We define as before:

p(x)=\frac{df}{dx}(x)

And we are looking for a function g(p). The crucial point is this: We require that g’s derivative will correspond to x:

x=\frac{dg}{dp}(p)

Rigorously, this means:

x=\frac{dg}{dp}(p(x))

Where p(x) is the function defined by the first requirement.

Now we have something we can use. We look at the function g(p(x)) and we calculate:

\frac{d}{dx}g(p(x))=\frac{dg}{dp}(p(x)) \cdot \frac{dp}{dx}(x) = x \cdot \frac{d^2f}{dx^2}(x)

Both requirements were used in the last transition. We now integrate this equation by dx:

\int \frac{d}{dx}g(p(x)) dx=\int x \cdot \frac{d^2f}{dx^2}(x) dx

Integrating by parts on the right-hand side:

g(p(x))=x \cdot \frac{df}{dx}(x) - \int \frac{df}{dx}(x) dx = xp(x) - f(x) + C

Thus g(p) is defined up to a constant, and choosing C=0 we get the familiar Legendre transform:

g(p)=xp - f(x)

Motivation

Finally, I want to comment on this second requirement that x=g’(p), which is how I came up with this derivation. In Thermodynamics, the important parameters of a problem always come in pairs: Pressure and Volume, Temperature and Entropy, Number of particles and the Chemical constant, and so on. They are paired by the First Law, which is conservation of energy:

dU = TdS - PdV + \mu dN + ~ \cdots

So here, the internal energy U is a natural function of S,V,N. Its partial derivatives are T,-P, and mu. Of course it can depend on other variables and there are other derivatives, but these are the ones that are easiest to calculate.

The Legendre transform is used to get new forms of energy that are naturally dependent on other parameters. This is done by switching between pairs of variables. For instance, if you know the temperature T instead of the entropy S, you can transform like this:

F = U - TS

F is called the ‘Helmholtz Free Energy’, and its natural parameters are T,V,N:

dF = -SdT - PdV + \mu dN + ~ \cdots

So T, which was a derivative before, is now a parameter. But equally as important, S is now a derivative. This is very important because it allows us to calculate S very easily. If we had a new function F=F(T,V,N) that wouldn’t allow simple calculation of S, it would be useless. This is what makes the Legendre transform so useful.

Having said that, we now see that we can define other transforms by changing this requirement. We still get functions that can be thought of as ‘functions of the derivative of f’. For example, require:

ax+b=\frac{dg}{dp}(p)

Where a,b are some arbitrary parameters. Then we get a new transform:

g(p) = a(xp - f(x)) + bp

Perhaps this method can generate some other useful transforms.

(*) I just saw that Wikipedia has something that is related to my explanation under ‘Another definition’, although it goes through a different route and still uses the geometric requirements.

Relativistic Bohr Model

November 10, 2007 – 4:20 pm

I recently needed to find the energy levels of hydrogen-like atoms (i.e. atoms with a single electron), taking into account relativistic effects. These effects become important for atoms with a large number of protons.

I only had a couple of days to do this, and I figured the easiest thing would be to create a relativistic version of the Bohr model. This is a toy model that predicts the energy levels of a non-relativistic hydrogen atom.

The first thing I guessed was that the basic assumption from the Bohr model regarding the angular momentum still holds. That is:

L=n\hbar

where n is some integer and \hbar is hbar.

Also, the electron should still travel in a circular orbit, so that:

L=rp

where r is the orbit radius and p is the linear momentum. It is easy to show that this is still correct relativistically (*).

Next we have the energy, which for a free particle now obeys the dispersion relation:

E^2 = p^2 c^2 + m^2 c^4

Our electron is in the electric field created by the nucleus, so this should be (**) :

E = \sqrt{p^2 c^2 + m^2 c^4} - \frac{Z e^2}{r}

Where e is the electron charge and m is its rest mass.

Let’s find the possible orbit radii. In the classical Bohr model this is usually done by calculating the force acting on the particle. But here we already wrote down this nice expression for the energy so we might as well use it… To do that, we will venture one last guess.

We assume that, given angular momentum L=n\hbar, the system chooses the radius where it has minimal energy.

You can verify for example that when you make this assumption in the non-relativistic version (instead of using forces and such), you still get the same result.

To get the orbit radii we then need to solve:

\frac{dE}{dr} = 0

which comes to:

r = \frac{L}{mc} \sqrt{ \left( \frac{Lc}{Ze^2} \right)^2 - 1 }

Plugging this into the expression for the energy we get the atom’s energy levels:

E_n = mc^2 \cdot \sqrt{1 - \left(\frac{Ze^2}{n \hbar c}\right)^2}

And there you have it. I have reason to believe this model works, at least partially, because I used it on measurements of large-Z atoms and it fit the data splendidly (while the non-relativistic model failed miserably). In addition, taking Z \rightarrow 0 recovers the non-relativistic energies, as expected.

Having said that, you may have noticed a problem in the last expression. For large enough Z, it is imaginary, and energy isn’t imaginary. How large a Z? Well if you’re a physicist may have also noticed that the coefficient is the fine structure constant:

\alpha = \frac{e^2}{\hbar c} \approx \frac{1}{137}

So for the ground state n=1, the largest possible Z is about 137. Above that, according to the model, the system is unstable.

You might be thinking that a more plausible explanation is that I made some mistake in my series of, uhm, guesses. That could be, but a quick look in the periodic table shows that it goes all the way up to… 118.

Well! Could this little toy model tell us something deep about the stability of atoms? Apparently it can. I googled for ’stability of large z atoms’ and came up with this review, where the instability is discussed. In page 10 it says:

The moral to be drawn from this is that relativistic kinematics plus quan-
tum mechanics is a ‘critical’ theory (in the mathematical sense). This fact
will plague any relativistic theory of electrons and the electromagnetic field
– primitive or sophisticated.

… we are on the ‘primitive’ side. ;)


(*) This can be verified by taking L=p_\theta and making a change of coordinates:

L = p_\theta = \frac{\partial x_i}{\partial \theta} p_{x_i}

Place the electron on the intersection of the X axis and the circular orbit to obtain the answer.

(**) This guess can be justified by writing down the Lagrangian for a relativistic charged particle (c=1):

L = - m u_\mu u^\mu - A_\mu J^\mu

And converting to a Hamiltonian. Taking the Hamiltonian to be the energy, one arrives at this formula.

Twin Paradox in a Closed Universe

September 21, 2007 – 3:14 am

If you’re familiar with the twin paradox, you can skip straight to the Closed Universe section.

Suppose you’re sitting in a car heading east at 100 km/h, while another car passes you by heading west at 100 km/h. According to special relativity, if you compare your clock with the clock of the passenger in the other car (let’s call her Alice), you will notice that her clock runs slower than yours.

The closer your relative speed is to the speed of light, the more noticeable the difference will be. For example, if your relative speed is 90% the speed of light, you will see Alice’s clock running about 2.3 times slower than yours. Of course, it’s not just the clock that will run slower — everything will run slower, including, for example, biological processes. Alice will age 2.3 times slower than you.

But what makes you so special? Nothing, really. Because according to Alice, it’s you whose aging slowly. So whose right? Well, the short answer is that you’re both right, because time isn’t an absolute thing — it depends on who’s measuring it.

This line of reasoning leads to the famous ‘twin paradox’: Two twins, Alice and Bob, are born on earth (if you must know then yes, it’s the same Alice). At birth, Alice is put aboard a spaceship that can reach extremely high speeds and sent off into space for a long trip. She reaches the edge of the galaxy, makes a U-turn, and returns to earth. This round-trip takes her 40 years in earth-time.

Meanwhile, Bob remains on earth. When Alice gets back, Bob is 40 years old. Because Alice traveled at high speed, Bob saw her aging very slowly, so when she gets back she is much younger than Bob, say 20 years old. But according to Alice, it’s Bob who aged slowly, so actually Alice should be older than Bob.

That’s the twin paradox, but it isn’t really a paradox. The problem is that in order to make this round-trip, Alice has to accelerate — she can’t go away and back again with constant speed. Therefore, the problem is no longer symmetrical: Alice is accelerating, Bob isn’t. To calculate exactly what happens in this case you need to use general relativity, but the answer is that Bob is right and Alice is wrong: Alice will have aged more slowly than Bob.

Closed Universe

I’ve been wondering, what happens to the twin paradox if the universe is closed? What I mean by ‘closed’ is that if you travel in a straight line, you end up back where you started. So the universe looks like a closed loop, only in three dimensions. (*)

In such a universe, the Alice twin can leave earth, travel with constant speed, and return to earth by simply going straight (**). Meanwhile, the Bob twin remains on earth like before. No one is accelerating, so the usual explanation of ‘Alice is accelerating and that’s cheating’ doesn’t apply here. Apparently, the situation is symmetrical: Bob thinks Alice is moving and aging slowly, Alice thinks Bob is. Whose right?

Well the answer is rather interesting. In special relativity we are used to saying that there are no ‘preferred frames of reference’ — that all inertial observers are the same, and that there are no absolute speeds, only relative ones. It turns out that in a closed universe this is no longer the case.

Consider the following experiment. Bob, who is on non-moving earth, simultaneously fires two photons at opposite directions. The photons travel around the universe and return to Bob, both at exactly the same instant.

Now let Alice perform this experiment on her spaceship. She sends out two photons, traveling at the speed of light. Let’s look at this experiment from Bob’s frame of reference, where the photons are still traveling at the speed of light (***). Bob sees the photons going around the universe, but meanwhile Alice is moving. So one photon will reach Alice before the other. Alice must observe the same thing (although they will not agree on the timing).

The same experiment, conducted by Bob or by Alice, produces different results. This is because Bob’s frame is special: It has zero speed, an absolute speed. Thus the symmetry between Bob and Alice is broken. Further calculations are required, but it turns out that Bob is correct: Alice ages less than Bob.

If you want more details, I found an interesting paper that discusses this paradox, and also develops the Lorentz transformations for such a universe.


(*) The mathematical term for this is actually ‘compact’, but that’s because mathematicians enjoy complicating things.

(**) You may be wondering how it’s possible to go in a circle without accelerating. You can try looking at it this way: In circular motion with constant speed, the vector of acceleration points inward toward the center of the circle. If you live in a 2-dimensional world, you can measure this vector because it points in a direction you can move in. For instance, you can hang a ball from a piece of string and watch the string stretch.

But if you live in a closed, 1-dimensional world, and you’re going in a circle, where is the acceleration vector pointing? It can’t point inward, because for you there’s no ‘inward’. In your 1D world there’s only front and back. As a consequence, you can’t measure this acceleration experimentally. And what you can’t measure experimentally doesn’t exist.

By the way, in terms of differential geometry, this is what geodesic curvature is all about: It’s the part of the acceleration that’s due to the shape of the manifold.

(***) This is one of the premises of special relativity — that light travels at the same speed for all observers. If you’re unconvinced it’s still true in a closed universe, consider the following. The metric of a cylinder is the same as that of a plane — a cylinder isn’t curved. Similarly, the metric of a closed, 1-dimensional universe is the same as that of an open one:

ds^2 = - dt^2 + dx^2

The constancy of light speed follows from the metric, and from assuming that the correct transformations (boosts) preserve the metric.

It’s like deja vu all over again

September 20, 2007 – 8:21 pm

6th grader raises $6.5 million in round A funding. The bubble’s back, folks.

Fun With PDEs - Part 2

September 17, 2007 – 6:42 am

Last time I described the first pitfall I encountered when solving a PDE — an inherent instability in the partial derivatives. This time I’ll talk about the second pitfall, which is simpler conceptually, but has wider implications for programming in general.

In my first implementation of the solution I used a simple method to calculate the derivative:

\frac{\partial f}{\partial x}[j] = \frac{f[j+1] - f[j-1]}{2 \, dx}

-
This is just the average of the right and left derivatives. I wanted to improve the accuracy of the derivative calculation because of various reasons, so I turned to Savitsky-Golay filters, which were also described in the previous post. Using this method you get a digital filter that you apply to your data. If you’re not familiar with digital filters, you can think of a filter as an array of numbers c_n, \, n=-l,\cdots,l which are applied to your function f as follows:

F[j] = \sum_{n=-l}^{l} f[j-n] * c_n

-
(I’m probably mixing up some of the signs here, but the intention is clear I’m sure). To calculate a derivative using S-G, you first apply their filter to your data, and then divide by the spatial resolution dx — the distance between adjacent points on the grid. It is worth noting that there is a highly efficient way of doing this calculation using FFT. For more details, see Numerical Recipes.

I implemented S-G and tested it using a simple sin(x) function, and it worked flawlessly. But after I inserted it into my main program, the simulation started spewing out strange results. After some debugging I discovered something very strange: Using S-G, the derivative of a constant function wasn’t zero, and in one case reached 10^{44}. How odd! Debugging further, I found that non-zero derivatives only happened when the initial function f had very large (and constant) values, on the order of 10^{22}, and dx was very small, about 10^{-10}.

To continue debugging, I dropped the fancy FFT convolution and switched to straightforward calculation — the one that’s shown in the last equation. This finally revealed the problem: When multiplying each value of the function f by the factor c_n and summing, some of the least significant bits are lost, because the numbers don’t all have the same exponent. So even though the calculation is accurate, the limitation of the computer’s double precision causes a loss of data. When the convolution is done, you’re left with a small value that’s not zero. But then to get the derivative you divide by dx, a very small number, and this shoots that small error through the roof. The smaller dx, the larger the derivative of the constant function! So once again, decreasing dx actually caused a larger error.

What helped me solve this problem was noticing that the S-G coefficients are anti-symmetric, i.e. c_{-n} = -c_n. Specifically, anti-symmetric coefficients have the same exponent. Therefore I changed the summing order to sum pairs of anti-symmetric factors:

F[j] = \sum_{n=1}^{l} (f[j-n] \cdot c_n + f[j+n] \cdot c_{-n}) + f[j] \cdot c_0

-
This solved the problem: Constant functions now had a zero derivative, because each term in the sum was exactly zero, even in double precision. And the happy consequence was that this fix also solved the weird simulation results I started with.

No More Time Waste

August 30, 2007 – 3:57 am

Lately I’ve been spending way too much time on tech news sites like reddit. Whenever I had some free time and thought about doing something productive like reading a book, I would end up gravitating toward one of these time suckers instead. Hours would pass. From time to time I’d learn something new, but most of it was spent on mind-numbing entertainment like lolcat. (my personal favorite btw).
Read the rest of this entry »

August 19, 2007 – 8:12 pm

It’s not that I’m so smart, it’s just that I stay with problems longer.

– Albert Einstein

Prisoners’ Escape - The Solution

August 13, 2007 – 12:30 pm

(you can find the original riddle here)

Let’s begin with a draft solution, one that doesn’t work but will start us in the right direction. And to keep things simple, let’s say all the tasks are done the minute the prisoners leave the meeting, so all they have left is to notify someone that they’re done.

Prisoner 1 is responsible for telling the guard once all the tasks are complete. We’ll call him the Leader. He keeps a counter called ‘num_done‘ which starts at 1 (denoting that prisoner 1 himself is done). When the leader enters the room, this is what he does:

if light is on:
    num_done = num_done + 1
    turn light off
else:
    do nothing
-

When any other prisoner enters the room, this is what he does:

if light is off and not turned_light_on:
    turn light on
    turned_light_on = true
else:
    do nothing
-

And each such prisoner maintains a flag called turned_light_on, initialized to ‘false’.

At the end of this, num_done should show the number of prisoners whose task is complete. And when num_done = n, prisoner 1 will notify the guard.

The problem is that this algorithm has an off-by-one error. If the light was originally on before any prisoner entered the room, then when the leader enters the room he will count that on-light as one prisoner. Therefore he’ll reach a count of n while only (n-1) prisoners reported in.

To solve this problem we need to modify the algorithm a bit, to handle the possible initial states of the light switch. We want to give the leader a chance to see the initial state before anyone else touches it. So the other prisoners start by doing nothing, until they get some sort of signal from the leader. This signal will be ‘flicking the switch’: They first want to see the light turned on, and then, when the time is right, they will turn it on themselves.

So now, each prisoner other than the leader maintains another flag called saw_light_on, which starts as ‘false’. The algorithm is:

if light is on:
    saw_light_on = true
else if saw_light_on and not turned_light_on:
    // light is OFF
    turn light on
    turned_light_on = true
else:
    do nothing
-

Next, the leader maintains an extra flag: saw_initial_state initialized to ‘false’. Her algorithm is:

if saw_initial_state:
	if light is on:
	    turn light off
	    num_done = num_done + 1
	else
	    // flick the switch to signal the others
	    turn light on
	    num_done = num_done - 1
else:
	// This is the first time the leader enters the room
	turn light off (if it's on)
	saw_initial_state = true
-

Note that the leader needs to continue flicking the switch forever. Otherwise, latecomers will never turn the light on. Also note that the leader must discount for his own flicking by decrementing num_done. Finally, as before, when num_done = n, the leader notifies the guard and the prisoners escape. There you have it!

Prisoners’ Escape

August 1, 2007 – 9:53 pm

In an effort to make this blog more interactive, I’m switching the riddles to a new format. I’ll first post only the riddle, without the solution, so you guys can have a crack it. After a week or so, I’ll post the solution. Here goes…

n prisoners are planning an escape. They bribed a guard to let them all meet once. Each prisoner was given a task to complete, such as digging or stealing some key. Once all the tasks are complete, the prisoners will notify the corrupt guard who will turn a blind eye while they escape. The problem is, the prisoners can’t talk to each other because they are all in solitary confinement.

Once a day, a prisoner chosen at random enters a storage room that has a single light bulb in it. The prisoner can either flick the switch (turning the light bulb on or off), or leave it as it is. At the time of the initial meeting, as they plan their strategy, the state of the switch is unknown.

Using only switch flicking, how can the prisoners coordinate their escape? Specifically, one of the prisoners must know at some point that all the tasks are done, so he can notify the guard and lead them all to freedom. If the guard is notified before all tasks are complete, the plan will fail and the prisoners will be caught.

(ready for the solution? here it is)