Sunday, February 21, 2016

Remove your own LIGO noise

I have finally played with the raw data from the first LIGO gravitational wave, GW150914. Everyone who can code is invited to do the same. It's a lot of fun.

Although I had to learn some Python for the Higgs Kaggle contest, my preferred tool is Wolfram Mathematica – whose full online version was just completed a week ago.

What should you do to evaluate the LIGO raw data yourself? Download all the datasets and Python code from the official LIGO tutorial page. Or follow Nicholas Lincoln's blog – he is a data analyst who has no official physics background, as far as I understand.

Or you can continue with me and I will try to make the things really concise.

The sound of the merger I can hear sounds almost just like the heartbeat around 0:01 of this Lenka's song. BTW Lenka would celebrate name day today if she were still Czech. I am deeply aware of this fact because I had met a pretty canonical holder of that name on her name day, February 21st, but in 1995. She is German these days (much like the singer is Australian) but with the name like Lenka, it's hard to claim that you are not Czech.

Spoiler: I got a beautiful "heartbeat" sound from both detectors, pretty much shifted by 0.0073 seconds (30 sampling points, as I will explain), and with the nearly opposite phase. My estimate of the actual confidence level is way higher than 5 sigma. It would be interesting if you tried to give me your own opinion about the significance of the signal.

For your convenience, I created simplified copies of the raw data from both detectors:
The files are posted at Dropbox. I recommend you to create your own 2 GB Dropbox cloud if you don't have one.

Each one-megabyte text file contains \(32\times 4,096=131,072\) lines with simple integers comparable to 1 million (with either sign). They are the "strain" data describing the "length difference" of the two LIGO arms during a 32-second window, sampled 4,096 times a second (the sampling frequency is 4,096 Hz, we say), and it's in the units of \(10^{-24}\) meters. (So to create the files, I divided the original numbers by \(10^{-24}\) and rounded them.)

So far, it looks simple, right? Among the 131,072 entries in each file, the black hole merger is close to the middle, more precisely around 67,250. Let me tell you in advance that the phases are approximately opposite to each other, the Washington signal appears about 0.0073 seconds (about 30 sampling points) after the Louisiana signal, and the Washington signal is a bit stronger.

Now, if you play the raw "sound" directly, it will contain too much noise. You won't hear anything. You will Fourier transform it and apply a filter – to pick frequencies between 50 Hz and 250 Hz (but the filter should be sort of smooth as a function of the frequencies). You try to play it and it's still noise – well, you will hear some resonant whistling at preferred frequencies.

The filter wasn't enough! You need to remove this "predictable" whistling – which represents the part of the signal with a high amount of autocorrelation. How do you do that? Compute the Fourier transform of the signal \(|\hat S(\omega)|^2\), average this function of the frequency over nearby frequencies, take the square root again, divide the original \(\hat S(\omega)\) by this square root, and multiply it by \(1/\omega\) to get a realistic behavior at low enough \(\omega\) again.

This removal of the highly autocorrelated part of the data is known as whitening.

Or do whatever your common sense tells you to do! At the end, you should get rid of the noise – too high and too low frequencies; the highly autocorrelated whistling etc. And when you play this "clean" version of the strain data, you should clearly hear a heartbeat at the place that I indicated.

When you have it, you should try to combine the Louisiana and Washington data to maximize the strength of the signal and compare its intensity to the noise. Estimate the probability that such strong and highly similar heartbeats appear by chance. I haven't quite estimated my own value of the confidence level but I am confident it will be far higher than 5 sigma, even after the look-elsewhere correction is applied.

You know, the possible delay is between minus 10 milliseconds and plus 10 milliseconds – between –40 and +40 sampling points. Add a factor of two for the possible flip of the relative phase. The look-elsewhere effect is increasing the tiny \(p\) from the \(p\)-level at most by 160 which is negligible. But when I combined the two signals (with the shift and the inversion), and normalized it so that this function has the usual standard deviation of one, I got a signal that exceeded 6 sigma at least at three independent enough places. So I think that the local significance level is close to 6 times the square root of three which is over 10 sigma. The probability is some \(10^{-24}\) or whatever it is, or \(10^{-22}\) when you reduce it by the two orders of magnitude. So a signal of this magnitude appears once in \(10^{-22}\) cycles or \(10^{-17}\) seconds or 10 billion years or so.

Maybe my calculation is naive but it qualitatively reveals my certainty that this can't be a statistical noise – such "bad luck" has occurred once since the Big Bang. It's much more likely that it's a consequence of LIGO's hacked computers. Computers are hacked much more frequently than once per 10 billion years.

I haven't created a Mathematica notebook that would be representative but you may look at one linked to in this sentence. Fourier[] and ListPlay[] are the most nontrivial Mathematica functions for this kind of game. And I am sure I could get a much cleaner signal producing a higher significance level, too.

My notebook will only be useful to an intelligent user. A part of it has to be run twice, one has to remember the Louisiana and Washington results in two variables, and they're used later. I guess that Mathematica will be useless for a user who can't do these things herself, anyway. (Update Sunday night: I made the notebook runnable without any manual adjustments.) Maybe such users may prefer a PDF printout of the notebook, anyway. Funny, you need to learn Persian or Arabic to read it. Is Stephen Wolfram promoting multiculturalism? ;-) I honestly did nothing else than "save as PDF". (Update: It's actually a glitch in the Dropbox PDF viewer, the PDF looks fine e.g. in Chrome when downloaded.)

After various adjustments of the filters and whitening parameters and combinations of the two detectors, this is probably the best reconstructed signal I could get. It's included in the updated PDF file linked to above. The strain signal is normalized so that the standard deviation is "1" on the \(y\)-axis. Nevertheless, the signal locally reaches 8-10 sigma at three places! Also, you may see that the signal actually looks more (quasi-)periodic before the merger than after it. It actually looks like I can almost clearly see a whopping number of seven orbital periods before the peak of the merger! ;-)

No comments:

Post a Comment