Wednesday, July 06, 2011 ... Français/Deutsch/Español/Česky/Japanese/Related posts from blogosphere

I was wrong: DjVu may have created all Obama certificate anomalies

During the most recent 24 hours, I spent a few hours by looking for some additional anomalies in the long form Obama certificate. And I have spent a very similar amount of time by trying to look for alternative explanations - i.e. methods to create a multi-layered PDF file with similar bizarre properties that have been observed in the official PDF file.

See my previous dramatic article about this issue.

And I think that both lines of "research" have been successful. While I have found many new anomalies in the PDF file, I have also found a possible answer - something that may actually be the only possible answer. It's called DjVu. Let me credit a Russian guy on YouTube with the original discovery of the DjVu explanation: he has posted his video more than 2 months ago but it still only has 150 views - a very small number given the controversial character of the PDF file.




Another event that energized my search for non-forgery explanations was that I discovered - via an Internet forum that I forgot - a better scanned copy of the long form birth certificate:



The maximum resolution that you get if you click the JPG image above is 1335 x 1600 pixels and the file size is 311 kilobytes. Despite the apparently higher quality, it's actually a smaller file than the 376 kB greenish "compressed" White House PDF file.

The size of the JPG picture above is limited by the maximum dimension of pictures on Picasa Web, 1600 pixels, but I actually possess a 1669 x 2000 pixel file whose file size is 897 kB. I added the link so that you may download it, too.

Now, how did they create the silly green PDF file (they may have printed the monochromatic picture above on a greenish paper and scan it again) with all the layers that have different pixel sizes and different color depths, and with all the exact, pixel-by-pixel clones of the letters?

The pixel-by-pixel identical features of the image look like a result of a man-made copy-and-paste procedure. However, an alternative solution was named by Tobias - it's called JBIG2. It's a software routine that looks for almost perfect copies of objects within the image and represents them just once which may save many bytes.

It wasn't used cleverly but if the PDF file were not forgery, and the detailed JPG file above indicates that it wasn't, there exist programs that naturally perform this lossy procedure and represent similar objects on the image as identical clones of the same one.

DjVu is arguably the most likely software that could have been used to create a multi-layered image of this kind. DjVu is a very efficient format - it is a suffix much like PDF - to store scanned books etc. They come in really small sizes and they are displayed incredibly quickly.

Wikipedia informs us that DjVu really uses JB2, a method analogous to JBIG2, to search for nearly identical objects on the document, like letters in the same font, in order to save space. Various documents on the Internet indicate that DjVu may try to work with many components of the picture even though the main separation is the background-foreground separation.

The page What is DjVu on the DjVu.org website dedicated to the format tells us many more details. Indeed, DjVu separates the document into the background which is a low-resolution JPG file with the paper textures and images, and foreground which is a high-resolution, typically monochromatic text. It wasn't done nicely in the White House PDF file but the description may agree.

If you want to try all these things with me, go to the DjVu download page and download WinDjView and/or the DjVu browser plugin. Those things may display DjVu files.

You may try to download some free DjVu files via BitTorrent. Be careful to avoid copyrighted documents because they may be offered illegally and you may become a criminal by downloading them.

Now, you need some software to actually create DjVu files if you really want to try that it works. There is a collection of command line executables in DjVuLibre but it's a mess to figure out what they're doing and which one is the right one. I was actually able to convert some files with them but I am not sure they had the right properties. ImageMagick should also be able to learn to work with DjVu files but I couldn't make it work.

Instead, go to the DjVu download page and download DjVu solo, a Windows program. It worked for me and for the first time in 24 hours, it convinced me that it's more likely that the PDF file is a result of an automatic procedure and not a carefully crafted forgery.

I started DjVu Solo, opened the 1669 x 2000 image. It appeared on the screen. Then I clicked File / Open as DjVu, increased the resolution from 300 dpi to 600 dpi. Then I could choose scanned/clean/photo/bitonal. Bitonal produces black-and-white images only while photo produces huge images (by their file size).

However, with scanned and clean, I got amazingly small files:

Compressed birth certificate
The page above offers you the two files, scanned and clean, and they have 22 kB and 20 kB, respectively! That's quite a compression.

Download the files above and open them with DjView Solo. They will look pretty, just like the original JPG file - but only at 20 kilobytes. Note that the original JPG file had 300 or 900 kilobytes.

Also, DjView Solo allows you to see the background and the foreground separately (under View/Display). The former is a fuzzy colorful JPG-like picture while the latter is a pixel-exact, monochromatic, black-and-white GIF-like bitmap, just like in the layers of the White House PDF file. More importantly, you may see that there are pixel-identical letters in it.

Look at the title:



Click to zoom in...

You may see that the first three copies of the letter "I" are identical to each other. All three letters "E" in the title above are also identical to each other. Both "C" letters are identical, too. And all three "T"'s. And both "R". And both "F". So I have obtained an even higher concentration of clones - which is a part of the reason that my image is much more compressed than the White House image.

My background doesn't include any idiotic portions of the text because I am not a pathetic amateur resembling the incompetent folks in the White House. But I can still see that the effect could arise if the DjVu encoding procedure is not handled properly.

I don't want to waste your and my time too much with high-resolution pictures but the background looks like this:



while the foreground looks like this:



Click to zoom in.

It contains the nicely black-and-white text-dominated content at a higher-resolution. This foreground was arguably split into several GIF-like pieces when the DjVu program was running with more complicated arguments. I haven't been able to reproduce it but some sources indicate that DjVu may normally produce many foreground layers.

So it seems plausible to me right now that the seemingly bizarre steps were created by a compression and a badly done conversion to DjVu seems to be the best candidate what was happening with the scanned image before they released the horribly looking PDF file.

Add to del.icio.us Digg this Add to reddit

snail feedback (2) :


reader scottlocklin said...

Why don't you ask Yann and Leon? They wrote DjVu, and they answer their emails on a regular basis.


reader bm2 said...

PDF z Bieleho domu ma viac anomalii, napriklad podpis matky Dunham v 18a. je "D" - v greyscale, a "unham" ako BW, co by aj najstupidnejsi Obamov poskok nedokazal prehliadnut, resp. nemal dovod takto sfusovat, takze sa priklanam k nazoru, ze sa aj tato anomalia vytvorila automaticky v SW pri jeho snahe zvysit kompresiu pomocou rozpoznavania objektov.

Paradoxne ale ciary koloniek su greyscale, ale pisany text, alebo jeho casti su BW.