Saturday, September 30, 2017

Private keys, RSA, digital signatures, blockchain: rudiments of cryptography

The Bitcoin cultists completely misunderstand economics – and the magic of the (fiat) money – but I've noticed that most of them completely misunderstand the point of the blockchain itself which is a pity because it's a clever idea, indeed (although not a terribly useful one).

There are some basics of cryptography that aren't really taught but people should understand them. In practice, most people don't even get why anything should be crypto-, why there should be any keys, and what all these things are good for. Do you have a relative who believes that it's fine for his or her Google account to be unprotected and for the password to be basically public? I do! ;-)

Passwords, encryption

But the reader is expected to understand what passwords are good for. You may need to be sure that only you – or people who know the password – access your files. They contain sensitive material that allows you to manipulate your wealth (you don't want everyone to read the message "dear son, I have digged the bricks of gold under the 34th tree next to the crossing at the Niggerville forest"), that stores the information about your private life, private parts of your body and soul, and other things.

Passwords may be employed so that the operating system prevents you from logging into the computer unless you type the correct password. However, the hard disk still contains the files and there could be a way to avoid the operating systems and get to the content of the files, anyway. So files may also be encrypted. It means that they're transformed in some way that depends on the password. It's important that the resulting encrypted file doesn't allow an attacker to guess the password or the unencrypted file, not even with a reasonably huge amount of CPU power and time.

The encryption must be "irreversible in practice" in the sense that it must be vastly easier to encrypt the file or message, assuming a given key (password), than to decrypt it. In the context of computational complexity and cryptography, people talk about one-way functions rather than "irreversible ones in practice".

OK, to encrypt a file with a password isn't too hard. Periodically add the password to the bytes and then process it by several more steps like that which look complicated and nonlinear but which can be reversed with the knowledge of the password. So the computer may decrypt the file when it's told the right password but it can't guess the right password. It's important that you don't end up with some simple linear methods of encryption. For example, if you just periodically added the letters of the password to the bytes of your file waiting to be encrypted, the encrypted file could contain "KONVALINKAKONVALINKA" instead of "@@@...@@@" for every sequence of zeroes in the file. So the password would directly show up in front of your eyes.

I chose the password "KONVALINKA" which means "LILY OF THE VALLEY" because it was the password of a public Sinclair ZX Spectrum text game "Robots' City" in Czechoslovakia of the mid 1980s. I displayed the file after it was loaded from the tape and there was a "KONVALINKA" close to the beginning. I tried that password and it worked! So I could have played the game 2 weeks before the official start when the password ("KONVALINKA", indeed) was officially announced in the media. Well, I was far from being the only smart kid at that time. ;-) At the end, the prizes were given to successful solvers of the game as a lottery, not according to the precise time when they completed the game, as originally promised.

But if you do something slightly more complicated, it's believed that there exists no feasibly fast way to determine the password from the encrypted file. There's no publicly known fast algorithm to do such things – and it's believed that there's no mathematically possible algorithm waiting to be discovered, either. But almost none of these crucial statements have been proven. Most of the belief that they don't exist is mainly due to the failure of all the researchers to find them so far.

OK, so the concept of "encryption by a password" is relatively straightforward. Things get a bit more complex when we start to talk about private keys and public keys. How do those things differ from the simple password and what are they good for?

Public keys, private keys, digital signatures

The purpose of the encryption using public keys and private keys is to enable some "limited, read-only access" to certain people. You want someone to be "in between" the actual owner who knows all the passwords and may do everything and create all the correctly encrypted files and messages; and those who don't know the password at all and who only see the encrypted gibberish that is useless to them.

In particular, you want the owner of the file to be able to encrypt everything as well as the special "read-only access" people to be able to decrypt a file or a message. But you don't want these people to be able to create the encrypted content by themselves. This is clearly important for digital signatures.

In the real world, the declaration of war against North Korea is simple. The U.S. president knows his Twitter password, logs into Twitter, and writes a message, e.g. "I will tear your aß to pieces tonight, you nutty little rocket man", and the nuclear war basically starts. But if you wanted a world where fake Trumps can't ignite the nuclear war ;-), you would have to introduce checks and balances that prevent others from pretending that they're Trump – but his messages must still be readable by others, otherwise they couldn't obey the orders.

So the digital signature is something that Trump may attach to the message and that plays the same role as regular signatures – something that no one is able to fake, assuming all the unproven no-go theorems, in a feasible amount of time.

A digital signature must have the "right content" – so when it's "decrypted" in a certain way, the result must be the right thing. But at the same way, the people who may verify that it's the right thing must be unable to write their correctly encrypted messages themselves – because you don't want them to be able to start the war. OK, how does it work?

You need a scheme in which Trump, the ultimate user, possesses the "public key" as well as "private key". The private key is necessary and sufficient for encrypted messages; the public key is sufficient for decrypting them. The ultimate user, Trump, has both keys. The generic "read-only users" know the public key and they may decrypt the messages and/or verify them come from the right person, Trump, but they can't create such valid messages. How do you achieve it?

Prime numbers, RSA

One always uses some magic of number theory – the part of mathematics about prime integers, divisibility, and various Fermat-like theorems. The most famous – and the simplest known example, in some sense – is the RSA cryptosystem invented by Clifford Cocks, a colleague of James Bond in the U.K., in 1973. However, that discovery was only declassified in 1997 so before that moment, three men with the surnames R,S,A could have patented basically the same thing in 1978. If you want clever algorithms to be named after you, you shouldn't work for the British intelligence.

How does this RSA system of public keys and private keys work? The basic observation is that "powers of an integer computed modulo \(n\), another integer" (modulo \(n\) means "replace every result by the remainder of its division by \(n\)") simply have to be periodic functions of the exponent. Why? Because in the \(\ZZ_n\) cyclic group, there are just a finite number of possible results of the exponentiation, namely at most \(n\). So after some finite time (number of iterations), you have to get to the same value. The powers \(x^y\) modulo \(n\) for a fixed \(x\) and increasing \(y\) remain "maximally diverse" if all numbers from \(1\) to \(n-1\) in \(\ZZ_n\) are alternating in some permutation. Note that \(0\) in \(\ZZ_n\) cannot alternate because once some power is \(0\) modulo \(n\), all the higher powers will be \(0\) modulo \(n\), too.

Let's look at powers of numbers modulo \(n=3233 = 61\times 53\), just for fun. If you choose a base \(x\) that isn't co-prime with \(3233\), only some numbers in \(\ZZ_{3233}\) will appear among the powers \(x^y\) – all of these powers will be multiples of the greatest common divisor of \(x\) and \(3233\). In particular, you won't be able to find the multiplicative inverse of \(x\) in \(Z_{3233}\).

However, if the base \(x\) is co-prime with \(n=3233\), i.e. the greatest common divisor of \(x\) and \(3233\) is \(1\), all the numbers \(1\dots 3233\) (the zero is excluded) will be appearing among the powers \(x^y\) and you will be able to find the multiplicative inverse of \(x\) in \(\ZZ_{3233}\). Now, what is the periodicity \(\Delta y\) of the powers \(x^y\) modulo \(n\) for a fixed \(x\) in that case?

It's simple to find a periodicity. All the numbers in \(\ZZ_n\) that are co-prime with \(n\) have to appear there. The non-co-primes have to be excluded because they would shrink the diversity but all the others are there. The number of such numbers is known as Euler's totient function (Eric Weinstein's review). How many numbers in \(\ZZ_{3233}\) are co-prime with \(3233\)? How much is the totient?

Well, for numbers of the form \(n=pq\) like \(3233=61\times 53\), it's simple to count them. It's all the \(3233\) numbers except for one zero, except for \(60\) nonzero multiples of \(53\), and vice versa, i.e. except for \(52\) nonzero multiples of \(61\). In total, it's \(3233-1-60-52=3120\) numbers. In Wolfram Mathematica, you get this result through EulerPhi[3233].

So we're guaranteed that the 3120th power of every integer modulo \(3233\) is equal to one! However, \(3120\) isn't the lowest such exponent. The lowest exponent is known as Carmichael's function or reduced totient and in this case, the actual minimum periodicity of the powers is \(\lambda=3120/4=780\). (It wouldn't really hurt if I were using the original, Euler's totient rather than Carmichael's reduced totient but I will use the reduced one.) In the relevant, somewhat general case of \(n=pq\), the Carmichael's totient function of \(n\) is the least common multiple of \(p-1\) and \(q-1\) (i.e. it equals Euler's totient divided by the greatest common divisor of \(p-1\) and \(q-1\) and the greatest common divisor of \(52\) and \(60\) happens to be \(4\) here).

Now, RSA is straightforward but very clever. Pick a number \(e\), the public key, that is co-prime with \(\lambda=780\), for example \(e=17\) (\(e\) may be chosen to be prime, in that case it's enough to demand that it's not a divisor of \(\lambda\)). Because it's co-prime, you may find its inverse, the private key, \(d\) in \(\ZZ_\lambda\). In our case, the inverse is \(d=413\) because\[

17\times 413 = 9 \times 780 + 1.

\] That's great. \(413\) is the private key and \(17\) is the public key. How do we encrypt messages? Take a message – a file – and interpret it as a large integer \(m\), for example by reading the file as a number written in the base-256 system. Donald Trump knows the private key \(413\) so he may compute the encrypted message as the "ciphertext"\[

c = m^d = m^{413}\,\,{\rm modulo}\, 3233

\] and send it to his employees. His employees may exponentiate the text modulo \(3233\) again, using the public key exponent \(e=17\), and they get\[

m = c^e = c^{17}\,\,{\rm modulo}\, 3233 = m.

\] Why did they get the original message? Because \(c^e = m^{de}\) but \(de=17\times 413\) is equal to a multiple of \(780\), the periodicity, plus one. So the power of \(m\) modulo \(n=3233\) is the same as the first power, namely \(m\). It just works! Everyone computes modulo \(n=3233\) so they must know \(n=3233\) – it's a part of the private key as well as a public key. In the text above, I could have calculated \(d,e\) from that \(n=3233\) because I knew the decomposition of \(n\) to primes, \(3233=61\times 53\). But if you don't know that, you probably can't determine (in practice) the private key \(d=413\) from \(n=3233\) and from the public key \(e=17\).

Note that the primes \(61,53\) were used to invent the public key and private key in the first place. Only Trump – or his computer and/or his secretary and IT staff – know these primes. But they don't have to be considered a part of the private key. Only \(n=3233\) and \(d=413\) are used to encrypt the message \(m\). You may consider the private key "413th power" of the message to be nothing else than some sort of 17th root within \(\ZZ_{3233}\) – the inverse operation to the 17th power. But it's hard to compute the "roots" in \(\ZZ_{3233}\) if you don't know the prime decomposition of \(3233\).

So you see that the mechanism involves two equally good primes \(\{p,q\}=\{61,53\}\) – they play a completely symmetric role in this scheme – and another pair of dual exponents \(d,e\), in our case \(413,17\), that are used to encrypt and decrypt the messages. The exponents \(d,e\) are the private key and the public key, respectively, and they're determined from \(p,q\) by the totient logic above.

In practice, the integers \(p,q\) and therefore \(n=pq\), \(\lambda\) and \(d,e\) have to be chosen larger. The procedures that are needed to encrypt and decrypt may be done using elementary algorithms rather quickly – in particular, the exponentiation modulo something isn't too hard, even if the base and the exponent are huge numbers. But the decoding of \(d\) from \(e,n\), just like the decoding of the prime factors \(p,q\) from \(e,n\), is probably impossible to do in practice (too time-consuming).

Excellent, it's over. Back to the signatures.

I guess that many of you have either known it or you have gotten lost, anyway. At any rate, there exists at least one pair of algorithms that allows the owner to write encrypted messages using his private key, that allows everyone to decrypt the messages using the public key, but the public key isn't sufficient for authoring the encrypted messages.

When you send the data to your partner, you want her to read what you sent, but not be able to author the encrypted messages herself. And you want everyone else to see just gibberish. So you may simply exponentiate your message to her public key, even though you're the sender. Modulo her \(n\). She exponentiates the encrypted message onto the power of her private key exponent and decrypts the message in this way. No one else can do it. So imagine that this is the kind of encryption that happens when one server sends the data to another using HTTPS even though the practice is a bit more complex because some of these operations would still be too slow in practice.

Signatures are simple, too: the hash code of a message (some shortened, one-way function of the message content) is exponentiated to the private key exponent of the sender. Others may exponentiate it to the public key exponent, compare with the hash code of the message, and see that the author of the signature is authentic and really authored the message. Great. These tricks are behind our ability to send the data through the Internet in such a way that all the people who are listening in between will only see gibberish. Think about it. It works, it seems safe, and it is very useful in many situations where you need to protect the information and/or someone's authority.


Now, how does the Bitcoin of Satoshi Nakamoto work? What does it bring? How do the payments work, what is the role of encryption or digital signatures, and what is the problem that the Bitcoin has solved?

When Tony sent me some fractional amount in the Bitcoin (which is nevertheless a lot of money due to the irrational Bitcoin singularity), he basically produced the following letter:
Hi, my name is [encrypted secret version of the name Tony],
by this message, I confirm that I donate A Bitcoins to [Motl's Bitcoin address].

I am really more generous than anyone else and to prove that this statement was made by me, [encrypted secret version of the name Tony], here is my signature:

[Signature obtained from the hash of the message above, by exponentiating it to some Tony's private key exponent]
The brackets make the letter a little bit abstract and mathematical but the human content of the letter isn't too different from a cheque. A cheque would say that "Tony pays some money to Luboš" and it would be certified by his signature, to make sure that not everyone who finds an empty cheque from his cheque book may rob his account in this way.

In the Bitcoin case, the need to "prove the payer's will" (you prove it by exponentiating the signature of [Tony]'s to his private key exponent) is the same as in the case of a cheque except that the people's names are scrambled – replaced by some Bitcoin addresses – and the signature is digital. I have explained that everybody may verify, using the public key for [the encrypted name of Tony's wallet], that he has indeed written the message himself – i.e. he made the decision to send the coins.

OK, would such a signed message be enough for people to send the Bitcoins to each other? Well, it's not enough yet. Why? While we can verify that [Tony] is willing to make the payment, we're not sure whether he actually had those A Bitcoins to start with! ;-) If he were sending actual gold coins, the signed letter could be enough for his employees to send the gold to me. And the gold would disappear so the same order to his employees wouldn't work again. But how do you emulate the "missing gold insurance" in the case when the currency is intrinsically worthless and there's really no gold (and no anything) there at all? To create a new system of payment, we must also verify that there have been previous transactions that transferred some money to Tony's wallet; and that Tony hasn't spent the money in another way yet (to avoid "double spending").

So whether you like it or not, you need to remember at least the result of the prehistory. In practice, the Bitcoin is designed so that it remembers the whole ledger – the blockchain. The blockchain is nothing else than a sequence of digitally signed letters confirming the transactions just like the letter above (I won't discuss some efficiency improvements – e.g. that the sequence isn't quite linear but rather a Merkle tree which is not a child of Angela Merkel). These messages are clumped to blocks. I will get to this point soon.

OK, if you have the whole history of the Bitcoin payments between all the Bitcoin addresses [anonymized IDs of the wallets or their owners], you can clearly verify whether someone has the money to donate something – whether he has the "right" to do it. You just go through the history and look at all the transactions (signed letters) that include that Bitcoin address, and you follow how much the balance of that account was changing as a function of time.

If the resources are available, it's possible to add the transaction (Tony's payment to me). You ask everyone who has the copy of the blockchain to add the new transaction. Everything would be almost possible with the ledger – just a sequence of digitally signed "I donate this and that" letters – except for one detail: the double spending problem.

You know, Tony may have some 15 Bitcoins and he could decide to be extremely generous and donate 150 Bitcoins – to his 300 favorite websites. At one moment, he would contact 300 different servers that hold a copy of the ledger. Each of these 300 servers would be asked by Tony to make a different payment to a favorite website of Tony's – at the same moment. So Tony could donate much more money than he has. None of the 300 servers would see a problem immediately – because the other simultaneous transactions/donations haven't been reflected in their copy of the ledger yet.

So how is this problem prevented? If you have PayPal or some Internet banking, there's simply one official ledger and it defines the only truth. You have to submit the payment instructions to that server in some order and only the first one will be approved if you no longer have funds for the other ones. But what if you want to decentralize things?

Nakamoto introduced the Nakomoto consensus. What does it mean? It means that each participating server picks the correct version of the ledger in a simple way: it's the longest one (among all the candidates that you're offered by other computers in the network), one with the highest number of blocks. The blocks contain the sequence of the digitally signed "transaction letters" as well as some codes that depend on the whole previous form of the ledger and some numbers that depend on the whole hypothesized or proposed ledger and that are hard to be calculated. And when these numbers are determined, the computer that determined them may add a "mining transaction" confirming that some new Bitcoins have just been mined and belong to the [Bitcoin address that the miner controls].

So the miners are motivated to do some difficult calculation because they can attach the message "I just successfully mined some new Bitcoins" to the blockchain. When many such miners co-exist, they simply pick the longest version of possible blockchains because they're financially motivated to do so. If they omitted recent blocks in which they had successfully mined some new Bitcoins, they would rob themselves of those previously mined Bitcoins.

A miner could be motivated to "unconfirm" some blocks that include his own payments – when he got poorer. But unless he controls a majority of the CPU/GPU power used for mining, the other miners will kill these efforts because they are still motivated to confirm the blocks in which they have found some new Bitcoins by mining.

So after some iterations, all the attempts to "hide" some blocks that prove the outgoing payments are eradicated by the greed – by the strengthening majority opinion of the miners. If there's no coordinated majority that would want to fake or hide some old transactions, the ledger with transactions is getting longer "fairly" despite the fact that no copy of the blockchain is official. As additional blocks are being added, the older transactions become more confirmed – they acquire an increasing number of confirmations – and the probability that something irregular was happening with those older transactions decreases with this age or number of confirmations.

In this way, no ledger is official – so no bank or supervisor of the ledger has to be trusted. The miracle of the Bitcoin is that the double spending is avoided without relying on any central authority.

All this justice evaporates when some entity controls most of the mining CPU power, however, like China may control if it nationalizes the miners on its territory (65% of the world's Bitcoin miners). In that way, China could "unconfirm" even rather old, 30 times confirmed transactions.

It could "overspend" in the way I sketched for Tony – with his non-existent 300 Bitcoins. China could buy something for 1 million Bitcoins, get the product, and then "undo" the payment – "unconfirm" the corresponding transaction in the blockchain. And buy another thing for the same money afterwards. Even if the transaction was in an older block that has 50 confirmations (a number surely considered sufficient) – because the currently mined block is separated by some 50 blocks from this past confirmed one – it could still produce a completely different, alternate history with 51 new blocks where the "outgoing payments" (transactions that reduced China's Bitcoin balance) are omitted simply because China is capable of increasing the length of the blockchain faster than the rest of the world's miners.

In that way, China would become able to double spend. It could also do all the things – like forking etc. – which are being done by a "political consensus of the miners and developers" in the Bitcoin community so far (but to do such things changing the software, you also need to announce them because you need the participants to adjust the software etc. – if they don't update their software, the participants will determine that something has broken about the Bitcoin ecosystem). Nakamoto speculates about the psychology of such rogue miners. You could unconfirm very old transactions and blocks and other miners would notice that something is anomalous. The Bitcoin system would obviously become untrustworthy as soon as you would get 2 candidate Blockchains that differ in 50-blocks-old blocks: it would mean that you can't consider 50 confirmations enough. Needless to say, the relevant fraud could be hiding in "just several recent" blocks, too. A rogue entity in control of a majority of the mining CPU/GPU power would have to decide whether the destruction of the Bitcoin's reliability and rules, as believed by others, is a better idea than some localized theft etc.

At any rate, a majority miner with bad intents would obviously be a lethal problem for the whole system. It would be exactly the same problem as a regular bank that is hacked by a hacker who can adjust the account balances, add, or remove transactions from the official ledger. That hacker also faces the dilemma whether he wants to steal big, or whether he prefers to do things that are almost invisible. The blockchain obviously doesn't make it completely impossible to cheat and steal for people who have a sufficient power and bad moral principles to cheat and steal. The blockchain only changes the character and location of such scammers – like the trust, the scammers are "delocalized" in the sense that their CPUs doing the mining must exist in a bigger part of the world (because the mining takes a lot of computers, and you need a majority).

So the difference is purely technical. There is no "moral progress" making it impossible for the very powerful people with a bad intent to do brutal things. The possibility always exists – just the "strategies of hacking" are different for the Bitcoin than they are for hackers working to get the control over a bank's computers.

I hope that at least some readers will catch some new ideas from the post above.

No comments:

Post a Comment