Sunday, May 16, 2010

Running out of IPv4 addresses

There exists one more possible "disaster" somewhat similar to Y2K that might actually be somewhat more threatening:
The world may be running out of IP addresses.
See Google News for some updates. The ICANN head warned that only 8-9 percent of the IP addresses are left. On December 21st, 2012, the Mayan Long Count calendar is ending, so it is expected that by the end of 2012, the IP addresses will be gone. ;-)

No one else will be able to connect to the Internet. You won't be able to sell another cell phone. Of course, this prediction is complete nonsense because if such a global problem were real, we would already be seeing it "locally" in the portions of the Internet that have already run out of the address space. But there is *a* problem because the new largest domains that can get their new part of the address space may become impossible at some point in the near future and they may be forced to "share" with the existing ones.




Recall that every computer on the Internet - which also starts to mean every powerful enough cell phone (and other computer-like devices that will appear in the future, like a gadget inside a bag of coffee in the supermarket that will communicate with the supermarket to inform it whether the bag has already been sold or stolen) - has its own identification. I don't talk about the "motls.blogspot.com" addresses that are readable by humans. The internet traffic is directed according to numerical addresses that are not so easily readable by humans (unless you are able to read numbers, of course haha).

The routers and computers are using the so-called IPv4 addresses these days, a particular standard that has developed from simpler ways to identify the hosts in the past. They're made out of 32 bits. They can be separated to 4 bytes (a byte is eight bits) i.e. 4 numbers between 0 and 255 (note that 2^8 = 256). For example, arxiv.org may also be called 128.84.158.119. Now, 2^{32} is around 4 billion - less than the world's population.

Clearly, this number is not quite enough. Many of us need several IP addresses for our computers and cell phones - and some addresses are also wasted.

At the beginning, the IPv4 addresses would be divided to fixed-size subnets. That was very wasteful because many subnets contained many fewer computers than the assigned IP addresses. So the rest was wasted. Ultimately, the subnets became arbitrarily large - at least the number of fixed bits defining the subnet, as seen in the "subnet mask", became arbitrary. In principle, you waste at most 1/2 of the addresses by a smart decision.

The domains and clusters that run out of the IP addresses use various methods to avoid the depletion such as IP masquerading (which is almost the same thing as network address translation). A more modern method to circumvent the limitations of IPv4 is to simply increase the number of bits. You need to switch to a completely new system which is almost inevitably incompatible with the old one.

The IPv6 protocol was proposed by an official group of engineers in 1998. It's no rocket science: they just make the sequences of bits longer and simplify various rules which is possible because of the big space. The IPv6 addresses have 128 bits and can distinguish 2^{128} which is about 10^{38} hosts. That's such a large number that you can freely waste lots of IP addresses. So once again, the subnets may be defined by a fixed size - like at the beginning of IPv4 - namely by 64 bits in this case. Note that it's still a lot, 2^{64} is about 10^{19}. You may have 10^{19} organizations and each of them can have 10^{19} gadgets in it, so to say. That should be enough for a couple of years. ;-)

More seriously, that's clearly enough for any foreseeable (and less foreseeable) future. But it doesn't seem easy to switch the whole world to IPv6. I think it's clear that whole regions of the world are not ready at this moment. Some countries don't have any software that can deal with IPv6 at all. The Internet could become dysfunctional. It could also work smoothly. But I think that before the world switches, it should try to "test the global switch", so that if it doesn't work well enough, the whole change will be reverted in a minute.

There will be a lot of slowing down if the world begins to switch to IPv6 gradually. As soon as pieces of Internet will switch to IPv6, they will make lots of the old IPv4 addresses available, which will reduce the motivation of the rest of the world to switch. But maybe, this continuing co-existence of IPv4 and IPv6, slowly and gradually preferring the latter, is the best way to switch. It's not quite clear to me how easy it is for the networks to co-exist if the new software that understands everything is only available on one end of a communicating pair.

Eventually, all of us can be switched to IPv6 and things may become smoother. It sounds attractive but given the possible difficulties with the transition from IPv4, I am not sure whether IPv6 is the best solution to the problem.

Instead, one could also introduce some "official" ways to extend IPv4, without changing its structure radically. For example, the address space could be extended by "extensions" - supplementary identifiers similar to those you would add to a telephone number of a person in a large institution that only has one telephone number for everyone. In fact, there already exists a method to use this trick: the port numbers. Some bits of the port number could be used to distinguish several hosts sharing the same of the 32-bit IPv4 address.

Readable identifiers as the main ones

Also, and this may be my original idea, the world could try to switch to a system of identifiers that is actually based on the human-readable addresses of the hosts. The computers could be sending the packets directly to "www.amazon.com", without any translation to an unreadable format in the middle. Such names of hosts may occupy dozens of bytes but it's no problem these days.

Even if you add dozens of bytes representing the host's ID to each packet, the packet size won't increase much, relatively speaking. It's because the number of bytes needed to cover an address space only grows logarithmically with the size of the address space. But the capacity of the internet connectivity has grown "exponentially" rather than logarithmically. So we clearly have lots of space to waste for the transmission of long identifiers with each packet, and we will have much more of it in the future.

Once you appreciate this point, there is one more radical way how to change the structure of the identifiers of hosts on the Internet.

The whole hierarchy with DNS servers translating readable addresses of the hosts to the unreadable IP addresses seems like a redundant intermediate messenger to me. It's one of the remnants of the old times when people had to save every bit because it was difficult to remember one bit and it was difficult to transmit it. So they had to think how to compress the information into as small a number of bits as possible. We don't have these limitations anymore.

I am convinced that if all the "real" identifiers of the hosts were readable, it would simplify all the routing tables - and it would make their management much more intuitive.

The transition to this new "readable IP" system could be smooth and, in some sense, compatible with the IPv4 system. The computers that would understand the new system would already be calling themselves by the "readable IP" addresses which can be arbitrarily long. But the important servers would still have both "readable IP" as well as IPv4 addresses, just like today.

It would be the "numerous clients and cell phones" inside the large (modern) networks that would be the first ones that start to lose their unique IPv4 identifiers. There would be no problem with that because they can't be directly called by anyone who hasn't switched to the new "readable IP" system. On the other hand, their providers would obviously have to understand the new IP system.

By doing so, you could effectively reduce the size of all the IPv4 subnets - because only the "servers" that should be accessible by everyone would have their IPv4 addresses. The remaining clients etc. would only be identified by the "readable IP" or "IPv6" system (or any other new system that you want to introduce and that doesn't suffer from the size limitations).

So the "quickly expanding" parts of the Internet - with lots of newly attached cell phones etc. - would be required to use the new IP system for all the "clients" while the old, and probably underdeveloped, domains on the Internet could continue to use IPv4 for everyone, servers as well as regular clients, at least for some extra years.

What do you think?

3 comments:

  1. Sorry, this won't work. You have to be able to route these host addresses through the internet, and with your proposal *every* router out there would have to contain the *entire* list of host addresses out there - which is obviously not(I hope you understand why) feasible

    ReplyDelete
  2. Come on, nos, your criticism is complete nonsense.

    There's a simple way to show that no such "stopper" can exist. An (obvious yet less perfect than those that would exist later) algorithm for the routers to work is for them to translate all the readable IP addresses to "auxiliary" IP addresses, which could happen to be equal to the old ones, and reduce the routing problem to the old IPv4 routing problem.

    It's completely clear that with the readable IP addresses, things couldn't be more complicated than with IPv4.

    Another way to tell you what's wrong with your argument is that today, the routers don't remember separate routes how to get to every IPv4 address, either. They identify the subnet/cluster, and send the packets correspondingly to another host that also knows what to do with it.

    In reality, only a couple of bits decide where a router sends it, and so on.

    Exactly the same thing would still be true for readable IP addresses. In first approximation, you can imagine that the routers would remember the domains. For example, a client in the *.cz domain could have a provider that sends every *.com packet, except for some listed exceptions, to another host that deals with the foreign addresses, and so on.

    Obviously, at some point, they would need to divide all *.com addresses to several comparably large pieces, to decide what the next step is for the transmissin of the packet.

    But needless to say, the domain name servers are doing exactly the same task today, too.

    What I am proposing effectively unifies the work of the routers with the work of the domain name servers. These two jobs may still be *de facto* separated to two different hosts. But they don't have to be separated.

    Today, DNS servers may require some time to translate a readable address to the IP address. When this job is done, it's helpful for servers (and clients) to cache the information for future packets.

    An analogous claim would also be true for the "readable IP address". However, they (could but) wouldn't have to cache any auxiliary IPv4 address associated with the readable IP address: they could simply cache the best route(s) how to send the packet for a host, given by its readable IP, to the right target. It's the same, except that you avoid the useless intermediate step of having IPv4 addresses.

    At any rate, it's obvious that algorithms - and very natural ones - may be written down that reduce my readable IP system to the existing system combining IPv4 and DNS's.

    Cheers
    LM

    ReplyDelete
  3. Consider:

    Hi-end internet backbone routers, which cost millions of dollars, use special memory (CAM, SRAM) to cache routing tables. These assume 32-bit addresses. Replacing these routers with routers that handle 128-bit addresses (or god help me, the variable length addresses you suggest) will be VERY expensive. I am sure that Cisco et al will continue to warn about address exhaustion as they will make lots of money fixing it. Follow the money.

    I don't have the stats at hand, but I have heard that even though we are running out of address ranges, the actual usage of addresses within the allocated ranges is moderate (20%?)

    IPv6 was developed when it was believed that each and every device needed to directly communicate with each and every other device. Today we know that not only is this not necessary, its actually a huge security problem and walling off devices behind NATed firewalls is actually a huge advantage.

    There are large address ranges that were allocated to organizations for their private use that could be reclaimed. Why do Ford Motor Co, IBM, GE, HP(2), Apple, Merck, Xerox, and Prudential Insurance need Class A's. Why does US DoD need 10 Class A's!!!. These addresses could all be redeployed for far less than converting all the backbone routers.

    Remember even when addresses are exhausted, the existing Internet will continue to run just fine.

    ReplyDelete