Tuesday, December 23, 2014

2015: arXiv identifiers get a new digit

Paul Ginsparg began to maintain xxx.lanl.gov – the server later renamed as arXiv.org – in Summer 1991. Since that time, the number of papers submitted each month would be growing.

You can see that despite the mild acceleration in recent 5 years, the increase was much closer to a simple linear increse from 0 in Fall 1991 to almost 9,000 in recent months (the latter number may be translated to 400+ papers on an average "live" day). Because 9,000 is rather close to 10,000 which is 10 to the fourth power, you may be worried about the identifiers of the papers.

Since April 2007, the users of the preprint repository were using a system that only allows 10,000 papers a month, a threshold that is likely to be surpassed sometime in 2015 or 2016.

So instead of identifiers such as arXiv:1401.0001, the year 2015 will kickstart new identifiers of the type arXiv:1501.00001.

It's a minor change relatively to the change from the system that indicated the subject, such as hep-th/9701025. In both cases, the limitation of the number of the papers that are allowed by the identifier system was quoted as a motivation of the change.

I think that in both cases, more conservative solutions would be preferred.

With the disciplines included in the identifier, one would have a reasonable chance to keep the number of papers in the sub-archive below 1,000 or a fixed number that is just slightly higher. Much of the increase was about the addition of new fields.

And in both cases, one could add additional "space" by using letters. For example, after 1412.9999, you could have 1412.A000 – and then up to 1412.Z999. This would add extra 26,000 papers in principle. Additional papers could be from 1412.AA00 to 1412.ZZ99 – that is extra 26*26*100 = 67,600 extra papers. I am using this convention for folders of photographs from digital cameras – FinePix or CoolPix or Lumia 01 up to 99, then A0, and so on. AAA0-style identifiers would add 175,760 and AAAA-style ones would add 456,976. In total, even if you disallowed "digits before letters", you would have almost 1 million identifiers per month.

At least for papers written in the coming decade or so, the initial digit D in arXiv:YYMM.DNNNN will remain zero for most of these papers. And that's quite a lot of unnecessary zeroes that people will have to write or read while talking about the million or two millions of papers that the arXiv will devour in these years.

With those 100,000 papers a year that are being sent to the server these days, I believe that the repository has become comparable to the body of all scientific papers that are being written in the world. And it doesn't seem impossible to switch the "remaining" fields of natural and mathematical sciences to the arXiv culture – although there are many reasons why this is not always happening.

Incidentally, my estimate is that because the traffic to arXiv.org exceeds the visitors to WUWT (to pick a benchmark) just by a factor of 3 or so, the average arXiv paper's page is seen just by hundreds of visitors, significantly below an average TRF blog post. Of course, there are papers that get many more hits and the arXiv visitors are much smarter than the visitors to an average blog (I am not sure about TRF, however).

I just wanted to make sure that if someone is going to submit papers in early 2015 and he or she sees arXiv:1501.fiveDIGITS, it's not because he or she is drunk.

No comments:

Post a Comment