Calendrical endianness
« previous post | next post »
Today's xkcd:
The mouseover title: "Neither group uses iso 8601 because the big-endian enthusiasts were all at the meeting 20 years ago."
As usual for comics, and especially for xkcd, some subcultural background information is needed.
In this case, you need to know at least that "localization" doesn't refer to finding things in geographical or spatial coordinates, but rather (as Wikipedia explains) to "the process of adapting internationalized software for a specific region or language by translating text and adding locale-specific components". ISO 8601
is an international standard covering the worldwide exchange and communication of date- and time-related data. It is maintained by the Geneva-based International Organization for Standardization (ISO) and was first published in 1988, with updates in 1991, 2000, 2004, and 2019.
"Big-endian enthusiasts" refers to one side of the (now mostly settled) Endianness debates, where
endianness is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most significant byte of a word at the smallest memory address and the least significant byte at the largest. A little-endian system, in contrast, stores the least-significant byte at the smallest address.
Endianness may also be used to describe the order in which the bits are transmitted over a communication channel, e.g., big-endian in a communications channel transmits the most significant bits first. […]
Big-endianness is the dominant ordering in networking protocols, such as in the internet protocol suite, where it is referred to as network order, transmitting the most significant byte first. Conversely, little-endianness is the dominant ordering for processor architectures (x86, most ARM implementations, base RISC-V implementations) and their associated memory. File formats can use either ordering; some formats use a mixture of both or contain an indicator of which ordering is used throughout the file.
And of course the comic's key point is that U.S. practice puts months before days, so that 2/3 means February 3, whereas European countries put days before months, so that 2/3 means March 2.
My most recent encounter with calendrical endianness was a couple of weeks ago, when I volunteered at a Booster Clinic held in one of Penn's gyms. My job was to greet the folks coming in, verify that each of them was eligible for a booster, and send them onwards to other stations, either to get the shot or to get a U.S.-type paper vaccine card for documenting the shot (if what they had was an image of a vaccine card or a vaccine certification from another country).
There were a few people who were actually looking for a first or second vaccine shot, rather than a booster, and we had to send them elsewhere since the dosages are different. But the most common eligibility issue was the date of their last Covid-19 vaccine shot, since the clinic was following the official U.S. guidelines that prescribe six months of waiting (though they allowed a five-day "grace period"). This was easy to check on U.S. vaccine cards, but quite a few people had documentation from other countries: Europe, China, Korea, Russia, India, Australia, Mexico, Argentina, and so on.
Of course the diverse vaccination-certifications documents were in appropriately different languages — if I couldn't read one I took the applicant's word for the general nature of the document, as long as it looked plausible, since there were long lines and it didn't seem right to hold things up while I searched for some staffer who could read Thai or whatever. But I did check the dates, and this required a judgment in each case about whether N/M was the Nth day of the Mth month, or the Mth day of the Nth month.
RachelP said,
January 1, 2022 @ 9:56 am
I had a legitimate failure of comprehension recently, being informed someone had announced themselves as ‘Bigender’. I understood that this was referring to their being ‘big ender’ and tried to guess what that might mean (a Swift fan?).
I did not know the term from data science above. In fact, though, they are ‘bi-gender’, another term new to me and the lack of an article did not clue me in. How we live and learn!
Christian Weisgerber said,
January 1, 2022 @ 10:36 am
This raises a question: Which European countries even use the slash as a date separator?
Allan from Iowa said,
January 1, 2022 @ 10:38 am
Working in computer programming, I have been aware of endiannness for decades. But I used to have trouble remembering which term was which. So I privately thought of then as "the sensible way" and "the Intel way".
David C. said,
January 1, 2022 @ 10:40 am
Large multinational companies are commonly adopting the ISO YYYY-MM-DD across the board as part of computer file naming convention systems, for exactly that reason. It can be quite the hassle when programmers and the like assume that dates in raw data always come in a specified format.
I once had the experience of having a European colleague "translate" a date. Using the example above, I was filling out a form that asked for dd/mm/yyyy. I duly filled in 03/02/2022 for February 3, 2022. Thinking an American would never use this convention, it was input into the system as March 2, 2022. To avoid that kind of a situation happening again, I would now write 03/FEB/2022 when filling out paper forms.
David C. said,
January 1, 2022 @ 10:48 am
@Christian W.
To my knowledge, in the UK at the very least. I have seen German speakers use the dot separator convention when writing in English (as in 03.02.22), which is not always immediately obvious to English speakers that it should be read as a date.
https://en.wikipedia.org/wiki/Date_format_by_country
david said,
January 1, 2022 @ 10:50 am
I suspect the big-endian reference is to the year specification whereby the date would be read as 2002-MAR-22,
Philip Taylor said,
January 1, 2022 @ 11:01 am
Being unable to rapidly convert a numeric month into an English month name, and having lived through the Y2K débâcle, I invariably express dates using the VAX/VMS convention — today, for example, would be expressed as 01-Jan-2022.
Norman Smith said,
January 1, 2022 @ 11:07 am
In Canada, governments used to use the "DD/MM/YY" format for dates until the Y2K problem forced a reconsideration. I believe most have switched to the ISO 8601 "YYYY-MM-DD" format. Nonetheless, a lot of forms that collect dates still seem to be in year-last format of some kind.
As for popular use, it seems to be a mixed bag. We receive a great deal of US media, which leaves a lot of people thinking that we are using the American format.
Brett said,
January 1, 2022 @ 11:31 am
@RachelP: The names "big endian" and "little endian" for the ordering of bits are, in fact, named after the egg openers of Lilliput and Blefuscu. Implied is that having strong opinions about which ordering of significant bits is "better" is a waste of time.
Fritz said,
January 1, 2022 @ 12:20 pm
As Norman Smith implied, it's total chaos in Canada over which system is used. Many forms now specify DATE-DAY-YEAR and many other specify DAY-DATE-YEAR. I have no idea why they choose one over the other. As for ordinary people writing messages are concerned, it's totally random over whether you use one or the other. It could be age-related, but if so, that's not obvious.
Yvon Henel said,
January 1, 2022 @ 12:29 pm
@Christian W.
The d/m/yyyy format is still the usual one in France. The day and month numbers are not padded to 2 digits (today: 1/1/2022)
Not so long ago, it was possible to find day (arabic) month (Roman capital) year on 4 digits (today: 1 I 2022).
An old version for month provided some abbreviations: 7bre for septembre and so forth until 10bre for decembre.
Linda Seebach said,
January 1, 2022 @ 12:48 pm
I would understand DAY-MONTH-YEAR or DATE-MONTH-YEAR but would have no clue what order DAY-DATE or DATE-DAY intended.
unekdoud said,
January 1, 2022 @ 12:50 pm
I've seen that dot separator in Chinese, sometimes without the year (2nd Jan = 1.2)!
I do also get calendrical puzzlers when reading expiry dates on my snacks. Same difficulties, different consequences.
Yuval said,
January 1, 2022 @ 1:28 pm
My wife was denied a booster shot here in Israel for an injustifiably long time due to clerks' refusal to read US date format on her vax card properly.
Terry K. said,
January 1, 2022 @ 2:12 pm
For those of us vaccinated after the 12th of the month, there's no need to guess which date format. I've experience with similar with library registration forms (where I work( and Mexican immigrants. 1/2/21 is ambiguous. 15/2/21 is not. (It could be ambiguous where year is unknown and yy/m(m)/d(d) is an possibility, though, I admit.)
DMcCunney said,
January 1, 2022 @ 2:36 pm
@Allan from Iowa;
"So I privately thought of then as "the sensible way" and "the Intel way"."
Decades back, a chap on a BBS forum wrote drivers for DEC's VAX systems. The VAX used a flat 32 bit address space.
Intel used a segmented architecture in the x86 CPUs, so addressing was segment:offset.
My guy was giving a seminar on writing drivers, and described the audience confusion when he talked about Intel's segmented architecture First, they had to wrap their minds around the concept. Then they displayed blank incomprehension about why Intel *did* that. ;-p
(People programming for Intel CPUs in the MSDOS area had six different possible memory models to use. Fun, for suitable values of the term.)
——
Dennis
David Marjanović said,
January 1, 2022 @ 3:14 pm
What is a segmented architecture?
I think all of them – but some much more commonly than the others.
In the German-speaking places and east of there, days and numeric months are read as ordinals and consequently abbreviated with dots: March 15th is 15. 3., "fifteenth third". The spaces behind these dots are very often omitted (and the most professional typesetters apparently prefer hair spaces over normal ones).
Stephen Reeves said,
January 1, 2022 @ 5:32 pm
My birthdate is Dec 2 , in Canada some forms it come out as February 12 ,
Duncan said,
January 1, 2022 @ 6:04 pm
For public usage I've used ISO-8601 dates wherever possible since the '90s, including in hand-written forms with just a simple "Date" column where everyone else is using the traditional US m/d/yy.
For private (mostly digital) usage I do similar, but with just a single dot after the year making it 2021.1231 (for yesterday), four-digit year due to Y2K, with the dot separator because four digits is visually instant while an unseparated 8 digits 20211231 is not.
Tho lately I've decided that being in my 50s my chances of actually needing more than a two-digit year any longer are minuscule, so I've gone back to 6-digit, still with the leading year followed by a dot separator, so 21.1231 .
Dara Connolly said,
January 1, 2022 @ 6:21 pm
Ireland: My EU Covid cert uses 2 different date formats in a single document.
1) Date of Birth YYYY-MM-DD
2) Vaccination Date DD-MMM-YYYY
Sergey said,
January 1, 2022 @ 6:45 pm
Segmented architecture has nothing to do with endianness, it describes the addresses while the endianness is a property of the data. The segmented addresses have two possible meanings, in the original Intel architecture of 8086 CPU that is known as "real mode" and in the revised version that appeared in the 80286 CPU ("protected mode"). The 16-bit Intel 8086 CPU allowed to address only 64 KB while physically it supported up to 1 MB. So OK, they came up with the idea that the program and data would be divided into chunks of up to 64 KB ("segments"). It's kind of like 64 KB pages with manually-managed TLB, except that they don't have to be aligned on the page size, they are aligned on a smaller 16-byte boundary.
On the dates, in Russia they also use the dot separator, and the convention of using a Roman numeral for the month was alive at least until the mid-1990s, but I don't think it has been used recently. All the official papers (passports, birth ceritificates, etc.) used to have the month written in Roman numerals (if not spelled out as words) but not any more.
Duncan said,
January 1, 2022 @ 7:27 pm
David Marjanović: What is a segmented architecture?
In general:
https://en.wikipedia.org/wiki/Memory_segmentation
Or more specifically for x86 (which is a bit different than the general case):
https://en.wikipedia.org/wiki/X86_memory_segmentation
*Severely* abbreviated form: In the 16-bit era Intel divided memory into (maximum) 64-KiB memory segments, with the 64-KiB arrived at due to the 16-bit offset address within each segment. Addresses were thus segment:offset format instead of a flat unsegmented address, with separate settable default segments for data/stack/instructions/etc, so it was possible to set that and then use just a native 16-bit "short" address offset instead of the full segment:offset format, *provided* the size you were working with was under 16 KiB.
(Being "of sufficient age", I well recall running into that 16 KiB limit in practical ways such as "simple" (read as commonly available for the slow dialup download time or on the low-cost shareware disks of the time) text editors being limited to a 16 KiB file size, for instance. They had programmed loading it into a single segment, and simply couldn't deal with anything bigger, as doing so would have significantly increased the complexity of the program.)
Duncan said,
January 1, 2022 @ 7:40 pm
> 16 KiB limit
Ugh. 64 KiB segment size, internally addressed via 16-bit offset. Sorry.
Gideon said,
January 1, 2022 @ 11:52 pm
@stephen reeves, so we have what I call "inverse birthdays". I celebrate mine on 12/2, which is to say December 2 because I was born on Feb 12.
As an aside, happy palindrome birthday on your most recent! For one year only, I chose to write mine as DDMMYYYY so I could participate on "12/02/2021".
Of course only 144/366 dates have such an inverse, with 12 of them being self-inverses (i.e. Jan 1, Feb 2, etc)
Jon said,
January 2, 2022 @ 1:57 am
I have used the ISO yyyy-mm-dd format in all contexts, unless prevented from doing so, for more than 25 years.
It is complete and unambiguous
It sorts properly in filing systems etc
It is language-neutral
Peter Grubtal said,
January 2, 2022 @ 2:33 am
Dara Connolly
My EU Covid cert. (issued in Germany) uses only DD/MM/YYYY
DCA said,
January 2, 2022 @ 3:04 am
In some corners of the sciences, the date system is yyyy:ddd where ddd is the day number, 1 to 365 or 366: sometimes (but improperly) called Julian day. If you think of this as degrees around a circle it is easy to visualize where in the year a day number falls. This will never be broadly adopted, though if it were I suspect the month/day system would soon be regarded rather as we now view Roman numerals: quaint but not very practical.
(And then there the astronomers, who use actual Julian Dates: ideal for calendrical conversions but too unmoored from the year to be generally useful).
cM said,
January 2, 2022 @ 4:21 am
Dara Connolly, Peter Grubtal:
The actual EUDCC certificate (i.e., the digital data encoded in the QR code, not the stuff printed around it) uses unambiguous ISO 8601 for "date of birth", and unix timestamps for "time of vaccination" and "valid until".
This may be decoded/converted/displayed differently depending on program and locale used, coming back to this article's topic.
maidhc said,
January 2, 2022 @ 4:39 am
Intel's use of the word "segment" forever muddied the waters, because an Intel segment has a fixed and unchanging length (sort of a super-page). In previous architectures that were called "segmented" (from Burroughs and CDC, if memory serves), a segment had a variable length (possibly made up of an integer number of fixed-length pages).
The intent of the older segmented architecture was that a segment would contain something of a particular nature, for example a code segment or a data segment. This would simplify matters like access control by having it operate at the addressing level–having code segments be read-only, for example. But it would also allow things like only giving read access to certain data segments only to certain code segments, operating in hardware so it could not be circumvented by malicious software.
Intel's hijacking of the term "segment" described a kludge added to the architecture to make up for their lack of foresight in choosing too few address bits. The exact same problem that DEC had faced with the PDP-11 just a few years earlier, and had handled with a similar kludge, though without attacking the meaning of "segment". DEC's architect Gordon Bell (co-author of the classic computer architecture text Bell & Newell) was fairly outspoken about the problem (What We Learned From the PDP-11).
It was somewhat comical to watch Intel plunging enthusiastically into the same mess that DEC had just finished extricating themselves from, but their attack on the meaning of "segment" was regrettable.
maidhc said,
January 2, 2022 @ 4:56 am
I don't know about the current situation for Canadian dates, but back in the day the armed forces used a three-letter month, so either 2/JAN/2022 or JAN/2/2022 would work. Was this to be consistent with the Americans? I suppose it would have to be either English or French, but luckily the month names are not too different.
I don't get much mail from Canada these days, but it used to be that the Canadian post office (as far as postmarks) used colons and Roman numerals, so 2:I:2022.
Alley Oop said,
January 2, 2022 @ 9:01 am
At first, capitalized "Bigender" would have made me believe there was someone adhering to the obscure cult of Hubertus Bigend. Now I wonder whether Gibson hinted at endianness – but I can't recall any suitable clue from the book, so it might have been a coincidence, after all.
https://en.wikipedia.org/wiki/Hubertus_Bigend
Rodger C said,
January 2, 2022 @ 10:52 am
The US Army, last I checked (a bit over half a century ago), uses all numbers with European order. I grew up in Murica writing, say, today's date as 1-2-22. In the Army I learned 2-1-22 and it seemed more logical, but because of the ambiguity (and because I detested the Army) I now write dates in my notebook as 2.I.22. (That doesn't look very clear in this font. A month ago was 2.XII.21.)
David Marjanović said,
January 2, 2022 @ 12:41 pm
Thanks for the explanations of "segmented architecture"!
Daniel Barkalow said,
January 4, 2022 @ 12:42 pm
Everyone eventually comes around to YYYY-MM-DD for dates in filenames, because that makes them sort correctly despite the computer not using special rules for them. (That is, ones that start with "1" are before ones that start with "2", ones that start with "2021-09" are before ones that start with "2021-10", and so forth.)
Anthony said,
January 5, 2022 @ 2:28 am
American calendrical style is sometimes referred to as "middle-endian".
Moa said,
January 7, 2022 @ 10:02 am
In Sweden it's common to only put the "/" between day and month: 7/1 -22 or 7/1 -2022. (seventh january year 2022). Otherwise:
2022-01-07 or 07.01.2022. With only two digits for the year (22-01-07) it's easy to read it wrong, but you see it anyway. It can be very confusing when there is nothing to mark the pause between the units (220107). Yet I often find myself writing like that, six digits, no marks between units.
Andy Stow said,
January 7, 2022 @ 2:05 pm
"This raises a question: Which European countries even use the slash as a date separator?"
It's acceptable to write dates in the UK as 25/12/2021, but not with slashes as separators. Those are strokes.
Philip Taylor said,
January 7, 2022 @ 8:02 pm
OK, I have to ask : what is the difference between a slash and a stroke ? The Unicode standard refers to U+002F ("/") as a solidus, a word which I personally reserve for the vertical slash "|", but in the GPO we referred to "/" as an oblique, or less formally as "bar" ("A bar E" = "A/E" = "Assistant Engineer").
Philip Anderson said,
January 8, 2022 @ 3:58 pm
I (British) usually use a (forward) slash in a date, unless I use the month name or abbreviation. I don’t call it a stroke, although I’ve heard the term “Stroke city” for Derry/Londonderry (being the Catholic and Protestant names respectively).
Philip Taylor, didn’t a half-crown have a solidus when written as 2/6, where it indicates the shillings, or solidi in LSD?
Philip Taylor said,
January 9, 2022 @ 5:16 pm
"Didn't … LSD". Yes, indeed it did — a fact that somehow failed to impinge on my stream of consciousness until you kindly drew it to my attention. In fact, the solidus could occur in many pre-decimal prices, "5/-" (five shillings) being just one example.
Frank Gibbons said,
January 10, 2022 @ 5:44 pm
Rodger C pointed out that US Army uses the European order (DAY-MON-YEAR). Anyone who's renewed their US passport recently (or filled out immigration-related documents) might have noticed that they always request dates in the DAY-MON-YEAR order, and that your passport's dates are show as ddd MMM yyyy where MMM is the abbreviated *name* of the month, while ddd and yyyy are the *numerical* days and year. I wonder how the middle-endian-ness of dates took hold in the US, especially given the overwhelming European-ness of immigration until the latter half the twentieth century?