Last week, the internet in the US, in between the usual frenetic discussion about world events and memes, took a moment to gasp in shock that the US Senate had unanimously passed a bill that would make states who followed Daylight Savings time adopt it permanently thereby shifting most of the US into permanent DST (the few areas that only observe Standard time would be exempt). The bill was named a ridiculous “Sunshine Protection Act” S. 623.
Everyone was abuzz about how we could finally stop with changing clocks twice a year. Another group of people were angry that we should have made Standard time permanent because in the winter kids would be effectively forced to go to wake up and go to school multiple hours before dawn. Some people pointed out that we literally tried this under Nixon and people hated it so much it was reverted. Sleep experts seem to agree that standard time is more in tune with circadian rhythms.
Personally, I’d be overjoyed at not having to change clocks twice a year. But I’m a bit weirded out that we would permanently have the sun directly overhead at 1 P.M. instead of noon just on the principle of it all. Time’s a social construct and all, but it still feels weird.
Luckily, it seems that the US Senate passing that bill did so due to some procedural fluke due to the rules around “Unanimous Consent” where any senator can introduce a bill to pass by unanimous consent and if no one raises an objection it passes. Well, while everyone was busy with Ukraine and other matters, and no customary notice beforehand to every Senator went out, it slipped through.
It seems that the corresponding bill in the House, H.R. 69, has NOT been passed, and TL;DR, there’s probably some objections to it passing so I wouldn’t get your hopes up on abolishing time changes just yet.
But all this raises a very interesting question… how do our computers know what time zone it is? Governments around the world haphazardly change their time zones and daylight savings time rules arbitrarily all the time, how do computers handle this chaos?
Why, there’s a database for that!
The tz database
Since the late 1980’s there had been some volunteers that maintained a public domain list of time zones and daylight savings time, and the myriad of rules surrounding those details have changed over time. It’s referred to by various names — tz, zoneinfo, tzdata, etc.
As of 2011, IANA, the group under ICANN that among some other things manages how IP addresses are allocated across the globe, took over responsibility for maintaining this time-zone database after a copyright dispute resulting in Astrolabe, Inc. v. Olson et al. caused the original FTP where the files lived to shut down. A pretty comprehensive page with links to a lot of resources and basic descriptions of the databse is here and is actually part of the official distribution.
The database is a set of data files and corresponding code that keeps track of time zone changes since the start of the Unix epoch, UTC 00:00:00 Jan 1, 1970. Note that the concept of Universal Time, and UTC were not adopted by all countries simultaneously, so 1970 was roughly in the range of when everyone had started using clocks that agreed with each other. Go too far before that and everything gets messy.
This package is widely referenced and relied upon in some form by practically everything. Your individual operating system or favorite programming language very likely relies on a timezone library that’s either directly derived from, or developed by closely referencing, this database.
I’ve known for a while that this database existed — you can’t really obsess over leap seconds and timekeeping as much as I do without coming across it. But I’ve never actually opened the files themselves to see what’s inside.
Reading the database manually
There’s a very nice and clear summary of the data format used in the tz database by Bill Seymour, which I’m cribbing from very heavily here. To summarize, the database has two schema — Rules and Zones.
Rules describe the daylight savings time rules that are in effect for a location, for example Chicago:
#Rule NAME FROM TO - IN ON AT SAVE LETTER
Rule Chicago 1920 only - Jun 13 2:00 1:00 D
Rule Chicago 1920 1921 - Oct lastSun 2:00 0 S
Rule Chicago 1921 only - Mar lastSun 2:00 1:00 D
Rule Chicago 1922 1966 - Apr lastSun 2:00 1:00 D
Rule Chicago 1922 1954 - Sep lastSun 2:00 0 S
Rule Chicago 1955 1966 - Oct lastSun 2:00 0 S
The first line essentially reads “For Chicago, June 13th 1920, at 2AM, daylight savings time starts, saving 1 hour, and you use the letter ‘D’ to represent it (aka, CDT instead of CST for the Central time zone, except the time when they were Eastern time zone). Then the next line describes how in 1920-1921, daylight savings ended on the last Sunday of October at 2AM. And so on.
What’s maddening is the Rules for starting Daylight Savings, and starting Standard, are independent. In 1922-1966, Daylight Savings started in April, but you can see in the last two lines, switching Standard time got delayed a month from in 1955. You need to keep track of both ends to make sure your clock (and thus, datetime math) is set correctly.
But Chicago is part of the US, and sometimes Chicago followed the US-wide rules instead of doing their own thing. Those rules are reflected below. Observe how there’s data since 1918, because the tz database has been adding details about time keeping before 1970 over time. It includes things like the clock changes during World War 2 in an attempt to save energy for the war effort.
#Rule NAME FROM TO - IN ON AT SAVE LETTER/S
Rule US 1918 1919 - Mar lastSun 2:00 1:00 D
Rule US 1918 1919 - Oct lastSun 2:00 0 S
Rule US 1942 only - Feb 9 2:00 1:00 W # War
Rule US 1945 only - Aug 14 23:00u 1:00 P # Peace
Rule US 1945 only - Sep 30 2:00 0 S
Rule US 1967 2006 - Oct lastSun 2:00 0 S
Rule US 1967 1973 - Apr lastSun 2:00 1:00 D
Rule US 1974 only - Jan 6 2:00 1:00 D
Rule US 1975 only - Feb 23 2:00 1:00 D
Rule US 1976 1986 - Apr lastSun 2:00 1:00 D
Rule US 1987 2006 - Apr Sun>=1 2:00 1:00 D
Rule US 2007 max - Mar Sun>=8 2:00 1:00 D
Rule US 2007 max - Nov Sun>=1 2:00 0 S
But rules aren’t not the end of it. “Chicago” in this example is just the major city that sits within the US and is representing a time zone that the database names the America/Chicago time zone after. The convention is that every time zone is an area that usually sets their clocks together and the time zone is [usually] named after the largest or most recognizable place in the time zone.
Here’s the Zone entry for Chicago:
#Zone NAME STDOFF RULES FORMAT [UNTIL]
Zone America/Chicago -5:50:36 - LMT 1883 Nov 18 12:09:24
-6:00 US C%sT 1920
-6:00 Chicago C%sT 1936 Mar 1 2:00
-5:00 - EST 1936 Nov 15 2:00
-6:00 Chicago C%sT 1942
-6:00 US C%sT 1946
-6:00 Chicago C%sT 1967
-6:00 US C%sT
Unlike the Rules from earlier, Zones only have one active at a time. Here it starts in 1883 with LMT (“Local Mean Time”) with an offset off UTC of -5:50:36. I have no idea how they determined that. From 1883 until 1920, they used the US rule set. The format string uses a “%s” to denote where the “Letter/S” field in the rule (S or D) goes — CST vs CDT. Then you can see over history how Chicago switched between US and Chicago rule sets. Notice how from March 1 1936 until November 15, 1936, Chicago decided to be EST instead of adopting Daylight Savings time for Central time — the time change was effectively the same but somehow they decided to implement it differently.
Between the Rules and the Zones, a computer with a given date, say, July 10th, 1968 in Chicago could figure out what time zone it’s in and whether it should have daylight savings time on or not (answer: central time UTC-6, daylight savings of 1 hour so effectively UTC-5).
Now just scale this construct… across the globe for every time zone.
Also included in the database is C code that helps read the files, a set of aliases in the backward
and backzone
files to handle changes in the names of various things in the database (for example, America/Sao_Paulo
was at some point labeled as Brazil/East
).
But it’s the comments that are the most amazing bit
Originally, I wasn’t going to write about time zones this week. But out of curiosity, I downloaded the database just to see what was inside since I had never inspected them before.
After downloading and unpacking the latest release, (2022a), mixed in with the bits of code were files for every major continent, antarctica, northamerica, etc. Open one of those up, like for antarctica, and you’re greeted with a huge wall of comments. In fact, comments dominate these rule and zone files. Once I realized what was in these comments, I knew I had to share. Here’s a breakdown of the comment volume for the non-code files.
# quick/lazy bash loop to count commented lines vs non commented ones
> for f in `ls|grep -v '\.'`; do echo "$f `cat $f |grep '^#'|wc -l` `cat $f|egrep -v '^#'|egrep '[a-zA-Z0-9]'|wc -l`" ;done
Inside, you’ll see comments from various contributors providing notes and research explaining why a given Zone has the rules that it does. It’s essentially a condensed history of modern timekeeping. (Note: the authors very clearly state that these the database and notes aren’t intended to be perfect and complete, only useful. Corrections are always welcome.)
# tzdb data for Antarctica and environs
# This file is in the public domain, so clarified as of
# 2009-05-17 by Arthur David Olson.
# From Paul Eggert (1999-11-15):
# To keep things manageable, we list only locations occupied year-round; see
# COMNAP - Stations and Bases
# http://www.comnap.aq/comnap/comnap.nsf/P/Stations/
# and
# Summary of the Peri-Antarctic Islands (1998-07-23)
# http://www.spri.cam.ac.uk/bob/periant.htm
# for information.
# Unless otherwise specified, we have no time zone information.
# FORMAT is '-00' and STDOFF is 0 for locations while uninhabited.
# Argentina - year-round bases
# Belgrano II, Confin Coast, -770227-0343737, since 1972-02-05
# Carlini, Potter Cove, King George Island, -6414-0602320, since 1982-01
# Esperanza, Hope Bay, -6323-05659, since 1952-12-17
# Marambio, -6414-05637, since 1969-10-29
# Orcadas, Laurie I, -6016-04444, since 1904-02-22
# San Martín, Barry I, -6808-06706, since 1951-03-21
# (except 1960-03 / 1976-03-21)
...
# From Lee Hotz (2001-03-08):
# I queried the folks at Columbia who spent the summer at Vostok and this is
# what they had to say about time there:
# "in the US Camp (East Camp) we have been on New Zealand (McMurdo)
# time, which is 12 hours ahead of GMT. The Russian Station Vostok was
# 6 hours behind that (although only 2 miles away, i.e. 6 hours ahead
# of GMT). This is a time zone I think two hours east of Moscow. The
# natural time zone is in between the two: 8 hours ahead of GMT."
...
Zone Antarctica/Vostok 0 - -00 1957 Dec 16
6:00 - +06
There’s lots of fascinating stuff like how Turkey had multiple instances of shuffling their daylight savings changes on short notice, to the confusion and annoyance of many. One discussion is below:
# From Faruk Pasin (2014-02-14):
# The DST for Turkey has been changed for this year because of the
# Turkish Local election....
# http://www.sabah.com.tr/Ekonomi/2014/02/12/yaz-saatinde-onemli-degisiklik
# ... so Turkey will move clocks forward one hour on March 31 at 3:00 a.m.
# From Randal L. Schwartz (2014-04-15):
# Having landed on a flight from the states to Istanbul (via AMS) on March 31,
# I can tell you that NOBODY (even the airlines) respected this timezone DST
# change delay. Maybe the word just didn't get out in time.
# From Paul Eggert (2014-06-15):
# The press reported massive confusion, as election officials obeyed the rule
# change but cell phones (and airline baggage systems) did not. See:
# Kostidis M. Eventful elections in Turkey. Balkan News Agency
# http://www.balkaneu.com/eventful-elections-turkey/ 2014-03-30.
# I guess the best we can do is document the official time.
Or how Saudi Arabia prior to 1968 ran on quasi-solar time where people would reset their clocks to midnight at sundown in accordance to the Islamic calendar. Every day the clocks shifted relative to the rest of the world! The entry in the tz database cites a story about how one man at a local power substation named Higgins flipped out over the constantly shifting arrangement and declared that the station would run on “Higgins Time”.
Jon Udell, in 2009, wrote up an appreciative post about the comments in the tz database highlighting a lot of the quirks and research that went into maintaining this file. It’s well worth the short read. I largely echo that sentiment.
I haven’t had time to read though the many thousands of lines in all of these files, but the little that I’ve read was deeply fascinating. I highly recommend that we all take a look at this marvel that only a handful of people in the world ever bother to read, let alone contribute to.
Data DEMANDS context come with it
Since I’m a hopeless nerd, I absolutely adore the bits about historical changes to time zones across the world. But there’s a more practical lesson to be had here about documentation. At first blush, from a purely engineering perspective, you’d wonder why was is there so much lengthy commentary that is not meant for machines to read? Why was the energy invested in this way?
But the answer isn’t too hard to imagine — imagine if these configuration files weren’t commented. You’d just have these lengthy tables of offsets and rule changes — it’d be like any number of config files lurking in your Linux /etc/ folder, full of mind-numbing symbols that you’d touch maybe once a decade.
What would it be like maintaining such a file? What happens when the current maintainer wants to retire their position and pass it on to someone else? How can you know what’s a potential mistake?
The comments that take up very often greater than 80% of the files are critical notes for the maintenance of the database, and it’s baked right into it. They’re like margin notes for a big almanac. These notes let you quickly figure out why a certain change happened — there’s a link or at least a mention of whatever caused a change. That way someone can always quickly verify whether a mistake has been made without diving into a huge research project to figure out “what happened in 2014 Turkey that caused daylight savings to shift?” Since many time zone changes are announced well in advance, someone going in blind would have to research what happened in Turkey in 2014, maybe 2013, and possibly even earlier.
I think that’s a very important lesson for us as we continue to build and maintain long-lived datasets that will be shared to others.
Resources
If you’re thinking “but how do I know on a map what time zone a place belongs to?” the tz database mentions that Timezone Boundary Builder does this and looks pretty accurate. In reality, there’s geopolitical issues surrounding what time zone someplace uses, so the borders can be much fuzzier than you would think.
Stuff to share w/ the Data community
Nothing on my end this week.
Standing offer: If you created something and would like me to review or share it w/ the data community — my mailbox and Twitter DMs are open.
About this newsletter
I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With excursions into other fun topics.
Curated archive of evergreen posts can be found at randyau.com
All photos/drawings used are taken/created by Randy unless otherwise noted.
Supporting this newsletter:
This newsletter is free, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:
Tweet me - Comments and questions are always welcome, they often inspire new posts
A small one-time donation at Ko-fi - Thanks to everyone who’s sent a small donation! I read every single note!
If shirts and swag are more your style there’s some here - There’s a plane w/ dots shirt available!