Synchronicity
This week I’ve been obsessed about time… again. Because somehow I still haven’t exhausted this topic. But despite my constant griping about many of the dizzying aspects of handling time, I also find it an irresistible draw because the depth of thinking and ingenuity needed to measure what feels so fundamental is just… awesome.
The exact concept I’ve latched onto this week is on clock synchronization, specifically, the concept of coordinating the timestream that is my life with other remote people in the mess of Zoom/Google Meet (I think that’s the current chat-app branding…) world that society has devolved into.
The main reason I care about time sync now is because, like missing a scheduled train, I’ll be writing a code snippet or turned around to give the kid a headpat or move a toy, and suddenly realize I’m 15 minutes late to a 30 minute 1:1 meeting. Given the craziness of the times, it happens and people have been very understanding, but it is obviously suboptimal behavior. Similarly, I’d glance at the analog clock above my computer to check the time for an important meeting, only to realize it’s been running 3 minutes fast (or worse, 2 minutes slow) and get overly frustrated that the quartz wall clock running on a draining AA battery isn’t keeping time as well as my NTP-synced computers and cell-tower-synced cell phone.
My new WWVB “atomic” radio clock… that still hasn’t synced (4 days and counting…)
In an attempt to try to fix the analog clock issue, just bought a low cost “Atomic” clock, which is actually a radio receiver to WWVB in Fort Collins, Colorado. That’s a giant shortwave radio station that broadcasts the time slowly but strong enough to cover the entire continental US states. Thus far, due to distance and probably EM interference, the darned thing still hasn’t synced up to the proper time… This stupid thing is what kicked off this whole piece here.
Since that thing isn’t syncing, I guess I can either return it, obsessively stalk the WWVB status page, or build a DIY micro WWVB transmitter (which is surprisingly complex, and hard, and you’d risk breaking FCC transmission rules [a big no-no] unless you keep the power extremely low and probably some other technical details I”m missing).
Also, while I’ll get to this towards the end of this piece, time synchronization, especially super ridiculously accurate synchronization, is a bedrock requirement for a lot of modern data infrastructure and analysis.
But first, join me in some history. Because fun!
Syncing Clocks is Fundamental
Clocks are very interesting and boring things all at once.
At the core, they’re just devices that steadily mark regular intervals. Whether it’s pendulum swings, clicks of a mechanical clock escapement, crystal vibrations, the rhythm of your nephew repeatedly bouncing a ball in the living room, or the bouncing of supercooled Ytterbium atoms on atomic fountain, these could all be the basis for clocks. The only difference between the different methods is how steadily and consistently they beat. We just take those beats, count up how many happen within a second — defined as of 2018 to be “the unperturbed ground state hyperfine transition frequency of the caesium 133 atom [Delta nu_Cs] is 9,192,631,770 Hz”, and we have a device that can be used to tell human-consumable intervals of time — a clock.
The thing about clocks is that having just one clock, whether it’s an atomic clock or a bucket dripping water, is of limited utility. If instead of Russel’s Teapot, there was a ticking cesium clock ticking in Sol orbit, how much use is it alone?
Clocks become much more valuable when used with another clock. It allows for coordination based on time. Multiple people in separated from each other can reference their clocks to coordinate actions, or come to agreement about what events happened in what order (relativity nothwithstanding). That’s why for most of human history, the sun and stars were used as the primary basis of clocks. Weather permitting, you could look up and be referencing the same “clock” as anyone else on the planet.
In fact, clock syncing is SO fundamental, clock accuracy is defined as the absolute value of offset between a clock and a reference. A clock is perfectly accurate if its time exactly matches that of a reference clock. Asking whether a lone clock in a vacuum universe is meaningless because all you have is this one thing ticking away with nothing to compare it to.
During normal operation, a clock’s accuracy can drift away from the reference due to technological or physical differences (like gravity and temperature), and so synchronization is the process of minimizing the relative offset between the two clocks.
Since the sun provided a near-universal shared reference clock that was steady and predictable, local solar time was good enough for most of human history. Personal clocks and watches could be set against the big town clock (or a big bell would ring at important time points). The big central clocks were usually set to some form of solar time, and life would just function. We couldn’t travel or send messages fast enough that the concept of “minutes” was of dubious utility in daily life, let alone the “second”. Morning, noon, afternoon, and phrases like “quarter past”, “half past” exist today because that’s roughly the chunks of useful time people lived by.
That’s not to say that people weren’t interested in dividing time into smaller segments. Big mechanical clocks that showed minutes and seconds have been documented since the mid 1400s. The devices themselves were of varying accuracy. They’d lose or gain time relatively to the sun pretty quickly, so would need to constantly be reset. But they were the best available at the time and technological advances would slowly increase overall accuracy.
Accurate Time Starts Taking and Also Saving Lives
The quest for having increasingly accurate clocks only got serious during the Age of Exploration, where Europeans in search of more lands and peoples to plunder, enslave, conquer, and colonize put a lot of money and energy into developing the marine chronometer. Britain had rewards posted for the development of a chronometer, much like the modern day X Prize for space exploration (and other things).
A navigator can determine latitude by looking at the angles of various stars at night, but determining longitude required knowing the difference in time between your local time, and a reference time (the Great Meridian at Greenwich). This meant they had to (effectively) carry a copy of the clock at Greenwich with them, or otherwise risk getting lost at sea if they ever lost sight of land. Once you had that, you’d compare your Greenwich time to a local fixed time point, solar noon, and you’d see how many hours east/west of Greenwich you were. A bit of math or lookup tables and you’d have your longitude.
This was the probably the first big example where the accuracy of timekeeping devices had a direct effect on the lives of large groups of people around the globe.
But if you’re not a sailor or a exploited indigenous person somewhere, the effects of the marine chronometer was still remote to you. Accurate clocks only “got real” for everyday society with the development of railroads.
The problem with railroads was that there’s usually a single track running between two spots for cost reasons. Railroads build these things called passing loops which let trains pass one another on otherwise single rails. But otherwise, there’s plenty of potential for having two trains on the same track either going in opposite directions (definitely bad), or going in the same direction but at different speeds (still bad).
In order to make sure that a given section of track only had one train on it, there needed to be coordination of when trains set off at various stations. Efforts to do this started with the use of telegraphs to coordinate train traffic and synchronize clocks, but as rail traffic expanded and lines became more congested, the solution ultimately required accurate clocks and time to be distributed across the whole railway network.
Screenshot from Rails West discussing railroad watches and accidents.
Railroads were ultimately the force that created the modern standard time zones we know today in the US and UK. Since the railroads needed to have accurate clocks to ensure safety of service, and the railroads were a central part of transportation in the era, society did the expedient thing and adopted the way railroads set their clocks up.
As a side note, you can still buy “railroad approved” watches (and pocket watches like the Seiko SVBR003) that must meet a certain standard of accuracy and precision (and also design involving readability), similar to how the term chronometer also means a watch movement meets certain standards of timekeeping.
Timekeeping in the Atomic (and Internet) Age
Skipping through a bunch of timekeeping advances like quartz oscillators, we’re now in the atomic timekeeping age, where time itself (the second)is defined via atomic clock ticks. We finally have clocks that keep time with enough frequency stability that we can measure really cool effects like relativity (which must be accounted for to make GPS work), as well as measure how the rotation of the Earth is slowing down.
Nowadays in tech, we routinely use time units of milliseconds for things like page load speed. Sometimes data is stored at microsecond resolution and processors function in nanoseconds (1 Ghz is 1 cycle per nanosecond).
The clocks in our computers “merely” use super cheapo quartz crystals to keep time that will drift away from atomic clock standards quite quickly because their frequency isn’t exactly right to tick exact seconds. To fight drift, we use the NTP protocol to regularly synchronize our computer clocks together. For highly time-critical applications, the more advanced Precision Time Protocol (PTP) is used instead because it offers sub-microsecond accuracy at the cost of requiring specialized hardware.
Why is all this so darn important? It provides a shared monotonic number generation system, so timestamps also see occasional use in forms of authentication (like in Kerberos) or with cryptograhpic nonces.
More interesting to us data folk, accurate time lets computers agree on the order that things happen (again, no relativistic effects). If your computer clock was somehow set 1 year behind the rest of the world, everything you’d do or create with a timestamp would appear to have happened in the past to every other computer. That could lead to very weird software behavior.
This fact is getting even more important these days because in the Big Data age (wait, are we phasing the term out yet?) we’re doing more and more with giant distributed systems. Guess what you often need to do when data is generated and processed in sharded systems across the world? You’re gonna need to put them in order at some point.
We’re all super lucky: We can take time for granted
NTP v0 dates from 1985, the latest (proposed) version v4 dates from 2010. It’s good enough to hold most of the internet together and is ubiquitous. For most of us who aren’t obsessed enough to become horologists, it’s plenty good enough and Just Works without any thought on our part.
NTP can keep clocks within a few tens of milliseconds over WANs, and within a few milliseconds of each other on a LAN in lab settings. For a small amount of money, you can build a basic stratum 1 NTP server with an ESP8266 (or an Arduino) with a GPS module and get your clocks very close to that time over the LAN.
If your data work looks anything like mine, you’ll find that you’re simultaneously super reliant on making sure your data is time stamped correctly, but also very ambivalent as to what the exact value of the timestamp is (so long as it’s correct). I don’t usually care what it means for something to happen on 2020-08-29Z18:44:23.413. I might care it happened on the 29th. I might care about the hour in local time for other work. But I don’t usually need to know things happened down to the millisecond.
Instead, I primarily care about how I could use the time stamp to order and group my data. It’s important to know that registration happened before credit card swipe. It’s important to know transaction #1 happened before refund #2. This consistency is the C in ACID, and we like things to be this way.
So when I think about how Cloud Spanner and CockroachDB use super high accuracy GPS clocks and complicated algorithms to offer consistency, (well TrueTime is said here to give linearizability) for a distributed database that can span the globe, I’m somewhat floored.
The properties of the distributed database algorithm in those systems essentially allows the database to know that event A came before B, not because transaction A explicitly ended before B, but because the timestamp where A happened is before B’s timestamp, and there’s no error overlap in the time measurements (this paper explains it pretty clearly and concisely). So, effectively, the global cluster is using clocks and algos to cheat a bit to achieve a consistency state slightly faster than the speed of light would allow traditional multi-step transaction handshakes can work.
We’ve gone from making sure everyone runs on the same time (to within maybe 30 seconds) in order to prevent trains from smashing into each other, to using GPS and atomic clocks in datacenters to make sure the bank transaction for my pizza order doesn’t inadvertently collide with my simultaneous ordering of a vacuum cleaner online and break something.
So, bringing synchronization back to data
We’re living in an age where we’re literally swimming in accurate timekeeping. Thanks to GPS, any device that can view the sky can get access to a giant network of atomic clocks whizzing about overhead. We carry computers called cell phones that all sync their clocks to the cell network time, which is usually referenced against GPS. Even dirt cheap $18 wall clocks like the one I bought can sync to radios signals attached to atomic clocks around the world (assuming they’re able to catch the signal =\).
All this means that, with some effort, it’s possible to use timestamps to link events and data from all over the world across time, space and different systems. If you’ve ever had to do work with 2 distinct logging systems that don’t share any join keys before, you’ll quickly realize that you’re looking for the dataset equivalent of the clapperboard that’s used in film. Those iconic devices are used to create a clear syncing point between multiple recording cameras and audio recorders in filmmaking, by having a very clear audio spike that can be lined up to when the visual clapper slaps into the board.
If the thought of pairing events together via timestamps sounds really hacky and sketchy, worry not! It IS hacky and sketchy! With the time resolutions that we’re used to working with (usually on the order of milliseconds) and how clocks on different systems being slightly offset from each other, various forms of network lag, analysis stack processing times, the same event might show up within a couple of seconds from each other. That’s usually why this sort of work can’t be done without extra information.
But if ultra-accurate (microsecond or better) clocks get even more ubiquitous and they’re synced increasingly better, we might have reason to start adopting 64-bit time counters as “milliseconds since the epoch” like Java does (since at least Java 1.5.0 if not before) instead of the current “seconds since the epoch” that unixtime uses. That sort of high-resolution timekeeping, with clocks accurate enough to support their usage, I fully expect it to be easier to separate and line up events from different systems.
That should be pretty darn cool for some limited applications, AND also have lots of scary implications for others.
All because we share the state of the ticks of little oscillators better.
Final aside, there are apparently apps for iPhone and Android that will emulate the major time radio signals, JJY, WWVB, DCF77, etc. Apparently the mechanism is similar to the one described on this page for JJY.
Since the WWVB signal is 60khz, the emulators play pules of audio as loud as possible at 20khz, which is easily doable on a pair of headphones. That 20khz waveform running down the wire generates harmonics, and the 3rd harmonic would sit at 60khz. The headphone wire/speakers act as a very weak antenna, which would then transmit the time over to the clock/watch.
That’s all sorts of cool and clever.
Also, electromagnetism is magical stuff.