Companies on the scale of Google or even Facebook are venturing into uncharted territory due to their scale. No one has faced the issues a large site like Google faces, so it has to solve all of its problems by itself.
This means creating its own hardware, even its own networking equipment, not to mention innovative software platforms to keep it all in sync.
Multiple copies of data, across the world, need to be kept in sync
Google has many data centers scattered around the world. They're not all used for the same thing, but most run most of Google's services, the ad network, Gmail, the search engine and so on.
As such, they hold much of the same data since they all need it. For example, your emails may be stored at dozens of locations around the world, duplicated for both security, if one copy is lost there are all the others, and speed, accessing your data from a location near you is significantly faster than fetching it from halfway across the world.
However, while all of the data centers are linked, they still work as separate and self-contained entities. In practice, it means that more data is replicated than necessary and that the data centers are always communicating to keep all of it in sync.
You wouldn't want one copy of your data to store one version of a draft email and another copy to have an updated version.
But keeping all of the data centers in sync is no easy task. What's more, not being able to run tasks across more than one data center means that Google is using more resources than it should and that there's a limit to what you can do, dictated by what a single data center is capable of.
Google Spanner is a massive unified database that spans continents
Google solved this with one of the most advanced and interesting technologies it has created to date. Essentially, all of Google's data centers around the world and all of the data it houses are part of the same massive database.
It's called Google Spanner and it was only made possible once Google started working by its own time, quite literally. The biggest problem with a massive, cross data center database is that it gets out of sync fast.
Changes to the same data sets could be made at two or more locations at roughly the same time and there would be no way of knowing which one was the most recent because "time," as far as computers are concerned, isn't all that reliable.
Most networks and servers rely on the venerable Network Time Protocol to sync their internal clocks to a unified source which itself relies on atomic clocks to keep time.
Google built atomic clocks and GPS receivers for its data centers
But the protocol is not accurate to the degree it would need to be to work as the internal clock of a massive database. Which is why Google built the TrueTime API that, like its name suggests, provides servers across the world with a very accurate measure of time.
It does this by using GPS that, apart from location data, also provides very accurate and, more importantly, constant time data. All GPS satellites have an internal atomic clock and all are linked together. Two GPS receivers on different continents should get exactly the same time data.
But that's not all, Google also built atomic clocks at all of its data centers to act as redundancies for GPS, to make sure time is kept in sync even if GPS is not working.
What this means is that the timestamp on any change on any piece of data in any Google data center is incredibly accurate, making it possible to determine which of two conflicting changes is the newest even if they were made virtually "at the same time."
Google Spanner is the only platform that can operate across data centers at this scale and no one is lining up to build something similar. Wired has a detailed and interesting account of how Google Spanner works and how it came to be.