Triple Play: GitHub's Code Now Lives in Three Places at Once

Github is now storing projects using a new system called DGit, short for distributed Git, to ensure projects sit in many places, not just one.
This image may contain Furniture and Drawer
Getty Images

On the Internet, everything can be everywhere. And that's true in more ways than one. If your phone goes online---no matter where you are in the world---you can theoretically visit every last bit of information uploaded to the global network of machines we call the Internet. And by that same logic, all this information can also be stored in so many different places.

The Google search engine doesn't sit on one machine in one location. It resides on thousands of machines in computer data centers across the globe. The same goes for Facebook and Twitter and Dropbox. If these tech giants are doing their jobs right, each individual piece of data they store is sitting not just in one place but in many places, in case of emergency. If one of your Google spreadsheets is stored in a data center in Oregon and that data center goes dark, your spreadsheet should still be available, because it's also stored in a data center somewhere else.

Some companies do this kind of thing better than others. But among the biggest and best services, it's the norm. They even ensure redundantly distributed data within individual data centers. Data and software are spread across many different machines so that, even as machines fail, one after another, the whole keeps going.

Today, the power of redundancy was reaffirmed by GitHub, the online service that has become the world's de facto repository for open source software, software freely available to the world at large. This morning, the eponymous San Francisco company that runs the service announced that it's now storing projects using a new system called DGit, short for Distributed Git, to ensure everything sits in many places, not just one.

Rule of Threes

GitHub is already a vastly distributed system. Based on software called Git, invented by open source granddaddy Linus Torvalds, GitHub operates in a wonderfully smooth way. Coders download a complete copy of an open source project onto their own machines and, as they make changes, they can so easily merge these changes back into the central repository. The result is that myriad copies of each project are spread across the net, which makes for a great backup if GitHub ever goes belly up or otherwise disappears from the face of the Earth.

But with DGit, GitHub has gone a step further. The central repository is now stored only just on one machine but on three machines. If two go down, the project is still available to everyone, and the system then rebuilds additional replicas on other machines. "What DGit does is that it makes Git a lot more aware of the environment it's in and where it's being stored," says Sam Lambert, GitHub's director of systems. "We can tolerate failure more. Servers can go down---we can disconnect their power supplies---without interrupting production traffic."

Previously, if servers meant down like this, the world would lose access to a huge number of repositories. Now, GitHub is, in essence, making itself look more like Google or Facebook. "This concept is a now a requirement," says Robin Schumacher, vice president of products at DataStax, a company that offers database software that works in much the same distributed way.

Code Everywhere

There's a very practical result to all this redundancy: GitHub repositories are far less likely to be unreachable. According to GitHub senior systems engineer Patrick Reynolds, the company has rolled DGit out to about two-thirds of all GitHub projects, and the company has virtually eliminated downtime due to server outages for these projects.

All this is important because GitHub is the primary way that the world builds open source software. It's the way many businesses, including Google and Facebook, build private software as well. GitHub hosts more than 35 million software repositories. More than 14 million people are registered to use service. And according web monitoring service Alexa, it is now among the 100 most popular websites on earth---a coding site up among the news sites and social networks that typically top the web.

GitHub achieved such popularity in part because of the distributed nature of Git. The world's previous open source hub, SourceForge, was notoriously unreliable. Companies like Google started building their own open source repositories because they were worried that SourceForge couldn't deal wit the load. But then GitHub came along and distributed code in new ways that gained the loyalty of coders everywhere. And like those coders---and the Internet itself---GitHub is everywhere, in more ways than one.