Biz & IT —

Investors poured millions into a storage network that doesn’t exist yet

Filecoin is expected to raise millions in an initial coin offering.

Investors poured millions into a storage network that doesn’t exist yet

A blockchain-based cloud storage technology called Filecoin has already raised $52 million from investors. The company is poised to raise millions more on Thursday when it begins selling units of its bitcoin-like cryptocurrency to a larger set of wealthy investors.

Filecoin aims to disrupt conventional cloud-based storage platforms from Amazon and others. If it succeeds, the technology could be worth billions of dollars. But the company will need to overcome some significant hurdles first.

First and foremost, Filecoin's technology doesn't actually exist yet. The Filecoin team has done extensive research and planning, producing a series of white papers describing the technology it's building. But an actual, working Filecoin network is still months away. When it launches, Filecoin will compete with rival blockchain storage networks, including Sia, which has been available to the public for two years.

"Filecoin currently is just a white paper," Sia co-founder David Vorick told us earlier this week.

The broader challenge for Filecoin and its more established competitors will be convincing customers that it's safe to entrust their data to a decentralized, blockchain-based storage network at all. In theory, blockchain-based storage could offer significant advantages, including lower costs and higher reliability. The technology is likely to be most appealing for people looking for low-cost, long-term data storage.

But the technology is going to need at least a few years to mature to the point where it's ready for mainstream use. The Sia network has relatively limited capacity, and, right now, using the technology involves significant hassles—including acquiring the Siacoin cryptocurrency on a digital currency exchange and configuring and running complex Sia client software.

How to use a blockchain to create decentralized storage

Blockchain storage networks aim to enable trustless markets for online storage, allowing customers to buy storage from relatively unknown vendors without having to worry about losing data.

The basic strategy is for a service provider to sign a contract promising to store the data and post collateral backing up the promise. If a service provider fails to keep its end of the bargain, it forfeits the collateral. The idea is reasonable in theory, but it's not really practical with conventional payment networks backed up by the conventional legal system. The system would easily get bogged down in costly disputes between service providers and their disgruntled customers.

But a blockchain provides an elegant solution. Here's how it works: when a storage contract begins, the service provider posts the root of a data structure called a Merkle tree that serves as a unique fingerprint for the customer's data.

This hierarchical data structure allows the service provider to provide a succinct cryptographic proof that it has any particular 64-byte chunk of the file. At regular intervals, the Sia network chooses one of the 64-byte chunks using a pseudorandom function based on the most recent block of the Sia blockchain. The service provider must respond by publishing the sequence of hashes that charts a path up the tree from that data block to the already-published Merkle tree root. This constitutes a cryptographic proof that the service provider still has that chunk of data stored on its servers.

A dishonest service provider can't predict or control which chunk of data will be chosen in each round of the challenge, so the only way to consistently respond to the challenges is to store the entire file. Service providers who fail to supply a proof too many times lose their collateral. The network can enforce these rules without help from the client because you only need to know the Merkle tree's root hash to verify the correctness of a proof.

Rewards, penalties, and redundancy

Rewards and penalties on the Sia network are denominated in Siacoins, the cryptocurrency that powers the Sia network. Users buy Siacoins on an exchange, then spend them to purchase storage from service providers. Service providers post collateral in Siacoins and automatically get them back if they fulfill their contracts and furnish the required cryptographic proof to the Sia blockchain. As on the Bitcoin network, the Sia network is run by miners who earn new Siacoins as a reward for participating in the network's transaction-clearing process to build the Sia blockchain.

Of course, some providers will default on their commitments anyway, but the customer can deal with this by storing redundant copies of the data with different providers. A naïve approach would be to store, say, five copies of each file with five different hosts. A technique called erasure coding, which splits a file up into multiple chunks and allows any chunk to be reconstructed from others, allows clients to do much better than that.

Sia cofounder David Vorick tells Ars that most Sia users currently use a redundancy factor of three—meaning that three bits are stored for each bit of the underlying data. But Vorick argues that customers will eventually be able to do much better, achieving very high reliability with a redundancy factor as low as 1.5. For example, a particular file might be split into 60 pieces and stored with 60 different hosts. The customer would be able to recover the file so long as at least 40 of those 60 hosts remain online.

All these details have to be handled on the client side of the network, since the whole point of the system is to avoid having to trust any single service provider. If you want to store data on the Sia network, you'll need to acquire Siacoins from a digital exchange—most likely buying the more widely traded Bitcoins first and then trading those in for Siacoins. Then you'll need to download the Siacoin client software, which has options for creating storage contracts, uploading files, and so forth.

Filecoin aims to be better blockchain storage

Filecoin is based on the same basic idea as Sia, but it aims to make a few significant enhancements. One is a new algorithm for mining.

Mining is the collaborative process for building a blockchain. Sia uses an approach called proof-of-work that was pioneered by Bitcoin. Computers compete to solve a difficult mathematical problem, with the winner getting to add a new block to the blockchain and reward itself with new Siacoins. That extra computation isn't necessary to actually process Bitcoin transactions—it's essentially just make-work to prevent Sybil attacks and secure the network. And the amount of energy consumed by the Bitcoin network has grown steadily along with the price of bitcoins. The Bitcoin network's annual energy consumption measures in the terawatt-hour range.

Filecoin aims to eliminate this waste by making storage, rather than computing power, the basis for influence on the Filecoin network. While Bitcoin and Sia miners stockpile ever more powerful computing hardware, Filecoin miners will amass more and more hard drives—hard drives that can actually be put to work storing user data.

Filecoin also aims to offer self-healing capabilities that Sia lacks. When a host drops off the Sia network and takes part of a client's data with it, it's a good practice for the client to reconstruct the missing data from other copies (using the erasure coding techniques mentioned above), contract with a new host, and upload the reconstructed data. That means that Sia client software needs to log onto the network about once a week to check if any of their data needs this kind of repair.

Filecoin aims to make this unnecessary by offering automatic self-healing capabilities in the network itself. Under the Filecoin protocol, if a host disappears from the network—or fails to prove that it's still storing data it has promised to store—the network will notice and post a contract for a new host to reconstruct and store the missing data.

That's possible because Filecoin uses an encoding scheme that allows anyone to reconstruct missing data. That's different from the Sia network, where the encryption and encoding of the data is done by the client, which means only the client can reconstruct missing data.

Channel Ars Technica