Data and large tracts of it is what is defining business and revenue models of companies across domains and industries today. Humungous amounts of resources are being expended in mining this data to draw insights about the market, consumer behavior, demand and supply patterns, etc. in an effort to arrive at forecasts with razor-sharp precision. Websites and applications, that collect and store huge volumes of data every second, have inevitably becomes invaluable assets to businesses. With data of this nature and volume establishing itself as a gold mine, storing and preserving it becomes a commitment of paramount importance to enterprises. The scale at which data thefts are becoming commonplace demands proactive measures to be taken at the firm’s end. Any online interface, if witnessing persistent growth, should be prepared to encounter significant scaling in terms of exponentially growing online traffic. It is imperative that scaling to newer heights should be preceded by adequate steps being taken to ensure data security and integrity. Hence, agile business models today enterprisingly adopt such data architecture solutions that enable them to scale to mightier proportions with little worry about losing the sanctity of the data. One such solution is sharding. Its relevance becomes all the more obvious when employed alongside blockchain technologies.
Sharding, comes from the word ‘shard’ which means a small piece or portion of a whole. It is a data architecture technique by which data is partitioned into horizontal logical shards. Then these shards are distributed across separate nodes, commonly referred to as physical shards, which have the ability to store multiple logical shards. This doesn’t obstruct the data held within all the shards from collectively representing an entire logical dataset. Thus, it is said that database shards are based on the tenet of a shared-nothing architecture. Sharding is based on the premise that if heavy and big are databases are broken into smaller, more manageable portions, it would enhance performance and bring down the processing time.
In the context of blockchain, sharding facilitates managing and transacting at a speed almost as fast as a Visa or MasterCard. Bitcoin has the bandwidth to processes around 3-7 transactions per second, while Ethereum’s tally stands at 12-30, much lower than that of a bank card. This is only one of the problems. For blockchain to become ubiquitous and indefatigable, it needs to build the capability of unhindered scalability, overcome latency issues while achieving high throughput.
With blockchain sharding, each node of the block will have only a part of the data of that chain, and not the entire information. Nodes that maintain a shard maintain information only on that shard in a shared manner, so, within a shard, the property of decentralization is still upheld. Each node doesn’t load the information on the entire blockchain, thus helping in scalability and building a capacity to store large volumes of data in portions. It is being employed by new-age companies like Telegram, which is developing its Telegram Open Network (TON), to enable users of its cryptocurrency GRAM to send funds across borders within no time and without worrying about remittance fees.
A use-case particularly benefitting out of the reduced process time is in the context of the database query response. When a query is made in a database that has not been horizontally sharded, the system has to rummage over multiple line items of data to identify the data point in question. And so, with websites or applications with an inexplicably large and monolithic database, queries tend to become prohibitively slow. There’re other valuable uptakes, too. It is undoubtedly easy to have a database up and running on one machine and scaling up would mean to the only upgrade to newer computer or software resources. However, one can’t skip the fact that an undistributed database will always be constrained by limited storage and computing power. Thus, having a sharded data architecture offers scalability and flexibility on the same platter. Sharding also brings more value to the application by boosting its tenacity against outages. With an unsharded database, an outage would jeopardize the accessibility of the entire data, which in the case of a distributed data structure would hinder the availability of only a part of the data or a single shard.
LIMITATIONS OF SHARDING
Sharding, however, can also impose a certain amount of limitations. Primarily, building and deploying a sharded database structure is no mean task. A minute slip up can cause the loss of data or corrupt it. With sharding, the team workflows also get impacted. In an unsharded situation, one could access all related data at one single entry point. In this sense, sharding with its multiple shards and locations, brings in more complexity to its users. Another issue that could crop up in a sharded database is that over time it could potentially become unbalanced. Consider a sharded scenario wherein shards are created in an alphabetic order, by the way of example say the transaction data of customers’ names that start with A-F are on one shard, G-L on another and so on. Over time, in all probability, the shard with names starting with letters G-L could be heavier with more line items than its co-shards. This could potentially cause the application to slow down and stall out for a significant portion of users. Needless to say, there is also the constraint of database engines not being compatible with sharding. Lastly, a major drawback is that once sharded, it is close to impossible to return the database to its unsharded form.
It should also be mentioned here that this arena still remains under-explored and in a continuous work in progress. There are also burning unsolved questions about sharding like the one on the blockchain trilemma and cross-shard data transfer. The blockchain trilemma between scalability, decentralization and security is another one of those burning questions. All in all, while some experts look at sharding from a positive lens and celebrate its benefits that were unheard of until recent times, there are also those who don’t advocate its implementation due to its operational complexity. Conclusively, companies should answer the question of whether or not one should implement a sharded database architecture with utmost caution.