When to think about using Cloud for your service

Three years ago I wrote “The Cloud is overrated!“. Since then I joined Google as an SRE, and I’ve been asking myself if Cloud does make sense for me or not. Even before joining Google, their GCP was my first option to go for Cloud; it’s seems quite good and of the three major providers (along with AWS and Azure) is the cheapest option. And let’s be fair, my main complain on Cloud is price. Vendor lock-in is my second concern and Google again seems to be the fairest of the three. Anyway, this isn’t about which is better but more about if when it’s a good idea.

Proper Cloud deployments are pricey and also require a lot of resources from developers; if it has to be done right, it’s not about deploying a WordPress inside a VPS style service on the cloud.

What is Cloud about?

Cloud is about having easy to use tools to deploy scalable and reliable applications without having to worry yourself on how to implement the details.

We need to think about scaling and zero downtime. These are the only two factors that will determine if you should pay the extra cost or not.

Everything else are extra services that they provide for you, such as Machine Learning. If you want to use those services, you could always setup the minimum on the given Cloud to make it work and call it from the outside, no problem. So these are out of my analysis here.

Vertical Scaling

When you deploy an application in a server, if later you need more resources you’ll need to migrate it to a different, beefier server if it no longer fits. In an VPS you have usually the option to upgrade it to have more compute resources as well.

In Cloud, the range of machines you can run code on is quite big. From tiny (1/2 CPU, 1 GiB RAM) to enormous (32 CPU, 512 GiB RAM). This gives quite the flexibility to keep growing any service as needed.

The other thing is that they allow for fast upgrade and downgrade, and also automate it. This can be used to reduce the cost overnight when there are less load. But be aware that even with this, it’s highly unlikely that you’ll get a cheaper option than a bare metal server.

Same as an VPS, Cloud services usually guarantee data consistency; no need to do maintenance or migrations because the disks fail. This is the downside of bare metal servers: you need to handle the maintenance and migrate to a new server if the disks start to fail, having risks of data loss.

Horizontal Scaling

This kind of scaling refers to splitting the service into different partial copies so they can work together, in parallel. This is specially needed when the service itself won’t run in a single machine.

The problem here is that most of the time applications are stateful, and this means that the state needs to be split or replicated across the different instances.

Cloud here helps by having database services and file sharing services that can do it for you, so your service can be stateless and leaves the complexity to the Cloud provider.

In Cloud, you can also spawn dynamically more instances of your services to handle the load.

Reducing downtime to zero

This is basically done by replicating data and services across different data centers. If one goes down, your service will be still up somewhere else.

This is the most important part I believe, so I’ll leave the details for later.

When should we think about using Cloud?

This is an important decision as it’s hard to convert a typical service (monolithic, single thing that does it all) into something that it’s going to make good use of the Cloud benefits. It’s better to do this on the design phase if possible.

Sharding

In the recent years has been a boom between the “Big Data” and Cloud, and everyone talks about NoSQL, sharding (horizontal scaling), etc. But all this has been just a lot of buzzwords, a way of looking cool. Is it really that cool for everyone?

All these things are meant for horizontal scaling (sharding), which means that we expect to use more than one machine for one of the services (i.e. database).

It sounds really cool, but it’s not really worth it for the majority of cases. Unless you have a big project on hand, chances are that it fits in an average server.

Why not use sharding anyway? well, it’s usually more expensive to have 5 machines running than a single one with all that power together. Sharding will impose a lot of design restrictions that are quite hard to handle, so it will substantially increase the time to develop the application. Unexpected requirements along the way will sometimes require a full redesign, because sharding requires certain premises to be true (how to split the service), and cannot be changed on the way without a lot of effort.

The other problem of sharding is that it’s always less efficient to use X machines than X threads. And X threads is less efficient than using a single-thread CPU X times more powerful. Parallelizing does not linearly scale, there’s a trade-off, always think about this.

Cloud is not (only) sharding, and sharding is not Cloud. If your service will never need to span more than one computer, there’s no point of adding the complexity.

I would recommend to plot a forecast of growth for your service for 5-10 years. Also plot the forecast for server growth, it usually increases 2x every two years (See Moore’s law). If your growth seems to be close to that, definitely you need to consider sharding from the start. Also think that there are periods of stagnation, where there are no improvements on certain areas for years.

If you go for sharding, the databases provided by the Cloud provider will make your life much easier, but they will be your vendor lock-in. Once the application is coded with a particular Cloud DB in mind, it will be quite hard to move away from that provider later. If this is a concern, look on how to make it generic enough, there are usually projects that let you change the DB or offer a plugin to connect to these DB, so you can swap later with less effort.

If you doubt, go for sharding. If you already need >25% of the biggest machine available, go for sharding. Better safe than sorry.

Replication

For me, here lies what applies to most applications and companies: How much is worth your downtime? How much is worth your data?

A server can fail, an entire data center can be struck by a lightning or engulfed in flames. Assuming you have your backups off-site, how much data is lost in this scenario? hours, a day, or week? How much time will be needed to get it back and running in a new server?

For example, in a server I use for a personal project I do a on-site database backup every two days, and a off-site full disk backup every day. This means that I can have one or two days of data loss. But if it happens, it will take me 5 days to get it up and running (because it’s a weird setup and I can only use my spare time). In this case the downtime and the data is almost worth zero, as it generates no revenue for me while it costs money. Still, the amount of time that would be needed to set it back up is something that I need to fix.

To minimize these scenarios we use replication. This will always be off-site replication. Sharding must be in-site (same DC) and Replication is better if it is off-site.

If you use sharding while managing the database, you can choose to have a fraction of the servers for redundancy. In this case, N+2 is always recommended. If you need 5 servers to handle the load, have 7 so at least 2 servers can fail. When using RAID yourself, I would recommend RAID 6. In most cases this will not apply.

Regardless, you need a full working copy elsewhere. Here you can go N+1 or N+2. Having another set of servers far away that are running the software in parallel avoids having an outage that can last weeks.

When using Cloud you can take advantage of the huge network between the different data centers. That is, they usually have another network that it’s not internet that is blazing fast and small on ping times that you can use to communicate between them, making real-time replication across the servers possible. Anyway, don’t go crazy and don’t set up the different servers very far away, as fast as those networks they can be, they still have to obey physics and are tied to the speed of light limit (no kidding here, light travels roughly at 50% of c on fiber and this can be used to estimate ping times)

If you want to use a regular ISP with VPS services, check if they also have an internal network interconnecting the data centers; this is starting to become the norm lately.

The problem with replication is that the cost for running the service is now 2x or 3x, as you need way more space and servers than before.

If cost is a problem, I would recommend to do only primary + “warm” read-only secondary. This means that all writes go to primary, and the secondary is only writing back those changes at real time. In an incident, you might lose seconds of data that might have not been written to the secondary yet. If this is a problem, you can look if the database allows for waiting until the secondary confirms the data is there. This will come with a huge penalty on write speed and latency.

The secondary could be smaller than the primary, or be used for other stuff. Only writing back data uses a very small amount of resources (but the same amount on disk space). In this case, if the secondary needs to be promoted to primary is possible that it suffocates on the amount of load, and the application would be almost unavailable until a new server is turned up. So it’s best to avoid having small secondaries if possible as this approach only serves to back up data with a resolution of seconds, but it will not be good enough for taking over.

On Cloud, they can also automate this replication for your database and files, and even automate the change from secondary replica to primary when things fail. Sharded databases do this best.

My final thoughts

I find Cloud products prohibitively expensive for my personal projects, adding proper replication makes them even more out of reach.

But on the other hand, I find extremely difficult to properly prepare automation for replica and takeover. These things are difficult to do and to test to ensure they will not hurt instead of helping.

So it seems that either there is not much money involved and the risk of data loss or downtime is not a big deal, or it actually offsets and then Cloud seems to have a price that is quite justified.

In the end this is about if you want to take the risks yourself or you want to pay extra so someone else deals with it. Generally I would go with the second and rest easy.

What if cryptocurrencies were used to perform useful work?

With BitCoin using more than 140TWh per year or 15GW and growing, we must ask ourselves, are they really worth that much? Are they providing any useful work?

15GW is not that much globally speaking, but to put this into perspective a Nuclear Power plant on average produces 1GW, therefore this means that we need 15 nuclear plants to keep mining BitCoins.

I have never been a believer of BitCoin and similar, per-se they have a lot of costs and don’t provide that much usefulness. The idea surely is interesting, and I really like the concept of decentralising the money from banks, entities and governments, but the cost is currently just too high.

Also we need to keep in mind that money is anything that we give a value and we desire to exchange for goods, and with that, almost anything can be used as long is not perishable or easily obtained or duplicated.

“Almost anything” is certainly not 15 nuclear power plants in cost. Also, if people don’t switch to use the currency it is of no use. The amount of goods that can be purchased with cryptocurrencies is certainly dim.

A currency should also retain its value over time, and the volatility of the crypto market is so high that holding onto crypto can be either extremely profitable or completely wet paper from one day to the next. Product pricing having to change every day or hour is not something that anyone wants to do.

Chia is another cryptocurrency that is getting famous on the last months. The idea of being way more “eco-friendly” by not wasting so much energy and instead requiring disk space is somewhat encouraging. This, of course, has led to retailers to increase on prices of HDDs as they saw a surge in demand. And it’s still not without its cost, as it still consumes a lot of power, just way less than BitCoin or others using proof of work.

I feel that cryptocurrencies give most of their value from their features, as smart contracts and similar. Ethereum is one of the most cited for these and Chia also has their own set of features.

The energy and monetary cost of running crypto should be justified with useful work they provide. Regular paper money provides useful work, by removing the burden of trading goods for goods; this is also true for crypto but it’s not enough by several orders of magnitude.

Some features could aid on some legal aspects that would reduce human effort in a lot of areas, but this needs to be used by governments or accepted to be of any use. And usually governments are decades behind on tech stuff so I don’t see this happening on the near term. Also the fact that they have to put their trust on something they don’t manage sounds quite a blocker to me.

In short, the amount of money saved by doing something using crypto has to overcome the energy cost by a good margin. If not, it’s not a good solution. It’s that simple.

To give an example, computers for accounting purposes had to become cheaper than doing the same thing manually; if that weren’t the case we would be doing it with pen and paper. It’s not because “it’s convenient” or “faster”, it’s because having a human doing the same tasks costs a lot more than purchasing and owning a computer. As for speed, it’s also because time is money and you can translate it back. Having the right information faster, flawless, has a value and you can put a price to it.

So I think of blockchain systems as something still very cool but also very immature. It will get there, but unless something revolutionary happens in the middle, it will still take a lot of years to see wide usage. It came ahead of its time and probably we’re not there yet to profit from them.

(At the current moment https://chiacalculator.com/ reports that 1 PiB of space would gain $62,000 per month, investing less than $20,000 in a server; this is so ridiculous that I expect it to be corrected by supply in the next months. In fact users in r/Chia already report no gains from it; the amount of people entering because of the investment prospects is probably saturating the network and make it really hard to win anything)

An idea came to my mind recently

…and most probably is either stupid or unfeasible. I don’t have much background on blockchain and not enough maths to go for it. But in case it inspires someone, here it is.

Chia network basically seems to make servers store “trash data” to prove they actually allocated the space, thereby the proof of space (yes, I know it’s much more complicated, but I love oversimplifying).

I was thinking… what if instead of storing crap data they actually stored customer data?

Chia has recently reached the 1 Exabyte of storage. Storing someone’s data has a value. And selling that capacity can be worth millions, specially in Cloud scenarios.

A decentralised storage run by users has already a name and it’s called P2P; some implementations being BitTorrent.

But those networks relied on the willingness of users to serve files for free, and nowadays is mostly used to combine bandwidth of several servers so the download can get the fastest transfer possible.

Instead, what I’m talking about is more in-line on this famous Linus Torvalds quote:

Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it 😉

Torvalds, Linus (1996-07-20). Message. linux-kernel mailing list. Source: Wikiquote

Can you upload your backups to BitTorrent and rely on others mirroring it? no. Heck, even FTP is no longer an option as no one uses it anymore.

Imagine a network where we could send data for retention and pay for that storage in cryptocurrency. Anybody could setup a server for it and be in the pool to store anybody’s else data for money.

The most basic usage of this network would be backups. Upload a backup, put a price on it, and people would start replicating it to cash that money. The more money you put, more replicas will be worth doing. You don’t need that old backup anymore? Stop paying for it. It will be gone in days as they began to find more profitable data to store.

Of course, anything that you upload would be public. So if you don’t want it to be public, you need to encrypt all the data with a secret key. The encryption used in particular would be the uploader’s choice; they might want to use symmetric or asymmetric encryption (although asymmetric is more risky because it has inherently more attack vectors).

The price for storage would fluctuate like the stock market does. As more people jump into the pool, the price would fall down until it no longer makes economic sense to do so. And as more people uploads more data into it the price per replica would rise up.

You don’t want to pay for the space you’re using elsewhere? It’s fine! just enter the network with the same server with the extra space you’ll need and you’ll be getting crypto for it at the current value, for which you can exchange to get your data saved elsewhere. If the price of storage goes up, so does your profits from storing other people data. Now you don’t need to pay for different servers in different regions to guarantee that the data will be recoverable if your only server fails. You could also use your home computer to do this to do the exchange if you like. Or tell your computer to prioritize your own data.

This idea could be expanded to a lot of interesting use cases, but at first glance it has several problems:

  • You don’t know what you’re storing, or from whom. This could mean that your server might contain illegal material without knowing it. But hopefully the payment is trackable.
    • ISPs and others also offer storage and can’t really check what’s on it, specially if it’s encrypted. I guess the law could track the payment and pursue the uploader/cut off the payment if that was a problem.
  • A single machine/location could try to claim that it has >1 copies of the same data, which in reality it’s pointless.
    • Filtering based on IP might not work as a machine can have >1 IP.
    • Ping time analysis to check that replicas are far apart could be tricked by having lots of small servers that actually fetch the data from the same place.
    • Encrypting with different keys the same data could ensure that the data is effectively copied. But it’s burdensome and anyway the network requires at least 2 plain copies to be able to verify that the other end actually contains the data.

I guess that I’m missing other risks and problems. And it’s also possible that they have some form of workaround; As I said, I’m not any kind of expert on these kind of systems to be able to outline a solution myself.

Nonetheless this seems an idea worth exploring. It’s possible that the usage could be extended outside of backups, for data that it’s modified often.

If all the state of an application could be stored in a network like this, then everything that requires to be deployed is basically stateless and can leverage Cloud very easily and cheaply. This, of course, would mean that a database can be run and modified quickly in this way, which is no easy feat.

But circling back to the beginning, a network like this would deliver actual work with actual value that would overcome the cost of running it. Therefore it will give use to the coin, and create a market based on supply-demand, not on speculation.

So as I said, it’s just an idea that crossed my mind. What are your thoughts? Seems interesting to you?