Review: Amazon Web Services is eating the world

Amazon continues to define the cloud with an unrivaled set of services for developers, IT, and data crunchers

Contributing writer, InfoWorld |

Review: Amazon Web Services is eating the world — Thinkstock

Is it possible to review Amazon Web Services in one article? Not a chance. What about a book? Perhaps a long one, preferably with several volumes. The reality is that Amazon’s cloud business is larger than ever and spawning new features, services, and options faster than any one person could begin to follow. The company is swallowing the Internet by delivering the simplest way to create complex, highly scalable, data-rich applications.

The scope of the project is amazing. There are, in my exploration, at least 10 different ways to store your data and four different ways to buy raw computation. If you need more than raw power, Amazon is moving up the stack by delivering cloud versions of many sophisticated tools for analyzing large data sets, like Hadoop, Spark, and Elasticsearch.

These tools are changing the game for programmers and data analysts, giving them fewer reasons to write fresh code and more reasons to link together different, high-end services from Amazon. While raw computing power is still the focus, the new tools and services are compelling value propositions that can make good financial sense. Writing your own code gives you the freedom and the power to move elsewhere, but entrusting more and more of the stack to Amazon can be dramatically cheaper and faster. It’s a complex decision.

A sea of machines

The core of the Amazon cloud remains the collection of virtual servers known as the Elastic Compute Cloud (EC2). If you want a machine, you can go to the AWS website, click a few buttons, and have root. More and more people, though, are using the API. Did I say “people”? I also meant bots because the cloud is more and more automated. If you’re going to do more than start up a single instance for experimentation, you’re better off writing code to spin up your machines. There are SDKs for Java, .Net, PHP, Python, and even Google’s Go language.

The range of machines that Amazon rents is growing larger and more complex. There are at least nine general classes of instances available -- and this is only when you consider the machines listed in the “current generation.” You can still rent instances from earlier hardware families, if your software seems to need it for some reason. Each of the general classes of machine is available in various models configured with various amounts of RAM and local disk storage.

aws ec2 options — You’ll find a burgeoning list of options available when configuring an instance in Amazon EC2.

There may be too many options in Amazon’s list of instance types for mere mortals to examine and debate. The i2.8xlarge, for instance, comes with 244GB of RAM and 32 virtual CPUs that pump out 104 elastic compute units (ECUs), the metric that Amazon uses to measure the power of its machines. The d2.8xlarge comes with 244GB of RAM and 36 virtual CPUs that pump out 116 ECUs.

These instances use different versions of Intel’s Xeon, and they’re only two among dozens of options. I found myself scratching my head and wondering how to choose. If you have a serious project, you’ll want to benchmark your code on a range of instance types and figure out how fast your application happens to run. If you’re renting only a few machines for occasional work, it may not make sense to think too much about it. But if you’re spinning up hundreds of machines for larger projects, benchmarking is the best solution. The ECU metric, after all, is merely an average using some standard benchmarks. As they say in the car business, your mileage may vary.

Decisions, decisions

Choosing the hardware is only the beginning. Amazon has gradually increased the number of options as it learns more and more about what can go wrong, and the API reflects this experience. There’s now a click box to “enable termination protection,” a kind of safety switch to prevent you from deleting your instance by mistake when you’re pruning your list of running machines. Once you enable it, you have to explicitly disable it. I know, I know, you’ll probably write a script to automatically disable it, but if you do you’ll have only yourself to blame. Amazon tried. This is one of the little options that make life a bit easier -- but not simpler -- in the Amazon cloud.

The trickier questions come when you decide how to pay. You can rent outright and pay full price or start fishing for lower rates. Amazon offers a spot market that lets you bid for extra capacity, and it can fluctuate as demand ebbs and flows. These flows can be quite dramatic because some of the big video streaming services often take over the clouds on Friday and Saturday nights.

The savings can be surprisingly large, but you have no way of knowing what the final price will be. At the moment I’m writing this, an m3.medium instance is going for 0.91 cents per hour, much lower than the list price of 6.7 cents per hour. Everyone puts in their bids to run their jobs and if the winning bid price rises above your maximum bid, your machine shuts down.

aws codecommit — Amazon gives software developers tools to automate the build, test, and deployment cycle. It also competes with GitHub by offering storage for the code on the way to deploying it.

You can pay a bit more for some guarantees. If you want your machine to run for at least an hour, the price (at the time I’m writing this) jumps to 3.7 cents, still far lower than the regular price.

The pricing model is not aimed at a continual auction alone. If you want stability instead of flexibility, there are even more options in the pricing charts. If you’re willing to commit to one year or even three years, you can save 30 percent, 40 percent, or as much as 60-plus percent by paying up front and locking in a price -- no auctions, only lower prices.

All this means your accountants will have as big a part to play as your programmers. You’ll want to sit both of them down to design the architecture because the pricing model is meant to encourage efficient use of resources. The programmers need to ask themselves again and again whether they can push computation into a batch job that can be executed occasionally when the cost of computing drops on the spot market.

Inventing the cloud

It’s easy to find examples of cloud services that originated with Amazon. MapReduce, data warehousing, stream processing -- AWS had them first. Only now are competitors beginning to emulate Lambda, Amazon’s server-less, event-driven compute service. No cloud competitor has a match for Aurora, Amazon’s souped-up MySQL service, or the range of databases Amazon offers as a service.

Amazon continues to innovate, as well as emulate its competitors. The Elastic Beanstalk is Amazon’s version of the Google App Engine, a collection of software packages that automates the process of building a cluster of machines that grows and shrinks with demand. It’s a more general system that supports a number of common server platforms for applications written in Java, .Net, Python, PHP, Node, Ruby, and Go. The scripts automagically configure the load balancer and machines, starting and stopping the basic EC2 instances as needed.

aws nexrad data — Amazon hosts dozens of public data sets, including climate data, geographic data, census data, Wikipedia traffic statistics, and even Enron emails. The Nexrad radar images shown above come from Amazon’s public data set captured from the U.S. government’s weather radar systems.

The Beanstalk is more like a concierge than a separate, all-encompassing framework like the Google App Engine. It runs your code on generic EC2 instances that will appear in your list of machines -- important to keep in mind if you notice new instances starting up. I once spent several hours killing zombie instances that kept coming back to life before I remembered the experimental Elastic Beanstalk app I created. It kept discovering that something had killed the missing instance and dutifully brought it back to existence.

Elastic Beanstalk is nominally free. You don’t pay anything extra for its services, but you pay for the EC2 instances it spins up on your behalf.

Amazon’s EC2 Container Service takes the same approach with Docker containers. Amazon has its own little agent that will start up EC2 instances on your container’s behalf, then install the Docker container. You don’t pay for the container service, only the underlying resources.

Higher and higher

Not all of the tools are given away for free to rent out more machines. Cluster instances for Elastic MapReduce, Amazon’s package of Hadoop-based tools, cost about 25 percent more over the underlying EC2 list prices. Amazon has created a fairly standard bundle of the major tools (Spark, Hadoop, Presto, Pig, Hive) and integrated them with Amazon’s S3 storage system. If you’re crunching logs or data in other parts of the Amazon cloud, analyzing them with an Elastic MapReduce cluster can make a great deal of sense. Getting the raw data out of AWS takes time and bandwidth.

Amazon is also moving higher in the stack by providing answers instead of computing resources. While Microsoft’s Azure delivers a more complete set machine learning tools to data scientists, Amazon makes machine learning more straightforward for developers or business analysts. Ultimately, Amazon will sell you the analytics too.

One of the most intriguing new additions to the cloud is the public data that Amazon is gathering for us. For instance, you can chew on radar images from a network of 160 high-resolution Doppler radar sites that provide data on precipitation and atmospheric movement in five-minute intervals. Amazon is putting up a number of big public data sets with the hope they'll attract projects that rent EC2 instances to ask questions. If your business involves the weather -- hint, hint commodity traders -- it’s much easier to play.

Services like these show how Amazon plans to build out the cloud in the future. The company has mastered the process of delivering a commodity product, and it continues to improve the service to address the needs of a broad base of users. There are now so many options that it can take hours of research before you’re ready to push the button and rent some machines -- or nonmachines.

The newer services like Lambda and Amazon Machine Learning are clean layers meant to abstract away the complexity of renting and configuring EC2 instances. After all, nobody really wants to run servers. We want to deploy code, collect data, and find answers. We can expect Amazon to continue pushing the automation up the stack to deliver higher-level services that sweep the actual servers under the rug. In the meantime, Amazon continues to offer the largest array of instance types, along with the richest set of services and options to make the most of them.

Next read this: