Implementing autoscaling is more than just about adding more virtual servers to your running application stack. In this article, I'll discuss the unified support for autoscaling on AWS. I'll also look at AWS Auto Scale, a quick way to add autoscaling functionality to an entire application stack.
The challenge of autoscaling
Autoscaling is the process of adding or decreasing computing power for your application using software automation.
I won't dive into this article on what autoscaling is, as I've written about that before. From here, I'll assume you know the basics of autoscaling but want more information about how to implement it specifically in an AWS deployment.
In my previous article, I talked about the factors that make autoscaling challenging. Figuring out the right metrics to scale on, scaling quickly to meet demand, and ensuring your application can work statelessly all make autoscaling harder to implement in practice than it seems on paper.
Another challenge is scaling your entire application stack. At a minimum, your application likely consists of front-end Web servers, a business logic/services tier, and data storage. You may also have backend workers processing queued asynchronous requests. Any one of these logical layers can become a choke point.
Finally, another issue is understanding how autoscaling on AWS even works. I used to write training courses at AWS. Time and again, autoscaling proved one of the most difficult topics to teach. We'd always spend multiple slides - complete with detailed examples - to help customers understand the model.
AWS Auto Scaling: your autoscaling assistant
Originally, AWS introduced autoscaling as a feature of Amazon EC2. Despite its complexity, it quickly became one of the major selling points of cloud application hosting.
AWS now offers autoscaling across several of its computing and data storage technologies. It also offers AWS Auto Scaling, a feature that detects all scalable resources in an application stack. It can also create scaling plans for you based on whether you want to optimize for cost, availability, or a balance of the two.
Autoscaling features on AWS
Before we look at AWS Auto Scaling, let's look at all the scalable resources on AWS.
Compute autoscaling
Elastic Compute Cloud (EC2). The original autoscaling implementation. With EC2 autoscaling, you can define an Amazon Machine AMI (AMI) that either contains your application or the bootstrapping code needed to install it. As the demand for your available EC2 instances increases, you can spin up a new instance and add it to the available pool.
Elastic Container Service (ECS). Docker containers are a convenient way to package and run code. Using ECS, you can define autoscaling rules as you would for EC2 instances and increase or decrease the number of running instances of your container. You can use autoscaling with containers hosted on your own ECS cluster or on AWS Fargate.
Spot fleet. The spot market is AWS's on-demand system for bidding for virtual machine capacity at discounted rates. Since the availability of spot instances is variable and can change from minute to minute, it's best used for interruptible workloads, such as background processing. You can create Spot Fleets and increase or decrease their running capacity with autoscaling rules.
Data autoscaling
Amazon DynamoDB tables. DynamoDB, AWS's NoSQL data storage solution, is near-infinitely scalable. However, you still need to specify how you're going to access your data and how often.
DynamoDB offers two different capacity modes for accessing data. I'll summarize them here but watch this video from AWS for a deep dive.
With provisioned capacity (which is the default), you set the maximum number of read and write operations you expect to handle in a given time period. You pay for provisioned capacity whether you use it or not.
With on-demand capacity, you don't set any limits. DynamoDB automatically ensures it can handle whatever number of read/write operations you throw its way. You pay only for the read/writes you actually make.
So why not just always use on-demand capacity? If you run predictable workloads with predictable consumption patterns, you can save money with provisioned capacity. You can save even more purchasing reserved capacity in 1- or 3-year bundles.
However, if you choose provisioned capacity, you should enable autoscaling of provisioned read/write throughputs as well. If you don't and you experience a sudden spike, your read/writes could fail, resulting in application downtime.
Amazon Aurora read replicas. A read replica is a copy of your database that serves read-only requests. Many applications tend to be read-heavy - i.e., customers retrieve and view data far more often than they make changes to it. Replicating your data across multiple database instances distributes heavy read volumes, reducing the risk of read/write failures on your primary database.
You can configure read replica autoscaling on Aurora assuming your Aurora cluster already has one primary db instance and one existing replica. A scale-out event will create a new read replica in response to increased traffic. When the spike subsides, autoscaling will shut down the replica so you don't incur unnecessary charges.
"Automatically" scaled features (serverless)
Some AWS features are "serverless" by default and will scale with usage. Incorporating these into your architecture is like obtaining autoscaling by default. The most obvious is Amazon S3 for static data file storage, which can handle millions of requests per second without blinking.
Other serverless AWS features will scale but with potential soft limits. For example, AWS API Gateway can handle up to 10,000 requests per second (RPS) per region by default. To handle more, you'll need to request a quota increase from AWS for the hosting region.
Similarly, AWS Lambda can handle 1,000 concurrent requests by default. You can raise this limit by request. You can also request provisioned concurrency with Lambda to reduce the service's notorious cold start issues, where execution of your code is delayed as Lambda starts up a new environment.
Serverless technologies can play a critical role in building a scalable application. However, don't assume that just because you've "gone serverless" that you can scale instantly to infinity. Understand the services you are using in your architecture and their default limitations and plan accordingly.
Implementing autoscaling on AWS
Autoscaling on AWS requires defining rules for scale-out and scale-in events. A scale-out event occurs when an Amazon Cloudwatch metric exceeds a certain threshold over a certain time period. A scale-in event occurs when the metric falls below the threshold for a specified period.
Defining scale-out and scale-in events accurately is critical to successful autoscaling. Since adding capacity takes time, you need to define your scale-out event to occur early enough so that you can meet increased demand. Similarly, your scale-in event should ensure you're not paying for capacity you no longer need.
Before setting up autoscaling, you should decide which scaling strategy works best for you. This will depend on your use case. The three major strategies are:
-
Optimize for availability. Always provide enough capacity to service incoming requests, even if it means overpaying slightly. This is a good choice for production workloads that require maximum availability.
-
Optimize for cost. Minimize account spend, even if this means dropping a few requests. This strategy is best for dev/test and for background processing (asynchronous) workloads.
-
Balance availability and cost. Spend a little more than you would in a pure cost optimization strategy to ensure availability.
Using AWS Auto Scaling
It can take considerable time to implement an autoscaling strategy directly across all application components using Infrastructure as Code. Until then, you can use AWS Auto Scaling to apply a consistent autoscaling strategy across your stack.
With AWS Auto Scaling, you can find scalable resources by tag or by the CloudFormation stack that created them. This simplifies applying consistent properties to different stacks - e.g., creating separate strategies for dev, test, and production deployments.
To get started, you'll need a set of resources supported by AWS Auto Scale. I cobbled together a small CloudFormation template to demonstrate this. The template launches two resources:
- An autoscaling group containing a Web server. (AWS Auto Scale requires an existing autoscaling group to work its magic; you can't just point it at a standalone EC2 instance.)
- A DynamoDB table. The IAM role for the Web server instances enable read/write access to the table.
First, download and launch this stack in your AWS account. Then, to create a new autoscaling plan, navigate to AWS Auto Scale in the Management Console. Select the CloudFormation stack you created.
On the next page, you can select your scaling strategy for each scalable resource that your stack supports. You should see two entries - one for your EC2 autoscaling group and one for your DynamoDB table.
For each of your scalable resources, you can select one of the strategies I discussed above - optimize for availability, for cost, or for a balance of both. You can also create
a custom policy using your own settings.
In the example above, our scaling plan will observe average CPU utilization across all EC2 instances in our autoscaling group. If average utilization exceeds 40%, the scaling plan will add a new instance.
AWS Auto Scale will also offer to enable two other intelligent options for you. Dynamic scaling changes how quickly you scale out based on current traffic patterns. Predictive scaling uses AI algorithms to monitor your stack over time and scale out ahead of an anticipated traffic surge.
Creating reusable autoscaling strategies
Using the console is, at best, a short-term solution. Ideally, you should encode all of your autoscaling solutions into your automated DevOps deployments using Infrastructure as Code (IaC).
Assuming you're driving your IaC deployments with CloudFormation, there are two ways to do this. The first is to create autoscaling CloudFormation settings for every service you need to scale. AWS provides some starter snippets for all of its services that support autoscaling.
You can also write your AWS Auto Scale plans in AWS CloudFormation. This brings all of the benefits of AWS Auto Scale in the console into your DevOps pipeline.
Conclusion
Autoscaling on AWS can be challenging to get right. AWS Auto Scale provides a quick way to add scaling behavior to your dev, testing, and production stacks while you craft a reusable IaC solution.