Cloud Squeeze – AWS Cost Optimization & Right Sizing with AI

AWS Trust Advisor, Alternatives vs. Cloud cost optimization approaches

Published on March 22, 2018
Share this on

The cloud cost calculators computed savings for moving on-premises infrastructure to the cloud. Based on IDC research, traditional on-premises infrastructure spend is shrinking currently at 55%, with public and private clouds increasing at 25% and 8% respectively. The savings of CAPEX to OPEX and on-premises to the cloud movement is real.

“What one didn’t anticipate is the ease of spawning new resources and the shock of the easy visibility to aggregate and monitor utilization in the cloud would bring.”

According to a Business Insider report, companies waste $62 Billion in the cloud by paying for capacity they don’t need. This vast excess spends spawned various approaches to cost optimization. The methods of such optimization listed in this article, are shared across the big three cloud providers – AWS, Azure & Google. In this article, we will look at specifically AWS Trust Advisor, and different categories of cost savings tools and approaches.

The reason for picking AWS Trusted Advisor, is because of the way it works, it is disclosed publicly and referenceable. In the cost optimization category, the checks fall into these nine categories:

  • Amazon EC2 Reserved Instances Optimization
  • Low Utilization Amazon EC2 instances
  • Idle Load Balancers
  • Underutilized Amazon EBS Volumes
  • Unassociated Elastic IP Addresses
  • Amazon RDS Idle DB Instances
  • Amazon Route 53 Latency Resource Record Sets
  • Amazon EC2 Reserved Instance Lease Expiration
  • Underutilized Amazon Redshift Clusters

Using a more standardized approach, we can group cost-saving elements into one of these five general areas (The purpose of this is to understand the benefits of the sequence and order in which these steps are taken):

  1. 1. Idle capacity
  2. 2. Reduce under-utilized capacity
  3. 3. Misappropriated allocation
  4. 4. Life-cycle management
  5. 5. Reservation

If we regroup, the core Trusted advisor approach against these five categories, it becomes:

  1. 1. Idle capacity:
    • Idle load balancers
    • Amazon RDS Idle DB Instances
  1. 2. Reduce under-utilized capacity
    • Low utilization Amazon EC2 instances
    • Underutilized Amazon EBS volumes
    • Underutilized Amazon Redshift clusters
  1. 3. Misappropriated allocation
    • Unassociated Elastic IP Addresses
    • Amazon Route 53 Latency Resource Record Sets
  1. 4. Life-cycle management
    • (none)
  1. 5. Reservation
    • Amazon EC2 Reserved Instances Optimization
    • Amazon EC2 Reserved Instance Lease Expiration

Now, let’s look at the definition of underutilized resource for Trusted Advisor.

aws definition of under utilized trusted advisor

“Trusted advisor checks the Amazon Elastic Compute Cloud (Amazon EC2) instances that were running at any time during the last 14 days and alerts you if the daily CPU utilization was 10% or less and network I/O was 5MB or less on 4 or more days.”

It takes four consecutive days of lower than 10% utilization to trigger this alert of an underutilized resource.

Here is a picture of what 90% spare capacity resource in the cloud looks like:

The cost of measuring underutilization at 10%

cost of under utilization in the cloud

One of the easiest ways to arrive at cost savings is through “Reservations,” a focus area of “Trusted Advisor” and the fastest way to see cost savings. For companies that have come to the cloud from the CAPEX purchasing patterns, this is easy to understand and lock in footprints of instance types in a region and reduce up to 30-60% compared to ad hoc costs. If a resource is well above the underutilization threshold (4 consecutive days of 10%), then cost savings through reservations becomes the method for realizing measurable cost savings.

The contract terms vary from one year or three years, with options to pay nothing up front, partial up front (for a lower hourly cost) or full upfront for no hourly computing cost. AWS enforces this committed purchase. If you purchase a one year of no upfront term and then realize you do not need that resource type, you continue spending against the lower rate even if you are not using it. A new breed of convertible reservations allow more flexibilities to change instance families and types within a region with smaller cost savings compared to standard contracts, where the flexibility reduces cost savings. The contract cost vary based on the operating system, and the add-on software licensing involved.

The capacity planning and forecasting models for applications relies on a progressive growth of utilization with the peak capacities reached over a three year period. The reality of this capacity planning and forecasting studies arriving at the predicted usage levels three years out, has to be challenged, needs new thinking!

There are various types of virtualized container types in the cloud. At every 10% variation of workload, there is an instance type out there for it. The instance variations have different levels of computing power, memory, networking abilities and storage access differences. Deploying on such containers does not mean the software can leverage these advances in virtualization! Many people fail to understand scaling down and scaling up a resource type is a 2-minute process.

If you look at the various alternate cost optimization approaches in the market, they tend to have a core strength in one, with some mix of others in one of these five categories:

1) Cost analysis: The core focus of these tools is to help you measure, manage and analyze your costs, and allocate charges back to the right cost-center. Some of these perform cost savings, and some include arbitrage strategies to lower cost through reservations. Large companies and some MSPs use these to allocate costs to their clients. Visual elements of drill-down analytics and measuring by various tags, types are the focus.

cost analysis template

(referenced from smartsheet)

Cost savings in this type of ‘cost savings tool‘ category comes from your ability to organize your resources into tags. In-depth knowledge of consumption pattern trends along with future planning insights, along with leveraging cost arbitration through reservations produce the cost savings. Typically this category of tools is used by accounts and finance skillsets.

2) Reserved Instance (RI) planning: The vast majority of cost optimization tools fall in this category, they are glorified extensions of Trusted Advisor with a different threshold mechanism from AWS (underutilization threshold isn’t four days of under 10%). What differentiates these from Trusted Advisor is the types of misconfiguration and misappropriation analysis they perform, but the savings primarily come from the OPEX to CAPEX switch through reservations, irrespective of levels of underutilization or starvation of resources analysis. Traditional thresholds and business intelligence style thought processes dominate the recommendations.

Capex or Opex

Cost savings in ‘Reserved Instance (RI) planning tools‘ comes from your ability to fork capital up front or commit for one to three year periods instance types, families in specific regions based on knowledge of future trends. These tools leverage business intelligence type analysis to drive decision making for cost savings. Strong understanding of Net Present Value (NPV), the value of future cash flow streams analysis produces the justification necessary to arrive at savings.

3) Shutdown-restart time-based automation: If your development team works from 9-5, then some of these cloud resources can be shut down when not in use. These tools orchestrate powering on and powering off of resources on set schedules. Cost savings is realization occurs through – orchestration of fleets of virtualized containers through shutdown and restarts. This nature of stop-start led to a new class called spot instances, where prices are dramatically lower usually after hours, or instance types become available in the cloud.

Shutting down a computing class eliminates the compute costs, but you may incur storage, network penalties for associated resources, which have to compared against certain classes of instances like the T2 level that credits for low usage time periods and bursts at other.

start stop automation of cloud resources

Cost savings from ‘Shutdown-restart time-based automation tools‘ come from the planning of shutdowns, restarts, coordination of cloud resource availabilitiy and successful implementation of these schedules and routines.

4) Spot instance arbitration tools: These are a newer breed that discovers low spot instances pricing that becomes available and moves specialized virtualized containers between various spot to standard instance types in near real time. These require running workloads in another virtual container layer that abstract the movement of workload to different container types. The easiest analogy for this is like a 7×24 heart-bypass operation theatre, looking for the lower cost container type, with a possibility of resuscitation but workload will live.

heart bypass

(referenced from wikipedia – heart bypass surgery, Rome)

Cost savings from ‘Spot instance arbitration tools‘ comes from the determination of a price point below which your computing workload can do all the processing it can find. You have a massive queue of work (‘batch type’) that requires computing and price thresholds below which this makes business sense, (example mining of bitcoins) and the ability to use various compute classes to produce the cost savings.

5) Right-sizing with ML or AI: A newer category that specializes in matching the compute workload to the right sized cloud virtual container type. These tools perform analysis on available data in the cloud and machine learning or deep-learning models on CPU, multi-threading, memory, network, disk and more to detect the utilization across and leave a small room (20% or less) for growth. Resizing up or down can be done in minutes when the workload demands it. These apply to storage usage patterns with analysis of usage, storage family classes and help guide for better lifecycle management policies. As these computations are intense and there can be tens of thousands of EC2 instances in an account, they can take time to perform. These are built on machine learning or artificial intelligence because business intelligence approach cannot arrive at these results.

Cost savings from ‘Right-sizing with ML or AI‘ tools come from the tools ability to have a lot of learned or unlearned data collected over the years, about virtualized container types, limits, capabilities, and your workloads. Data collected from the cloud providers are assessed against container types to determine right-sizing, which require in-depth analysis, not suitable for business intelligence type analysis. Cost savings come from resizing, configuration changes with reservations adding to the overall cost savings achieved.


As it relates to costs, Trusted Advisor is free if you have a support agreement, which is 10% of your overall cloud spend. Other tools vary from 2-10% of your cloud spend, depending on current market share and the type of approach used.

Managing cloud resources can be easy when approached in the right sequence. If a reservation is the first approach in the tool belt to cost optimization, then we are adding to the $62 billion in unused cloud capacity, while still realizing some cost savings. At Software WORX we believe the sequence to a reservation is one of the last steps in cost optimization,

This enables us to answer the question:

What is the action I can take today, to reduce your cloud overspend, currently at $62 billion?

The spend for yesterday isn’t something you can change, but you can change what the cost you are accruing for tomorrow onwards!

In summary, cloud cost optimization is large and growing space with $62 billion in overspend and growing. Understanding the types of approaches in the market to arrive at cost reduction, when reviewing tools like Trusted Advisor, cost analysis focus tools, an RI (reservation) planning tool, fleet restart automation tools, spot arbitrage tools, or AI based right-sizing approaches.

whitepaper - cloud capacity planning


Receive this white paper and updates to transform the way you think about capacity planning