Instances

Instance recommendations aim to save money while optimizing CPU and memory utilization. Even if you think that your instances are sufficiently sized to meet your workload, you may be missing out on savings that you could achieve by downsizing, moving to newer instance families, or moving to better-optimized instance families.

Note that in order for Cloud Optimizer to obtain memory utilization information, you’ll have to install the AWS CloudWatch Agent on your instances.

Scope

There are two recommendation strategies:

  1. Holistic
  2. Non-holistic

The default strategy is non-holistic, but you may change this value on the Policies screen.

Holistic

With this strategy, instance recommendations are made assuming that they will all be done together. This results in maximal cost savings, assuming that each instance has the proper policies set.

When using this strategy, you may see counter-intuitive recommendations that would appear to cost you more money than you’ll be saving. However, this will occur when we recommend moving instances to a different family, freeing up reserved instances for other instances to consume, ultimately resulting in better savings.

As long as the majority of your instances are properly sized and your reserved instances generally match your instances, then your recommendations will consist of simple resizing operations. It is only when your reserved instances are unbalanced and many of your instances are over- or under-provisioned that you’ll notice significant improvement with the holistic strategy.

As a result, the holistic strategy is very beneficial because it will try to maximize your savings, using every reserved instance and savings plan that you’ve already paid for.

Non-holistic

With this strategy, instance recommendations are made assuming that they will be done one at a time. If there are many recommendations, doing them all at once may not result in the optimal result when using reserved instances. This is because each recommendation assumes that it will be the only recommendation that is performed.

For example, if you have some t2 reserved instances unallocated, we may recommend that you change 20 t3 instances to t2 instances. Doing any one of those recommendations would save money, and this would be reflected on the next day once everything has been re-scanned. But if you did all of those recommendations at once, the unallocated t2 reservation would be consumed and you would be moving cheaper t3 instances into more expensive t2 ones.

How we find the best savings

The first step is to figure out the minimum number of CPUs and amount of memory needed for each instance. Based on the policy for the instance, we set upper bounds on the CPU and memory utilization that is tolerable (for example, you may want to keep CPU and memory utilization each below 80%). For the number of CPUs, we compute the minimum number of CPUs by applying the current CPU utilization data from the time span to a variety of options, doing a binary search to find the minimal amount. Burstable and non-burstable instance types are handled differently, since burstable instances allow you to burst over the maximum CPU for the instance type for some time. After all of this is taken into account, burstable instance types typically require considerably fewer CPUs than non-burstable ones, but it depends on the workload of the instance. For memory, we compute the miniumum amount of memory necessary by taking the maximum memory used during the time span and finding the smallest amount such that the current maximum is at the threshold.

For example, we might determine that the instance will require 0.68 burstable CPUs (or 1.96 non-burstable CPUs) and 1.2 GB of memory.

In order to find the minimal set of instance types to consider, we find the smallest current-generation instance type in each family that meets the requirements. Anything larger, within each family, would also work, but increasing the size of the instance types more than necessary will not save any money.

Sometimes, the current instance type does not meet the policy requirements of the instance. In this case, the instance is under-provisioned and will need to be resized to an instance type with more capacity.

In addition to the minimal set of instance types, we look for everything between the minimum requirements and the current instance type (or the smallest one within the family, if the current one is under-provisioned). Any instance types that cost less than or equal to the current instance type (or the smallest one within the family, if the current one is under-provisioned) are included. This provides you with a series of incremental recommendations that are all along the path toward the greatest optimization.

Policies

CPU: Maximum utilization

Default: 80%

This is the maximum CPU utilization allowed. We’ll recommend growing instances to keep the CPU utilization below this value, and we’ll recommend shrinking instances if their CPU utilization is much lower than this value.

In order to maximize savings, you should be running your instances close to the highest CPU utilization that you can tolerate, since instance types are priced by CPU.

Note that for burstable instance types, we treat this as the allowed utilization of the burstable CPU amount. For example, for a t2.medium, while there are 2 CPUs, you are only allowed to use 40% of 1 CPU on average, so our theshold is based on the base performance of the CPUs, not the actual number of them.

CPU: Maximum total over-provisioned time

Default: 1 minute

This is the maximum amount of time, in minutes, that the CPU can be continuously pegged at 100%.

CPU: Maximum total over-provisioned time (burstable)

Default: 1 minute

For burstable instances (such as the t2 and t3 families), this is of the number of minutes the CPU can be continuously pegged at 100%. The maximum meaningful value that this can be is one day, per Amazon’s credit usage policies.

CPU Maximum over-provisioned time

Default: 1%

This is the percentage of the total time that the CPU can be pegged at 100%. Every minute that the CPU is at 100% is added to a grand total, and that grand total is compared to the total amount of time in the analysis period.

Memory: Cutoff percentile

Default: 95th percentile

Throughout the time span, the memory may go above the utilization threshold. This number is the percentile to use to when calculating the maximum memory detected throughout the time span.

When using the 95th percentile, the top 5% of the peak memory measurements are thrown out when calculating the overall maximum.

You may set this to the 100th percentile to use the actual maximum.

Memory: Maximum utilization

Default: 80%

This is the amount of memory that the system can consume normally at any given time.

Time window

Default: 7 days

This is how large the data window needs to be for making a recommendation. Setting this to a smaller value means more frequent recommendations, but they are likely to change day by day. Setting this to a larger value means that new recommendations will take longer to appear after making a configuration change, but they are likely to be more stable.

Longer time spans may provide a better understanding of the performance, assuming that the workload does not change rapidly, but this also means that you have to wait longer to receive a new recommendation after making any changes. Shorter time spans provide for faster recommendations and are good for dynamic workloads.