Rightsizing ec2 instances for cost optimization with respect to RAM/CPU and price-CodePudding

I'm using several t3.small instances as my worker nodes for AWS EKS. As per the documentation, the cost per hour for the t3 family in my region is:

Instance name   On-Demand hourly rate   vCPU    Memory    Storage    Network performance
t3.nano     $0.0054 2   0.5 GiB EBS Only    Up to 5 Gigabit
t3.micro    $0.0108 2   1 GiB   EBS Only    Up to 5 Gigabit
t3.small    $0.0216 2   2 GiB   EBS Only    Up to 5 Gigabit
t3.medium   $0.0432 2   4 GiB   EBS Only    Up to 5 Gigabit
t3.large    $0.0864 2   8 GiB   EBS Only    Up to 5 Gigabit
t3.xlarge   $0.1728 4   16 GiB  EBS Only    Up to 5 Gigabit
t3.2xlarge  $0.3456 8   32 GiB  EBS Only    Up to 5 Gigabit

When deciding between a t3.small and t3.medium, the t3.medium is double the cost of a t3.small, having double memory but the same number of vcpus.

Money-wise, why would I use a t3.medium if I can spawn 2 x t3.small for the same price and benefit from the same amount of RAM and double number of CPUs?

CodePudding user response：

Well if you have bigger machines, you get the benefit of less kubelets by count running on your EC2s, meaning instead of having 3 kubelets, you'll only need 1. Having 3 smaller instances require more daemons (CloudWatch, SSM) and human/automation effort for maintenance too (Security, IT upgrades, Patch updates, etc)

Would also recommend upgrading to t4g.small instances as they offer you the following benefits:

Better CPU performance
Less electricity consumption
Lower cost

If you'd like to know more about the benefits of an upgrade to newer generation, you can read this article.

If you upgrade to t4g.medium, you'll get $ 0.013 more expensive than t3.small instances but then $ 0.012 lesser than t3.medium instances. However, this means that the upgrade will get you more CPU and RAM and lesser kubelet count too!

We can also include the blast radius of failures! Having one big instance is cheaper in terms of resources (Hardware and Manpower) but the blast radius of a failure becomes bigger too. Its a known trade off where having smaller nodes offer more resiliency against failures but then requires more manpower. If you'd ask me what works best, it really depends on the context of each project and its only you that could tell what is more important(between having cheaper compute or higher fault tolerance) for your team to achieve your goals!

CodePudding user response：

There could be various reasons for this.

Double the maintenance with 2 instances
Cost is the concern but it's not the ONLY concern
Many people use smaller "t" type instances as spot instances. So it could be that Medium is available and Small might not
What if one doesn't need more CPU and only need more RAM

Many of the above reasoning will fit in nicely let's say if you have to run a docker container or an application which you can't slice into 2.