Perfect Spot Instance's Imperfections | part-I

In this blog I am going to share my opinion on spot instances and why we should go for it. While I was going thorough the category(on-demand, reserved, and spot) that AWS provides to launch our instances into, I found spot instances very fascinating and a little challenging.

What I found about spot instances is that they are normal ec2 instances. But what makes it different from the other two(on-demand, reserved)? What strategy do AWS uses for spot instances to make it cheaper than other two and why? Let’s know about these first.

With AWS continuously expanding their region and Availability Zones in their region, they are left with huge amount of unused capacity. How AWS take advantage of their unused capacity? AWS floats its spare capacity on market on a very low base price and allows us to bid on instances and the person with the highest bidding price is provided with the instance, however the price that person pays is only market price i.e if market price is $1 for t2.micro instance and you places a bid of $2 on t2.micro then you will get that instance but the price you will pay is market price i.e $1 only. Interesting?? Let’s bring more fascinating things by comparing the prices of the all three.

Discounts	Types	Details
0%	On-demand Instances	No commitment from your side. You pay the most. Costs fixed price per hour.
40%-60%	Reserved Instances	1 year or 3 year commitment from your side. You save money from that commitment. Costs per plan.
60%-90%	Spot Instances	No commitment from AWS side. Ridiculously inexpensive. Costs based on availability.

With this information you must be thinking to try out spot instances at least once. Since we know that every interesting thing comes with a price, spot instances too have a downside “AWS can take back spot instances from you anytime”. Upset? Don’t be, cause this blog built with the purpose of overcoming its downside only. After all you won’t mind spending your 5-10 minutes only for saving in dollars.

Let’s start then…

Now you know that AWS is ready to give their huge spare capacity in the prices of our choice but with a promise to take the capacity back when they want, giving us a warning of two minute before interruption. We can manage the interruption wisely and the proof is some of the organization are already taking full advantage of the spot instances.

Before we go to the core concept let’s build some required concept that will help us to understand core concept with ease.

Spot Instance v/s Spot Fleet

With normal spot instance request, you place a bid for a specific instance type in anyone or specific Availability Zone and hope you get it.

With spot fleets, you can request a number of different instance types that meet your requirements. Additionally, you can spread your spot fleet bid across multiple Availability Zones to increase the likelihood of getting your capacity fulfilled.

Interruption Notice

When AWS take our spot instance back they provide interruption notice 2 mins before so that we can perform some actions.

Next, it is also necessary to know about Cloudwatch Rules. AWS provide the event type on the basis of which you can perform actions like triggering lambda function, sending notice over mail or sms etc.

One of the event types that you can monitor is ec2 state change to running.

Now with this much knowledge we are good to go. And I will show you how you can automate the interruption to avoid the risk of downtime.

This is the main diagram stating all the components that are used to automate the interruption. Now suppose one of the spot instances has been interrupted and AWS is going to take that spot instance back. Let’s see what happens then.

When AWS is to take one spot instance back, AWS will give interruption notice upon which a cloudwatch rule is created to monitor the interruption notice and then;

Lambda function is triggered. And then;

Lambda function increases the desired capacity of Auto scaling group to 1. Due to which an on-demand instance gets launched into Target Group and the interrupted spot instance gets terminated.

Now when on-demand instance is launched and its state changes to running then,

Another cloudwatch rule monitor that change and

Cloudwatch will trigger another lambda function, and then:

Lambda function will modify the spot fleet request capacity to 2 which was previously 1, this will launch a spot instance in the same Target Group and now we will have

1 more spot instance is being launched and when its state changes to running then,

Again cloudwatch rule comes in action upon state change to running of just launched spot instance. Then

This will again trigger associated lambda function and

Lambda function will set the desired capacity of ASG to 0 again due to which the on-demand instance under the target group will get terminated. And finally we will be left with the following:

Again we are at the same place i.e, 2 spot instances get maintained under the Target Group always.

Note: For the purpose of demonstration, I have taken two instances initially, however you can have any number of instances. You can customize this according to your needs and constraint. The whole infra is automated with terraform which will create and link everything presented above. Link to clone the repo is provided at the second part of this article.

Are you excited to implement this concept? I am equally excited to share the real implementation with you. With the next part coming very soon, I want you to try the implementation by yourself. In the second part I will help you to implement the whole concept. See you soon…