Issue
I'm looking for a procedure that I can use to replace a specific instance in an AWS scalegroup, all the while maintaining AZ "balance" and not reducing capacity while waiting for a new instance to provision.
Occasionally, we may have reason to terminate a specific EC2 instance in a scale group, and have struggled to have an efficient procedure for doing this. I know that I can terminate the instance directly and it will be replaced, but that reduces the overall capacity of the scalegroup temporarily while waiting for a new instance to provision. In our case this is tens of minutes as we must setup and deploy our software before the ALB can send requests
If we increase the desired_capacity
by 1, we can prepare a new instance in advance - but there is no guarantee that it will be created in the same AZ as the instance we wish to terminate. In addition, if I terminate the offending instance, and immediately reduce the desired_capacity
will the scalegroup terminate another instance?
So what is the best way to manage this procedure?
Solution
You can temporarily suspend and resume specific scaling processes. With this feature you can achieve the desired result in multiple ways, two of which I've described below:
A: Use the Auto Scaling Group's rebalance feature
- Increase the Auto Scaling Group's desired instance count by 1 and wait for the new instance to be available
- Temporarily suspend the
Launch
scaling process (this prevents an automatic launch of a new instance during the next step) - Terminate the faulty instance
- Decrease the Auto Scaling Group's desired instance count by 1 (the number of desired instances and the actual number of instances should now be in sync again)
- Resume the
Launch
scaling process. If the remaining instances are unbalanced the Auto Scaling Group'sAZRebalance
process will pick this up and gradually rebalance across the AZs.
B: Explicitly start a new instance in the desired AZ:
- Start a separate instance in the desired AZ
- Temporarily suspend the
Terminate
scaling process] (this prevents an automatic termination of the additional instance during the next step) - Attach the instance from (1.) to the Auto Scaling Group
- Terminate the original instance (the number of desired instances and the actual number of instances should now be in sync again)
- Resume the
Terminate
scaling process
Answered By - Dennis Traub