23 juni 2022
How we use custom resources to deal with unsupported resources, circular dependency and more.
cloudformation custom resource
Cloudformation(Cfn) is a commonly used IaC tool for deploying resources into AWS. It has been out there for a long time now and is a quite popular choice among engineers to set up AWS infrastructure using it. However, there are certain limitations with Cfn as listed below.
This is not a complete list, but only specific in the context of this article.
1. All AWS resources are not covered under Cfn. A few AWS services are not supported in Cfn (e.g. AWS S3 batch) or sometimes a part of the service is not included (e.g. tagging of AWS event rule). In such a situation, you use AWS APIs to fill the gap.
2. There are certain scenarios where you end up in an egg-chicken situation i.e. circular dependency in Cfn terms. A very common example is setting up an S3 bucket with Lambda as a target for event notification. Such cases can be handled by a 2-step deployment approach or use APIs outside of Cfn stack. However, it is not desirable and requires human intervention.
3. Cfn can only provision AWS resources. If your application has a dependency on an external non-AWS service, you can not make it part of the Cfn lifecycle. In such cases, you might use Terraform which has support for multi-cloud.
Cfn Custom Resources is the solution to all these limitations. AWS also released a new feature named “Cloudformation Registry” and it is the successor of custom resources with more advanced capabilities. For this blog, we will only focus on custom resources. In the next section we will take a deeper look at what it is and how we can use it to extend cloudformation beyond its regular offerings.
what is Cfn custom resources?
Custom resources allow you to execute a custom script/logic within the Cfn lifecycle. When you create, update or delete a Cfn stack, the custom script also runs during the template deployment phase. This way you can embed any programmable functionalities into your stack; such as calling an AWS/Non-AWS API, seeding your dynamo table, or uploading a file in an S3 bucket. There are certain rules you need to remember while developing a custom resource. It is very important to adhere to these rules, else it can lead to a situation where your stack execution stucks for hours.
how to get started?
There are 3 simple steps for using a custom resource
- Write an AWS lambda function with the logic you want to execute during Cfn stack deployment
- Deploy the lambda function or make it part of the stack. The latter is advisable as all the IaC code can reside in the same stack and follow the same lifecycle
- Invoke the lambda function by referring to it in the script with a resource type “AWS::CloudFormation::CustomResource”. Here is a complete example:
Resources: MyCustomResource: Type: 'AWS::CloudFormation::CustomResource' Properties: ServiceToken: arn:aws:lambda:us-east-1:12345:function:fnName key_1: value_1
“ServiceToken” is the most important part i.e. this is how you refer to a lambda function. You can also pass some contextual data as a JSON payload to the lambda function ({“key_1”: “value_1” } in this example)
Note — You can also use SNS topic instead of a lambda function. To keep it simple we will only use the lambda function in the latter part of this blog.
under the hood
Before implementing let’s see how the custom resource works under the hood. This will set up the foundation before we dive into the programming model.
Custom resource lambda function is a webhook for cloudformation’s lifecycle events i.e. create, update and delete. When you deploy the stack for the first time, Cfn will trigger the lambda function with the “Create” event. All your custom logic will be executed in lambda. The lambda is also expected to write the output to an AWS-managed S3 bucket in a specific format that Cfn understands. While the Lambda function runs, Cfn keeps on polling the S3 bucket for the result. Once it receives the result the stack deployment continues with the subsequent steps.
programming the custom resource
Now comes the tricky part — coding the lambda function. This is where most first-time users of custom resources tend to make mistakes. The custom resource lambda function follows a strict programming model and we need to stick to certain guidelines. Below are a few points to keep in mind —
1) When Cfn triggers the lambda, it receives a payload in this format —
{ “RequestType” : “Create”, “ResponseURL” : “http://pre-signed-S3-url-for-response", “StackId” : “arn:aws:cloudformation:region:1:stack/stack-name/guid”, “RequestId” : “unique id for this create request”, “ResourceType” : “Custom::MyCustomResource”, “LogicalResourceId” : “MyCustomResource”, “ResourceProperties” : { “key_1” : “value_1” } }
The important attributes are:
RequestType — It will either Create, Update, or Delete. When you deploy the Cfn stack for the first time, lambda will receive the RequestType as “Create”, While deleting the stack it will receive “Delete” and while updating it will receive “Update”
ResponseURL — This is a pre-signed S3 URL where the lambda function writes the output i.e. SUCCESS/FAILED, and some other metadata for Cfn
ResourceProperties — This is the payload you define in the Cfn stack’s custom resource
2) After you execute the logic in your lambda function, it should write output to the S3 signed URL with this payload —
{ "Status" : "SUCCESS", "RequestId" : "unique id for this create request", "LogicalResourceId" : "MyCustomResource", "StackId" : "arn:aws:cloudformation:region:1:stack/stack-name/guid", "PhysicalResourceId" : "MyCustomResourcePhysicalId", "Data" : { "output_key_1" : "output_value_1" } }
The two most important attributes are
Status: SUCCESS/FAILURE — This tells Cfn whether your custom resource was successfully created/updated/deleted. This also decides the overall status of the Cfn stack. If the lambda function sends a failure signal, the entire stack will rollback which is the default behaviour of Cfn
Data: With this key, you can send some useful information back to Cfn and make use of it in other parts of the stack using Fn::GetAtt intrinsic function
3. The key differentiator of programming lambda function in general and a “custom resource” lambda function is that the latter must send a response to the signed URL before it terminates (and optionally returns an output).
what happens if you forget to send the response to the signed url?
This is when things can get nasty. A custom resource is implemented in an asynchronous programming model i.e. Cfn expects a callback from the lambda function. It does not rely on what the lambda returned in the invocation result, rather it expects that the lambda should make an HTTP request to signed-URL and deliver the result. If you forget to code this in the function then Cfn stack will be stuck for hours until it times out.
can you delete the stack instead of waiting on a stuck stack?
The answer is NO. This is because Cfn will again trigger the lambda with the “Delete” event and waits for a callback from the lambda. As you have not programmed to send a callback, you will see the stack stuck in “DELETE_IN_PROGRESS” state which again lasts up to hours.
what if lambda errors out?
The answer is the same i.e stack gets stuck. Let’s say you have implemented the callback logic, but the lambda failed due to an uncaught exception or timed-out and prevented the execution of the callback logic. In this scenario also, Cfn will keep waiting in anticipation of a callback from the function which it never gets.
Is there an easy way?
The good news is YES. The internal mechanics do not change though and all the above principles are still applicable. However, instead of you doing all the heavy lifting you can make use of a helper library. The helper library makes sure the function calls back with an appropriate response and handles any uncaught exception and edge cases. This leaves you only with handling the custom logic like any other Lambda function. There are quite a few open-source utility libraries in most of the lambda-supported programming languages. If you are using python, crhelper is a great choice. It’s very simple to start and eliminates all the above complexity we talked about. Additionally, it has the support for long-running processes exceeding lambda’s 15 min runtime by enabling a polling mechanism.
Here is the example code skeleton where you need to just fill the placeholders under the decorated functions.
from crhelper import CfnResource import logginglogger = logging.getLogger(__name__)helper = CfnResource(json_logging=False, log_level='DEBUG', boto_level='CRITICAL', sleep_on_delete=120, ssl_verify=None) try: ## Init code goes here pass except Exception as e: helper.init_failure(e) @helper.create def create(event, context): # Handle create event helper.Data.update({"test": "testdata"}) return "MyCustomResourceId" @helper.update def update(event, context): #Handle update event @helper.delete def delete(event, context): # Handle delete event def handler(event, context): helper(event, context)
conclusion
We learned quite a few important mechanics of custom resources and built the foundation knowledge required to get started with it. And most importantly the “gotchas” — use it in a proper way to eliminate undesired behaviour. Custom resources are a great way to fill the shortcomings cloud formation has at the moment. It en-powers you to do anything and everything with Cfn.