Security September: Cataclysms in the Cloud Formations

22 September 2020

This is the fourth of a 5-part series on AWS exploits and similar findings discovered over the course of 2020. All findings discussed in this series have been disclosed to the AWS security team and had patches rolled out to all affected regions, where necessary. A big thanks to my friend and fellow Australian Aidan Steele for co-authoring this series with me. This post was written by Aidan.

Back in November 2019, AWS CloudFormation added support for “resource providers” - a new and improved way of extending CloudFormation with user-authored resources. Not just that, it is also the way that all new resources are added to CloudFormation by AWS themselves. This was a substantial improvement over “custom resources”, which hadn’t seen any change since at least 2013 other than Lambda support in 2015. This post won’t get into the differences or benefits of resource providers, for that check out this AWS blog post.

On January 17th, Ben Kehoe CloudFormation extraordinaire sent me a Twitter DM with a link to a GitHub pull request that set off his spidey-senses. I was excited to receive this because a) Ben thought of me! and b) it combined CloudFormation and credentials in the cloud in a novel way, two of my favourite things. This was the excuse I needed to finally dig into resource providers - albeit not until the 21st when I had some time.

Pulling things apart

The first thing I noticed is that (unlike custom resources) the Lambda function you are authoring ends up not running in your AWS account - it runs in an AWS-managed account. I was immediately curious about what permissions my code had in their account - you’d hope it’s locked down! I was also curious about how the code I wrote appeared to have a role in my account - where was the role being assumed?

Rather than try to understand the framework provided by CloudFormation and how things are meant to work, I decided to drop down a level and log the raw input to the Lambda function, to see how things actually work. This is an excerpt of the input passed to the Lambda when a resource is being created:

So there are at least four different sets of credentials in play here:

The callerCredentials: These are the credentials that I am expected to use. They are for the role assumed in my account to actually create, update, delete, etc the resource in question.
The providerCredentials: These are the credentials for the LogAndMetricsDeliveryRole role in the CloudFormationManagedUploadInfrastructure stack, i.e. for writing logs that appear in my account.
The (not pictured) Lambda execution role’s credentials: this appears to have permission only to write logs within the AWS-managed account.
The platformCredentials: These are the interesting ones. They are for a role in the AWS-managed account. That role has permission to call a handful of permissions, but the ones that caught my eye were events:PutRule, events:PutTarget, events:RemoveTarget, events:DeleteRule.

Why does the Lambda function need permission to invoke those EventBridge APIs? Resource creation might take a long time and Lambda is limited to 15 minutes, so the service has support for reinvoking the Lambda periodically to ask “is it done yet?” and EventBridge cron jobs are a great fit for this. The way it works is by creating a one-off rule scheduled to run one minute in the future, with a target of the same Lambda. It specifies that the Lambda should be reinvoked with a hardcoded input string - namely the input that this invocation of the Lambda received.

Curiosity intensifies

This piqued my curiosity. What if I used the API to trigger a different target, like a Lambda in my account? No luck, they must have used the events:TargetArn condition key to restrict the target to only this Lambda. So I tried something else: what if instead I tried a different rule? I tried to create a rule that would invoke my Lambda in their account that matched events with {"detail-type": ["AWS API Call via CloudTrail"]}. I was immediately inundated with a firehose of events. I rushed to turn it off as it wasn’t my intention to break things, just curiosity.

I looked at the events. Was this really a security issue or more a “that’s cute” kind of thing? Most events were uninteresting. But one caught my eye. It was a call to events:PutTarget. The CloudTrail event included the payload. I’ve highlighted the noteworthy part.

In order to reinvoke the function a minute later with the same payload, we established earlier that the target was created with a constant JSON input. That input includes credentials for a role in the customer’s account. Multiple (all?) customers are serviced by the AWS-managed account. So I could see credentials for other AWS customers using resource providers. They might be using it without realising: AWS are using this pattern for first-party types - I saw AWS::WAFv2::RuleGroup fly by in the logs.

Responsible disclosure and rapid remediation

That afternoon I reported the issue to the AWS security team. They got back to me within 9 hours. They asked for some clarification (fair, I was a bit frazzled when I wrote the first email!) and any reproduction steps I could provide. Given I’d slapped together a very manual proof-of-concept, I set about creating a more useful reproduction for them.

The CloudFormation service team also reached out to me. They indicated that they had actually already caught this internally and a fix was underway. A few days later on January 25th, they told me that a fix had been deployed to a couple of regions and asked if I could try again. A few minutes later, I got a follow up from the AWS security team with the same request - things move surprisingly quickly at AWS!

I tried again. After creating my rule I got a few events, but it quickly dried up without me even needing to turn it off. I looked through the events recorded. I didn’t see any of those problematic ones before, no matter how many times I tried. I did see this interesting one though:

Clever! They had created an alarm that would trigger whenever a malicious rule was created. This was a solid tactical mitigation. Unfortunately the IAM policy conditions available on events:PutRule weren’t rich enough to block me outright, but this stopped the bleeding. I also noticed the unusual absence of any of the events:PutTarget API call events. I don’t know how they did this as omitting those from CloudTrail/EventBridge isn’t possible for customers’ accounts. I guess AWS have some secret sauce.

Closing out

On January 31st I got a follow up email from AWS Security letting me know that the issue has been resolved.

On June 30th the CloudFormation team released version 2 of the resource provider protocol. It came with a number of benefits, but the change most visible to me was that the platformCredentials field had disappeared. It is no longer necessary as CloudFormation itself handles the reinvocation of the Lambda function.

Finally, on August 26th AWS sent out the following email, gently encouraging people to migrate to the new protocol for resource providers. I wouldn’t be surprised if this was followed by an eventual deprecation of v1.

Thanks

I’d like to thank Ben Kehoe for pointing this out to me in the first place. Without his keen eye and intuition for sensible security, I would have never noticed this. I’d like to thank Ian for being a great sounding board throughout the process. His tweets last year about Lake Formation issues made me feel confident that AWS would respond positively to my email. I’d also like to thank the AWS Security team who made me feel very comfortable both reporting my first issue and that it was being addressed in a thorough and impressively timely fashion. And a second thanks to Ian for motivating me to write my first blog posts in years!