13 January 2024
AWS re:Invent 2023 is now behind us and one of my favourite announcements was the introduction of HTTPS Endpoints to AWS Step Functions. In this post, I explain the feature, test its limits and also show off some other tricks for data manipulation within your state machines.
For the impatient, here is the final result.
HTTPS endpoints use Amazon EventBridge API destination connections to determine the authentication mechanism used. This service subsequently uses Secrets Manager to store the credentials that will be included to authenticate requests.
Then within the state machine, you reference this connection and specify your own URL and HTTP method. You can also optionally include your own query parameters, headers and/or request body.
There are some limitations though. Firstly, there is a 60 second timeout (hard limit) for the totality of the request. There are additional mandatory headers which Step Functions sets and you cannot override. These are:
Amazon|StepFunctions|HttpInvoke|us-east-1
, where us-east-1
is replaced by your region)bytes=0-262144
)Note that the request will still fail if the response exceeds 256kb even though the Range header is set. The presence of the header can also cause confusion as some servers will respond with a 206 Partial Content
status code even if all data is returned, so be aware of that.
The client IP address for the requests are different for each request and appear to lie within the standard EC2 public IP range published by AWS. There is no capability to use Elastic IPs or other networking constructs within your account.
Your state machine IAM role will need to include actions that allow access to the connection and its associated secret, as well as the states:InvokeHTTPEndpoint
action which has the optional conditionals of states:HTTPEndpoint
and states:HTTPMethod
to help scope down what endpoints and HTTP methods the state machine can call. I have included an example of a granular policy in the CloudFormation template at the end of this post.
In order to demonstrate the capabilities of the new feature, I’ve chosen to consume the Chess.com API. This is a free and anonymous API which retrieves metadata about games and players on their platform.
I will retrieve a list of all grandmasters, their country of origin, and aggregate these details by country.
Because this is a public endpoint, there is no need for an Authorization or similar header when accessing the endpoint, however EventBridge API destinations require the use of Basic Authorization, OAuth or API Key header. One creative way of avoiding sending an unnecessary header is to create your connection using the API Key type but set the header to one of the immutable headers, such as User-Agent
.
I created the step to gather the list of grandmasters by hitting the URL https://api.chess.com/pub/titled/GM
. Because I am only interested in the content of the response body, I apply an OutputPath filter of $.ResponseBody
. This provides me with the list of grandmaster usernames, but not their origin country or actual name. For that, we need to retrieve their details using additional individual HTTPS calls.
To do this efficiently, we use the Distributed Map type within Step Functions. To ensure we do not overload the Chess.com API, we limit the concurrency to 40. We also use a standard exponential backoff for the inner HTTPS call to allow for retries in the event of an occasional error.
This brings us to a state where we have an array of the individual grandmaster details.
Aggregating data (using map-reduce style methods) within a state machine is not a native function, however with some clever usage it is possible.
To do this, we first need to ensure all fields are present in the individual grandmaster details. Unfortunately, the name
field isn’t always present on these responses so to fix that we add the following ResultSelector
to the HTTPS endpoint step within the distributed map:
{
"output.$": "States.JsonMerge(States.StringToJson('{\"name\":\"Unknown Player\"}'), $.ResponseBody, false)"
}
This takes the resulting detail from the HTTP response, and performs a JSON merge with the static object we defined with a default name. If the name is not present, this field will be used.
Next, we format the resulting name in the way we would like it, as well as extract the 2-letter country code from the URL which looks like https://api.chess.com/pub/country/US
. To do this, we use a Pass state. The Parameters of the Pass state are as follows:
{
"displayName.$": "States.Format('{} ({})', $.output.name, $.output.username)",
"country.$": "States.ArrayGetItem(States.StringSplit($.output.country, '/'), 4)"
}
Note that the array index used is 4 and not 5. This is because empty segments (the one in between http:/
and the next /
) get discarded during the States.StringSplit
operation.
Using the output of the distributed map, we apply a new Pass state with the following parameters:
{
"original.$": "$",
"countries.$": "States.ArrayUnique($[*].country)",
"countriesCount.$": "States.ArrayLength(States.ArrayUnique($[*].country))",
"iterator": 0,
"output": {}
}
The original
key contains the distributed map output, the countries
key uses JSONPath and States.ArrayUnique
to select the unique list of countries, the countriesCount
key is the length of the countries, the iterator
key is initialised at 0, and the output
key is initialised with an empty map.
Then we enter a loop. The loop will continue whilst the iterator is less than the length of countries. We then use a Pass state to set the country
key to the country at the iterator
index of the countries
list. We then use one more Pass state increase the iterator with:
States.MathAdd($.iterator, 1)
We also set the output
key to the following (spaced for visibility):
States.JsonMerge(
States.StringToJson(
States.Format(
'\{"{}":{}\}',
$.country,
States.JsonToString(
$.original[?(@.country == $.country)]['displayName']
)
)
),
$.output
, false)
The above performs the following transformations:
displayName
strings within the original
key, filtering where the country
key is equal to the country within the original
key entries which we previously created using JSONPathcountry
and the value is the above string-encoded array of namesoutput
variableWe’re basically adding the country code as a key of the output
JSON object one at a time, then increasing the iterator to reference the next country in the list.
Once it has completed the loop, we are left with our final output.
I have provided a CloudFormation template that contains the full state machine and associated connection here. Feel free to deploy this into your own AWS account and try it yourself.
The HTTPS Endpoints feature is a very useful addition to the Step Functions service that I believe will have huge uptake. I personally want to do more with the Step Functions service as I believe more architectures can be more than serverless, they can be “functionless” (i.e. no Lambda functions). I would however like to see more useful intrinsics become available in the service. As you can see from this post, developers are often pushing the limits of what is available. Consider this my #awswishlist item.
A big thank you to Aidan Steele for helping review this post. If you liked what I’ve written, or want to hear more on this topic, reach out to me on 𝕏 at @iann0036.