One Cloud Please

HTTPS Endpoints and more tricks with AWS Step Functions

13 January 2024

AWS re:Invent 2023 is now behind us and one of my favourite announcements was the introduction of HTTPS Endpoints to AWS Step Functions. In this post, I explain the feature, test its limits and also show off some other tricks for data manipulation within your state machines.

For the impatient, here is the final result.

HTTPS Endpoints feature

HTTPS endpoints use Amazon EventBridge API destination connections to determine the authentication mechanism used. This service subsequently uses Secrets Manager to store the credentials that will be included to authenticate requests.

Then within the state machine, you reference this connection and specify your own URL and HTTP method. You can also optionally include your own query parameters, headers and/or request body.

There are some limitations though. Firstly, there is a 60 second timeout (hard limit) for the totality of the request. There are additional mandatory headers which Step Functions sets and you cannot override. These are:

  • Host (value: hostname of the URL)
  • User-Agent (value: Amazon|StepFunctions|HttpInvoke|us-east-1, where us-east-1 is replaced by your region)
  • Range (value: bytes=0-262144)

Note that the request will still fail if the response exceeds 256kb even though the Range header is set. The presence of the header can also cause confusion as some servers will respond with a 206 Partial Content status code even if all data is returned, so be aware of that.

The client IP address for the requests are different for each request and appear to lie within the standard EC2 public IP range published by AWS. There is no capability to use Elastic IPs or other networking constructs within your account.

Your state machine IAM role will need to include actions that allow access to the connection and its associated secret, as well as the states:InvokeHTTPEndpoint action which has the optional conditionals of states:HTTPEndpoint and states:HTTPMethod to help scope down what endpoints and HTTP methods the state machine can call. I have included an example of a granular policy in the CloudFormation template at the end of this post.

Gathering the data

In order to demonstrate the capabilities of the new feature, I’ve chosen to consume the Chess.com API. This is a free and anonymous API which retrieves metadata about games and players on their platform.

I will retrieve a list of all grandmasters, their country of origin, and aggregate these details by country.

Because this is a public endpoint, there is no need for an Authorization or similar header when accessing the endpoint, however EventBridge API destinations require the use of Basic Authorization, OAuth or API Key header. One creative way of avoiding sending an unnecessary header is to create your connection using the API Key type but set the header to one of the immutable headers, such as User-Agent.

I created the step to gather the list of grandmasters by hitting the URL https://api.chess.com/pub/titled/GM. Because I am only interested in the content of the response body, I apply an OutputPath filter of $.ResponseBody. This provides me with the list of grandmaster usernames, but not their origin country or actual name. For that, we need to retrieve their details using additional individual HTTPS calls.

To do this efficiently, we use the Distributed Map type within Step Functions. To ensure we do not overload the Chess.com API, we limit the concurrency to 40. We also use a standard exponential backoff for the inner HTTPS call to allow for retries in the event of an occasional error.

This brings us to a state where we have an array of the individual grandmaster details.

Aggregating the data

Aggregating data (using map-reduce style methods) within a state machine is not a native function, however with some clever usage it is possible.

To do this, we first need to ensure all fields are present in the individual grandmaster details. Unfortunately, the name field isn’t always present on these responses so to fix that we add the following ResultSelector to the HTTPS endpoint step within the distributed map:

{
    "output.$": "States.JsonMerge(States.StringToJson('{\"name\":\"Unknown Player\"}'), $.ResponseBody, false)"
}

This takes the resulting detail from the HTTP response, and performs a JSON merge with the static object we defined with a default name. If the name is not present, this field will be used.

Next, we format the resulting name in the way we would like it, as well as extract the 2-letter country code from the URL which looks like https://api.chess.com/pub/country/US. To do this, we use a Pass state. The Parameters of the Pass state are as follows:

{
    "displayName.$": "States.Format('{} ({})', $.output.name, $.output.username)",
    "country.$": "States.ArrayGetItem(States.StringSplit($.output.country, '/'), 4)"
}

Note that the array index used is 4 and not 5. This is because empty segments (the one in between http:/ and the next /) get discarded during the States.StringSplit operation.

Using the output of the distributed map, we apply a new Pass state with the following parameters:

{
    "original.$": "$",
    "countries.$": "States.ArrayUnique($[*].country)",
    "countriesCount.$": "States.ArrayLength(States.ArrayUnique($[*].country))",
    "iterator": 0,
    "output": {}
}

The original key contains the distributed map output, the countries key uses JSONPath and States.ArrayUnique to select the unique list of countries, the countriesCount key is the length of the countries, the iterator key is initialised at 0, and the output key is initialised with an empty map.

Then we enter a loop. The loop will continue whilst the iterator is less than the length of countries. We then use a Pass state to set the country key to the country at the iterator index of the countries list. We then use one more Pass state increase the iterator with:

States.MathAdd($.iterator, 1)

We also set the output key to the following (spaced for visibility):

States.JsonMerge(
    States.StringToJson(
        States.Format(
            '\{"{}":{}\}',
            $.country,
            States.JsonToString(
                $.original[?(@.country == $.country)]['displayName']
            )
        )
    ),
    $.output
, false)

The above performs the following transformations:

  1. Retrieve the list of all displayName strings within the original key, filtering where the country key is equal to the country within the original key entries which we previously created using JSONPath
  2. Convert that list to a JSON string
  3. Create a new JSON-compatible string where the key is the country and the value is the above string-encoded array of names
  4. Convert the string to a JSON object
  5. Merge that object with the output variable

We’re basically adding the country code as a key of the output JSON object one at a time, then increasing the iterator to reference the next country in the list.

Once it has completed the loop, we are left with our final output.

Finishing up

I have provided a CloudFormation template that contains the full state machine and associated connection here. Feel free to deploy this into your own AWS account and try it yourself.

The HTTPS Endpoints feature is a very useful addition to the Step Functions service that I believe will have huge uptake. I personally want to do more with the Step Functions service as I believe more architectures can be more than serverless, they can be “functionless” (i.e. no Lambda functions). I would however like to see more useful intrinsics become available in the service. As you can see from this post, developers are often pushing the limits of what is available. Consider this my #awswishlist item.

A big thank you to Aidan Steele for helping review this post. If you liked what I’ve written, or want to hear more on this topic, reach out to me on 𝕏 at @iann0036.