Tips for Using AWS X-Ray Traces for Long-Running Requests

jslattery · ‎May 30, 2019

Tips for Using AWS X-Ray Traces for Long-Running Requests

AWS X-Ray is a tool for understanding how requests flow through your microservices and identifying any issues or performance bottlenecks. A single trace can show how one request into the system flows to all of the backend services.

AWS advertises that a single trace can store up to 500 KB, which is enough to illustrate your trace in fine detail — hundreds or thousands of segments.

However, if your trace proceeds slowly, you might discover that your trace will be closed prematurely and your remaining updates will be rejected.

Why is this?

A trace can actually be open for seven days. However, there are undocumented “dynamic” limits that kick in, which may mean that you have far less than the advertised 500 KB of data to work with in your trace.

Here’s the result of a simple experiment:

In my test, I simply created a trace and sent subsegments at a steady rate. I counted the total bytes of JSON data sent to the AWS X-Ray daemon before the API returned an InvalidSegment error code, refusing to add any more segments to the trace.

When sending the data quickly, I could send in more than the advertised limit of 500 KB. But when sending data slowly, that limit quickly plummeted. For any trace lasting more than six minutes, I could only send 14 KB before the trace was closed! That was sobering to see the limit reduced by 97%.

I spoke with the AWS X-Ray team and they acknowledged the lack of documentation on these dynamic limits. They expressed an interest in eventually improving the documentation and perhaps even relaxing these limits.

Meanwhile, their general advice is to send less data — fewer segments, subsegments, annotations, or metadata. And if needed, split up a single large trace into several different traces.