So close to the launch, help me solve this logging issue

OK, So we’re on staging and everything runs perfectly.
ECR (fargate) <—> RDS <—> CloudWatch <—> ELB

For now, we’ve got 2 task count for our web service in Fargate.

Now imagine a scenario, where a container produces 5xx error. Since logging is already configured it’ll go to CloudWatch log group ‘/ecs/my-service’… so far so good!

But how do I know which container caused that error, since logs are aggregated by log streams.

How would you check out that, say, if 5xx happened on 1:31:20 PM then this is the log stream I should look into.

I’m pretty sure I’m missing something obvious here - any help will be greatly appreciated!

Hi @abhi

I’m happy to hear that your launch is upcoming!

Each log stream in the log group for your service belongs to one task which maps to a single container (the id at the end of the log stream is the task id). When you search trough all logs, you always get the log stream information and cloud map back to the individual task. Does this help?

Thanks for the quick reply, Michael. Always appreciated!

To expand the question further, how do I know which task ID resulted in 5xx error? Imagine I have 8 tasks running in parallel, and one of them emitted a 5xx. How can I pinpoint that task?

To make it even more complicated, imagine the task exited a few hours after emitting 5xx, and a new task replaced it. So now I have 8 + 1 (in the real world it might be even more complicated) tasks to look into.

Does that make sense? or am I missing something critical in my setup?

I wonder why you are so keen to figure out the task id? :slight_smile:
I’m usually more interested in thing such as stack traces to point me to the issue in the code.

Actually yes! I don’t care about task id, only stack traces. I thought task id was an intermediate step to get to that.

Anyway, I just found a solution, I’ll share here in case anyone needs.

In such a case, instead of going through each log group, head to ‘Log insights’ and filter by the concerning log group.

You can customize queries like these:

fields @timestamp, @log, @logStream, @message
| sort @timestamp desc
| limit 50

This would look into all log streams, so for my use case, this will solve it.

Michael, is there a better way to get to stack traces? The only downside of this method was that it was a bit slow for real-time watching, but satisfactory for older event lookup.

Your approach looks great! I’m happy to hear that you can find your stack traces :slight_smile:

I personally love to use Sentry for all my application monitoring and error tracking needs.

If you are willing to integrate some 3rd party tool, this might be worth looking into.

When using the old cloudwatch logs UI there’s the “Search log group” button you can use to search in all streams withing the group.
Otherwise you can use the new CloudWatch Insights feature that you already discovered :).

Finally, CloudTrail is sometimes useful for detailed investigation.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.