Outage monitoring / management in collaboration?

In SaaS, every outage is a learning opportunity, and each outage needs to be communicated to dev team so they can debug, rectify and learn from that. Currently, we are facing a problem that sometimes our app experience a downtime for few minutes, and while dev team does handle it well but there’s no record keeping. I wanted to get feedback from other teams if they maintain a log of all outages with root cause analysis, and what kind of tech. stack they use for record-keeping in collaboration.
Here are some of our tech. stack for the problem:

CloudWatch
Alarms
Uptime Robot

Thanks for reading!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.