Premium Hosted Website & Server Monitoring Tool.

(Sysadmin / Devops blog)

visit our website

Blog   >   Apple   >   The exciting adventures of an alert notification

The exciting adventures of an alert notification

A year ago, the alert and notification systems for our server monitoring service, Server Density, were very simple. They were based on batch cron jobs which processed all the items in a database table every minute.

Since then, we have grown significantly and this would no longer work. We now have a very robust alert and notification backend which can easily be scaled just be adding new servers. It’s quite interesting from a technical standpoint, so this is the exciting story of the adventure an alert notification takes through our systems to your inbox.

1: Agent sends a postback

Our monitoring agent reports back every 60 seconds. The stats payload is sent over HTTP (or HTTPS) as a JSON object and is immediately inserted into the database to display the latest data on the dashboard, through the monitoring API and on our graphs. The data is also stored in a postbacks capped collection inside MongoDB. A separate process transfers these JSON payloads from the postbacks collection into our RabbitMQ alertdetection queue. The web server does not queue directly to RabbitMQ because the various PHP AMQP libraries we tried caused too much load on the web server.

2: Is there an alert condition?

We have multiple RabbitMQ consumers listening to the queue waiting for new items. One of these sees there’s a new alertdetection item and pulls it down. The message pulled from the queue contains the same raw JSON payload. The data is then parsed and compared to all configured alerts to see if there is an alert condition match. In this case, load is a bit too high and so triggers an alert.

3: An alert is triggered

Alerts can have a delay so we need to check the configuration to see if we should alert right away. In this case, we do so. The alert is set to be sent via e-mail and iPhone push notification so 2 queue items are entered into the iphonealerts and emailalerts RabbitMQ queues.

4: Notifications are sent

Different consumers listening to the notification queues pick up the new queue items entered. The iPhone alert payload is built and sent to the Apple Push Notification service whilst the e-mail message is also constructed by a separate process, and then the Postmark API is called. The e-mail data is sent to Postmark to be queued and delivered.

5: The problem hasn’t been fixed

The alert is configured to alert every 5 minutes until the alert condition disappears. Every time the stats postback comes in, we run the comparisons and check already triggered alerts to see if there’s anything we need to do. 5 minutes later we see that the alert is still open and new notifications are triggered.

6: All is well

Shortly afterwards, a postback comes in with the alert condition fixed – load is back down again. We mark the alert fixed and send notifications to tell the user that all is well again.

But what if we stop receiving data?

If your server stops reporting back then there’s no payload to trigger the alert process. As such, we run a separate set of consumers which constantly check to see if we have data from your server and if we’ve stopped receiving postbacks, we’ll trigger the no data alerts after the time period defined in the alert configuration.

It takes seconds

Alert log

From postback payload coming in to notifications being delivered only takes seconds because we can easily scale out the number of consumers running and processing queue items. Every action is logged and these are exposed in the alert log within the Server Density UI so you can see the times between events.

We are always working on improving this and one of the items on our roadmap is to combine the 2nd step so that the alert triggering bypasses the database and can get inserted into the queue immediately. Unfortunately the various PHP AMQP libraries available aren’t robust enough (connection pooling is the main thing missing here) to handle that many inserts so we’re investigating other queuing systems and methods of handling the high number of inserts.

  • Wouter

    Wouldn’t MongoDB itself be a good fit for a queue? Especially with the recently added database command FindAndModify?

    What were your considerations to choose for a standalone queue (RabbitMQ) instead of MongoDB?

    • http://www.serverdensity.com David Mytton

      Using RabbitMQ allows us to use the AMQ protocol to acknowledge / commit queue messages. So if the consumer takes an item then fails, it can be put back into the queue. Also, we’d have to be constantly querying the DB whether with AMQP we can just keep open connections.