Fundraising Monitoring

From Wikitech

This page is intended to document the monitoring infrastructure that exists for the fundraising as well as keep track of desired monitoring functionality.

Existing monitoring infrastructure

Log monitoring

Minfraud log

We currently monitor the 'minfraud' log (which gets aggregated form the payments cluster to Loudon).

Monitoring wishlist

See RT tickets #405

  • Hudson
    • Nagios check for alive-ness
    • Nagios check for failed builds
      • Note: some scripts run by Hudson need to be modified to throw a non-successful exit status when they don't complete properly (eg send/receive mail scripts for civimail)
    • Ganglia graphs for build success/failiure
    • Nagios check for too many files in build folders (if the limit of 63999 gets hit, builds will fail)
  • ActiveMQ
    • Nagios check for queues filling up too fast
    • Ganglia graphs for message volume in various queues
  • Service communication times
    • Ganglia graphs for communication timings with PayPal and MaxMind
    • Nagios checks for timeouts/unacceptably high communication times
  • 3rd party service accessibility from payments cluster
    • Nagios check for communications access to MaxMind/PayPal