Fundraising Monitoring
This page is intended to document the monitoring infrastructure that exists for the fundraising as well as keep track of desired monitoring functionality.
Existing monitoring infrastructure
Log monitoring
Minfraud log
We currently monitor the 'minfraud' log (which gets aggregated form the payments cluster to Loudon).
- Ganglia (http://ganglia.wikimedia.org/?r=hour&c=Miscellaneous&h=loudon.wikimedia.org)
- Average risk score (as reported by MinFraud) (Ganglia)
- Number of successful transactions (Ganglia)
- Number of failed transactions (Ganglia)
Monitoring wishlist
See RT tickets #405
- Hudson
- Nagios check for alive-ness
- Nagios check for failed builds
- Note: some scripts run by Hudson need to be modified to throw a non-successful exit status when they don't complete properly (eg send/receive mail scripts for civimail)
- Ganglia graphs for build success/failiure
- Nagios check for too many files in build folders (if the limit of 63999 gets hit, builds will fail)
- ActiveMQ
- Nagios check for queues filling up too fast
- Ganglia graphs for message volume in various queues
- Service communication times
- Ganglia graphs for communication timings with PayPal and MaxMind
- Nagios checks for timeouts/unacceptably high communication times
- 3rd party service accessibility from payments cluster
- Nagios check for communications access to MaxMind/PayPal