Analytics/EventLogging/Data representations
This page gives an overview over the various representations of EventLogging data available on the WMF production cluster, and expectations around those representations.
In a nutshell: When consuming EventLogging data, only rely on the log
database available from m2
replicas, like analytics-store.eqiad.wmnet
. Other representations might not get updated, might not get fix-ups or may (on purpose) give you unvalidated data.
MySQL / MariaDB database on m2
This database is the best place to consume EventLogging data from.
Available as log
database on m2
replicas, such as analytics-store.eqiad.wmnet
.
Only validated events enter the database.
In case of bugs, this database is the only place that gets fixes like cleanup of historic data, or live fixes.
'all-events' JSON log files
Use this data source only to debug issues around ingestion into the m2 database.
Entries are JSON objects.
Only validated events get written.
In case of bugs, historic data does not get fixed.
Those files are available as:
stats1002:/a/eventlogging/archive/all-events.log-$DATE.gz
stats1003:/srv/eventlogging/archive/all-events.log-$DATE.gz
eventlog1001:/var/log/eventlogging/...
Raw client and server side log files
Use this data source only to debug issues around ingestion into the m2 database.
Entries are parameters to the event.gif's request. They are not decoded at all.
In case of bugs, historic data does not get fixed. Neither need hot-fixes reach those files.
Those files are available as:
stats1002:/a/eventlogging/archive/client-side-events.log-$DATE.gz
stats1002:/a/eventlogging/archive/server-side-events.log-$DATE.gz
stats1003:/srv/eventlogging/archive/client-side-events.log-$DATE.gz
stats1003:/srv/eventlogging/archive/server-side-events.log-$DATE.gz
eventlog1001:/var/log/eventlogging/...
Kafka
EventLogging now feeds the following topics in Kafka:
- eventlogging_valid_mixed: All valid events that come from all schemas.
- eventlogging_<schemaName>: All events from the specified schema. Note there is one of those topics for each schema.
MongoDB
EventLogging data is no longer fed into MongoDB since 2014-02-13.
The EventLogging data in MongoDB did not appear to get used.
ZMQ
ZMQ is available from eventlog1001.
In case of bugs, historic data cannot get fixed :-)
Data coming from the forwarders (ports 8421, 8422) is not validated and need not see hot-fixes.
Data coming from processors (port 8521, 8522) and multiplexer (port 8600) is validated.
This streams will cease working soon, Analytics is working to move all streams to Kafka.
Nginx pipeline
Since EventLogging data is typically coming in through https, and the EventLogging payload is encoded in the URL, EventLogging data is available in all the log targets from the SSL terminators.
In case of bugs, historic data does not get fixed. Neither need hot-fixes reach this pipeline.
Varnish pipeline
Since EventLogging data is extracted at the bits caches, and the EventLogging payload is encoded in the URL, EventLogging data is available in all log targets from the bits caches.
In case of bugs, historic data does not get fixed. Neither need hot-fixes reach this pipeline.