Logs

From Wikitech
Jump to: navigation, search
Wikimedia infrastructure

Data centres and PoPs

Networking

HTTP Caching

Wiki

Media

Logs

This page is about server log files. For IRC channel logs, see e.g. http://wm-bot.wmflabs.org/

Logs of several sorts are generated across the cluster and collected in a single location replicated on some machines. You can explore most logs through the Kibana front-end at https://logstash.wikimedia.org/.

fluorine:/a/mw-log/

All cluster-wide logs are aggregated here (configured through $wmfUdp2logDest, see also wmgMonologChannels). There are dozens log files, which amount to around 15 GB compressed per day as of April 2015. Some are not sent to logstash (settings) and some are sampled; log archives are stored for a variable amount of time, up to 180 days.

Source: Cluster-wide

  • exception.log: Exceptions exposed to users in simplified form include a hexadecimal fingerprint (e.g. in case of "[1903eff7] 2013-06-18 02:39:00: Fatal exception of type MWException", grep the exception log for "1903eff7" to find the complete stack trace). See bug 38095 for background.
  • antispoof.log: Collision check passes and failures from the AntiSpoof extension. This checks for strings that look the same using different Unicode characters (such as spoofed usernames).
  • apache2.log: aggregated Apache error logs, see #syslog
  • api.log: API requests (including URLs and some agent info, like username and IP address). Sampled 1:1000 from 2014-12-15 to some time in 2015, flushed every 30 days as of November 2015.
  • badpass.log: Failed login attempts to wikis.
  • captcha.log: Captcha attempts (both failed and successful attempts).
  • centralauth.log (2013-05-09–), centralauth-bug39996.log, centralauthrename.log (2014-07-14–): (temporary) debug logs for bugzilla:35707, bugzilla:39996, bugzilla:67875. In theory, rare events; can include username and page visited/request made.
  • CirrusSearchRequests.log: Logs every request made between mediawiki and the elasticsearch cluster
  • dberror.log: Database errors (invalid queries, missing tables, dealocks, lock-wait timeouts, disconnections).
  • dbperformance.log: DB transactions that hold DB locks open for a long time while running slow functions.
  • exec.log: Errors from shell commands run by MediaWiki via wfShellEx (logs the command and error string).
  • external.log: ExternalStore blob fetch failures (see External storage)
  • fatal.log: Fatal PHP errors during web requests, responded to with a Wikimedia Error page. (aggregated graph). With HHVM, they are in general under "hhvm" logs, in logstash.[1]
  • filebackend-ops.log: FileBackendStore operation failures (i.e. backend errors that happen during user file uploads).
  • generated-pp-node-count.log: High node count parses that took place (typically for slow parses of very large and complex articles).
  • gettingstarted.log: ?
  • imagemove.log: Page renames in the File namespace that take place (both failed and successful renames).
  • memcached-serious.log: Memcached access failures (effects caching and storage of ephemeral data, like rate limiting counters and advisory locks).
  • poolcounter.log: PoolCounter failures (connection problems, excess queue size, wait timeouts).
  • redis.log: Redis query and connection failures (might involve sessions, job queues, and some other assorted features).
  • redis-jobqueue.log: Redis query and connection failures from JobQueueRedis.
  • resourceloader.log: Exceptions related to ResourceLoader.
  • runJobs.log: Tracks job queue activity and including errors (both failed and successful runs).
    • Can be used to produce stats on jobs run on the various wikis, e.g. with Tim's perl ~/job-stats.pl runJobs.log.
  • slow-parse.log (since May 2012; 6 months archive)
  • spam.log: SimpleAntiSpam honeypot hits from bots (attempted user actions are discarded).
  • swift-backend.log: Errors in the SwiftFileBackend class (timeouts and HTTP 500 type errors for file and listing reads/writes).
  • temp-debug.log: Used for temporary logging of misc things during live debug sessions.
  • test2wiki.log: Full wfDebug log of test2.wikipedia.org.
  • testwiki.log: Full wfDebug log of test.wikipedia.org.
  • thumbnail.log: Failed thumbnail transformations (e.g. missing file, conversion failure, 0-byte output files).
  • xff.log: User agent and IP data for POST requests.
  • zero.log (since May 2013)
  • archive: Directory holding historical versions of the above logs (one compressed file per log source per day).

syslog

The syslog for all application servers can be found on apache2.log on fluorine or /srv/syslog/apache.log on lithium. This includes things like segmentation faults.

5xx errors

"Wikimedia error" errors (503) are usually[vague] in varnishlog, which is available on single machines and aggregated in TODO 504, 500 and other 5xx are TODO

tin:/var/log/l10updatelog/l10update.log

Source: tin

  • l10update.log: Error log for LocalisationUpdate runs.

vanadium:/var/log/eventlogging/

  • various: Logs of EventLogging entries. Potentially useful, in case their transformation into SQL and MongoDB records fails.

Deprecated properties and features in clients

Harvesting from clients in production is certainly possible: like translatewiki.net, Wikimedia does it. Deprecated properties[2] and features[3] use mw.track[4] to emit an event.

And the WikimediaEvents extension forwards these to EventLogging (at a sampled rate of course).[5] Which are then available privately in the analytics database, and made available anonymised in Graphite[6].

When something, like jQuery Migrate, doesn't have nice keys, you'll have to do with the full descriptive sentence of the warning (as done at TWN).

Request logs

Logs of any kind of request, e.g. viewing a wiki page, editing, using the API, loading an image.

  • Analytics/Data/Webrequest: "wmf.webrequest" is a name of one unsampled requests archive in Hive. We started deleting older wmf.webrequest data in March 2015. We currently keep 62 days.

erbium:/a/log/webrequest/

The squid (now varnish) request logs; see Squid logging#Log files.

The 1:1000 sampled logs are used for about 15 monthly and quarterly reports and day to day operations (source).

Beta cluster

The mw:Beta cluster in labs has a similar logging configuration to production. Various server logs are written to the remote syslog server deployment-fluorine.eqiad.wmflabs in /srv/mw-log.

Apache access logs are written to /var/log/apache2/other_vhosts_access.log on each beta cluster yhost.

Dead

Lucene (search)

Each host logs at /a/search/log/log (now less noisy), see Search#Trouble on how to identify which host serves what pool etc.

fenari:/home/wikipedia/syslog

Source: All apaches

  • apache.log: Error log of all apaches (includes sterr of PHP, so PHP Notices, PHP Warnings etc.)
    • Use fatalmonitor to aggregate this into a (tailing) report
    • This has been deprecated in favor of fluorine:/a/mw-log/apache2.log and logstash.

fenari:/var/log/

Source: Machine-specific logs