Swift/Logging and Metrics

From Wikitech
This project is active.
Swift
Due: 2012-04-01

Trending graphs

Ganglia graphs a number of metrics (count, average duration, 90th percentile duration, maximum duration) for HTTP queries against swift. These are broken out by HTTP method (GET, HEAD, PUT, etc.) as well as the status code of the query (200, 204, 404, etc.)

Ganglia graphs for:

Monitoring

Nagios currently attempts to connect to each Swift proxy server on port 80 to verify that the proxy is running. It is not currently checking any other swift processes (but should).

Log lines

proxy examples

A few example log lines:

  • The object doesn't exist, 404 handler falls through and finds the object, successfully returned.
    • note: PUT before GET (the PUT is happening entirely within the GET, so finishes first and logging writes out at the end of the query, not the beginning)
    • the GET returns 404, even though the object was successfully returned (contrast to the third log entry here - completely missing object)
    • the HTTP response from the client's perspective was 200
 Jan 27 01:39:08 copper proxy-server 127.0.0.1 127.0.0.1 27/Jan/2012/01/39/08 PUT /v1/AUTH_ade95207-9bcc-
 4bc9-bb67-06b417895b49/wikipedia-commons-local-thumb.97/9/97/Subaru_XV.jpg/800px-Subaru_XV.jpg HTTP/1.0 201 - - 
 test%3Atester%2CAUTH_tka3106db61d6d47d9801162a4d9c3d174 75403 - - - - 0.1305
 Jan 27 01:39:08 copper proxy-server 208.80.152.165 208.80.152.165 27/Jan/2012/01/39/08 GET /v1/AUTH_ade95207-9bcc-
 4bc9-bb67-06b417895b49/wikipedia-commons-local-thumb.97/9/97/Subaru_XV.jpg/800px-Subaru_XV.jpg HTTP/1.0 404 - 
 curl/7.19.7%20%28x86_64-pc-linux-gnu%29%20libcurl/7.19.7%20OpenSSL/0.9.8k%20zlib/1.2.3.3%20libidn/1.15 - - - - - - 
 0.3171
  • the object exists - just return it
    • note the increase in speed (0.1088 vs. 0.3171)
    • note 200 HTTP result
 Jan 27 01:40:18 copper proxy-server 208.80.152.165 208.80.152.165 27/Jan/2012/01/40/18 GET /v1/AUTH_ade95207-9bcc-
 4bc9-bb67-06b417895b49/wikipedia-commons-local-thumb.97/9/97/Subaru_XV.jpg/800px-Subaru_XV.jpg HTTP/1.0 200 - 
 curl/7.19.7%20%28x86_64-pc-linux-gnu%29%20libcurl/7.19.7%20OpenSSL/0.9.8k%20zlib/1.2.3.3%20libidn/1.15 - - 75403 - - - 
 0.1088


  • the object doesn't exist
    • note no PUT
    • note GET returns 404, exactly the same as successfully returned object (first log entry above)
    • the HTTP response from the client's perspective was 404
 Jan 27 01:42:24 copper proxy-server 208.80.152.165 208.80.152.165 27/Jan/2012/01/42/24 GET /v1/AUTH_ade95207-9bcc-
 4bc9-bb67-06b417895b49/wikipedia-commons-local-thumb.97/9/97/Subaru_XaoeuaoeuaeoV.jpg/800px-Subaaoeuaoeuaeoru_XV.jpg 
 HTTP/1.0 404 - curl/7.19.7%20%28x86_64-pc-linux-gnu%29%20libcurl/7.19.7%20OpenSSL/0.9.8k%20zlib/1.2.3.3%20libidn/1.15 
 - - - - - - 0.2171

full format description for proxy logs

The format is in /usr/lib/pymodules/python2.6/swift/proxy/server.py, lines 1697-1713:

1697         self.access_logger.info(' '.join(quote(str(x)) for x in (
1698                 client or '-',
1699                 req.remote_addr or '-',
1700                 time.strftime('%d/%b/%Y/%H/%M/%S', time.gmtime()),
1701                 req.method,
1702                 the_request,
1703                 req.environ['SERVER_PROTOCOL'],
1704                 status_int,
1705                 req.referer or '-',
1706                 req.user_agent or '-',
1707                 req.headers.get('x-auth-token', '-'),
1708                 getattr(req, 'bytes_transferred', 0) or '-',
1709                 getattr(response, 'bytes_transferred', 0) or '-',
1710                 req.headers.get('etag', '-'),
1711                 req.headers.get('x-trans-id', '-'),
1712                 logged_headers or '-',
1713                 trans_time,
1714             )))

Additional monitoring we need

Please add suggestions here of additional monitoring that would be useful.