Analytics/PageviewAPI
This page documents the Pageview API (v1), a public API developed and maintained by the Wikimedia Foundation that serves analytical data about article pageviews of Wikipedia and its sister projects. With it, you can get pageview trends on specific articles or projects; filter by agent type or access method, and choose different time ranges and granularities; you can also get the most viewed articles of a certain project and timespan. Have fun!
Quick Start
Technical Documentation: https://wikimedia.org/api/rest_v1/?doc (includes interactive examples).
Pageview counts by article
Get a pageview count timeseries of en.wikipedia
's article Albert Enstein
for the month of October 2015:
GET
http://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Albert_Einstein/daily/2015100100/2015103100
Get a pageview count timeseries of de.wikipedia
's article Johann Wolfgang von Goethe
from October 13th 2015 to October 27th 2015 counting only the pageviews generated by human users:
GET
http://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/de.wikipedia/all-access/user/Johann_Wolfgang_von_Goethe/daily/2015101300/2015102700
Get the number of pageviews of es.wiktionary
's entry hoy
generated via mobile web on November 1st, 2015:
GET
http://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/es.wiktionary/mobile-web/all-agents/hoy/daily/2015110100/2015110100
Slice and dice pageview counts
Get a daily pageview count timeseries of all projects for the month of October 2015:
GET
http://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/all-projects/all-access/all-agents/daily/2015100100/2015103100
Get an hourly timeseries of all project's pageviews belonging to human users visiting the mobile app on October 1st, 2015:
GET
http://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/all-projects/mobile-app/user/hourly/2015100100/2015100123
Get the number of pageviews of ca.wikipedia
generated by spiders on mobile web on November 1st, 2015:
GET
http://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/ca.wikipedia/mobile-web/spider/daily/2015110100/2015110100
Most viewed articles
Get the top 1000 most visited articles from en.wikipedia
for October 10th, 2015:
GET
http://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia/all-access/2015/10/10
Get the top 1000 articles from pt.wikipedia
visited via the mobile app on November 1st, 2015:
GET
http://wikimedia.org/api/rest_v1/metrics/pageviews/top/pt.wikipedia/mobile-app/2015/11/01
The API
What is it?
The Pageview API is a collection of REST endpoints that serve analytical data about pageviews in Wikimedia's projects. It's developed and maintained by WMF's Analytics and Services teams, and is implemented using Analytics' Hadoop cluster and RESTBase. This API is meant to be used by anyone interested in pageview statistics on Wikimedia wikis: Foundation, communities, and the rest of the world.
How to access
The API is accessible via https
at wikimedia.org/api/rest_v1
. As it is public, it doesn't need authentication and it supports CORS. The urls are structured like this:
/metrics/pageviews/{endpoint}/{parameter 1}/{parameter 2}/.../{parameter N}
Reference
Please, see AQS's RESTBase docs for a complete and interactive technical reference on Pageview API endpoints.
Updates and backfilling
The data is loaded at the end of the timespan in question. So data for 2015-12-01
will be loaded on 2015-12-02 00:00:00 UTC
; Data for 2015-11-10 18:00:00 UTC
will be loaded on 2015-11-10 19:00:00 UTC
; and so on. The loading can take up to 5 hours depending on the endpoint and timespan. See the #Gotchas for more details.
The API serves data starting at 2015-09-01. There are plans to backfill older data, but the Analytics team doesn't know when that might happen yet.
Gotchas
- 404 means zero or not loaded yet
- At some point you may get a
404 not found
response from the API. Sometimes, this means that there are0
pageviews for the given project, timespan and filters you specified in the query. Another case this may happen is when your client requests the data for today and the correspondent data has not yet been loaded into the API's database yet (see #Updates_and_backfilling). The problem is that the API, because of implementation reasons, can not distinguish between actual zeros, or data that hasn't been loaded yet in the database. For now, it's up to the user to control that. - 404s within timeseries
- Because of the same caveat (404 means zero or not loaded yet), if you request a timeseries from the API, you might get no data for the dates that have
0
pageviews. This may create holes in the timeseries and break charting libraries. For now, it's up to the user to control that and fill in the missing zeros.
Sample app
Here is a simple web application sample that shows how to access the Analytics Query Service via JavaScript.
Clients
As of December 2015 the API is pretty new but there are a few clients already available:
Changelog
- 2015-11-TODO
- Initial release. Featuring 3 endpoints for pageview metrics:
per-article
,aggregate
andtop
. Some endpoints do not support all granularities yet.