Nova Resource:Dumps
Resource Type | Resource Type::project |
---|---|
Project Name | [[Project::dumps]] |
Monitoring | nagf |
Admins | {{#arraymap:User:hydriz,User:novaadmin|,|x|* Member::x|\n}} |
Members | {{#arraymap:User:hydriz,User:novaadmin|,|x|* Member::x|\n}} |
Documentation
{{#formlink:form=Nova Project Documentation |target=Nova Resource:Dumps/Documentation |link text=Edit documentation }}
Dumps
Description
Description::This is a project that archives the public datasets generated by Wikimedia.
Purpose
Purpose::Archive the public Wikimedia datasets.
Anticipated time span
Anticipated Time Span::indefinite
Project status
Project Status::currently running
Contact address
Contact Address::https://groups.google.com/forum/#!forum/wikiteam-discuss
Willing to take contributors or not
not willing
Subject area narrow or broad
broad
Project information
Introduction
This project was created to provide a dedicated space just for transferring Wikimedia dump files to the Internet Archive. These dumps were created as a possible backup in the case of cluster-wide hardware failure, and its also often used by researchers/bots. Sometimes, these files are generated for forking of any Wikimedia project, when lots of people of a project has different aims from the original Wikimedia goal.
More information about the archiving process is available at Dumps/Archive.org
Data currently being archived
Here are some information and links regarding the data that this project is archiving:
- Wikimedia main database dumps
- Wikimedia incremental dumps
- Wikidata JSON dumps
- Wikimania videos
- OpenStreetMap datasets
Servers
- dumps-N (where N is an integer): Main archiving servers
- dumps-stats: Wikimedia data manipulation, including dumps above and other stuff of relevance for Wikimedia research.
Storage:
- Before the eqiad migration we used to have a 900 GB quota (hardly sufficient for comfortable work).
- Currently all heavy operations are conducted on /data/scratch/. We currently keep to a soft limit of using only 3 TB of space, but such disk usage is always temporary and will be deleted once the data is pushed to the Archive.
- Everything is retained locally only for very short periods, just the time needed for packing on archive.org.
Links
- Wikimedia dumps server
- OpenStreetMap datasets
- Wikimedia Downloads collection on the Internet Archive
- OpenStreetMap data collection
Server admin log
2015-09-15
- 20:40 andrewbogott: resuming dumps-1
- 20:36 andrewbogott: suspending dumps-1 briefly to assess its performance impact
October 3
- 20:00 mutante: adding new member Nemo_bis
September 29
- 11:10 Hydriz: Reboot dumps-bot1, out of memory space...
September 2
- 11:18 Hydriz: Migrated bot instances over here from Incubator project.
August 3
- 11:11 Hydriz: Having persistent problems with accessing the gluster storage, grrr...... (more)
Instances for this project
{{#ask:Resource Type::instance[[Project::dumps]] |?Instance Type |?Image Id |?Public IP |?Number of CPUs |?RAM Size |?Amount of Storage |?Modification date |mainlabel=FQDN |format=broadtable |limit=20 |order=DESC |sort=Modification date |offset=0 }}