Nova Resource Talk:Mwoffliner
This documentation describe how to setup a virtual machine (VM) able to create ZIM files from Wikimedia projects.
Virtual machine creation
- Create a new VM with max CPU/Storage in mwoffliner" project
- on wikitech, ‘configure instance’ and then select the labs::lvm::srv class. This will create /srv (if nothing happens, then "sudo puppet agent -tv")
- Ask admin to configure a public IP
- Create a new hostname "mwofflinerX.wmflabs.org" here
Optimize filesystem
Then reformat the /src filesystem, this is necessary to get more inodes:
sudo umount /dev/mapper/vd-second--local--disk
sudo mkfs.ext4 -T news /dev/mapper/vd-second--local--disk
sudo mount /srv
sudo rm -rf /srv/lost+found/
As root, put the following code in the crontab:
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
# m h dom mon dow command
@reboot umount /srv ; mount -o rw,errors=remount-ro,noatime,barrier=0,data=writeback,nobh /srv
Ubuntu basic setup
Make a dist-upgrade:
sudo apt-get update
sudo apt-get dist-upgrade
Install mandatory packages:
sudo apt-get install nginx git htop screen rsync unzip g++ p7zip-full libzim-dev automake libtool pkg-config libmagic-dev \
redis-server emacs23 imagemagick advancecomp gifsicle pngquant nscd liblog-log4perl-perl zip cmake uuid-dev \
texinfo xapian-tools jpegoptim zlib1g-dev iotop
Increase standards system limits
Edit /etc/security/limits.conf
* - nofile unlimited
* - stack unlimited
Edit /etc/sysctl.conf and add at the end of the file
vm.overcommit_memory = 1
Checkout Kiwix code
Clone these online code repositories:
sudo git clone git://git.code.sf.net/p/kiwix/kiwix /srv/kiwix-kiwix
sudo git clone git://git.code.sf.net/p/kiwix/tools /srv/kiwix-tools
sudo git clone git://git.code.sf.net/p/kiwix/maintenance /srv/kiwix-maintenance
sudo git clone git://git.code.sf.net/p/kiwix/other /srv/kiwix-other
sudo git clone https://gerrit.wikimedia.org/r/p/openzim.git
Install and compile zimwriterfs
zimwriterfs is called my mwoffliner.js and transform a HTML directory in a ZIM file:
cd /srv/openzim/zimwriterfs/
./autogen.sh
./configure
make
sudo make install
Configure redis
Edit /etc/redis/redis.conf
unixsocket /dev/shm/redis.sock
unixsocketperm 777
save ""
appendfsync no
Compile & install Node.js
mwoffliner.sh needs Node.js 0.12:
cd /tmp
wget http://nodejs.org/dist/v0.12.2/node-v0.12.2.tar.gz
tar -xvf node-v0.12.2.tar.gz
cd node-v0.12.2/
./configure
make
sudo make install
cd /tmp
rm -rf node-v0.12.2*
Install mwoffliner dependences
mwoffliner is responsible for dump the HTML from Mediawiki/Parsoid API:
cd /srv/kiwix-other/mwoffliner/
sudo npm install -g node-gyp
export LINK=g++
npm install
Create download.kiwix.org mirror
This mirror is needed to avoid too much network traffic.
Create a directory (this has to be done only on "mwoffliner1"):
sudo mkdir -p /data/scratch/mwoffliner/download.kiwix.org/
Make a first sync (this has to be done only on "mwoffliner1"):
rsync -vzrlptD --delete download.kiwix.org::download.kiwix.org/dev download.kiwix.org::download.kiwix.org/bin download.kiwix.org::download.kiwix.org/src /data/scratch/mwoffliner/download.kiwix.org
Configure cron (this has to be done only on "mwoffliner1"):
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
# m h dom mon dow command
*/5 * * * * flock -n /tmp/download.kiwix.org.rsync.lock -c "rsync -vzrlptD --delete download.kiwix.org::download.kiwix.org/dev download.kiwix.org::download.kiwix.org/bin download.kiwix.org::download.kiwix.org/src /data/scratch/mwoffliner/download.kiwix.org"
Create the download.kiwix.org virtualhost configuration at /etc/nginx/sites-available/download_dev_mirror:
server {
listen 127.0.0.1:80;
server_name download_dev_mirror;
location / {
alias /data/scratch/mwoffliner/download.kiwix.org/;
autoindex on;
}
}
... and enable it:
cd /etc/nginx/sites-enabled/
sudo ln -s ../sites-available/download_dev_mirror .
Finally, update the /etc/hosts by adding the following line:
127.0.0.1 download_dev_mirror
Make a link (necessary for build_portable_package.sh):
sudo ln -s /data/scratch/mwoffliner/download.kiwix.org/ /srv/
Create upload directories
Prepare upload directories like this:
sudo mkdir -p /srv/upload/zim/
sudo mkdir -p /srv/upload/portable/
sudo mkdir -p /srv/upload/zim2index/wikipedia/
sudo mkdir -p /srv/upload/zim2index/wiktionary/
sudo mkdir -p /srv/upload/zim2index/wikiquote/
sudo mkdir -p /srv/upload/zim2index/wikibooks/
sudo mkdir -p /srv/upload/zim2index/wikisource/
sudo mkdir -p /srv/upload/zim2index/wikinews/
sudo mkdir -p /srv/upload/zim2index/wikiversity/
sudo mkdir -p /srv/upload/zim2index/wikispecies/
sudo mkdir -p /srv/upload/zim2index/wikivoyage/
Configure rsync
Configure rsync deamon by putting following content to /etc/rsyncd.conf:
log file = /var/log/rsync.log
max connections = 15
timeout = 100
[mwofflinerX.wmflabs.org]
path = /srv/upload
comment = kiwix upload directory
list = no
uid = kelson
gid = wikidev
read only = false
hosts allow = 62.210.143.55
Configure rsync to work as a daemon, activate it in /etc/default/rsync:
RSYNC_ENABLE=true
Compile & install kiwix-install
kiwix-install is needed to prepare portable packages:
cd srv/kiwix-kiwix/
./autogen.sh
./configure --enable-compileall --enable-staticbins --disable-android
cd src/dependencies/
make
cd /srv/kiwix-kiwix
./configure --enable-compileall --enable-staticbins --disable-android
cd src/installer
make
sudo make install
Install kiwix-compact
kiwix-compact is xapian-compact based tool allowing to compact fulltext search engine. Install it:
cd /srv/kiwix-kiwix/kiwix/
sudo cp kiwix-compact /usr/local/bin/
Configure cron
There are the jobs which are run periodically:
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
# m h dom mon dow command
*/6 * * * * flock -n /tmp/build_portable.lock -c "/srv/kiwix-maintenance/maintenance_tools/build_portable_packages.sh"
Reboot
VM needs to be rebooted to apply all changes:
sudo reboot