Analytics/Cluster/AQS

From Wikitech

Analytics Query Service

The Analytics Query Service (AQS) is a public facing API that serves analytics data.

Monitoring

Grafana dashboards were created to help monitor AQS: https://phabricator.wikimedia.org/T116590#1754386

Deploying

Steps to Deploy Analytics Query Service - AQS

  • update the deploy repository
  • push a change to the deploy repository
    • cd /home/dan/projects/restbase
    • MAKE SURE the gerrit MIRROR of RESTBase is up to date
    • git pull
    • git config deploy.dir /home/dan/projects/aqs-deploy
    • rm -rf node_modules && npm install
    • node server.js build --deploy-repo --force
    • That last command will create a commit in aqs-deploy, so push that for review to gerrit, and review the change (make sure the commits you expect are there)
    • MERGE the change that just got submitted (https://gerrit.wikimedia.org/r/#/c/247726/ for example)
  • deploy
    • git clone https://github.com/wikimedia/ansible-deploy.git && cd ansible-deploy
    • install ansible, then check and deploy:
    • ansible-playbook --check -i production -e target=aqs roles/restbase/deploy.yml
    • ansible-playbook -i production -e target=aqs roles/restbase/deploy.yml

Administration

Create a new node

aqs* nodes have 12 disks. The layout is:

  • / /dev/md0 RAID 0 on sda1 and sdb1
  • swap /dev/md1 RAID 0 on sda2 and sdb2
  • /dev/md2 is LVM on RAID 10 across sda3, sdb3 and sdc1 - sdl1

partman is not smart enough to make this layout, so it has to be done manually. Assuming the raid1-30G.cfg recipe was used to install these hosts, run the following to create the desired partition layout:

#!/bin/bash

# Delete partition 3 if you have it left over from a previous installation.
for disk in /dev/sd{a,b}; do
fdisk $disk <<EOF
d
3
w
EOF
done

# Delete DataNode partitions if leftover from previous installation.
for disk in /dev/sd{c,d,e,f,g,h,i,j,k,l}; do
fdisk $disk <<EOF
d
1
w
EOF
done

# Create RAID partition 3 on sda and sdb
for disk in /dev/sd{a,b}; do
fdisk $disk <<EOF
n
p
3


t
3
fd
w
EOF
done


# Create RAID on a single partition spanning full disk for remaining 10 disks.
for disk in /dev/sd{c,d,e,f,g,h,i,j,k,l}; do
fdisk $disk <<EOF
n
p
1


t
fd
w
EOF
done

# run partprobe to refresh partition table
# (apt-get install parted)
partprobe

# Create mirrored RAID 10 on sda3, sdb3, and sdc1-sdl1
md_name=/dev/md/2
mdadm --create ${md_name} --level 10 --raid-devices=12 /dev/sda3 /dev/sdb3 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 <<EOF
y
EOF
/usr/share/mdadm/mkconf > /etc/mdadm/mdadm.conf

# set up LVM on /dev/md2 for cassandra
pvcreate /dev/md2
vgcreate "${HOSTNAME}-vg" /dev/md2
lvcreate -L 10T "${HOSTNAME}-vg" -n cassandra

# Make an ext4 filesystem on the new cassandra partition
mkfs.ext4 /dev/"${HOSTNAME}-vg"/cassandra
tune2fs -m 0 /dev/"${HOSTNAME}-vg"/cassandra

cassandra_directory=/var/lib/cassandra
mkdir -pv $cassandra_directory

# Add the LV to fstab
grep -q $cassandra_directory /etc/fstab || echo -e "# Cassandra Data Partition\n/dev/${HOSTNAME}-vg/cassandra\t${cassandra_directory}\text4\tdefaults,noatime\t0\t2" | tee -a /etc/fstab
mount $cassandra_directory