Analytics/Cluster/Refinery

From Wikitech

Refinery is the software infrastructure that is used on the Analytics Cluster. The source code is in the analytics/refinery repository.

This repository uses jars created from analytics/refinery/source, see this page for deploying those.

How to deploy

  1. Ssh into Tin
  2. Run:
    cd /srv/deployment/analytics/refinery
    git deploy start
    git checkout master
    git pull
    git deploy sync

    (git deploy sync will complain that only “2/3 minions completed fetch”. You can say “y”es to that)

    This part brings the refinery code from gerrit to stat1002.
  3. Ssh into stat1002
  4. Run sudo -u hdfs /srv/deployment/analytics/refinery/bin/refinery-deploy-to-hdfs --verbose --no-dry-run

    This part brings the refinery code to the HDFS (but it does not resubmit Oozie jobs).

How to deploy Oozie jobs

Please see the Deployment section in the Oozie docs.