Nova Resource:Tools/Admin/new exec host
New exec hode checklist
Note: this checklist is probably not completely correct and up to date. Please update the guide if you encounter any issues. |
Initial notes
- Host types:
- exec
- webgrid-lighttpd
- webgrid-generic
- custom (cyberbot, catscan, ...)
- Hosts typically exist in Precise (-12xx) and Trusty (-14xx) variants.
- Hosts are numbered incrementally.
Host setup
- Create a new host
- Instance name:
tools-<host type>-NNxx
- precise: NN=12, trusty: NN=14
- xx is incremental
- Instance type:
m1.large
- Image type: precise or trusty
- Security groups:
- exec:
default
,execnode
- webgrid-lighttpd:
default
,execnode
,webserver
- webgrid-generic:
default
,execnode
,webserver
- custom:
default
,execnode
- exec:
- Instance name:
- Configure host:
- all hosts:
role::labs::tools::compute
, - exec:
toollabs::node::compute::general
- webgrid-lighttpd:
toollabs::node::web::lighttpd
- webgrid-generic:
toollabs::node::web::generic
- custom: ??
- all hosts:
- run
sudo apt-get update
&&puppet agent -tv
until no failures- For precise instances, you need to reboot them after the first puppet run, and run puppet again. This fixes an NFS permissions issue and turns on swap partition properly, and outputs the correct vmem value for the gridengine configuration.
- kill
mpt-statusd
Grid configuration
- add the host as exec host:
qconf -Ae /var/lib/gridengine/etc/exechosts/<hostname>
- If pooling precise instances, remember to check that swap is enabled ('sudo swapon -s') and that the exec host config file mentions 30G as value for vmem (on a large host)
- webgrid, custom: add the host as submit host:
qconf -as <hostname>
- exec: add the host to hostgroup
@generic
:qconf -mhgrp \@general
- webgrid-lighttpd: add the host to hostgroup
@webgrid
:qconf -mhgrp \@webgrid
- webgrid-generic: add the host to queue
webgrid-generic
:qconf -mq webgrid-generic
- custom: add the host to the custom queue:
qconf -mq <queue name>
- exec: add the host to hostgroup
qmod -e "*@<hostname>"
should now tell you the new hosts' queues are enabled- start
gridengine-exec
on the new host:sudo service gridengine-exec start
qhost -q -h <hostname>
should show the new queues without trailing 'au', indicating the host is up and runningqhost -j -h <hostname>
hopefully already shows jobs being submitted on the host