Analytics/Cluster/Access

From Wikitech

CLI Access

Access to Hadoop and Hive in the Analytics Cluster should be done via stat1002.eqiad.wmnet. See also the instructions at requesting shell access and information about available access groups to request access to the Analytics Cluster.

Steps to login to the stat1002 :

1. Request access for stat1002 (https://wikitech.wikimedia.org/wiki/Requesting_shell_access) Most of you have already done this, and are in the pipeline to get access. Feel free to reach out if there's any trouble in this process.

2. Set up your ssh config (https://wikitech.wikimedia.org/wiki/SSH_access)

Add/update your ~/.ssh/config file. It should look something like this: 

## Use bastion-eqiad.wmflabs.org as proxy to labs
Host bastlabs
HostName bastion-eqiad.wmflabs.org
User madhuvishy
 
Host *.eqiad.wmflabs !bastion-eqiad.wmflabs.org
User madhuvishy
IdentityFile ~/.ssh/id_rsa
ProxyCommand ssh -a -W %h:%p bastlabs
 
## Prod
Host bastprod
HostName bast1001.wikimedia.org
User madhuvishy
 
Host *.eqiad.wmnet *.wikimedia.org !bast1001.wikimedia.org
User madhuvishy
IdentityFile ~/.ssh/id_rsa_prod
ProxyCommand ssh -a -W %h:%p bastprod
 
Host bast1001.wikimedia.org
User madhuvishy
IdentityFile ~/.ssh/id_rsa_prod

The User value should be your labs/prod username accordingly.  

3. Add keys to the ssh-agent. On the terminal, something like: ssh-add ~/.ssh/id_rsa ssh-add ~/.ssh/id_rsa_prod

4. If your access has been granted, and ssh config is all good, you should be able to get into stat1002 from the terminal, like this: ssh stat1002.eqiad.wmnet

It will prompt to confirm the RSA fingerprint, and when you say yes, log you in to the server.

You can quit the session by typing exit.

HTTP Access

Access to HTTP GUIs in the Analytics Cluster is currently very restricted. You must have shell accounts on analytics nodes. You must use a SOCKS proxy or ssh tunnels to access to HTTP services.

At the very minimum, you must have a shell account on the primary NameNode (analytics1001). HDFS uses POSIX accounts on the NameNode (analytics1001) for granting access to files.

Hue (Hadoop User Experience) GUI is available at https://hue.wikimedia.org. Log in using your shell username and your LDAP credentials. If you already have cluster access, but can't log into Hue, it is likely that your LDAP account needs to be manually synced. Ask ottomata (aotto@wikimedia.org) for help.

sshuttle

sshuttle is a 'Transparent proxy server that works as a poor man's VPN. Forwards over ssh. Doesn't require admin. Works with Linux and MacOS. Supports DNS tunneling.'. You can use this to proxy traffic through a bastion host to the cluster.

Download and install sshuttle following these instructions. Then, run

 ./sshuttle --dns -vvr bast1001.wikimedia.org 10.0.0.0/8

Be warned, this will proxy DNS requests through the Wikimedia network, and any requests to an IP on the internal Wikimedia network will be proxied through the bastion.

While this is running, you should be able to navigate to internally hosted web services from your browser. Try accessing the ResourceManager jobbrowser at http://analytics1001.eqiad.wmnet:8088/

ssh tunnel(s)

If you have access to the nodes you want to send HTTP requests to, then you can access specific HTTP services using direct ssh tunneling.

To access the Hadoop Resourcemanager jobbrowser, run:

 ssh -N bast1001.wikimedia.org -L 8088:analytics1001.eqiad.wmnet:8088

And then navigate to http://localhost:8088/cluster in your browser.

You might want to check out the FairScheduler interface here too. It will show you usage of the cluster per user: http://localhost:8088/cluster/scheduler

SOCKS proxy & FoxyProxy

Also see the explanation in Help:Access#Setting up the proxy

For this to work, you need automatic ssh proxying to analytics1001.eqiad.wmnet through bast1001.wikimedia.org. You can add the following to your .ssh/config file if you don't already have something more generic (see SSH access):

 Host analytics*
     ProxyCommand ssh -a -W %h:%p bast1001.wikimedia.org 

Once that works (verify that you can ssh into analytics1001.eqiad.wmnet), you can open up a SOCKS proxy through analytics1001.eqiad.wmnet:

 ssh -N -D 8999 analytics1001.eqiad.wmnet

Finally, configure your browser to connect via host: localhost port 8999. If you use FoxyProxy, you can set up specific URL patterns that you would like to proxy. https?://analytics.* should do.

Once there, you should be able to navigate to services. Try out http://analytics1001.eqiad.wmnet:8999/cluster to be sure that it works.