Help:Tool Labs/Web

From Wikitech

Every tool can have a dedicated web server started on a dedicated queue of the grid; that web server is lighttpd by default (documentation).

It is also possible to run your own webserver (e.g. to run a Scala-based tool). See #Other web servers below.

  • Tools get general error logs in ~/error.log
  • PHP scripts are automatically invoked with FCGI
  • The web server is mostly configurable (including adding other FCGI handlers)
  • customization being instead handled through ~/.lighttpd.conf
  • Everything runs with the tool's UID, regardless of file ownership.
  • Similar to other Wikimedia servers, HTTP requests to Tool Labs require a User-Agent header (see also User-Agent policy on Meta).

Using cookies

Since all tools in the 'tools' project reside under the same domain, you should prefix the name of any cookie you set with your tool's name. In addition, you should be aware that cookies you set may be read by every other web tool your user visits.

Accordingly, you should avoid storing privacy-related or security information in cookies. A simple workaround is to store session information in a database, and use the cookie as an opaque key to that information. Additionally, you can explicitly set a path in a cookie to limit its applicability to your tool; most clients should obey the Path directive properly.

Default web server

You can start your web server (from the tool account) with the command:

webservice start

Likewise, you can use the webservice command to stop and restart your server, or to request its status.

Configuring the web server

As it starts, the web server reads any configuration in ~/.lighttpd.conf, and merges it with the default configuration (which is likely to be adequate for most tools).

Sometimes merge fails if an option is already set in the default configuration. So instead of using   option = value   try   option += value.

Default configuration

This is the default (if you don't specify any other/additional settings in your tool's .lighttpd.conf)

See lighttpd-starter in operations/puppet.git for the canonical configuration.

Example configurations

FCGI Flask config
fastcgi.server += ( "/gerrit-patch-uploader" =>
    ((
        "socket" => "/tmp/patchuploader-fcgi.sock",
        "bin-path" => "/data/project/gerrit-patch-uploader/src/gerrit-patch-uploader/app.fcgi",
        "check-local" => "disable",
        "max-procs" => 1,
    ))
)

For Flask, the fcgi handler looks like this: https://github.com/valhallasw/gerrit-patch-uploader/blob/master/app.fcgi

Url rewrite
url.rewrite-once += ( "/id/([0-9]+)" => "/index.php?id=$1",
                      "/link/([a-zA-Z]+)" => "/index.php?link=$1" )

Details: ModRewrite

Header, mimetype, character encoding, error handler
# Allow Cross-Origin Resource Sharing (CORS) 
setenv.add-response-header  += ( "Access-Control-Allow-Origin" => "en.wikipedia.org",
                                 "Access-Control-Allow-Methods" => "POST, GET, OPTIONS" )

# Set cache-control directive for static files and resources
$HTTP["url"] =~ "\.(jpg|gif|png|css|js|txt|ico)$" {
	setenv.add-response-header += ( "Cache-Control" => "max-age=386400, public" )
}

mimetype.assign  += (
    # Add custom mimetype
    ".bulk"  => "text/plain",
    # Avoid [[Mojibake]] in JavaScript files
    ".js"   => "application/javascript; charset=utf-8",
    # Default MIME type with UTF-8 character encoding
    ""      => "text/plain; charset=utf-8"
)

# Add custom error-404 handler
server.error-handler-404  += "/error-404.php" 

Details: ModSetEnv  Mimetype-Assign   Error-Handler-404   HTTP access control (CORS)

Directory or file index
# Enable basic directory index
$HTTP["url"] =~ "^/?" {
	dir-listing.activate = "enable"
}
Deny access to hidden files
# Deny access to hidden files
$HTTP["url"] =~ "/\." {
	url.access-deny = ("")
}

Details: ModAccess

Custom index
# Enable index for specific directory 
$HTTP["url"] =~ "^/download($|/)" {
	dir-listing.activate = "enable" 
}

# Custom index file or custom directory generator
index-file.names += ("index.py")

Details: ModDirlisting

Request logging

Add the line:

# Enable request logging
debug.log-request-handling = "enable"
Apache-like cgi-bin directory

Add the following stanza:

$HTTP["url"] =~ "^/your_tool/cgi-bin" {
	cgi.assign = ( "" => "" )
}

This does require that cgi-bin be under your public_html rather than alongside it.

To run CGI from any directory under your public_html only need this one line (w/out the $HTTP["url"] .. block)

cgi.assign += ( ".cgi" => "" )

The part to the left is the file name or extension ("" = any). The part to the right is the program which will run it ("" = any). Another example

cgi.assign += ( "script.sh" => "/bin/bash" )
Enable Status & Statistics
# modify <toolname> for your tool
# this will enable counters  http://tools.wmflabs.org/<toolname>/server-status (resp: .../server-statistics)
server.modules += ("mod_status")
status.status-url = "/<toolname>/server-status"
status.statistics-url = "/<toolname>/server-statistics"

Details: ModStatus

Web logs

Your tool's web logs are placed in the tool account's ~/access.log in common format. Please note that the web logs are anonymized in accordance with the Foundation’s privacy policy. Each user IP address will appear to be that of the local host, for example. In general, the privacy policy precludes the logging of personally identifiable information; special permission from Foundation legal counsel is required if such information is required.

Error logs can be found in the tool account's ~/error.log; this includes the standard error of invoked scripts.

Error pages

The proxy provides its own error pages when your application returns HTTP/500, HTTP/502 or HTTP/503. This behavior is currently under review, and might change in the near future.

You can bypass the proxy error pages by passing an X-Wikimedia-Debug header.

Static file server

Static files in a tool's www/static directory are available directly from the URL tools-static.wmflabs.org/toolname. This does not require any action on the tool's parts - just putting the files in the appropriate folder (and making the directory readable) should 'just work'. You can use this to quickly serve static assets (CSS, HTML, JS, etc).

node.js web services

NodeJS (with websocket support) can run fairly well on toollabs now. They all run on trusty nodes, and run node version v0.10.25.

  1. Use trusty.tools.wmflabs.org as bastion for everything.
  2. Put your node application in ~/www/js in your tool's home directory.
  3. Make sure your server starts up properly when npm start is executed. The default way to do this is to name your main script server.js
  4. Your server should bind to a port that is passed in as an environment variable (TOOL_WEB_PORT). You can access this via process.env.TOOL_WEB_PORT. Without this your tool would not work.
  5. Run webservice2 nodejs start to start your webserver (or webservice2 nodejs restart to restart it after a code change)
  6. PROFIT! :)

This is an example code for a node.js web server running as a tool:

var http = require('http');
var port = parseInt(process.env.TOOL_WEB_PORT, 10) ; // IMPORTANT!! You HAVE to use this environment variable as port!

http.createServer(function (req, res) {
	res.writeHead(200, {'Content-Type': 'text/plain'});
	res.end('Hello World\n');
}).listen(port);

Keeping this in ~/www/js/server.js and doing a webservice2 nodejs start should work

Troubleshooting

If you run into errors doing npm install, try LINK=g++ npm install

Java (Tomcat)

Similar to the lighttpd webservice, there is also a Tomcat webservice for Java applications.

Before using Tomcat, you have to setup Tomcat:

$ setup-tomcat

This will create a local Tomcat installation at ~/public_tomcat/. You can manage the Tomcat webservice similar to lighttpd using webservice -tomcat (start|stop|restart). (Note: If there is a running lighttpd webservice, Tomcat won’t work.)

To deploy a Web Application Archive (WAR), move it to ~/public_tomcat/webapps/tool.war where tool is the name of your tool. Archive extraction, deployment, and configuration is done automatically by Tomcat. A Tomcat restart may be required. The application will be available at tools.wmflabs.org/tool/.

To test the Tomcat webservice, you can use the Tomcat sample application (available on tomcat.apache.org).

If your Java application is more complex, the standard memory settings might not work. You might get errors like There is insufficient memory for the Java Runtime Environment to continue and Tomcat will simply stop working. In that case, try to copy /usr/bin/webservice to your home directory and adapt the memlimit setting to e.g. 6g and then use your copy to restart the service. On the other hand, the settings for the JVM can be modified in public_tomcat/bin/setenv.sh. If that memory setting from JAVA_OPTS is too low, you'll get the well-known OutOfMemoryError from Java. In same cases, Tomcat may not stop anymore then. You can use ssh tools-webgrid-tomcat and kill your Tomcat process manually (only consider this if you know what you're doing).

Python 2 (uwsgi)

There is specialized support for Python uwsgi applications (such as Flask).

Place your application in ~/www/python/src/app.py in a variable named "app" (example).

Create a virtualenv in ~/www/python/venv (virtualenv ~/www/python/venv). This should be created on a trusty bastion as the code will be run on a trusty instance.

Run webservice2 uwsgi-python start and watch your application run!

You can put custom additional config for uwsgi in ini file form in ~/www/python/uwsgi.ini

Python 3 is not yet supported, unfortunately, but see the section on uwsgi-plain below. See also phab:T104374 for status and a workaround.

Python (uwsgi-plain)

The uwsgi-plain webservice allows a more customisable UWSGI configuration, including using Python 3. To set up a Python 3 Flask application, for instance:

  • Place your application in ~/www/python/src/app.py in a variable named "app".
  • Create a virtualenv in ~/www/python/venv.
  • Create ~/uwsgi.ini:
[uwsgi]
plugin = python3
socket = /data/project/myproject/myproject.sock
chdir = /data/project/myproject/www/python/src
venv = /data/project/myproject/www/python/venv
module = app
callable = app
manage-script-name = true
mount = /myproject=app:app
  • Run the web service:

webservice2 uwsgi-plain start

Note that there are two things called 'app' here - the module is the python module containing your application, while the callable is the name of the application variable (or function) within that module. So, for instance, if your application is in a file called foo.py and you have this line in it:

bar = Flask(__name__)

then you would have these lines in your uwsgi.ini:

module = foo
callable = bar
mount = /myproject=foo:bar

Other web servers

First of all, create a file called 'httpserver.sh', with the following contents:

#!/bin/bash
exec portgrabber <tool name> <command to run tool>

and make sure it's executable: chmod +x ./httpserver.sh

portgrabber will make sure the proxy is configured correctly, and will pass the port number to the tool as last argument. For instance, Python's SimpleHTTPServer takes a port number as argument:

python -m SimpleHTTPServer 8000

would run the server on port 8000. To run it under the web proxy, you would use the following script:

#!/bin/bash
exec portgrabber MyToolName python -m SimpleHTTPServer

Then submit it to the grid, using the 'webgrid-generic' queue:

jstart -mem 4G -l release=trusty -q webgrid-generic ./httpserver.sh

You can check the status with qstat, which will show a running job after 10 seconds or so. You can now reach your tool at https://tools.wmflabs.org/MyToolName !

Note that, as with the lighttpd setup, your tool will receive URL's that include your tool prefix - e.g. /MyToolName/index.html instead of /index.html. You may need to adapt your tool configuration to handle this.

Note that portgrabber may not be available direct from the command line, but only on the grid via jstart.