Help:Tool Labs/Web
Every tool can have a dedicated web server started on a dedicated queue of the grid; that web server is lighttpd by default (documentation).
It is also possible to run your own webserver (e.g. to run a Scala-based tool). See #Other web servers below.
- Tools get general error logs in
~/error.log
- PHP scripts are automatically invoked with FCGI
- The web server is mostly configurable (including adding other FCGI handlers)
- customization being instead handled through
~/.lighttpd.conf
- Everything runs with the tool's UID, regardless of file ownership.
- Similar to other Wikimedia servers, HTTP requests to Tool Labs require a User-Agent header (see also User-Agent policy on Meta).
Using cookies
Since all tools in the 'tools' project reside under the same domain, you should prefix the name of any cookie you set with your tool's name. In addition, you should be aware that cookies you set may be read by every other web tool your user visits.
Accordingly, you should avoid storing privacy-related or security information in cookies. A simple workaround is to store session information in a database, and use the cookie as an opaque key to that information. Additionally, you can explicitly set a path in a cookie to limit its applicability to your tool; most clients should obey the Path directive properly.
Default web server
You can start your web server (from the tool account) with the command:
webservice start
Likewise, you can use the webservice
command to stop
and restart
your server, or to request its status
.
Configuring the web server
As it starts, the web server reads any configuration in ~/.lighttpd.conf
, and merges it with the default configuration (which is likely to be adequate for most tools).
Sometimes merge fails if an option is already set in the default configuration. So instead of using option = value try option += value.
Default configuration
This is the default (if you don't specify any other/additional settings in your tool's .lighttpd.conf)
Default lighttpd configuration |
---|
The following content has been placed in a collapsed box for improved usability. |
server.modules = ( "mod_setenv", "mod_access", "mod_accesslog", "mod_alias", "mod_compress", "mod_redirect", "mod_rewrite", "mod_fastcgi", "mod_cgi", ) server.port = $port server.use-ipv6 = "disable" server.username = "$prefix.$tool" server.groupname = "$prefix.$tool" server.core-files = "disable" server.document-root = "$home/public_html" server.pid-file = "$runbase.pid" server.errorlog = "$home/error.log" server.breakagelog = "$home/error.log" server.follow-symlink = "enable" server.max-connections = 300 server.stat-cache-engine = "fam" server.event-handler = "linux-sysepoll" ssl.engine = "disable" alias.url = ( "/$tool" => "$home/public_html/" ) index-file.names = ( "index.php", "index.html", "index.htm" ) dir-listing.encoding = "utf-8" server.dir-listing = "disable" url.access-deny = ( "~", ".inc" ) static-file.exclude-extensions = ( ".php", ".pl", ".fcgi" ) accesslog.use-syslog = "disable" accesslog.filename = "$home/access.log" cgi.assign = ( ".pl" => "/usr/bin/perl", ".py" => "/usr/bin/python", ".pyc" => "/usr/bin/python", ) fastcgi.server += ( ".php" => (( "bin-path" => "/usr/bin/php-cgi", "socket" => "/tmp/php.socket.$tool", "max-procs" => 2, "bin-environment" => ( "PHP_FCGI_CHILDREN" => "2", "PHP_FCGI_MAX_REQUESTS" => "500" ), "bin-copy-environment" => ( "PATH", "SHELL", "USER" ), "broken-scriptfilename" => "enable", "allow-x-send-file" => "enable" )) ) (config as of Aug 4, 2014) |
The above content has been placed in a collapsed box for improved usability. |
See lighttpd-starter in operations/puppet.git for the canonical configuration.
Example configurations
- FCGI Flask config
fastcgi.server += ( "/gerrit-patch-uploader" => (( "socket" => "/tmp/patchuploader-fcgi.sock", "bin-path" => "/data/project/gerrit-patch-uploader/src/gerrit-patch-uploader/app.fcgi", "check-local" => "disable", "max-procs" => 1, )) )
For Flask, the fcgi handler looks like this: https://github.com/valhallasw/gerrit-patch-uploader/blob/master/app.fcgi
- Url rewrite
url.rewrite-once += ( "/id/([0-9]+)" => "/index.php?id=$1", "/link/([a-zA-Z]+)" => "/index.php?link=$1" )
Details: ModRewrite
- Header, mimetype, character encoding, error handler
# Allow Cross-Origin Resource Sharing (CORS) setenv.add-response-header += ( "Access-Control-Allow-Origin" => "en.wikipedia.org", "Access-Control-Allow-Methods" => "POST, GET, OPTIONS" ) # Set cache-control directive for static files and resources $HTTP["url"] =~ "\.(jpg|gif|png|css|js|txt|ico)$" { setenv.add-response-header += ( "Cache-Control" => "max-age=386400, public" ) } mimetype.assign += ( # Add custom mimetype ".bulk" => "text/plain", # Avoid [[Mojibake]] in JavaScript files ".js" => "application/javascript; charset=utf-8", # Default MIME type with UTF-8 character encoding "" => "text/plain; charset=utf-8" ) # Add custom error-404 handler server.error-handler-404 += "/error-404.php"
Details: ModSetEnv Mimetype-Assign Error-Handler-404 HTTP access control (CORS)
- Directory or file index
# Enable basic directory index $HTTP["url"] =~ "^/?" { dir-listing.activate = "enable" }
- Deny access to hidden files
# Deny access to hidden files $HTTP["url"] =~ "/\." { url.access-deny = ("") }
Details: ModAccess
- Custom index
# Enable index for specific directory $HTTP["url"] =~ "^/download($|/)" { dir-listing.activate = "enable" } # Custom index file or custom directory generator index-file.names += ("index.py")
Details: ModDirlisting
- Request logging
Add the line:
# Enable request logging debug.log-request-handling = "enable"
- Apache-like cgi-bin directory
Add the following stanza:
$HTTP["url"] =~ "^/your_tool/cgi-bin" { cgi.assign = ( "" => "" ) }
This does require that cgi-bin be under your public_html rather than alongside it.
To run CGI from any directory under your public_html only need this one line (w/out the $HTTP["url"] .. block)
cgi.assign += ( ".cgi" => "" )
The part to the left is the file name or extension ("" = any). The part to the right is the program which will run it ("" = any). Another example
cgi.assign += ( "script.sh" => "/bin/bash" )
- Enable Status & Statistics
# modify <toolname> for your tool # this will enable counters http://tools.wmflabs.org/<toolname>/server-status (resp: .../server-statistics) server.modules += ("mod_status") status.status-url = "/<toolname>/server-status" status.statistics-url = "/<toolname>/server-statistics"
Details: ModStatus
Web logs
Your tool's web logs are placed in the tool account's ~/access.log in common format. Please note that the web logs are anonymized in accordance with the Foundation’s privacy policy. Each user IP address will appear to be that of the local host, for example. In general, the privacy policy precludes the logging of personally identifiable information; special permission from Foundation legal counsel is required if such information is required.
Error logs can be found in the tool account's ~/error.log; this includes the standard error of invoked scripts.
Error pages
The proxy provides its own error pages when your application returns HTTP/500, HTTP/502 or HTTP/503. This behavior is currently under review, and might change in the near future.
You can bypass the proxy error pages by passing an X-Wikimedia-Debug header.
Static file server
Static files in a tool's www/static
directory are available directly from the URL tools-static.wmflabs.org/toolname
. This does not require any action on the tool's parts - just putting the files in the appropriate folder (and making the directory readable) should 'just work'. You can use this to quickly serve static assets (CSS, HTML, JS, etc).
node.js web services
NodeJS (with websocket support) can run fairly well on toollabs now. They all run on trusty nodes, and run node version v0.10.25.
- Use
trusty.tools.wmflabs.org
as bastion for everything. - Put your node application in
~/www/js
in your tool's home directory. - Make sure your server starts up properly when
npm start
is executed. The default way to do this is to name your main scriptserver.js
- Your server should bind to a port that is passed in as an environment variable (TOOL_WEB_PORT). You can access this via
process.env.TOOL_WEB_PORT
. Without this your tool would not work. - Run
webservice2 nodejs start
to start your webserver (orwebservice2 nodejs restart
to restart it after a code change) - PROFIT! :)
This is an example code for a node.js web server running as a tool:
var http = require('http'); var port = parseInt(process.env.TOOL_WEB_PORT, 10) ; // IMPORTANT!! You HAVE to use this environment variable as port! http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello World\n'); }).listen(port);
Keeping this in ~/www/js/server.js
and doing a webservice2 nodejs start
should work
Troubleshooting
If you run into errors doing npm install
, try LINK=g++ npm install
Java (Tomcat)
Similar to the lighttpd webservice, there is also a Tomcat webservice for Java applications.
Before using Tomcat, you have to setup Tomcat:
$ setup-tomcat
This will create a local Tomcat installation at ~/public_tomcat/
. You can manage the Tomcat webservice similar to lighttpd using webservice -tomcat (start|stop|restart)
. (Note: If there is a running lighttpd webservice, Tomcat won’t work.)
To deploy a Web Application Archive (WAR), move it to ~/public_tomcat/webapps/tool.war
where tool
is the name of your tool. Archive extraction, deployment, and configuration is done automatically by Tomcat. A Tomcat restart may be required. The application will be available at tools.wmflabs.org/tool/
.
To test the Tomcat webservice, you can use the Tomcat sample application (available on tomcat.apache.org).
If your Java application is more complex, the standard memory settings might not work. You might get errors like There is insufficient memory for the Java Runtime Environment to continue
and Tomcat will simply stop working. In that case, try to copy /usr/bin/webservice
to your home directory and adapt the memlimit
setting to e.g. 6g
and then use your copy to restart the service. On the other hand, the settings for the JVM can be modified in public_tomcat/bin/setenv.sh
. If that memory setting from JAVA_OPTS
is too low, you'll get the well-known OutOfMemoryError
from Java. In same cases, Tomcat may not stop anymore then. You can use ssh tools-webgrid-tomcat
and kill your Tomcat process manually (only consider this if you know what you're doing).
Python 2 (uwsgi)
There is specialized support for Python uwsgi applications (such as Flask).
Place your application in ~/www/python/src/app.py in a variable named "app" (example).
Create a virtualenv in ~/www/python/venv (virtualenv ~/www/python/venv
). This should be created on a trusty bastion as the code will be run on a trusty instance.
Run webservice2 uwsgi-python start
and watch your application run!
You can put custom additional config for uwsgi in ini file form in ~/www/python/uwsgi.ini
Python 3 is not yet supported, unfortunately, but see the section on uwsgi-plain below. See also phab:T104374 for status and a workaround.
Python (uwsgi-plain)
The uwsgi-plain webservice allows a more customisable UWSGI configuration, including using Python 3. To set up a Python 3 Flask application, for instance:
- Place your application in ~/www/python/src/app.py in a variable named "app".
- Create a virtualenv in ~/www/python/venv.
- Create ~/uwsgi.ini:
[uwsgi]
plugin = python3
socket = /data/project/myproject/myproject.sock
chdir = /data/project/myproject/www/python/src
venv = /data/project/myproject/www/python/venv
module = app
callable = app
manage-script-name = true
mount = /myproject=app:app
- Run the web service:
webservice2 uwsgi-plain start
Note that there are two things called 'app' here - the module is the python module containing your application, while the callable is the name of the application variable (or function) within that module. So, for instance, if your application is in a file called foo.py and you have this line in it:
bar = Flask(__name__)
then you would have these lines in your uwsgi.ini:
module = foo
callable = bar
mount = /myproject=foo:bar
Other web servers
First of all, create a file called 'httpserver.sh', with the following contents:
#!/bin/bash exec portgrabber <tool name> <command to run tool>
and make sure it's executable: chmod +x ./httpserver.sh
portgrabber will make sure the proxy is configured correctly, and will pass the port number to the tool as last argument. For instance, Python's SimpleHTTPServer takes a port number as argument:
python -m SimpleHTTPServer 8000
would run the server on port 8000. To run it under the web proxy, you would use the following script:
#!/bin/bash exec portgrabber MyToolName python -m SimpleHTTPServer
Then submit it to the grid, using the 'webgrid-generic' queue:
jstart -mem 4G -l release=trusty -q webgrid-generic ./httpserver.sh
You can check the status with qstat
, which will show a running job after 10 seconds or so. You can now reach your tool at https://tools.wmflabs.org/MyToolName !
Note that, as with the lighttpd setup, your tool will receive URL's that include your tool prefix - e.g. /MyToolName/index.html instead of /index.html. You may need to adapt your tool configuration to handle this.
Note that portgrabber may not be available direct from the command line, but only on the grid via jstart.