Analytics/Reportupdater
ATTENTION: This codebase was developed with the intention to replace generate.py script from analytics-limn-mobile-data project while the analytics team would find a reliable and generic solution for report scheduling. It was built temporarily inside the same project in a separate directory.
What is reportupdater?
It's a python program that creates and updates TSV files in a periodic manner, given the inputs:
- SQL queries
- Scripts that return SQL-like responses
Whenever a report needs to be updated, reportupdater queries the given databases or the given scripts for the missing data and appends it to the TSV report files.
Dependencies and dev environment
As reportupdater is built inside analytics-limn-mobile-data project, its dependencies and dev environment are the same, you can see them here.
How to execute
There are two options:
Run directly from the command line
Inside reportupdater folder, there lives this script:
usage: update_reports.py [-h] [--config-path CONFIG_PATH] [--wikis_path WIKIS_PATH] [-l LOG_LEVEL] query_folder output_folder Periodically execute SQL queries or scripts and write/update the results into TSV files. positional arguments: query_folder Folder with *.sql files and scripts. output_folder Folder to write the TSV files to. optional arguments: -h, --help show this help message and exit --config-path CONFIG_PATH Yaml configuration file. Default: <query_folder>/config.yaml. --wikis_path WIKIS_PATH All wikis list file path. Default: wikis.txt. -l LOG_LEVEL, --log-level LOG_LEVEL (debug|info|warning|error|critical)
Run through generate.py
Some unused parameters were left out of the description.
usage: generate.py [-h] [-c CONFIG_OVERRIDE] folder Generate data for the mobile dashboard. positional arguments: folder folder with config.yaml and *.sql files optional arguments: -h, --help show this help message and exit -c CONFIG_OVERRIDE, --config-override CONFIG_OVERRIDE config.yaml override -g GRAPH, --graph GRAPH the name of a single graph you want to generate for
Config file
As reportupdater will run alongside generate.py, the general expected contents of the config file are the same as in analytics-limn-mobile-data. The only difference is the reportupdater-reports section. Reportupdater will assume the config file describes the following yaml block:
reportupdater-reports: report_key_1: # Type: 'sql' or 'script'. Tells reportupdater how to execute the # report. Optional. Default: 'sql'. type: sql # Frequency: hours, days or weeks. The frequency in which the report # will be updated. Mandatory. frequency: hours # Granularity: days, weeks or months. Reportupdater will assume the data # to have this granularity. Mandatory. granularity: days # Lag: Positive integer. Time to wait (in seconds) to start the report's # execution after the execution is triggered. Useful to wait until source # database is updated. Optional. Default: 0. lag: 300 # Timeboxed: If true, the program will expect the SQL query to have # {from_timestamp} and {to_timestamp} placeholders and instantiate # them with the corresponding values for the missing date intervals. # Optional. Default: false. timeboxed: true # Starts: First date to be computed. Inclusive. # Only and mandatory for timeboxed = true. starts: 2015-01-01 # Funnel: If true, the program will assume funnel data. # Meaning that for a single date, there may be various data rows. # Optional. Default: false. funnel: true # Explode by: (dict) The initial report will be exploded, # and a different report file will be generated for each # value. The output path will be: # <report_name>/value1/value2/.../valueN.tsv # Where the order of the folders will be dictated by the # names of the placeholders (alphabetically). # In the sql query use {placeholder} to indicate where to # instantiate the values. # In the scripts, the values will be passed as arguments # in alphabetical order. explode_by: placeholder1: value1, value2, value3 placeholder2: value4, value5 # By wiki: If true, this will emulate explode_by with # placeholder = 'wiki' and values = a list of all wikis. # The value 'all' will also be part of the list. Default: false. by_wiki: true report_key_2: ...
When executing generate.py, reportupdater will only run if the reportupdater-reports section exists in the config file. Otherwise, it will never be called. If reportupdater runs, the report keys that exist inside the reportupdater-reports section, will not be executed by generate.py, even if they are also in generate.py's graphs section.
If you're running reportupdater through generate.py, and you want the reports generated by reportupdater to be outputed to a different folder, you can specify it by adding the following to the root of the config file:
reportupdater-output: "/your/desired/output/folder"
Limitations
Reportupdater works only for SQL queries and scripts whose result's first column represents a date in the following format: YYYY-MM-DD.
Testing
To run unit tests, use:
nosetests
To actually test reportupdater against the database, create an ssh tunnel to the database as explained in analytics-limn-mobile-data readme. And run reportupdater through generate.py overriding the config file, like:
python generate.py -c scripts/config.yaml <folder> -g wrong_graph_id
Where <folder> is the folder that contains all SQL files and the config.yaml file. The graph id wrong_graph_id will make generate.py crash with a KeyError, but that will be after executing reportupdater, and will prevent all generate.py reports to execute, if this makes sense for testing.