Open source code: https://github.com/chris-zen/wok
Source:https://bitbucket.org/bbglab/wok
Introduction
Wok is a workflow management system implemented in Python that makes very easy to structure the workflows, parallelize their execution and monitor its progress among other things. It is designed in a modular way allowing to adapt it to different infraestructures.
For the time being it is strongly focused on clusters implementing any DRMAA compatible resource manager (i.e. Oracle Grid Engine) which working nodes have a shared folder in common. Other, more flexible infrastructures (such as the Amazon EC2) are considered for future implementations.
Workflows in Wok are defined in an xml file with the .flow extension. This definition includes:
- the different modules (or pieces of processing)
- the interconnections between modules (i.e. the input of module B links with the output of module A)
- explicit dependencies (i.e. module A cannot be executed until module B has finished)
- descriptions that can be used to generate documentation automatically or to create web forms
Each module corresponds with a piece of software that has to be run in order to process some input and generate an output. For now, only Python scripts are allowed, but they can be used to execute software written in other languages.
Workflows in Wok can be treated as any software project and managed with version control system tools and the IDE of your choice.
Wok can be used as a terminal script or can be run in server mode.
The execution of a workflow in the terminal is done using the wok-run script which allows few options:
- An instance name (-n name), which allows to run the same workflow many times simultaneously independently
- Configuration files (-c file.conf), the configuration can be splitted in as much files as desired
- Configuration parameters (-D param=value), which overwrite any previous configuration in configuration files
The workflow definition file (i.e. myworkflow.flow) is passed as the first argument.
To monitor the execution of the workflow there are different resources available:
- The web server that allows to interact with the engine in a very straightforward way. Recommended!.
- The logs emited by the wok-run through the standard output,
- The intermediate files generated by Wok (i.e. the tasks output files)
It has been designed for workflow developers who feel more confortable programming than doing hundred of clicks and drag & drop's, and also for those who want infraestructure flexibility and full control and monitorization of the execution.
Authors
It is being developed by Christian Pérez-Llamas under the Biomedical Genomics Research Group.
ReplyDeleteherreture