Azkaban

A batch job scheduler can be seen as a combination of the cron and make Unix utilities combined with a friendly UI. Batch jobs need to be scheduled to run periodically. They also typically have intricate dependency chains—for example, dependencies on various data extraction processes or previous steps. Larger processes might have 50 or 60 steps, of which some might run in parallel and others must wait for the output of earlier steps. Combining all these processes into a single program allows you to control the dependency management, but can lead to sprawling monolithic programs that are difficult to test or maintain. Simply scheduling the individual pieces to run at different times avoids the monolithic problem, but introduces many timing assumptions that are inevitably broken. Azkaban is a workflow scheduler that allows the independent pieces to be declaratively assembled into a single workflow, and for that workflow to be scheduled to run periodically.

A good batch workflow system allows a program to be built out of small reusable pieces that need not know about one another. By declaring dependencies, you can control sequencing. Other functionality available from Azkaban can then be declaratively layered on top of the job without having to add any code. This includes things like email notifications of success or failure, resource locking, retry on failure, log collection, historical job run time information, and so on.

Why was it made?

Schedulers are readily available (both open source and commercial), but tend to be extremely unfriendly to work with—they are basically bad graphical user interfaces grafted onto 20-year old command-line clients. We wanted something that made it reasonably easy to visualize job hierarchies and run times without the pain. Previous experience made it clear that a good batch programming framework can make batch programming easy and successful in the same way that a web framework can aid web development beyond what you can do with an HTTP library and sockets.

State of the project

We have been using Azkaban internally at LinkedIn for since early 2009, and have several hundred jobs running in it, mostly Hadoop jobs or ETL of some type. Azkaban is quite new as an open source project though, and we are working now to generalize it enough to make it useful for everyone.

Any patches, bug reports, or feature ideas are quite welcome. We have created a mailing list for this purpose.

{{error}}