(not the guy from Mario Bros.)

(except Spotify did name it after Luigi guy they just don't use his image so they don't get sued by Nintento)

What is Luigi?

  • Luigi is a Python-based data pipeline tool
  • Built by Spotify engineers
  • System for managing a data workflow
  • Create steps as Python code for processing data
  • Create dependencies between steps
  • Run steps with CLI tool
  • Similar to Airflow by AirBnB
pip install luigi

Based on DAGs

Yeah, I use DAGs:
  • Directed
  • Acyclic
  • Graphs

Example: MARC

  • Downloading MARC
  • Splitting a MARC file up
  • Joining MARC files
  • Extract ISBNs from MARC

Example

Also: email notifications! retry failing stuff!

Later...

Sending emails and retries

Works witout Cenral Scheduler

Retries can be configured globally or per task

Thats it!

EM: jfournie@vcc.ca

TW: @robo_james

GH: @jamesrf