For some projects, I use Netflix’s Metaflow to build machine learning pipelines. I generally enjoy using it because it allows me to run pipelines both locally and remotely on AWS Batch, depending on the resource requirements of a pipeline. However, what I struggled with was using PDB within pipeline steps. Since Metaflow starts multiple processes in the background, breakpoints cause pipelines to simply get stuck.
As a solution, you can use web-pdb, which works excellently with Metaflow:
pip install web-pdb
To use it in Metaflow steps:
from metaflow import FlowSpec, step
class Flow(FlowSpec):
@step
def start(self):
self.next(self.a)
@step
def a(self):
import web_pdb; web_pdb.set_trace() # set a breakpoint here
self.next(self.end)
@step
def end(self):
print('success')
if __name__ == '__main__':
Flow()
This creates a web interface at http://<your Python machine hostname or IP>:5555, which provides you with the standard pdb interface.