Where does the schedule/pipeline manager run, and where are the individual task container executed?

The scheduler and pipeline management is done in the DataOps SaaS platform in the cloud. After building the complete pipeline graph, it maintains a queue of the pending jobs (or ready to run because all requirements and other dependencies have been met) for each DataOps runner.

Each DataOps runner, a long-running and stateless container, dials home regularly (typically every second) and asks if there are any pending jobs for it to execute. If the answer is yes, the DataOps runner runs another container of a specific type, passes in the relevant job run information, and monitors it for completion, streaming the logs back in real time.

Today our standard deployment model is that the long-running DataOps runner and the child container it spans are run on a Linux machine, typically EC2 in AWS. Therefore, resource allocation isn't very complex.

Be the first to reply!

Reply

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded