Managing Pipeline Ownership in DataOps: A Script-Based Approach

  • 5 March 2024
  • 0 replies
  • 20 views

Userlevel 4
Badge

In the fast-paced world of DataOps, ensuring seamless operation of pipelines is crucial for maintaining data integrity and workflow efficiency. However, managing ownership of pipelines, especially when users leave the organisation, can pose challenges. In this article, we'll explore a practical use-case and showcase a script-based approach to address this issue effectively.

The Challenge: Handling User Departures

Imagine a scenario where several users have left your organisation, but their accounts are still associated with active pipelines in your DataOps environment. The question arises: if these users are blocked or deactivated, what impact will it have on the pipelines they own? Furthermore, how can you efficiently identify and manage these pipelines to prevent disruptions in your data workflows?

Assessing the Situation:

To address this concern, we need to understand the implications of blocking a user in the context of pipeline ownership. Upon blocking a user, several questions arise:

  • Will blocking a user lead to the failure of unfinished jobs within pipelines?
  • What happens to pipelines owned by blocked users? Do they become inactive?
  • Can other users claim ownership of schedules previously owned by blocked users?

Testing and Observations:

To answer these questions, we conducted thorough testing within the platform. Here are our observations:

  • Blocking a user indeed fails any jobs within pipelines that are currently not finished by that user. This highlights the importance of ensuring job completion before user deactivation to prevent disruptions in pipeline execution.

  • Blocking a user renders pipelines inactive. Reactivating these pipelines involves a four-step process:

    This process ensures that pipelines continue their operation smoothly under new ownership.

    • Taking ownership
    • Editing the pipeline
    • Activating the pipeline
    • Saving the changes
  • Upon blocking a user, other users can claim ownership of schedules previously owned by the blocked user. This facilitates seamless transfer of ownership and ensures uninterrupted management of pipeline schedules.

The Solution: A Script for Pipeline Ownership Management

To tackle this challenge, we've developed a Python script that leverages the DataOps API to identify pipelines owned by users marked for deactivation.

import json
import requests

user_ids = {
9999: "*@****.****.com",
1111: "*.****.com"
}

headers = {
"PRIVATE-TOKEN": "YOUR-TOKEN-HERE",
"accept": "application/json",
}

# loop through all subgroups of Customer:
subgroups_raw = requests.get(
"https://app.dataops.live/api/v4/groups/YOUR_GROUP_ID/subgroups", headers=headers
)
subgroups_raw = json.loads(subgroups_raw.text)

# get only subgroup ids:
subgroup_ids = []
for group in subgroups_raw:
subgroup_ids.append(group.get("id"))

# get project ids"
project_ids = []
for subgroup_id in subgroup_ids:
subgroup_projects_raw = requests.get(
f"https://app.dataops.live/api/v4/groups/{subgroup_id}/projects",
headers=headers,
)
subgroup_projects_raw = json.loads(subgroup_projects_raw.text)
for project in subgroup_projects_raw:
project_ids.append(project.get("id"))

# get pipeline schedules:
pipeline_schedules_owned_by_blocked_users = []
for project_id in project_ids:
pipeline_schedules_raw = requests.get(
f"https://app.dataops.live/api/v4/projects/{project_id}/pipeline_schedules",
headers=headers,
)
pipeline_schedules_raw_json = json.loads(pipeline_schedules_raw.text)
if len(pipeline_schedules_raw_json) >= 1:
for pipeline_schedule in pipeline_schedules_raw_json:
owner_id = pipeline_schedule.get("owner").get("id")
if owner_id in user_ids.keys():
pipeline_schedules_owned_by_blocked_users.append(pipeline_schedule)

print(pipeline_schedules_owned_by_blocked_users)

Let's break down the script's functionality:

  • The script begins by authenticating with the DataOps API using a private token. It then retrieves a list of subgroup and project IDs within the DataOps environment.

  • Next, the script iterates through each project to fetch pipeline schedules. For each schedule, it checks if the owner's ID matches any of the user IDs marked for deactivation. If a match is found, the schedule is added to a list of pipelines owned by blocked users.

  • Finally, the script prints out the list of pipeline schedules owned by blocked users, enabling administrators to review and take appropriate action, such as reassigning ownership or deactivating the pipelines.

Understanding the Script in Action

In our example scenario, we received a query about the impact of blocking users on pipeline ownership. Our response included insights gained from testing and a script-based approach to identify pipelines owned by departing users. Managing pipeline ownership is a critical aspect of DataOps governance, particularly when users leave the organisation. By leveraging automation and script-based solutions, administrators can streamline the process of identifying and managing pipelines owned by departing users, ensuring minimal disruption to data workflows.

Final Thoughts

In the dynamic landscape of DataOps, efficient pipeline management is essential for maintaining data integrity and operational efficiency. The script showcased in this article offers a practical solution for addressing the challenge of managing pipeline ownership, empowering organisations to navigate user departures with confidence and agility. We demonstrated how a Python script can automate the identification of pipelines owned by disabled users, empowering teams to take proactive measures in managing access control and operational continuity. This approach not only mitigates risks associated with user transitions but also enhances the overall efficiency of DataOps processes.


0 replies

Be the first to reply!

Reply