Skip to main content

In today's fast-paced data engineering environment, efficient project management is crucial for ensuring smooth workflows and timely delivery. DataOps offers powerful features for version control, issue tracking, continuous integration, and more. However, managing multiple projects and ensuring consistent settings across them can be a tedious task, especially in large organisations or complex environments.

In this article, we'll explore a real-world use case in the domain of DataOps (Data Operations), where automation plays a vital role in streamlining project management tasks. We'll focus on a scenario where a team needs to apply default settings and protect branches across multiple projects within specific DataOps subgroups.

Understanding the Use Case

Imagine you're part of a DataOps team responsible for managing numerous projects related to different domains within your organisation. To ensure consistency and security, you need to:

  1. Set a default branch (dev) for all projects.
  2. Protect certain branches (dev, test, master) with specific access controls.
  3. Automate these tasks across projects within designated DataOps subgroups.

To accomplish this, you'll leverage Python scripting along with the DataOps API to automate the configuration process.

Solution Overview

The solution involves two Python scripts:

  • Script 1: Setting Default Branch

import gitlab
import click

# import logging
from gitlab.exceptions import GitlabCreateError
from typing import List
from time import sleep

from get_project_ids import get_project_ids
from get_subgroup_ids import get_subgroup_ids


# logging.getLogger().setLevel(logging.INFO)
@click.command()
@click.option(
"--dataops-url",
"-u",
"dataops_url",
required=True,
help="i.e 'https://app.dataops.live' ",
)
@click.option(
"--private-token", "-t", required=True, help="your private access token 'xxxxxx'"
)
@click.option(
"--top-level-group-id",
"-id",
required=True,
help="your top_level_group_id",
type=int,
)
@click.option(
"--set-default-branch-to",
"-b",
"set_default_branch_to",
required=True,
help="the default branch you wish to set the project to",
)
@click.option(
"--include-subgroups",
"-i",
"include_subgroups",
required=False,
default=False,
type=bool,
help="If you wish to recursively check subgroup's subgroups and grab projects in it",
)
@click.option(
"--name_of_master_branch",
"-m",
required=False,
default=None,
help="The name of the master branch. If omitted - we will use defualt project branch.",
)
@click.option(
"--patterns",
"-p",
required=False,
default=None,
multiple=True,
help="Any pattern to look for the names of any subgroups inside the top level group id.",
)
@click.option(
"--create-missing-branch",
"-cmb",
"create_missing_branch",
required=False,
default=False,
type=bool,
help="If set to True, for each missing branch user will create and commit them, otherwise will just skip and log output of missing branches.",
)
def set_default_branch(
dataops_url: str,
private_token: str,
top_level_group_id: int,
set_default_branch_to: str,
include_subgroups: bool = False,
create_missing_branch: bool = False,
name_of_master_branch: str = None,
patterns: Listsstr] = None,
) -> None:
"""
This functions goes into a top-level-group id and for each project in given subgroup ids (with or without pattern), sets default branch.

params: dataops_url: i.e 'https://app.dataops.live' or 'https://app.qa.dataops.live'
params: private_token: your private access token 'xxxxxx'
params: top_level_group_id: your top_level_group_id
params: set_default_branch_to: the default branch you wish to set the project to
params: include_subgroups: If you wish to recursively check subgroup's subgroups and grab projects in it
params: name_of_master_branch: Optional, if you wish to provide what the ref branch should be, otherwise we use what used to be the default branch
params: patterns: Optional, if you have subgroup name that you wish to edit projects in it.
"""

gl = gitlab.Gitlab(url=dataops_url, private_token=private_token)
# get subgroup ids that correspond to provided logic
if include_subgroups:
subgroup_ids = get_subgroup_ids(
dataops_url, private_token, top_level_group_id, patterns=patterns
)
else:
subgroup_ids = =top_level_group_id]
project_ids = =]
if not subgroup_ids:
click.echo("No subgroup found!Check subgroup id and pattern(s)!Exiting!")
return
for subgroup_id in subgroup_ids:
subgroup_project_ids = get_project_ids(
dataops_url, private_token, subgroup_id, include_subgroups=include_subgroups
)
project_ids.extend(subgroup_project_ids)
# in each project - check if branches exist
for project_id in project_ids:
# grabbing a project instance
project = gl.projects.get(project_id)
# click.echo(
# f"""We are checking if changes need to happen for:
# project_path_with_namespace: {project.path_with_namespace}
# project id: {project_id}, """
# )
branches_list_objescts = project.branches.list(all=True)
branches_list = =branch.name for branch in branches_list_objescts]
if name_of_master_branch:
project_default_branch = name_of_master_branch
else:
project_default_branch = project.default_branch
if set_default_branch_to not in branches_list and create_missing_branch == True:
try:
branch = project.branches.create(
{"branch": set_default_branch_to, "ref": project_default_branch}
)
click.echo(
f"""Branch - {branch.name} for
project namespace: {project.path_with_namespace}
project id:{project_id}
created with commit {branch.commit}"""
)
except GitlabCreateError as e:
click.echo(f"Error {e.response_code} with {e.error_message}.", err=True)
except Exception as e:
click.echo(e, err=True)
elif (
set_default_branch_to not in branches_list
and create_missing_branch == False
):
click.echo(
f"""Branch - {set_default_branch_to} for
project namespace: {project.path_with_namespace}
project id:{project_id}
Skipping creation because create_missing_branch={create_missing_branch}
"""
)
continue
if project_default_branch == set_default_branch_to:
click.echo(
f"""For
project: {project.path_with_namespace}
project id: {project_id},
branch {set_default_branch_to} is already set as default!
"""
)
else:
project.default_branch = set_default_branch_to
project.save()
click.echo(
f"""We have successfully set for
project: {project.path_with_namespace}
project id: {project_id},
branch {set_default_branch_to} as default!
"""
)
sleep(0.2)
click.echo("Finished!")


if __name__ == "__main__":
set_default_branch()

 

  • This script sets the default branch for all projects within specified DataOps subgroups.
  • It checks if the desired default branch exists. If not, it creates it from the existing default branch.
  • The script provides options to recursively check subgroups and create missing branches if required.
  1. Script 2: Protecting Branches

import gitlab

# import logging
import click
from gitlab.exceptions import GitlabCreateError
from typing import List, Dict
from time import sleep

from get_project_ids import get_project_ids
from get_subgroup_ids import get_subgroup_ids


# logging.getLogger().setLevel(click.echo)
@click.command()
@click.option(
"--dataops-url",
"-u",
"dataops_url",
required=True,
help="i.e 'https://app.dataops.live' ",
)
@click.option(
"--private-token", "-t", required=True, help="your private access token 'xxxxxx'"
)
@click.option(
"--branch-names",
"-b",
"branch_names",
required=True,
multiple=True,
help="""The protected branch(es) you wish to set the project to,
accepts multiple values by doing
as an example -b 'dev' -b 'qa' -b 'test' etc.""",
)
@click.option(
"--access-level",
"-a",
"access_level",
required=True,
type=(str, int),
multiple=True,
help="""dictionary with access passed, you should follow the following
-a merge_access_level -a 40 -a push_access_level -a 40 -a allow_force_push 0
this transcribes to:
access_level={
"merge_access_level": 40,
"push_access_level": 40,
"allow_force_push": False
}""",
)
@click.option(
"--top-level-group-id",
"-id",
"top_level_group_id",
required=True,
help="your top_level_group_id",
type=int,
)
@click.option(
"--patterns",
"-p",
required=False,
default=None,
multiple=True,
help="Any pattern to look for the names of any subgroups inside the top level group id.",
)
@click.option(
"--include_subgroups",
"-i",
required=False,
default=False,
help="If you wish to recursively check subgroup's subgroups and grab projects in it",
)
@click.option(
"--name_of_master_branch",
"-m",
required=False,
default=None,
help="the default branch you wish to ref from if branches need to be created before being protected.",
)
@click.option(
"--create-missing-branch",
"-cmb",
"create_missing_branch",
required=False,
default=False,
type=bool,
help="If set to True, for each missing branch user will create and commit them, otherwise will just skip and log output of missing branches.",
)
def set_protected_branch(
dataops_url: str,
private_token: str,
branch_names: Lististr],
access_level: Dict,
top_level_group_id: int,
include_subgroups=False,
create_missing_branch: bool = False,
patterns: Lististr] = None,
name_of_master_branch=None,
) -> None:
"""
This functions goes into a group id and for each project in given subgroup ids, sets protected branches.
params: dataops_url: i.e 'https://app.dataops.live' or 'https://app.qa.dataops.live'
params: private_token: your private access token 'xxxxxx'
params: top_level_group_id: your top_level_group_id
params: branch_names: list of branches to become protected
params: include_subgroups: if you wish recursively to grab projects in top-level-group
params: access_level: dictionary with access passed
params: create_missing_branch If set to True, for each missing branch user will create and
commit them, otherwise will just skip and log output of missing branches.
"""
# - protected branches = dev, test, master, allowed to merge = Maintainers, allowed to push = Maintainers, allowed to force push = no
########

gl = gitlab.Gitlab(url=dataops_url, private_token=private_token)
# get subgroup ids that correspond to provided logic
if include_subgroups:
subgroup_ids = get_subgroup_ids(
dataops_url, private_token, top_level_group_id, patterns=patterns
)
else:
subgroup_ids = top_level_group_id]
if not subgroup_ids:
click.echo("No subgroup found!Check subgroup id and pattern(s)!Exiting!")
return
project_ids = ]
for subgroup_id in subgroup_ids:
subgroup_project_ids = get_project_ids(
dataops_url, private_token, subgroup_id, include_subgroups=include_subgroups
)
project_ids.extend(subgroup_project_ids)
# print(project_ids)
# in each project - check if branches exist
access_level_dict = dict(access_level)
# print(access_level_dict)
for project_id in project_ids:
# grabbing a project instance
project = gl.projects.get(project_id)
# click.echo(
# f"""We are checking if changes need to happen for:
# project_path_with_namespace: {project.path_with_namespace}
# project id: {project_id}, """
# )
if name_of_master_branch:
project_default_branch = name_of_master_branch
else:
project_default_branch = project.default_branch
branches_list_objescts = project.branches.list(lazy=True, all=True)
branches_list = branch.name for branch in branches_list_objescts]
# if any of the branches we wish to protect do not exist - grab them and create them
missing_branches =
branch for branch in branch_names if branch not in branches_list
]
if missing_branches and create_missing_branch == True:
click.echo(
f"""For project namespace: {project.path_with_namespace}
project id:{project_id}
branches {missing_branches} are missing.
"""
)
for missing_branch in missing_branches:
try:
branch = project.branches.create(
{"branch": missing_branch, "ref": project_default_branch}
)
click.echo(
f"Branch - {branch.name} for project id:{project_id} created with commit {branch.commit}"
)
except GitlabCreateError as e:
click.echo(
f"Error {e.response_code} with {e.error_message}.", err=True
)
except Exception as e:
click.echo(f"Exception {e}", err=True)
elif missing_branches and create_missing_branch == False:
click.echo(
f"""For
project namespace: {project.path_with_namespace}
project id:{project_id}
has missing branches {missing_branches} - skipping creation for them."""
)

protected_branches_list_objects = project.protectedbranches.list(all=True)
# branches_list_objescts = project.branches.list(lazy=True, all=True)
# branches_list = branch.name for branch in branches_list_objescts]
project_protected_branches_list =
protected_branch.name
for protected_branch in protected_branches_list_objects
]
branches_to_protect =
branch for branch in branch_names if branch in branches_list
]

if branches_to_protect:
# click.echo(
# f"Branches that are not protected are {branches_to_protect} for project id:{project_id}"
# )
for branch in branches_to_protect:
if branch not in project_protected_branches_list:
p_branch = project.protectedbranches.create(
{"name": branch, **access_level_dict}
)
click.echo(
f"Branch {p_branch.name} protected in project {project_id}"
)
else:
project.protectedbranches.delete(branch)
p_branch = project.protectedbranches.create(
{"name": branch, **access_level_dict}
)
click.echo(
f"""Protected branch '{p_branch.name}' access level
in project namespace: '{project.path_with_namespace}'
with project id: '{project_id}'
has been updated accordingly {project_id}
"""
)
sleep(0.2)


if __name__ == "__main__":
set_protected_branch()

This script protects specified branches with defined access controls for projects within designated subgroups.

  • It verifies the existence of branches and creates missing ones if instructed.
  • Access controls such as merge, push, and force-push permissions are configurable.

Implementation Details

The implementation utilises the DataOps API via the python-gitlab library, which provides convenient Python bindings for interacting with DataOps resources. Key functionalities include:

  • Authentication using private access tokens.
  • Retrieving project and subgroup IDs based on specified criteria.
  • Checking existing branches and protected branches within projects.
  • Creating missing branches and setting up branch protection.

The scripts are designed to be versatile, allowing customisation of parameters such as subgroup patterns, default branch names, and access control levels.

Conclusion

In the realm of DataOps, where collaboration and version control are paramount, automating DataOps project management tasks can greatly enhance productivity and consistency. By leveraging Python scripting and the DataOps API, teams can effortlessly enforce best practices, ensure branch security, and streamline administrative workflows across diverse projects and domains.

As organisations continue to embrace automation and DevOps practices, solutions like these empower teams to focus more on innovation and less on repetitive manual tasks, ultimately driving efficiency and agility in software development processes.

Be the first to reply!

Reply