Artifacts are a great way to share files and/or directories that are generated during job execution with other jobs in your pipeline. However, as the number of jobs in you pipeline grows and the usage of artifacts increases, you may see the time it takes for your jobs to complete go up significantly, and often unnecessarily.
A key points to consider is that by default each job will pull all artifacts from previous jobs. This can slow down your pipeline unnecessarily if you have jobs that don’t need all or any artifacts.
Jobs that don’t need any artifacts:
For any jobs that don’t use previous job artifacts, you can specify an empty array ] for the dependencies
config. This will mean the job doesn’t download any artifacts potentially speeding up your overall pipeline considerably. E.g.
My Build Job:
stage: build
script:
- /dataops
dependencies: ]
Warning! For the majority of jobs running in a DataOps.live pipeline you will need to specify the "Initialise Reference Project" job as a dependancy. The artifact it generates contains the before_script which is used to sets various dynamic variables, such as DATAOPS_DATABASE
and variables relating to branch/environment names, which are then available to the apps and scripts running in the job's main part.
Therefore usually the minimum config for the dependencies
config is:
dependencies:
- "Initialise Reference Project"
Jobs that require only a subset of the artifacts:
For jobs that only require a subset of the previous job artifacts, you can supply a list of which jobs to fetch artifacts from. This will mean the job only downloads artifacts it needs, again helping to minimise your overall pipeline execution time. E.g.
My Reporting Job:
stage: reporting
script:
- /dataops
dependencies:
- "Initialise Reference Project"
- "My Modelling Job"
See the DataOps.live documentation for for more info about Artifacts https://docs.dataops.live/docs/develop-development-principles/job-artifacts/