At times, errors impacting the health of the DataOps Runner may arise. One such error is described below, and we've compiled a set of steps to help mitigate this issue:
ERROR: Preparation failed: adding cache volume: set volume permissions: running permission container "..." for volume "...": starting permission container: Error response from daemon: failed to create endpoint ... on network bridge: failed to add the host (...) <=> sandbox (...) pair interfaces: cannot allocate memory (linux_set.go:105:0s)
Receiving the mentioned error signals an issue with the runner or the runner host. In such scenarios, it is crucial to take specific steps to identify the root cause and implement precautionary measures to prevent the recurrence of the error.
- First and foremost, it must be confirmed that the latest version of the
dataops-runner
is utilised. Refer to our documentation here to ensure that your runner version aligns with the currentdataopslive/dataops-runner:latest
. Execute a `docker pull dataopslive/dataops-runner:latest
` and verify that the image you are using matches the pulled one. If there is a disparity, it is strongly recommended to update the runner before proceeding with any of the steps outlined below. - Check System Resources on the runner host and ensure that your host has enough available memory. You can check the memory usage with commands like
free -m
,docker stats
(docker stats --no-stream
),docker top container_name
andtop(htop)
. If your system is running low on memory, you may need to free up some resources or consider upgrading your hardware. - check the memory limits for your containers. You can increase the memory limit for Docker by modifying the Docker daemon configuration.
- Verify your Docker network configuration. Ensure that the Docker daemon is running and that there are no issues with the bridge network. You can check Docker network information using the
docker network ls
anddocker network inspect
commands. - Check for Disk Space as insufficient disk space can also cause issues. Verify that you have enough free disk space on the host where DataOps Runner is installed and for the container itself.
- Review the maximum user processes and open files limits on your system. These can sometimes impact the ability to create new containers. To check the maximum user processes and open files limits on your system, you can use the ulimit command and view kernel parameters:
ulimit -u # Check current maximum user processes limit
ulimit -n # Check current maximum open files limit
ps aux | wc -l #Compare the output with the maximum user processes limit to see if you are close to or have reached the limit.
lsof | wc -l # Compare the output with the maximum open files limit.
- Create the following (or similar)
CRON
job:
0 3 * * * docker system prune -f && docker volume prune -f
0 6 * * 0 reboot
The first cron job essentially runs a cleanup routine for Docker at 3 AM every day, removing unnecessary resources and volumes. The second expression schedules a system reboot every Sunday at 6:00 AM and is optional.
In conclusion, addressing the 'Cannot Allocate Memory
' error in DataOps Runner involves a systematic approach, ensuring the runner's compatibility with the latest version, validating system resources, inspecting Docker configurations, and monitoring kernel limits. Implementing preventive measures, such as periodic cleanup routines and system reboot schedules, can also mitigate potential recurrence of similar issues. By following these steps and maintaining a proactive stance toward system maintenance, users can effectively troubleshoot this error and sustain a more stable DataOps Runner environment.