19 KiB
Graph Docker Container Memory Usage with Gnuplot
- Get the data from docker
- Read the stats in python
- Adding a loop
- Generate the gnuplot file
- Run and render
- Full script
Sometimes you need to know how much memory your docker containers are using. Wouldn't it be nice to be able to view their memory usage on a graph? Lets build that.
Get the data from docker
First, we're going to need a docker container to be running so we can read how much memory it is using. You can use any container you want, but personally I used:
docker run --rm -i -t alpine:3.20
Now in another terminal, we want to run the
docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
1165735d5e55 sharp_chaplygin 0.00% 984KiB / 90.17GiB 0.00% 4.63kB / 0B 1.01MB / 0B 1
That's great, but we want to read this in a script so we can graph the memory usage over time. Our current
docker stats
- We don't want it to continuously run. We want a single snapshot of the memory and then the program should exit.
- It would be better we didn't have to parse this bespoke format.
- Personally I'd rather the command didn't truncate the container ID. That is a convenience for human users, but our script can handle the full ID and we can choose to truncate it in our scripts later if we want.
We can solve that with a couple additional flags:
$ docker stats --no-stream --no-trunc --format json | jq
{
"BlockIO": "1.01MB / 0B",
"CPUPerc": "0.00%",
"Container": "1165735d5e55",
"ID": "1165735d5e558652cb70345367dca725e3c48e5a9b6e1aba772f4496274b1fea",
"MemPerc": "0.00%",
"MemUsage": "984KiB / 90.17GiB",
"Name": "sharp_chaplygin",
"NetIO": "4.84kB / 0B",
"PIDs": "1"
}
Read the stats in python
We are going to be generating a graph of this data using gnuplot. For the user's convenience, we are going to generate a single gnuplot file that contains the gnuplot commands and the data. In our gnuplot file, all the stats for one container will exist in the file, followed by a separataor, and then all the stats for the next docker container. This means, we are going to buffer all of the metrics in memory and write them out at the end rather than streaming them out to the file as we go. So we need to define a class that will record our samples:
#!/usr/bin/env python
from __future__ import annotations
from dataclasses import dataclass
from typing import NewType
from datetime import datetime
ContainerId = NewType("ContainerId", str)
@dataclass
class Sample:
instant: datetime
stats: dict[ContainerId, Stats]
@dataclass
class Stats:
memory_usage_bytes: int
Each time we run
Sample
import logging
from typing import Tuple
import subprocess
import json
ContainerName = NewType("ContainerName", str)
def main():
logging.basicConfig(level=logging.INFO)
samples: list[Sample] = []
labels: dict[ContainerId, ContainerName] = {}
print(take_sample())
def take_sample() -> Tuple[Sample, dict[ContainerId, ContainerName]]:
labels: dict[ContainerId, ContainerName] = {}
stats: dict[ContainerId, Stats] = {}
docker_inspect = subprocess.run(
["docker", "stats", "--no-stream", "--no-trunc", "--format", "json"],
stdout=subprocess.PIPE,
)
for container_stat in (
json.loads(l) for l in docker_inspect.stdout.decode("utf8").splitlines()
):
if not container_stat["ID"]:
# When containers are starting up, they sometimes have no ID and "--" as the name.
continue
labels[ContainerId(container_stat["ID"])] = ContainerName(
container_stat["Name"]
)
memory_usage = parse_mem_usage(container_stat["MemUsage"])
stats[ContainerId(container_stat["ID"])] = Stats(
memory_usage_bytes=memory_usage
)
return Sample(instant=datetime.now(), stats=stats), labels
def parse_mem_usage(mem_usage: str) -> int:
# TODO: We will implement this.
return 0
if __name__ == "__main__":
main()
Sample
Unfortunately, despite exporting in a machine-readable format,
docker stats
"984KiB / 90.17GiB"
. This means we still need to parse that data. We are going to use regular expressions to grab the constant and the unit, and then we are going to use the unit to convert to the number of bytes:
import re
def parse_mem_usage(mem_usage: str) -> int:
parsed_mem_usage = re.match(
r"(?P<number>[0-9]+\.?[0-9]*)(?P<unit>[^\s]+)", mem_usage
)
if parsed_mem_usage is None:
raise Exception(f"Invalid Mem Usage: {mem_usage}")
number = float(parsed_mem_usage.group("number"))
unit = parsed_mem_usage.group("unit")
for multiplier, identifier in enumerate(["B", "KiB", "MiB", "GiB", "TiB"]):
if unit == identifier:
return int(number * (1024**multiplier))
raise Exception(f"Unrecognized unit: {unit}")
And finally, we can run our script to get a single sample:
$ python graph_docker_memory.py
(Sample(instant=datetime.datetime(2024, 10, 22, 17, 31, 25, 529549), stats={'1165735d5e558652cb70345367dca725e3c48e5a9b6e1aba772f4496274b1fea': Stats(memory_usage_bytes=1007616)}), {'1165735d5e558652cb70345367dca725e3c48e5a9b6e1aba772f4496274b1fea': 'sharp_chaplygin'})
Adding a loop
Now that we can read docker memory usage in python, its time to add our business logic. The flow for our program will be:
- Wait for any docker containers to exist.
- Record memory every 2 seconds until no docker containers exist.
- Print out the gnuplot file to
stdout
.
We will use the empty response from our
take_sample()
from time import sleep
from typing import Final
SAMPLE_INTERVAL_SECONDS: Final[int] = 2
def main():
logging.basicConfig(level=logging.INFO)
samples: list[Sample] = []
labels: dict[ContainerId, ContainerName] = {}
first_pass = True
# First wait for any docker container to exist.
while True:
sample, labels_in_sample = take_sample()
if labels_in_sample:
break
if first_pass:
first_pass = False
logging.info("Waiting for a docker container to exist to start recording.")
sleep(1)
Then we want to take samples every 2 seconds, updating our map of docker IDs to docker names, and recording their memory usage:
# And then record memory until no containers exist.
while True:
sample, labels_in_sample = take_sample()
if not labels_in_sample:
break
samples.append(sample)
labels = {**labels, **labels_in_sample}
sleep(SAMPLE_INTERVAL_SECONDS)
Finally, when no containers exist anymore, we want to print out our gnuplot graph:
if labels:
# Draws a red horizontal line at 32 GiB since that is the memory limit for cloud run.
write_plot(
samples,
labels,
)
from typing import Collection
def write_plot(
samples: Collection[Sample],
labels: dict[ContainerId, ContainerName],
):
# TODO: We will implement this
return
Generate the gnuplot file
Finally, we need to write out our gnuplot file. First, we print out a header that tells gnuplot information about our graph:
def write_plot(
samples: Collection[Sample],
labels: dict[ContainerId, ContainerName],
):
print(
"""set terminal svg background '#FFFFFF'
set title 'Docker Memory Usage'
set xdata time
set timefmt '%s'
set format x '%tH:%tM:%tS'
# Please note this is in SI units (base 10), not IEC (base 2). So, for example, this would show a Gigabyte, not a Gibibyte.
set format y '%.0s%cB'
set datafile separator "|"
"""
)
This is telling gnuplot to:
- Output an SVG with a white background.
- Sets the title of the graph.
- Sets the x-axis to be time.
- Sets the input format for the x-axis data to unix timestamps (number of seconds since Jan 1st 1970 in UTC).
- Sets the display format for the x-axis to be Hours:Minutes:Seconds using relative time (data starts at "0").
- Sets the y-axis format to use SI-units (base 10, 1GB = 1,000,000,000 bytes) for bytes.
- Tells gnuplot that we will use the pipe character to separate values in our data.
Since we're using relative time, we're going to need to know when each container started so we can normalize the timestamps to starting at 0:
starting_time_per_container = {
container_id: min(
(sample.instant for sample in samples if container_id in sample.stats)
)
for container_id in labels.keys()
}
Now, we need to tell gnuplot about each line (each docker container). The output will look like:
"-" using 1:2 title 'my-first-container' with lines, \
"-" using 1:2 title 'my-second-container' with lines
Which is telling gnuplot to read the data from stdin
(which is the same gnuplot file), with the axes x, y defined as the first and second value in the file (which will be separated by a pipe character), and with the specified titles.
To accomplish this in the code, we'll add:
line_definitions = ", ".join(
[
f""""-" using 1:2 title '{name}' with lines"""
for container_id, name in sorted(labels.items())
]
)
print("plot", line_definitions)
And then finally we need to output the data for each container, separating each container with an "e". For example, the data for two containers, one that goes from 100 bytes to 120 bytes of memory over the span of 8 seconds, and another that goes from 300 bytes to 900 bytes of memory over the span of 6 seconds, would look like:
0|100
2|140
4|290
6|118
8|120
e
0|300
2|450
4|650
6|900
To accomplish this in code, we loop over the containers, and over the samples for each container, then we subtract the time the container started from the timestamp (which aligns all containers to starting at 0 on the left in the graph, regardless of when they started up) and print out the data:
for container_id in sorted(labels.keys()):
start_time = int(starting_time_per_container[container_id].timestamp())
for sample in sorted(samples, key=lambda x: x.instant):
if container_id in sample.stats:
print(
"|".join(
[
str(int((sample.instant).timestamp()) - start_time),
str(sample.stats[container_id].memory_usage_bytes),
]
)
)
print("e")
Run and render
That is everything we need. The full script is below, but first to test the script:
-
Run
python graph_docker_memory.py | tee memory.gnuplot
- Then launch a docker container or two, and perform some actions like install a package. This gives us changing data to create a more interesting line.
- Then close all dokcer containers you have open.
Now memory.gnuplot
should contain a full gnuplot definition to graph the memory usage of your containers. We can generate an SVG with:
gnuplot memory.gnuplot > memory.svg
And then open memory.svg in your preferred image viewer and/or web browser to see your graph. It should look like:
You may notice that the underscore in the container names renders oddly. We just need to escape the underscore when writing the container names. This has been added to the full script below. Also the full script adds support for defining horizontal lines (for example, if you have a memory limit of 1GiB, you might want to put a red horizontal line at 1GiB).
Full script
Below is the full version of the script in one piece:
#!/usr/bin/env python
from __future__ import annotations
import json
import logging
import re
import subprocess
from dataclasses import dataclass
from datetime import datetime, timedelta
from time import sleep
from typing import Collection, Final, NewType, Tuple
ContainerId = NewType("ContainerId", str)
ContainerName = NewType("ContainerName", str)
SAMPLE_INTERVAL_SECONDS: Final[int] = 2
@dataclass
class Sample:
instant: datetime
stats: dict[ContainerId, Stats]
@dataclass
class Stats:
memory_usage_bytes: int
def main():
logging.basicConfig(level=logging.INFO)
samples: list[Sample] = []
labels: dict[ContainerId, ContainerName] = {}
first_pass = True
# First wait for any docker container to exist.
while True:
sample, labels_in_sample = take_sample()
if labels_in_sample:
break
if first_pass:
first_pass = False
logging.info("Waiting for a docker container to exist to start recording.")
sleep(1)
# And then record memory until no containers exist.
while True:
sample, labels_in_sample = take_sample()
if not labels_in_sample:
break
samples.append(sample)
labels = {**labels, **labels_in_sample}
sleep(SAMPLE_INTERVAL_SECONDS)
if labels:
# Draws a red horizontal line at 32 GiB since that is the memory limit for cloud run.
write_plot(
samples,
labels,
horizontal_lines=[(32 * 1024**3, "red", "Cloud Run Max Memory")],
)
def write_plot(
samples: Collection[Sample],
labels: dict[ContainerId, ContainerName],
,*,
horizontal_lines: Collection[Tuple[int, str, str | None]] = [],
):
starting_time_per_container = {
container_id: min(
(sample.instant for sample in samples if container_id in sample.stats)
)
for container_id in labels.keys()
}
print(
"""set terminal svg background '#FFFFFF'
set title 'Docker Memory Usage'
set xdata time
set timefmt '%s'
set format x '%tH:%tM:%tS'
# Please note this is in SI units (base 10), not IEC (base 2). So, for example, this would show a Gigabyte, not a Gibibyte.
set format y '%.0s%cB'
set datafile separator "|"
"""
)
for y_value, color, label in horizontal_lines:
print(
f'''set arrow from graph 0, first {y_value} to graph 1, first {y_value} nohead linewidth 2 linecolor rgb "{color}"'''
)
if label is not None:
print(f"""set label "{label}" at graph 0, first {y_value} offset 1,-0.5""")
# Include the horizontal lines in the range
if len(horizontal_lines) > 0:
print(f"""set yrange [*:{max(x[0] for x in horizontal_lines)}<*]""")
line_definitions = ", ".join(
[
f""""-" using 1:2 title '{gnuplot_escape(name)}' with lines"""
for container_id, name in sorted(labels.items())
]
)
print("plot", line_definitions)
for container_id in sorted(labels.keys()):
start_time = int(starting_time_per_container[container_id].timestamp())
for sample in sorted(samples, key=lambda x: x.instant):
if container_id in sample.stats:
print(
"|".join(
[
str(int((sample.instant).timestamp()) - start_time),
str(sample.stats[container_id].memory_usage_bytes),
]
)
)
print("e")
def gnuplot_escape(inp: str) -> str:
out = ""
for c in inp:
if c == "_":
out += "\\"
out += c
return out
def take_sample() -> Tuple[Sample, dict[ContainerId, ContainerName]]:
labels: dict[ContainerId, ContainerName] = {}
stats: dict[ContainerId, Stats] = {}
docker_inspect = subprocess.run(
["docker", "stats", "--no-stream", "--no-trunc", "--format", "json"],
stdout=subprocess.PIPE,
)
for container_stat in (
json.loads(l) for l in docker_inspect.stdout.decode("utf8").splitlines()
):
if not container_stat["ID"]:
# When containers are starting up, they sometimes have no ID and "--" as the name.
continue
labels[ContainerId(container_stat["ID"])] = ContainerName(
container_stat["Name"]
)
memory_usage = parse_mem_usage(container_stat["MemUsage"])
stats[ContainerId(container_stat["ID"])] = Stats(
memory_usage_bytes=memory_usage
)
for container_id, container_stat in stats.items():
logging.info(
f"Recorded stat {labels[container_id]}: {container_stat.memory_usage_bytes} bytes"
)
return Sample(instant=datetime.now(), stats=stats), labels
def parse_mem_usage(mem_usage: str) -> int:
parsed_mem_usage = re.match(
r"(?P<number>[0-9]+\.?[0-9]*)(?P<unit>[^\s]+)", mem_usage
)
if parsed_mem_usage is None:
raise Exception(f"Invalid Mem Usage: {mem_usage}")
number = float(parsed_mem_usage.group("number"))
unit = parsed_mem_usage.group("unit")
for multiplier, identifier in enumerate(["B", "KiB", "MiB", "GiB", "TiB"]):
if unit == identifier:
return int(number * (1024**multiplier))
raise Exception(f"Unrecognized unit: {unit}")
if __name__ == "__main__":
main()