At Green Coding Berlin (GCB), one goal is to enable reproducible runs on our cluster. An important step towards accurate measurements was the creation of NOP Linux, our custom Linux distro that disables as many background processes as possible to avoid interruptions during measurements. Another crucial step was ensuring the reliable operation of the PowerSpy2, so we could measure the entire power consumption.
We wanted to create a cluster that allowed users to select the server on which they’d like to run the benchmark. Initially, we aimed for full automation and looked at the excellent tool from Canonical, MAAS. As we use Ubuntu as our reference system, this seemed to be the logical choice. Although the tool was impressive, it required a daemon running on the machine, which created multiple interruptions during our measurements. This led us to reevaluate our tooling, and we decided to try a simpler approach using PXE. While there is a great description , and the general flow worked very well, we invested a significant amount of time and effort in configuring the machines correctly. Getting the entire installation flow working with reboots, different configurations like PowerSpy, and the multitude of different servers we wanted to use presented a considerable overhead. Additionally, we have our machines distributed across various data centers, and we needed to set up a complex networking layer for the DHCP discovery to work. While this was a scalable solution, it required substantial overhead that had to be maintained. Moreover, our tool develops quite rapidly, so we would have to keep updating the installation process. As a small company, this was not feasible in our scenario. Consequently, we decided to sacrifice scalability in favor of simplicity. In the meantime, we had built a complex test setup with various servers and a complicated setup that we could now disassemble. The main lesson learned for the future is to start with the simplest solution that solves the problem and continually reevaluate your assumptions and needs.
We are aware that there are a multitude of configuration systems out there that don’t require a client running on the machine to be configured and that automate some of the tasks we will now do manually. But we decided to keep it very simple for now and not invest more time into another solution.
At Green Coding Berlin, we are committed to not only creating efficient and reproducible programming solutions but also sharing our findings and tools with the wider community. We firmly believe in the principles of open-source, the power of shared knowledge, and the benefits of collaborative development. Our aim is to create tools and systems that can be utilized by anyone, without the restrictions of proprietary licenses. We don’t just want to make our solutions better - we want to make programming better, for everyone. One of the exciting initiatives that align with our philosophy is the Blue Angel for Software. We support this cause and believe that our tools and systems should be made available for such uses. By making our developments publicly available, we hope to contribute to the broader objective of creating software that is efficient, effective, and transparent.
The system we are using now
As previously mentioned, the current system will not scale to accommodate thousands of machines, but it will suffice for a considerable amount of time in our situation.
The files shown in this article might already be outdated when you read it as we will not update the article! For a detailed discussion please check out our documentation under https://docs.green-coding.berlin/
We have now opted for quite a simple solution. You will need a server that exposes the database externally and all results will be written to this server. We then have a
client.py script that runs on every server that periodically queries the server for jobs and if so executes the measurement undisturbed. After a job is finished the client does some cleanup tasks and checks if there is an update for the GMT and also for the operating system. It then retries to get a job till there are no more jobs left on which the client sleeps for 5 minutes and retries. On every wake up we send a message to the server that the client is up and functional. So we can check server side that all clients are up and working.
To create your own GCB cluster, you can follow these steps:
These configurations are testes with Ubuntu 22.04 LTS, but newer versions should also work. Older versions are discouraged.
Use this cloud config file to install your client machine:
#cloud-config autoinstall: apt: disable_components:  geoip: true preserve_sources_list: false primary: - arches: - amd64 - i386 uri: http://de.archive.ubuntu.com/ubuntu - arches: - default uri: http://ports.ubuntu.com/ubuntu-ports drivers: install: false identity: hostname: gc password: $6$exDY1mhS4KUYCE/2$zmn9ToZwTKLhCw.b4/b.ZRTIZM30JZ4QrOQ2aOXJ8yk96xpcCof0kxKwuX1kqLG/ygbJ1f8wxED22bTL4F46P0 username: ubuntu EOF touch meta-data realname: gc username: gc kernel: package: linux-generic keyboard: layout: us toggle: null variant: '' locale: en_US.UTF-8 network: ethernets: enp1s0: dhcp4: true version: 2 source: id: ubuntu-server-minimal search_drivers: true ssh: allow-pw: true authorized-keys:  install-server: true storage: layout: name: direct match: ssd: yes updates: security version: 1 shutdown: poweroff
You can use the descriptions how to create a custom iso Using another volume to provide the autoinstall config . We then put the iso on a usb stick and boot the machine by hand. As we have physical access this is ok for now. In this example the password is ubuntu. Obviously change this in your case. Once the install has finished you can pull the usb stick and reboot.
You can now ssh into the machine and start configuring. This is mainly done by copy pasting scripts manually. As we don’t install machines that often this is totally ok for now. Please note that all these commands need to be run as root. So you ssh into the machine with the
gc user and then need to become root by using
#!/bin/bash set -euox pipefail # Remove all the packages we don't need apt purge -y --purge snapd cloud-guest-utils cloud-init apport apport-symptoms cryptsetup cryptsetup-bin cryptsetup-initramfs curl gdisk lxd-installer mdadm open-iscsi snapd squashfs-tools ssh-import-id wget xauth unattended-upgrades update-notifier-common python3-update-manager unattended-upgrades needrestart command-not-found cron lxd-agent-loader modemmanager motd-news-config pastebinit packagekit systemctl daemon-reload apt autoremove -y --purge # Get newest versions of everything apt update apt install psmisc -y # on some versions killall might be missing. Please insta killall unattended-upgrade-shutdown apt upgrade -y # These are packages that are installed through the update apt remove -y --purge networkd-dispatcher multipath-tools apt autoremove -y --purge # Disable services that might do things systemctl disable --now apt-daily-upgrade.timer systemctl disable --now apt-daily.timer systemctl disable --now dpkg-db-backup.timer systemctl disable --now e2scrub_all.timer systemctl disable --now fstrim.timer systemctl disable --now motd-news.timer # these following timers might be missing on newer ubuntus systemctl disable --now systemd-tmpfiles-clean.timer systemctl disable --now fwupd-refresh.timer systemctl disable --now logrotate.timer systemctl disable --now ua-timer.timer systemctl disable --now man-db.timer systemctl disable --now systemd-tmpfiles-clean.timer systemctl disable --now systemd-journal-flush.service systemctl disable --now systemd-timesyncd.service systemctl disable --now systemd-fsckd.socket systemctl disable --now systemd-initctl.socket systemctl disable --now cryptsetup.target # Packages to install for editing and later bluetooth. some of us prefer nano, some vim :) apt install -y vim nano bluez # Setup networking NET_NAME=$(networkctl list "en*" --no-legend | cut -f 4 -d " ") cat <<EOT > /etc/systemd/network/en.network [Match] Name=$NET_NAME [Network] DHCP=ipv4 EOT # Disable the kernel watchdogs echo 0 > /proc/sys/kernel/soft_watchdog echo 0 > /proc/sys/kernel/nmi_watchdog echo 0 > /proc/sys/kernel/watchdog echo 0 > /proc/sys/kernel/watchdog_thresh # Removes the large header when logging in rm /etc/update-motd.d/* # Remove all cron files. Cron shouldn't be running anyway but just to be safe rm /etc/cron.d/* rm /etc/cron.daily/* apt autoremove -y --purge
Now you should have a machine that only runs a minimal amount of services and hence should not create a significant amount of interrupts that disturb measurements. We can measure this by starting NOP Linux in an virtual machine and checking CPU statistics.
Now we need the tooling installed on the client to start the measurements.
#!/bin/bash set -euox pipefail apt update apt install -y make gcc python3 python3-pip libpq-dev uidmap git iproute2 apt remove -y docker docker-engine docker.io containerd runc apt install -y ca-certificates curl gnupg lsb-release su gc << 'EOF' git clone https://github.com/green-coding-berlin/green-metrics-tool ~/green-metrics-tool cd ~/green-metrics-tool git submodule update --init python3 -m pip install -r ~/green-metrics-tool/requirements.txt python3 -m pip install -r ~/green-metrics-tool/metric_providers/psu/energy/ac/xgboost/machine/model/requirements.txt EOF mkdir -p /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null apt update apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin systemctl disable --now docker.service docker.socket apt install -y docker-ce-rootless-extras dbus-user-session shutdown -r now # # You need to reboot here # systemctl disable --now docker.service docker.socket su - gc -c "dockerd-rootless-setuptool.sh install" cat <<EOT >> /home/gc/.bashrc docker context use rootless EOT su - gc -c 'systemctl --user enable docker; loginctl enable-linger $(whoami)' apt install -y lm-sensors libsensors-dev libglib2.0-0 libglib2.0-dev sensors-detect --auto # You only need these commands if you are planning to use the PowerSpy2. pip install pyserial==3.5 apt install -y bluez
This might also change, please refer to the GMT Documentation.
If you want to use the PowerSpy2 device please follow the installation under https://docs.green-coding.berlin/docs/measuring/metric-providers/psu-energy-ac-powerspy2/
Now that you installed the GMT you need to configure it to run in client mode. You can run the install script with the following parameters to give you the first version of the config file. You will need to change the api/ metrics endpoint to the url of your server.
It is important that you don’t run the GMT server or database on the same machine as you are doing your benchmarks on, as this will create additional load and falsify your measurements.
Now please also edit the following points in the
postgresqlsection so that the
hostpoints to your server. You will need to replace the
green-coding-postgres-containervalue. This should be the same url as you specified when running the install script. Check that the password is correct.
machine_idto the number you gave the client when adding it to the
machinestable on the server.
In this setup we only have one machine configured to send emails, the server. You can add email sending capabilities to any client if you want by adding the
smtp data in the configuration. Don’t forget to also set the admin values (
no_emails=False) at the end of the file. Also you will need to set up a cron job for this. Please see the GMT documentation for details.
chmod a+x /home/gc/green-metrics-tool/tools/cluster/cleanup.sh echo "ALL ALL=(ALL) NOPASSWD:/home/gc/green-metrics-tool/tools/cluster/cleanup.sh" | sudo tee /etc/sudoers.d/green_coding_cleanup
To make sure that the client is always running you can create a service that will start at boot and keep running.
Create a file under:
/etc/systemd/system/green-coding-client-service.service with following content
[Unit] Description=The Green Metrics Client Service After=network.target [Service] Type=simple User=gc Group=gc WorkingDirectory=/home/gc/green-metrics-tool/ ExecStart=/usr/bin/python3 /home/gc/green-metrics-tool/tools/client.py Restart=always RestartSec=30s [Install] WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl start green-coding-client-service
sudo systemctl enable green-coding-client-service
sudo systemctl status green-coding-client-service
You should now see the client reporting it’s status on the server.