Slurm Raspberry Pi Cluster for Data Science

Be part of a team to create a slurm cluster on a Raspberry pi.

At http://piplanet.org we discuss setting up a Raspberry Pi cluster with a convenient burning approach of SD Cards. The Pis will receive preconfigured SD Cards that allow starting a cluster when the Pis are switched on. You will be able to communicate from the COmputer on which you burned the Pis and each of the PIs. You will also be able to communicate between the PIs.

Now that the basic cluster is set up, we like to install a distributed quieing system for the cluster based on Slurm. Slurm is a very popular farmework for HPC computers, but also allows scheduling of long running jobs for data science.

We like to implement a simple one line deployment comamndline tool supported by an API that deploys slurm on a list of hosts specified by the ip adress. This includes

  1. Create a proper method to deploy the frameworks on the cluster given the IP’s or the name of the machines.
  2. Provide a mechanism to verify if the deployment was successful
  3. Provide a set of unit tests using pytest to execute some basic use cases.
  4. Develop a python API that supports the deployment while exposing it through a command line
  5. Explore the development of a REST API that facilitates the deployment reusing the developed Python API.
  6. Develop a manual
  7. Develop a high-quality report with benchmarks.

Requirements

In order for you to participate in this project, you will need:

  1. You need to have financial resources to buy yourself the material for creating a PI cluster. You will need at least 3 Pis with 8GB memory. Each of them costs $75, but you also need a power supply costing $35, power cables, network cables, and at least one HDMI cable suitable to connect an HDMI monitor to a PI. More details about parts can be found at https://cloudmesh.github.io/pi/docs/hardware/parts/
  2. If you also have a Laptop or desktop to which you like to connect the PI, make sure it can run docker (Windows Home will not work). However, one of the Raspberry PIs will do.
  3. Significant Python knowledge
  4. Be highly motivated
  5. Be willing to have meetings on this project once or twice a week
  6. SHowcase significant progress over the lifetime of the project.
  7. be knowledgeable with GitHub (a repository will be provided to which Dr. von Laszewski will contribute)
  8. Conduct task management in GitHub (Gregor will explain)
  9. Be honest and not hiding problems or implementation bugs.
  10. You must be able to do a videoconference and be able to share your screen (I typically use google meet or zoom).

Options

We like you to explore the deployment both on RaspberryOS and on Ubuntu.

Please be aware that the goal is not to replicate tutorials from the Web that require input by hand or repeated installs on the various PIs. Instead, we like to have a single command such as

cms pi cluster deploy slurm --hosts "red,red[01-red02]"

where simple options such as the hostnames in the pi cluster are used.

Getting started

  1. In the first week, collect a list of all projects on the internet that do something similar
  2. Study the GitHub repositories from cloudmesh-pi-burn and cloudmesh-pi-cluster and cloudmesh-common intensely. Explore the old code and identify what is wrong with it (just by looking at it)
  3. Identify a base method. This can use existing DevOps approaches such as ansible, cheff, snapcraft (ubuntu), or cloudmesh parallel runtime methods. However, the use will be hidden through an API and command line tool. Evaluate the llnl simple deployment method which can be used. Also exploer if there is a snap available.
  4. From the command line, derive some API interface and use FastAPI to implement it.
  5. Identify a mechanism on how to deal with the security.
  6. Before you start implementing, describe and showcase a couple of commands on how to use it.
  7. Your deployment should target Ubuntu on the PI as well as RaspberryOS. If there is only time for one, pick one.
  8. Please be reminded that you also need to set up NFS on the nodes. There are many examples provided how to do that. You will be using the USB port to include a disk. It may require an external USB drive or a simple USB stick. Make sure they are USB 3 compatible. In case of questions, please contact Gregor at

laszewski@gmail.com

Gregor von Laszewski
Gregor von Laszewski
Research Professor

My research interests include distributed robotics, mobile computing and programmable matter.

Related