AI4EOSC models deployed on HPC clusters via interLink: Offloading Workloads from the Cloud to HPC supercomputers
In the quest for offering solutions along the Computing Continuum, AI4EOSC has recently collaborated with interTwin, a Horizon Europe-funded project, to enable the execution of AI models from the AI4EOSC Marketplace on High Performance Computing (HPC) clusters.
This collaboration has successfully demonstrated the feasibility of using OSCAR, our core inference platform within the AI4OS software stack, to run AI model inference on the VEGA supercomputer, hosted at IZUM in Maribor, Slovenia. A key enabler of this setup is a dedicated virtual node within the OSCAR Kubernetes-based cluster, which leverages interLink to establish connectivity with the remote HPC environment. Developed by INFN under the interTwin project, interLink provides a powerful abstraction layer that allows Kubernetes pods to be executed transparently on any remote container-compatible host.
In our experiment, we executed one of the models of the AI4EOSC Automated Thermography use case on the VEGA supercomputer. The deployment involved an OSCAR cluster running in the cloud on an elastic Kubernetes setup provided by UKRI, consisting of four worker nodes (each with 16 vCPUs and 61 GB RAM). The interLink Virtual Node was configured to offload computation tasks to VEGA’s HPC supercomputer. This node operates on Red Hat Enterprise Linux 8 and uses Slurm as its workload manager. The VEGA supercomputer is composed of a GPU partition with 60 nodes, each equipped with 4 Nvidia A100 GPUs, 2 AMD Rome 7H12 CPUs, and 512 GB of RAM. It also includes a CPU partition with 768 standard nodes (256 GB RAM each) and 192 large-memory nodes (1 TB RAM each), all featuring local 1.92 TB M.2 SSD storage.
This successful experiment validated the integration of OSCAR with interLink, showcasing the potential to transparently offload AI workloads to HPC clusters like VEGA, all without requiring user intervention.
Link to the video demo:
More news
AI4EOSC users can now deploy a pre-trained AI model from the AI4EOSC Marketplace in the EOSC EU Node EOSC EU…
We’re thrilled to announce that AI4EOSC now supports batch mode training, a new feature available in the AI4EOSC Dashboard. What…
Join Us on May 9 to Discover Real-World AI Solutions Across Agriculture, Urban Analysis, and Earth Observation Artificial Intelligence is…