Sokovan - Container Orchestrator for Accelerated AI/ML Workloads and Massive Scale GPU Computing
OpenInfra Foundation via YouTube
Earn a Michigan Engineering AI Certificate — Stay Ahead of the AI Revolution
Google, IBM & Microsoft Certificates — All in One Plan
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn about a powerful Python-based container orchestrator in this 28-minute conference talk presented by Jeongkyu Shin and Joongi Kim. Discover how to efficiently manage resource-intensive batch workloads in containerized environments through acceleration-aware, multi-tenant scheduling capabilities. Explore the dual-layer scheduling system, featuring a cluster-level scheduler for customizable job placement strategies and workload control, alongside a node-level scheduler that optimizes container performance through automatic hardware accelerator mapping. Gain insights into how this solution outperforms traditional tools like Slurm for AI workloads, and understand its successful implementation across various industries for GPU-intensive tasks including AI training and services. Master the integration of multiple hardware acceleration technologies that help container-based MLOps platforms maximize the potential of cutting-edge hardware.
Syllabus
Sokovan Container Orchestrator for Accelerated AI:ML Workloads and Massive scale GPU Computing
Taught by
OpenInfra Foundation