Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Resilient On-Premises AI Workloads on Kubernetes with Hyperconverged Infrastructure

Platform Engineering via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to deploy resilient AI workloads on Kubernetes using hyperconverged infrastructure in this 14-minute conference talk. Discover how to build fault-tolerant systems by integrating compute, storage, and networking into unified platforms that eliminate single points of failure. Explore the deployment of OpenShift clusters on hyperconverged infrastructure (HCI) to ensure high availability and operational efficiency for complex AI workloads. Master the design principles for creating robust systems with multiple servers and networks, while understanding how Software-defined Storage (SDS) provides scalability, resilience, and seamless data access. Examine critical business continuity strategies including backup policies, disaster recovery plans, and DR protections to minimize downtime and safeguard against data loss. Compare the performance and reliability trade-offs between bare metal and virtual machine deployments for AI workloads. Gain practical insights into streamlining day-two operations through automated monitoring, alerting tools, firmware upgrades, auto-scaling, and proactive issue resolution techniques that enhance overall system reliability and performance.

Syllabus

Resilient on-premises AI workloads on Kubernetes with hyperconverged infrastructure

Taught by

Platform Engineering

Reviews

Start your review of Resilient On-Premises AI Workloads on Kubernetes with Hyperconverged Infrastructure

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.