Accelerate and Autoscale Deep Learning Inference on GPUs with KFServing
CNCF [Cloud Native Computing Foundation] via YouTube
Learn Backend Development Part-Time, Online
Finance Certifications Goldman Sachs & Amazon Teams Trust
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore how to accelerate and autoscale deep learning inference on GPUs using KFServing in this 37-minute conference talk from KubeCon + CloudNativeCon Europe 2021. Learn about the challenges of implementing large-scale language models like BERT and GPT-2 for real-time applications, and discover how KFServing provides a simple model serving interface for common model servers. Gain insights into Bloomberg's use of KFServing for deploying BERT models trained on specialized financial news data, addressing scalability, latency, and throughput issues with Knative's Autoscaler and Activator. Delve into performance debugging tips and examine GPU benchmark results for TensorFlow and PyTorch BERT models deployed to KFServing. Understand how KFServing enables hardware acceleration and autoscaling for improved deep learning inference performance in production environments.
Syllabus
Accelerate and Autoscale Deep Learning Inference on GPUs with KFServing - Dan Sun
Taught by
CNCF [Cloud Native Computing Foundation]