Auto-scalable Microservices for Machine Learning - UnifyID Case Study

Explore how UnifyID scales their Machine Learning back-end to service over 1 million users in this 16-minute conference talk. Discover techniques for running containers on EC2 GPU instances and addressing common challenges in deploying Machine Learning clusters in production. Learn about horizontal scaling using GPU information from NVML, creating a uniform API for ML microservices across multiple frameworks, and running unreliable academic ML code reliably in production. Gain insights into the design of an auto-scalable ML back-end, the open-source uniform API for Machine Learning microservices available on DockerHub, and the limitations of GPU horizontal scaling in Kubernetes and Mesos. Examine UnifyID's in-house built auto-scaler that leverages GPU information from NVML to optimize performance and resource utilization.