Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Coursera

AI Orchestration: From local models to cloud

Pragmatic AI Labs via Coursera

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn to orchestrate AI systems across local and cloud environments through hands-on infrastructure setup, model deployment, and workflow integration. You will build a prompt engineering pyramid from basic prompts to chain-of-thought reasoning implemented in Rust, then evaluate six decision factors for choosing between local and cloud models including latency, throughput, cost, and privacy. The course covers local AI infrastructure in depth: running Ollama with custom Modelfiles for task-specific assistants, deploying llamafile for zero-dependency portable inference, compiling Rust Candle with CUDA for GPU-accelerated local inference, and optimizing local RAG with caching strategies. You will configure a complete AI workstation with tmux for session management, nvidia-smi and Zenith for GPU monitoring, and NVIDIA GPU optimization. The final module covers cloud workflows including AWS Spot instances for cost-effective GPU compute, Hugging Face model discovery and download, and GitHub AI models integration. By completing this course, you will be able to set up local AI infrastructure, deploy models across local and cloud environments, and design orchestration workflows that balance cost, privacy, and performance.

Syllabus

  • Orchestration Fundamentals
    • A comprehensive course covering prompt engineering with chain-of-thought reasoning, local inference runtimes (Ollama, llamafile, Candle), GPU workstation configuration, and cost-optimized cloud deployment with AWS Spot instances.
  • Local AI Infrastructure
    • Covers local vs cloud model tradeoffs, caching strategies, local RAG optimization, Ollama with custom Modelfiles, llamafile portable deployment, and Candle GPU-accelerated Rust inference.
  • Workstation and Cloud Workflows
    • Covers tmux session management, nvidia-smi and Zenith GPU monitoring, local workstation orchestration, AWS Spot instance deployment, Hugging Face and GitHub AI model workflows, and Rust project structure.
  • Capstone
    • Head-to-head comparison of Ollama vs `apr` ([paiml/aprender](https://github.com/paiml/aprender)) running Qwen2.5-Coder-1.5B on the same prompt suite, same hardware. Build a chain-of-thought routing engine that selects runtimes based on task complexity and validation requirements, with cost analysis spanning local workstations, Spot instances, and Bedrock.

Taught by

Alfredo Deza and Noah Gift

Reviews

Start your review of AI Orchestration: From local models to cloud

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.