Fine-tuning Pixtral - Multi-modal Vision and Text Model

Fine-tuning Pixtral - Multi-modal Vision and Text Model

Trelis Research via YouTube Direct link

How to fine-tune Pixtral.

1 of 21

1 of 21

How to fine-tune Pixtral.

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Fine-tuning Pixtral - Multi-modal Vision and Text Model

Automatically move to the next video in the Classroom when playback concludes

  1. 1 How to fine-tune Pixtral.
  2. 2 Video Overview
  3. 3 Pixtral architecture and design choices
  4. 4 Mistral’s custom image encoder - trained from scratch
  5. 5 Fine-tuning Pixtral in a Jupyter notebook
  6. 6 GPU setup for notebook fine-tuning and VRAM requirements
  7. 7 Getting a “transformers” version of Pixtral for fine-tuning
  8. 8 Loading Pixtral
  9. 9 Dataset loading and preparation
  10. 10 Chat templating somewhat advanced, but recommended
  11. 11 Inspecting and evaluating baseline performance on the custom data
  12. 12 Setting up data collation including for multi-turn training.
  13. 13 Training on completions only tricky but improves performance
  14. 14 Setting up LoRA fine-tuning
  15. 15 Setting up training arguments batch size, learning rate, gradient checkpointing
  16. 16 Setting up tensor board
  17. 17 Evaluating the trained model
  18. 18 Merging LoRA adapters and pushing the model to hub
  19. 19 Measuring performance on OCR optical character recognition
  20. 20 Inferencing Pixtral with vLLM, setting up an API endpoint
  21. 21 Video resources

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.