Skip to main content

2 posts tagged with "Infrastructure"

Infrastructure

View All Tags

ML Infra design for the GPU Poor

· 5 min read

Taming the Beast: How to Design a Queueing System for GPU-Intensive Workloads

TL;DR:

When designing for scale, the limiting factor is the GPU availability. So all rate limits / queueing must be designed around GPU availability.