ai infrastructure

looked into a few different ai infra companies for my contrary report

Baseten: a platform for deploying and scaling machine learning models with serverless infrastructure.
Lambda Labs: GPU cloud services and workstations specifically designed for AI and deep learning workloads.
Together AI: open-source AI infrastructure to make powerful models accessible to developers.
Anyscale: a unified compute platform for scaling AI applications.
Modal: serverless compute for running AI models and data pipelines with automatic scaling.
Replicate: run machine learning models in the cloud with simple APIs.
Hugging Face: tools, infrastructure, and a community platform for building, training and deploying ML models.

i asked P @ baseten about how baseten is different from these other competitors. it was the last minute of our chat for a very big question, but he touched on four things.

focusing on model performance (dedicated deployment, customer with high volume and workloads, better throughput)
infra (multi-cluster to deploy across multiple regions and clouds, treating it as a single global resources with k8 auto-scaler)
real self-hosting and hybrid hosting (a lot of companies have not been able to build)
good model management tooling (truss)

while browsing through their website i learned about speculative decoding, which an optimization technique for inference that makes educated guesses about future tokens while generating the current tokens, guaranteering that the overall output of speculative decoding is identical to the vanilla decoding.

BENEDICT NEO 梁耀恩

ai infrastructure