Pinterest Cut AI Costs 90% by Gutting a Frontier Model's Vision Layer

VentureBeat | May 29, 2026

SECTION 2: AI Industry and Government News

At 620 million monthly users, calling a frontier model for every image recommendation isn't a strategy — it's a bill. Pinterest CTO Matt Madrigal solved it by gutting Qwen3-VL's vision layer and rebuilding it with proprietary embeddings, cutting costs 90% and boosting accuracy 30%. Madrigal's team "ripped out" Qwen's vision encoder layer and fine-tuned the model on proprietary multimodal embeddings, capturing metadata around pins and images that can be precomputed offline. This allows the model to perform better at runtime and inference — without these embeddings, devs would have to call and encode each image at runtime, resulting in latency "20 times worse." Madrigal emphasized that "if you've got really unique data that you can then fine-tune an open source model with, data quality will, frankly, outweigh or overcome model size." The company also built a "taste graph" — a dynamic representation of what individual users actually like, combining graph structures with representational learning to transform discovery into intent.