CASE STUDY: Spotify

Spotify: An Early Adopter of Containers, Spotify Is Migrating from Homegrown Orchestration to Kubernetes

MAYANK VARSHNEY
3 min readDec 24, 2020

Challenges Faced:

Spotify was launched in 2008, the audio-streaming platform has grown to over 200 million monthly active users across the world.

“Their goal is to empower creators and enable a really immersive listening experience for all of the consumers that they have today — and hopefully the consumers they'd have in the future,

An early adopter of micro services and Docker, Spotify had containerized micro services running across its fleet of VMs with a homegrown container orchestration system called Helios. By late 2017, it became clear that “having a small team working on the features was just not as efficient as adopting something that was supported by a much bigger community”.

Solution they came up with:

They saw the amazing community that had grown up around Kubernetes, and they now wanted to be a part of that”.

Kubernetes was more feature-rich than Helios. Plus, “they wanted to benefit from added velocity and reduced cost, and also align with the rest of the industry on best practices and tools”.

At the same time, the team wanted to contribute its expertise and influence in the flourishing Kubernetes community. The migration, which would happen in parallel with Helios running, could go smoothly because “Kubernetes fit very nicely as a complement and now as a replacement to Helios”.

Impact to the industry:

The team spent much of 2018 addressing the core technology issues required for a migration, which started late that year and is a big focus for 2019. “A small percentage of their fleet has been migrated to Kubernetes, and some of the things that that they've heard from their internal teams are that they have less of a need to focus on manual capacity provisioning and more time to focus on delivering features for Spotify,”.

The biggest service currently running on Kubernetes takes about 10 million requests per second as an aggregate service and benefits greatly from auto scaling. Plus, “Before, teams would have to wait for an hour to create a new service and get an operational host to run it in production, but with Kubernetes, they can do that on the order of seconds and minutes.” In addition, with Kubernetes’s bin-packing and multi-tenancy capabilities, CPU utilization has improved on average two- to threefold.

Matt Brown, Staff Software Engineer of Infrastructure at Spotify, talks about how Kubernetes played a key role in the migration of back-end micro services to keep everything as seamless as possible for the 200+ teams involved.

Success story:

A story that’s come out of the early days of Kubernetes is a tool called Slingshot that a Spotify team built on Kubernetes. “With a pull request, it creates a temporary staging environment that self destructs after 24 hours,” . “It’s all facilitated by Kubernetes, so that’s kind of an exciting example of how, once the technology is out there and ready to use, people start to build on top of it and craft their own solutions”.

Technologies later added:

Spotify has also started to use gRPC and Envoy, replacing existing homegrown solutions, just as it had with Kubernetes. “They created things because of the scale they were at, and there was no other solution existing,”.

Both of those technologies are in early stages of adoption, but already “they have reason to believe that gRPC will have a more drastic impact during early development by helping with a lot of issues like schema management, API design, weird backward compatibility issues, things like that,”. “So they’re leaning heavily on gRPC to help in that space.”

As the team continues to fill out Spotify’s cloud native stack — tracing is up next — it is using the CNCF landscape as a helpful guide. “They look at things that needed to solve, and if there are a bunch of projects, they evaluate them equivalently, but there is definitely value to the project being a CNCF project,”.

Spotify’s experiences so far with Kubernetes bears this out. “The community has been extremely helpful in getting thier community to work through all the technology much faster and much easier,”. “It’s been surprisingly easy to get in touch with anybody they wanted to, to get expertise on any of the things they’re working with and it’s helped them validate all the things thet’re doing.”

This is it…

Thank You !

--

--

MAYANK VARSHNEY

I am a forward-thinking individual with exceptional skills in problem-solving, adaptive thinking, automation, and development.