The Elusive Infrastructure Team

At some point you're making enough money and have enough excellent engineers that it makes sense to have some people spend all their time making everyone else go faster. If this is you right now, congratulations, you have (partially) solved one set of problems and now have another set of problems.

(Shouldn't you always be working to make yourself go faster, even when it's just you, or just you and a few other people? Yes. Everyone should be doing the little things that speed up development. Build those habits! But really your focus should be on delighting customers. You can't delight people and keep your sanity with slow builds and useless, flaky tests. Everyone on the team will just have to do those chores. Later on, though, it will make sense for someone's full-time job to be making everyone go faster.)

How do you go about it?

The obvious move is to create a team dedicated to engineering velocity. Call it Infrastructure. Or Platform. Or Developer Experience. Whatever. You'll have someone whose clearly supposed to work on this stuff.

A funny thing happens once you go from delighting customers to trying to improve internal velocity. In theory, the team's customers are other engineers at the company, and they should transfer their focus onto them. In practice, once you step off the customer treadmill, you have the time to think of ideal solutions to everyone's problems. After a few months, you may find the infrastructure team building The Grand New System that's supposed to ship in three months while feature teams go off and build the infrastructure they need themselves because at least then they'll get it when they need it.

To make sure the infrastructure folks stay in touch with customer needs, you can have them embed with feature teams. They'll tackle customer problems head-on and be able to build some sweet repeatable infrastructure along the way! This works well in bursts, until the moment when the feature team “captures” the infrastructure engineer(s) and they start building one-off hacks to solve the problem in front of them.

The null option is to not create a dedicated team and let the existing teams wrestle with velocity and developer experience all on their own. Embrace the status quo! The Tragedy of the Commons will crop up again and again. The build's slow, but I have to ship this thing by Friday and can't go off on a goose chase to find out what's making it slow. My pull request has failed to merge three times in a row because of this flaky test, but I'm not sure I can delete the test nor do I know how to fix it. People know what to do, they're just not sure they're the ones to do it.

I've tried all three approaches. Nothing magically erases all problems. I do have a soft spot for infrastructure engineers embedding with feature teams. Having the infrastructure engineer ship what the feature team needs right away and then take the next N weeks to make a repeatable version of that solution sounds like the sweet spot, but turning down urgent work because someone's polishing an already-working system sounds hard.

More important than which approach you take is deciding what you're aiming for and making that clear. Start measuring before you think you need to do. You can say you're focused on shipping speed to the exclusion of all else, but that probably means you will spend increasing amounts of time firefighting. Measure it and put guardrails in place! And know what you're going to do when the firefighting takes more time than you bargained for!

Before growing your team, know which habits from the current team you want to preserve. You're going to have to really reward those good habits (fixing broken tests straight away) to maintain them as new people join. Know what your team's bad habits are too! They're going to pass those on to new folks.