Modernizing legacy code at LinkedIn: how big bang versus gradual approach caused conflict

Modernizing legacy code at LinkedIn: how big bang versus gradual approach caused conflict

Former LinkedIn senior staff engineer Chris Krycho has spoken about the complexities of modernizing legacy code and how conflicts over the best approach ended with him leaving the Microsoft-owned company.

Krycho was at LinkedIn for nearly five years, between January 2019 and October 2023, and according to his bio on the site, was tech lead for the LinkedIn.com web site. His bio also notes that he “designed a migration strategy away from Ember. We did not use it – ask me!”

He was asked about this and other things in a Corecursive podcast, which gives insight into both the code behind LinkedIn and the challenges of maintaining and modernizing a large legacy codebase.

The LinkedIn front end in 2019 had 2 million lines of code, he said, and took 17 minutes to build. On the back end there were “monstrous API servers,” supporting the LinkedIn functionality, the ad server, LinkedIn Learning and more.

The site was built using Ember.js, Krycho said, and his first big project was to modernize the code to use JavaScript classes. He ran up against some corporate policies, where a process that involved multiple teams could not tie up more than 10 percent of their time. His solution was to automate the revisions as much as possible, but it still took 18 months to complete the work, which was constantly delayed by teams who said they did not have time because of other demands. 

This experience was a clue that there were organizational and cultural factors that made major improvements hard to implement, even when all agreed they were worthwhile. 

Despite the move to classes, there were still many errors in the code and Krycho’s next task was to migrate piece by piece to TypeScript. That proceeded, though adding to his workload as TypeScript migration issues landed on his desk. 

There was still the problem of Ember.js. LinkedIn was the “biggest user of Ember.js in the world,” Krycho said, but he saw that it was not the best solution in many parts of the code. He determined to move away from Ember.js to React. How could that be done with what was now 3 million lines of code in a way that would not break the 10 percent rule?

Krycho came up with a plan that he believed would work, though it would take “three to five years. And three years was very optimistic.” It would again involve developing automations, and “the idea was that product teams will never really have to stop.”

In the meantime though, another group within the company came up with an alternative that Krycho summarized as being “about rewriting both the mobile and the desktop apps.” This big bang approach, versus Krycho’s five year plan, resonated better with LinkedIn leadership, he said. “My pitch lost.”

Along the way, Krycho describes some deep-seated issues with the LinkedIn codebase, one of which led to outages over the Christmas holiday. The cause was memory leaks in pre-rendering services, combined with automated processes that restart servers if they use too much memory. A misconfiguration meant that too many servers got restarted at the same time, causing the remaining servers to fail as well. “We would end up taking down an entire data center of these servers,” he said.

Why were there so many problems with the code? Krycho attributes it to technical debt caused by prioritising new features over code quality. “When velocity becomes the primary or driving value that everything else is subservient to, it leaves you in a spot where maybe you have good velocity initially, but you can’t sustain it over time,” he said.

His reflection implies that the LinkedIn big-bang rewrite proposal may repeat that same problem.

Krycho emphasised that he “loved my time at LinkedIn” and that, “I’ve learned a ton,” even if the outcome for him was that “I’m not going to spend years of my life trying to build in a way and on things that I ultimately don’t believe in.” He gives only his own perspective and there is much that is unknown, not least how the alternative plan is proceeding.

Irrespective of the outcome for LinkedIn though, it is an example of how hard it is to modernize a large codebase, the reality of technical debt, and the challenge of balancing business demands with the less exciting task of making code more performant and resilient.