I’m fascinated by the way Facebook operates. It’s a very unique environment, not easily replicated (nor would their system work for all companies, even if they tried). These are notes gathered from talking with many friends at Facebook about how the company develops and release software.
It’s been over six months since I assembled these observations and I’m sure Facebook has continuously evolved its software development practices in the meantime. So these notes are probably a little bit out-of-date. It also seems like Facebook’s developer-driven culture is coming under greater public scrutiny. So I’m feeling more comfortable now about releasing these notes… HUGE thanks to the many folks who helped put together this view inside of Facebook!
- as of June 2010, the company has nearly 2000 employees, up from roughly 1100 employees 10 months ago. Nearly doubling staff in under a year!
- the two largest teams are Engineering and Ops, with roughly 400-500 team members each. Between the two they make up about 50% of the company.
- product manager to engineer ratio is roughly 1-to-7 or 1-to-10
- all engineers go through 4 to 6 week “Boot Camp” training where they learn the Facebook system by fixing bugs and listening to lectures given by more senior/tenured engineers. estimate 10% of each boot camp’s trainee class don’t make it and are counseled out of the organization.
- after boot camp, all engineers get access to live DB (comes with standard lecture about “with great power comes great responsibility” and a clear list of “fire-able offenses”, e.g., sharing private user data)
- any engineer can modify any part of FB’s code base and check-in at-will
- very engineering driven culture. ”product managers are essentially useless here.” is a quote from an engineer. engineers can modify specs mid-process, re-order work projects, and inject new feature ideas anytime.
- during monthly cross-team meetings, the engineers are the ones who present progress reports. product marketing and product management attend these meetings, but if they are particularly outspoken, there is actually feedback to the leadership that “product spoke too much at the last meeting.” they really want engineers to publicly own products and be the main point of contact for the things they built.
- resourcing for projects is purely voluntary.
- a PM lobbies group of engineers, tries to get them excited about their ideas.
- Engineers decide which ones sound interesting to work on.
- Engineer talks to their manager, says “I’d like to work on these 5 things this week.”
- Engineering Manager mostly leaves engineers’ preferences alone, may sometimes ask that certain tasks get done first.
- arguments about whether or not a feature idea is worth doing or not generally get resolved by just spending a week implementing it and then testing it on a sample of users, e.g., 1% of Nevada users.
- engineers generally want to work on infrastructure, scalability and “hard problems” — that’s where all the prestige is. can be hard to get engineers excited about working on front-end projects and user interfaces. this is the opposite of what you find in some consumer businesses where everyone wants to work on stuff that customers touch so you can point to a particular user experience and say “I built that.” At facebook, the back-end stuff like news feed algorithms, ad-targeting algorithms, memcache optimizations, etc. are the juicy projects that engineers want.
- commits that affect certain high-priority features (e.g., news feed) get code reviewed before merge. News Feed is important enough that Zuckerberg reviews any changes to it, but that’s an exceptional case.
- no QA at all, zero. engineers responsible for testing, bug fixes, and post-launch maintenance of their own work. there are some unit-testing and integration-testing frameworks available, but only sporadically used.
- re: surprise at lack of QA or automated unit tests — “most engineers are capable of writing bug-free code. it’s just that they don’t have an incentive to do so at most companies. when there’s a QA department, it’s easy to just throw it over to them to find the errors.”
- re: surprise at lack of PM influence/control — product managers have a lot of independence and freedom. The key to being influential is to have really good relationships with engineering managers. Need to be technical enough not to suggest stupid ideas. Aside from that, there’s no need to ask for any permission or pass any reviews when establishing roadmaps/backlogs. ”My product director doesn’t even really know all the things I have on my roadmap.” There are relatively few PMs, but they all feel like they have responsibility for a really important and personally-interesting area of the company.
- by default all code commits get packaged into weekly releases (tuesdays)
- with extra effort, changes can go out same day
- tuesday code releases require all engineers who committed code in that week’s release candidate to be on-site
- engineers must be present in a specific IRC channel for “roll call” before the release begins or else suffer a public “shaming”
- ops team runs code releases by gradually rolling code out
- facebook has around 60,000 servers
- there are 9 concentric levels for rolling out new code
- the smallest level is only 6 servers
- e.g., new tuesday release is rolled out to 6 servers (level 1), ops team then observes those 6 servers and make sure that they are behaving correctly before rolling forward to the next level.
- if a release is causing any issues (e.g., throwing errors, etc.) then push is halted. the engineer who committed the offending changeset is paged to fix the problem. and then the release starts over again at level 1.
- so a release may go thru levels repeatedly: 1-2-3-fix. back to 1. 1-2-3-4-5-fix. back to 1. 1-2-3-4-5-6-7-8-9.
- ops team is really well-trained, well-respected, and very business-aware. their server metrics go beyond the usual error logs, load & memory utilization stats — also include user behavior. E.g., if a new release changes the percentage of users who engage with Facebook features, the ops team will see that in their metrics and may stop a release for that reason so they can investigate.
- during the release process, ops team uses an IRC-based paging system that can ping individual engineers via Facebook, email, IRC, IM, and SMS if needed to get their attention. not responding to ops team results in public shaming.
- once code has rolled out to level 9 and is stable, then done with weekly push.
- if a feature doesn’t get coded in time for a particular weekly push, it’s not that big a deal (unless there are hard external dependencies) — features will just generally get shipped whenever they’re completed.
- getting svn-blamed, publicly shamed, or slipping projects too often will result in an engineer getting fired. ”it’s a very high performance culture”. people that aren’t productive or aren’t super talented really stick out. Managers will literally take poor performers aside within 6 months of hiring and say “this just isn’t working out, you’re not a good culture fit”. this actually applies at every level of the company, even C-level and VP-level hires have been quickly dismissed if they aren’t super productive.
It’ll be super interesting to see how Facebook’s development culture evolves over time — and especially to see if the culture can continue scaling as the company grows into the thousands-of-employees.
What do you think? Would “developer-driven culture” work at your company?
Related Posts: On this day...
- Remixable video of Norway's four seasons from a train - 2013
- Interactive "Starry Night" adds a touch of movement - 2012
- Verizon reduced prices on its unlimited wireless plan - 2010
- NFOPad - 2009
- Linux succumbs to Windows users - 2009
- Misconceptions About Laptop Encryption May Put Data At Risk - 2009
- Forbes: Best Cities For Jobs In 2008 - 2008
- New trojan out for the iPhone - 2008