This starts the first post of many regarding my shocking experience inside Twitter’s DC (data center) in Atlanta, GA. I got a job as a Site Operations Technician with the company in October 2017 and worked there until December 2020. Since most of the employees from Twitter 1.0 departed, including the management who wanted to keep the data centers a secret, now I can write about my time there.
My Shocking Experience Inside Twitter’s Data Center
Oh yes, when I arrived at the Atlanta, GA data center after attending Flight School at the Twitter headquarters in San Francisco, CA, the size of the fleet shocked me! After I took a tour through the building with my manager, however, I saw many things which also shocked me.
Back in 2017 there was close to 100,000 individual servers in four huge rooms, which came out to be thousands of racks. So think of it this way: I could see upon entering a room was row after row of servers. Actually, it’s a nice sight especially when the lights are off in the room. You just see the flicking lights and it’s quite calming. I digress, however.
Another shocking part of the data center was the amount of old hardware the company still utilized. I will go into further detail about the old hardware in a future post but Twitter still used outdated Dell and HP servers. Granted, these beasts can run for years without failure but they were slow. The company used HP ProLiant DL380 Gen9 to host their images and/or videos. With 1GB network uplinks these were slow. The disks failed quite often in these due to the heavy use. In addition we had plenty of HP ProLiant DL360 Gen9 to service as those servers ran other parts of the platform (which I can’t remember at this time). These were more reliable than the DL380 as the disk didn’t fail as much, although I remember replacing RAM sticks quite often.
Regarding the Dell server model numbers I can’t remember. However, I do remember the Dell network switches I troubleshot. Oh boy, those sucked. These were old Force10 switches that were in too many racks. When they failed we just swapped them out with a newer Juniper switch if possible. If not, then we put in another Force10 that was going to fail eventually.
Another part of my shocking experience inside Twitter’s DC was the loose procedures for the techs. Since my data center experience came from working in vendor-neutral locations with various customers using the facilities I learned to follow certain procedures. That meant keeping a clean desk, a tidy data center floor, and following the written rules regarding ticket handling, diagnosing, and repairing servers. That wasn’t so at my new employer.
I get it: Twitter still ran like a startup in that sense. In my opinion, I think management didn’t have enough people with data center backgrounds. Thus, they didn’t understand the need for running a tight ship. While my training was good there wasn’t a set procedure. It was more like “when you run into that situation, tell us so we can inform you how to do it.” Oh yes, I had access to an internal wiki but it was out of date because our department didn’t have a technical writer. And the techs who maintained the documentation got caught up in other duties.
I worked in chaos before and this was far from it so I was able to get my footing. However, my thought that Silicon Valley company would have their stuff together crumbled.