While working at the Twitter data center as a Site Operations Technician I decided to join the on-call rotation. When it came to my turn I would spend a week as the Primary on-call tech. Then the next week I would spend a week as the Secondary on-call tech. My main task as the Secondary on-call tech was to do the data center after hours work (in addition to assisting the Primary on-call tech). When it was to perform the data center after hours work it was a bunch of hurry up and wait.
Data Center Afters Hours Work Couldn’t Begin Until Failover Finished
I would arrive to the building between 10:30 to 10:45 PM EST to prepare for the after hours work. I would already have the equipment in the room or rooms ready for installation because I put it there earlier that day. So I take my cart into the first room and, setup my laptop, and sit down check the various Slack channels to make sure everything was normal.
Then around 11:30 PM EST failover would start. The way Twitter performed after hours maintenance was to shut one data center off (not literally, though) and shift all the customer traffic onto the other. Thus, we could do our work without disturbance.
Now failover took a good amount of time, usually about thirty minutes. So I would monitor the process in the Slack channel and just browse the Internet or watch videos on YouTube. Then when failover finished it was time to…hurry up and wait for Network Engineering to put the device into maintenance mode.
Now To Wait For Networking Engineering
Yep, after the data center failover I had to wait for Networking Engineering to put the device (a major switch or router) I was to work on into maintenance mode. Thankfully this didn’t take as long, usually around five to ten minutes. Once I got the okay from the engineer I would check to make sure the switch or the router was in maintenance mode by looking at the LEDs and then get to work.
My work was to either install a new linecard into a switch or router, or remove an older linecard and install a new one with improved capabilities. To see what a linecard looks like you can view some of the Cisco models, Juniper models, and the Arista models. Out of all the linecards the Juniper ones gave me the most trouble. Only because they were long and heavy. I could install one myself but it was easier to have two people install one.
After installing the linecard or linecards I would tell the Network Engineer and then he or she would run their commands to bring up the card or cards. That took more time and I would sit down and watch more YouTube videos. If everything worked properly then the Engineer would put the device back online and do their final checks. Again, I would wait. Once finished, I would get the okay to move onto the next switch or router and repeat the steps.
However, if there was a problem I would have to troubleshoot it with the Engineer. Sometimes the linecard was faulty and I would have to rush to get another one from inventory so the maintenance could continue. Other times it needed to be reseated into the slot. Or rebooting the switch or router resolved the issue.
Data Center After Hours Work Didn’t Bother Me Because I Was Hourly
Yeah, I didn’t mind sitting around and waiting to do my work since I was hourly. Usually data center after hours work took about four hours on average with me to complete everything. However, I’ve been stuck there for more than six hours because failover didn’t go as planned or I ran into issues after installation. Either way the extra hours would usually bump me into overtime even if I went to work later the next day.