What’s it like to work at a fully distributed company with a worldwide team and culture guided by a Creed? Welcome to “Life @ Automattic,” our series of Q&As with the people behind the products. Today we spoke with Rudy Faile, a senior systems engineer.
Under “Job Title” on our company intranet, you wrote, “Literally web master.” I love that! But what exactly did you mean by it?
It’s actually “Literal Web Master” 🙂 – it’s a play on “webmaster,” which was a more commonly used term in the early days of the internet, particularly in the 1990s and 2000s. A webmaster is/was a person responsible for maintaining one or more websites.
The number of sites hosted at WordPress.com alone is over 181 million, and that isn’t even counting blogs on Tumblr or Gravatars or Simplenotes, or the myriad of other services Automattic provides. As a member of a small team responsible for the availability, performance, reliability, and security of all Automattic’s services, I am making a tongue-in-cheek joke – “literal master of the web.” It was actually Gary Pendergast’s title before he retired it and I
totally stole it asked him if I could use it.
For those who may not know, what exactly is Sysops, and how does it contribute to a company’s stability, growth, and well being?
“Sysops” is short for “systems operators” (or systems/operations). The term is most often used in the context of networks, databases, or system administration. In general it refers to folks who manage and maintain computer systems, particularly servers.
Here at Automattic, we refer to our Sysops as “Opers,” which is a throwback to IRC Operators. We still use IRC in Automattic systems, and you can find our sysadmins hanging out in the #opers channel there.
In terms of contributions to a company’s stability, growth, and well being, it’s kind of hard to overstate the importance there. We’re a software company, and our business is reliant on our customers being able to access the software they pay us for. If we get more users, we need to be able to accommodate them. If our services are down, we’re not going to have happy customers, and if we don’t plan appropriately, we aren’t going to be able to handle new customers.
How does Sysops work at Automattic? What’s different about our approach?
Hah. I think maybe literally everything 🙂. Considering the amount of servers we have and the number of customers we serve, we’re an extremely small team. I think, at most companies, you have structured divisions under infrastructure that specialize in each facet of systems administration. For example, folks who only work in the data centers, ops folks who only respond to alerts, infrastructure developers, database administrators, network specialists, so on-.
But at Automattic, an Oper is going to be expected to handle any of these tasks and more. It doesn’t matter if it’s a technology or hardware or a programming language you’ve never seen before—you’re expected to figure it out. It’s really impressive what we’re able to do on a daily basis with a few incredible folks across different time zones. My teammates are some of the most dedicated and talented people I’ve ever worked with. You might be fixing a server while working on a database schema change while writing code while helping someone troubleshoot their access in Slack. We do it all.
About what percentage of your time is spent interacting directly with our company’s customers?
This is kind of an interesting paradigm because when it comes to systems, in general our “customers” tend to be the other folks in the company. Developers come to us for help in implementing scalable solutions or deploying backend services, HR for access control, and everyone else for anything and everything in-between.
However, all Automatticians, from the CEO on down, are required to interface directly with our customers during something we call a “support rotation,” which is a week-long period that happens every year. During this rotation you work in direct support, interfacing with Automattic’s customers directly—answering emails, communicating with them via chat, and everything else.
What work did you do before becoming a Systems Wrangler, and how did that work prepare you for what you do now?
I did a number of other jobs before Automattic, including short stints as a mobile developer and the Director of IT Operations at a small accounting firm. However, I think probably the most impactful job I had was the 8 years I spent in the United States Navy.
Funnily enough, my job in the U.S. Navy had nothing to do with computers, but I think the experience imparted a lot of intangible traits that I’ve carried with me ever since. Things like commitment to a mission, resilience in the face of adversity, and attention to detail. These are especially important traits in an environment like Sysops where maintaining composure in the midst of chaos, and systematically determining what the problem is, is critically important.
How were you onboarded? Describe what you learned and how you learned it.
When I did my trial, I got right to work in planning an expansion project at one of our origin data centers. We were basically doubling our footprint in our Dallas data center, and it was an eye-opening experience to see that basically two people were responsible for making the entire thing happen.
I made spreadsheets that planned everything for the expansion down to which fiber connectors were going to plug into each specific switch port. I worked with the data center providers to coordinate cage and cabinet installation. I made 3D models that planned the layout for how things would fit based on the physical space we were purchasing.
I worked with vendors to order the servers, switches, routers, PDUs, fiber, copper, drives, and so on. I spent time at home cutting each Cat5 cable to length, terminating and labeling them, testing them, and doing the same for the other equipment. This is thousands of cables and other pieces of equipment that we’re talking about here, and it had to be perfect.
Then, I flew out to the data center to actually execute the project. Everything from racking the servers to configuring them to getting them in a state where they would be serving traffic to our customers. A lot of this I had never done before, but I was expected to look up everything I didn’t know to try and figure it out, asking questions if I got stuck.
At the end of the project, our legal team at Automattic was in town for a team meetup, and I was asked to give them a tour of the Data Center as a sort of “capstone” for my trial. I explained the history of Automattic in data centers, our presence in Dallas, and how each thing worked. I learned a lot about Automattic’s infrastructure and services-at-scale in general. It was an experience that I’ll never forget.
This type of work takes so long and requires so much focus that now, when I’m looking at a customer’s site on the backend and I see which physical server it’s on, sometimes it triggers a memory of work I performed in-person on the server. Sometimes it’s a server I physically racked or replaced RAM in or something. In my mind, I can see the server in the data hall: where it’s at, what the chassis looks like pulled out. It’s a wild experience.
Walk us through a typical work day.
Most days begin by saying good morning to my teammates and looking at alerts. If anything is critical, it takes priority over everything else. After that, we look at the systemsrequests p2, which is sort of like an internal forum where folks can come and request things from Systems, and start working through requests there. A systemsrequest can be anything. It can be something that takes minutes, or months, and it generally requires a thorough investigation into whatever the request is about.
It’s crucially important that we return accurate information or set things up properly for the requestor. These are the KPIs for systemsrequests, but we also try and get back to the requestor as quickly as possible—either with an acknowledgement that we’re working on it, a request for more information, or confirmation letting them know their request has been fulfilled, including the details.
After that, we look into things like long-term projects and improvements to the system. For example, looking into recurring alerts and figuring out what can be improved there, or writing code for something else. While doing all of this, we are monitoring all of our alerting systems and Slack, responding to anyone who has a question, request, or report.
What are your specific responsibilities on the team, and are they same as every other system’s engineers responsibilities, or do you folks vary in the work and roles you take on?
This is a good question. In general we all sort of have the mandate I mentioned previously, in that we’re all responsible for the availability, performance, reliability, and security of all Automattic’s services. However, some of us definitely have specialties or areas of expertise. For me in particular, I’m more of a generalist but help out from time to time with some of the project management around our data center operations, and I also help out our Opers focused on Tumblr Systems because I spent a couple of years focused on Tumblr Systems full time.
We do have some other teams under Systems that have more specific mandates. For example we have systems folks who focus entirely on some of our different systems like WordPress.com Business, Enterprise (WPVIP), and Tumblr. We also have some more traditional teams like Devops, Secops, and Perfops. We look at Systems at Automattic as a “Spoke and Hub” model where all of the spokes are the other divisions and Sysops is at the core. Because of this approach, Sysops is expected to know a bit about how everything works.
For companies to succeed, systems must be up and running all the time. How do you handle that responsibility? And does it affect how you schedule your working day?
One really nice thing about working in a distributed company is that we have great coverage across time zones. We all still have to respond to alerts, and everyone jumps on to help out, but no one has to work nights or anything like that (unless they want to 🙂). I think for the most part it’s a pretty simple hierarchy of importance, in that critical alerts take precedence over everything, and responding to other folks in the company is also pretty important to us. Everything else can slide to the right if it needs to.
What part of your role do you enjoy the most?
I like solving problems, helping people, and automating things. The best part of the role comes when you can do all three of those things at once. For example, if someone comes to you with a problem, and you solve it by automating it and reducing their manual toil, you’ve just become their best friend 🙂. I like that a lot.
What are some of your biggest challenges with the job?
Sometimes it’s difficult to fix something that’s broken or implement something that you’ve never seen before. Working on systems at scale can be like working on a running car, in that you have to kind and try and add or repair things without turning the engine off. A seemingly benign change or mistake can affect the whole system if you fail to attend to the details at all times.
Sometimes you can do “everything right” and still break things in some unexpected way. The best thing you can do is try and make each change in small, separate steps, then test, evaluate, and undo or move forward as needed.
Can you tell us how you may have worked through any particular on-the-job challenge?
One particular challenge that comes to mind was when we built out our own Matrix solution earlier this year. Even though the Matrix protocol was released 9 years ago, it’s a technology that’s still somewhat in its infancy, with a lot of questions around scale and application. We had a three person team with a deadline of six weeks to complete a stable, scalable backend to turn over to our Matrix developers.
We went through multiple iterations, and there were times when I thought we not only would miss the deadline but perhaps not even be able to deliver the end product at all, given the constraints we had. However, in the Automattic Creed, we have a line that says: “I am in a marathon, not a sprint, and no matter how far away the goal is, the only way to get there is by putting one foot in front of another every day. Given time, there is no problem that’s insurmountable.” We kept this in mind as we continued to push forward day after day. We ended up meeting the deadline and delivering on time.
How does “lifelong learning” enter into your role?
This one’s easy! Everyone at Automattic in general is a lifelong learner. The first sentence of our creed that I talked about earlier is: “I will never stop learning.” In terms of learning specific to my role, there is really no choice, as technology progresses, developers want to use the latest and greatest tools for development, and end-users want to enjoy the benefits of them. We always have to keep a sharp lookout for emerging trends.
Twenty years ago it was basic web servers running a LAMP Stack. Since then we’ve seen the rise of technologies like Cloud Computing, Blockchain, IoT, AI, and, from a systems perspective, containerization technologies like Docker and Kubernetes. Staying up-to-date with industry trends and standards is what keeps us delivering the best product possible for our customers.
How would you describe the culture of Automattic and the people you get to work with?
Our hiring processes are so good that we end up with the best and brightest folks. I’ve never met an Automattician that wasn’t super impressive in some way. We have PhDs, marathoners, successful business owners, just folks who are world class in everything you could imagine. As a result, during the five and a half years I’ve worked here, Automattic has always taken the “we are all adults” approach to things.
From the day you’re hired on, you’re handed an incredible level of trust and responsibility. I think Automattic is the apex of what remote work and a horizontal organizational structure should look like. On our “org chat” page there is a message that’s been posted there since I’ve been working here by our CEO Matt Mullenweg. It really encapsulates what I mean:
“This shows an organizational chart that looks much like a traditional company. We’ve found keeping lines clear to be good for accountability, responsibility, and navigation. However the “hierarchy” does not imply worth or value—a key philosophy of Automattic is that good ideas can come from anywhere, and often do. Everyone in the company can influence any other part of the company; in fact if you believe in something, you are expected to.”—Matt
What new things is the Systems team working on, and how might they affect customer experience and the bottom line in the future?
I think some of the most exciting projects include the Matrix hosting thing I mentioned earlier. There are a lot of interesting ways we’re looking at using Matrix to innovate in the messaging space. We’ve formed an entirely new division of Systems called “AI Operations,” which is focused on enabling and empowering our newly formed Applied AI Development Team with internal systems that meet the requirements of the next generation of Automattician and product AI-augmentations, as well as new AI product development. Lastly we’ve started looking into “Edge Compute” since we run our own Content Delivery Network across the world. Think AWS Lambda or “Functions-as-a-Service”.
Stepping back a bit, what life lessons, if any, has working as an Automattic Systems Engineer taught you?
It’s really taught me how to apply fundamental reasoning in any situation. As I mentioned, I look at problems almost every day that I’ve never seen before, and have no idea where to start sometimes. My go-to method is always stepping back through first principles and working from there.
This has translated into my life in the same way. For example: I don’t really know anything about cars, but when my car breaks now, instead of it being an immobile piece of metal until someone that knows something can look at it, I step through the fundamentals of what that car needs to work. I’m often surprised at how simple things really are “under the hood,” if you just take a look with an open mind.
Founded in 2005 by Matt Mullenweg, the co-creator of WordPress software, Automattic has been recognized as one of the world’s most innovative companies. We’re the people behind WordPress.com, WooCommerce, Jetpack, Simplenote, Longreads, WPScan, Akismet, Gravatar, Crowdsignal, Cloudup, Tumblr, Day One, Pocket Casts, WordPress VIP, and more. At this moment, there are 1,954 of us in 95 countries, speaking 120 languages. Maybe you can be one of us.