Sometimes, as some of you may have noticed now and then, I do Stupid Things.
This isn’t exclusive to my personal life; it leaks into my professional life, as well.
The company I work for is a web hosting company. For those of you who aren’t sure what that is, I can put simply by saying: We are where the web lives. Well, for a large number of folks, anyhow. We provide the computers upon which websites live, and we provide a ridiculous amount of bandwidth for those computers to access the internet.
There are several different levels of accounts, from the most limited account on a shared server (where many websites live together, sometimes in harmony) to the middle-of-the-road “virtual private server” (where only a few websites live together, often in complete discord,) up to the top-of-the-line private dedicated server.
As support technicians, we log in as “root” on these machines most of the time. The root user is the superuser – the one who can break absolutely everything with a mere typo or a moment’s inattention.
We also have a group Jabber (instant messenger) service we all hang out in, and it’s our primary means of communication. People ask questions in one of the support channels, and get answers usually immediately. Being new, I ask a lot of questions in Jabber.
One day, not too long after starting second shift, I made a minor, stupid noob mistake. I don’t even remember what it was, but it wasn’t a huge deal. One of my friends, a more senior UNIX admin, came over to console me. “Let me tell you a story,” Rex began.
“They occasionally call me The Panda Killer. You know why? Because one day, I asked someone a question in Jabber about how to fix a problem. Someone wrote back, totally kidding around, with a command. I didn’t realize he was kidding, and I implemented the command. It destroyed the entire Panda server. So don’t feel bad… everyone makes mistakes around here. Live and learn.”
We laughed, and I said, “someday, that’ll be me. No, seriously, that’s totally something I would do.” It is important to note, for reasons which will soon become obvious, that Rex did not tell me what command he implemented.
Cut to the very next night.
A customer called in saying he had 28,000 emails in a folder, and he couldn’t get rid of them. I logged in, and lo, I could not get rid of them with my newbie skills, either. I needed a more powerful command to handle that many files.
So I went into the support channel in Jabber, bearing in mind what Rex had told me about potential tomfoolery. I crafted my question carefully: “Without any shenanigan-like commands that will kill the server, could someone please tell me how to delete 28,000 files in a directory? rm cannot handle it.”
Almost immediately, another tech responded with: rm -Rf $dir/*
Now those of you who are at all conversant with any form of UNIX or Linux, know there is one thing seriously missing in that command. However, I was in a rush to get my customer’s problem fixed, I trust my co-workers implicitly, and thus… I copied and pasted that command , exactly as it stood, into my window and hit enter.
Several moments passed. There were 28,000 files there… surely it would take a few moments. Right? I started to get nervous and hit ^C to break the command. It broke after another moment. My customer said, “Um, my website seems to be down,” just as I hit “ls” to list the contents of the directory, and the server told me “command not found.”
“ls” is one of the most basic commands in Linux; it simply returns the contents of the directory one is in. If that command isn’t there… things are very, very bad.
“cd command not found”
In a very calm voice, I put my customer on hold, and went frantically back to the support channel: “OH MY HELL WHAT HAVE I DONE?! ls – command not found!”
Three people, including a shift supervisor, immediately responded, “you DID replace the variable… right?”
Um, no. No I did not. And thus, I almost deleted the entire server.
Replacing the variable means to replace the $dir/ with the actual directory I wanted to nuke. I didn’t. So instead, I was trying to nuke everything. Fortunately, it was a salvageable thing, we had a back-up, and the customer was back up and running in … oh, about 10 hours. TEN HOURS it took the system restorations team to fix my fuck-up.
I pretty much wanted to die.
Within a few moments, I had the customer placated, as Severin, who is a supervisor, stood over me compassionately, hovering in case any questions came up I couldn’t answer. I asked him if I could give the guy any credits for free service, and he authorized 2 weeks. Finally, the call was over, and I nearly burst into flames.
Siena, the shift manager came over with a couple of other people, all of whom had big, big grins on their faces. It was too late to crawl under my desk; they knew I was there.
Siena, a big, giant Viking of a man said, through the enormous grin, “SO. Do you know what you did wrong?”
“Will do you it again?”
“Oh my god, no.”
“Ok, then. So let me tell you about the time…” whereupon he regaled me with a few tales of other fuckery implemented by LiquidWeb support technicians, himself included.
Everyone was so very cool about it, they did a great job making me feel better. My natural inclination is to run and hide when I do something patently stupid, lick my wounds and beat myself up for a good, long while. They refused to let me do that, dragging me outside for a break, telling me stories of stupidity, and telling me it was not that big of a deal. A few said, “Well, now you’re officially One of Us.”
In a crazy kind of way, I did feel more like a part of the team after that happened. I still get ribbed about it from time to time, and deservedly so. I’ll never copy and paste a command I don’t fully understand again, that’s for damn sure. [EDIT: A year TO THE DAY later… I did the same fucking thing.]
What killed me was the VERY NIGHT BEFORE, I had been warned about this!
What kills me even further was the revelation, a week or more later, was that it was Mike N. who jokingly gave Rex the panda-killer command.
Last night, Mike made an uncharacteristically stupid mistake that brought down a fair chunk of one of our data centers for about an hour and a half. I imagined his sense of “oh, shit” outweighed mine by several orders of magnitude. He corrected the simple mistake in about one minute, but the weirdo cascading after-effects had the networking team scrambling for awhile.
A bunch of us had already had tentative plans to go out after work for beer, and it was decided this was an even more urgent matter than previously thought, since we all needed post-crisis alcohol. Several of us tried to get Mike to come out, but he chose not to come, instead licking his wounds, beating himself up, and generally feeling crappy about himself. I relate, I truly do, because that’s how I have historically dealt with my own stupidity when it impacts others.
Still, I wished I could have helped him to feel better, or that he would at least let me try to help him feel better.
I’m glad I have friends who occasionally force me to break out of that pattern of self-pounding, and I’m glad I’ve decided to let them. 🙂
I have, once again, been extremely fortunate to have found a very cool group of people to immerse myself in, and I’m grateful.