(image by Zakaria Ahada)
How important is it to back up your data?
This is a leading question, of course, but it’s worth asking anyway. I’ve worked with a lot of companies, a lot of people over the years, and while plenty of folks understand what they should be doing, few of us give enough time and attention to actually doing it. It’s like wearing a seat belt, washing your hands, eating vegetables. I’m going to be fine. I did a good enough job. I’ll do it in the future.
But like car accidents, viruses and diabetes, when it comes to data loss, you’re much better off preparing for the worst than dealing with the consequences after the fact. The following story is about just that. About what can happen if you overlook your backups. Lessons in not just how important backups are, but how important it is to back up in the right way.
Lesson #1: You Can Lose Everything in an Instant
On one fateful day in 1997, three executives at Pixar Studios were gathered in an office.
They were working on a technical adjustment to the main character--Woody--of their upcoming picture, “Toy Story 2.” While looking through the file directory where the Woody data was stored, an error popped up on screen.
The directory was no longer valid.
It was odd. As if, for some odd reason, the Woody files were disappearing.
Oren Jacobs, the company’s CTO, recalled the incident to The Next Web. He was sitting beside his Technical Director, Larry Cutler, when “He looked at the directory and it had like 40 files, and he looked again and it had four files. Then we saw sequences start to vanish as well and we were like, ‘Oh my god.’”
Every time they refreshed the page, there were fewer and fewer files. Before they knew it, all the Woody data was gone, “and then we saw Hamm, Potato Head, and Rex. Then we looked at it again and there was just Hamm and then nothing.” Entire scenes began to fall away.
They hadn’t prepared for a scenario like this--who would’ve imagined it?--but there was no time to waste. Every second that passed represented thousands, if not millions of dollars worth of data. Jacobs recalled the moments that followed going something like this:
So I grabbed the phone.
“Transfer me to systems. Ah!”
So they transfer me to the systems group.
Then I said: “Unplug the machine. Just pull it out of the wall. Pull it out!”
[Systems replied:] “We can't pull it out of the wall because the-...”
“Pull it out of the ****** wall! Pull it out!”
Systems complied. Pixar Studios’ main server--the core of their whole network--was physically disconnected from the electricity.
Later, when the dust settled, they plugged the server back in. They got on a terminal, and typed in a command to return the total size of their movie’s database. It was one tenth of what it’d been earlier that day.
90% of the movie was gone.
A lot of companies operate under a “it won’t happen to me” mentality, until it does. When Pixar analyzed how their data had disappeared, they discovered the simplest, most avoidable of causes:
Just before the files went missing, an employee of Pixar Studios had typed a simple command into their computer: “rm -rf *.” “rm -rf *” is a function that deletes directories and the files they contain. The employee was probably trying to delete some old, unwanted files. However, in a stroke of bad luck, they’d accidentally applied the function to the root directory for the whole movie. The folder which contained all folders.
It goes to show just how quickly you can lose everything. For Pixar, months of time, effort and money came down to one accidental keystroke. For other companies it’s a wayward click, an earthquake, a hard drive failure, a ransomware attack.
Lesson #2: Backups Require Active Maintenance
Pixar were lucky, though: they had all their data backed up.
The team immediately pulled out and dusted off the bulky, plastic tapes they used for backups. Within a couple days’ time, everybody at the company returned to normal operation. “We lost a week of work,” Jacob recalled. “So those last 10 shots are the last week, but other than that…O.K…” Crisis averted.
Except something was off. Strange errors kept popping up. Renderings turned out all cockeyed, with important information skewed or missing.
After a week, they finally conceded:
The backups were broken.
It turned out that though Pixar had been backing up their data, they hadn’t been testing those backups. An all too common mistake. What if the data didn’t upload correctly, or some of it was missing? What if it was corrupted? Companies still make this oversight all the time. Only when it’s too late do they realize that their backup software was misconfigured, or that some change in their software or hardware threw everything off, or that backed up their data but not their settings (and that backing up configurations is important in the first place). Sometimes, testing is simply an exercise in reminding yourself what your backups actually contain, and what will happen if and when you need them.
If Pixar had tested their backups, they would’ve realized that the chunky plastic tapes they were using only had 4 GB of available storage. After hitting that cap, every time they uploaded new data, they were overwriting existing data. As a result, the only data that remained was what was most recently added--a fraction of the total.
But it was worse than that. The data that did manage to survive relied on data they no longer had. As Jacobs explained:
“[T]he restoral is bad, the work on it is bad, the deletion was horrible, and the backup tapes are busted. All possible directions to move are broken and, maybe worse. We don’t quite understand how they’re broken. If only 10 percent of the show is not on the tape, which 10 percent? I don’t know.”
They had nothing. Millions of dollars down the drain, due to an entirely avoidable oversight.
Lesson #3: You Need Redundant, Off-Site Backups
Even if Pixar’s backups had been working properly, their data would still have been at risk. They’d made a fundamental oversight, albeit a common one enterprises are still making to this day.
For example, not long ago I met with a prospective client. This client had no issue forgetting or misconfiguring backups--in fact, they took the matter very seriously, and had invested a lot of resources into it. They’d designed an entire room in their building to be dedicated to backups, and backups alone. It was “state-of-the-art,” they said, and had run them tens of thousands of dollars. But at least their data was safe.
It’s true that this client was completely impervious to certain worst case scenarios, like ransomware, and the kind of data loss Pixar faced in 1997. But what about fire? They spent so much time and effort on an on-premise solution that they’d not even considered backing up their data off-site. Therefore, any physical destruction in the building posed an equal threat to all copies of their data.
Pixar’s backups were also on-premise. At least, they thought all their backups were on-premise.
Galyn Susman, Supervising Technical Director, had just recently given birth. She needed to be home for some time, but didn’t want to miss weeks or months of work. The solution Pixar landed on was to send a copy of the full file tree--everything (which, remarkably, added up to only around 10 GB total)--to her home desktop. She’d be able to work from home, and every couple of weeks they’d forward an update.
Basically, Susman’s home computer was an off-site backup. And the only chance they had left at recovering months worth of work. She and Oren Jacobs exited the meeting.
“She and I just stood up and walked out, back to her Volvo, drove across the bridge [from Richmond, CA], got the machine, got some blankets, I hugged it with seatbelts, across the back seat.
That old computer was worth its weight in gold.
[We d]rove at like 35 with blinking lights on, hoping to get a police escort. No cops saw us, so it didn’t help us.
[. . .]
Eight people met us with a plywood sheet out in the parking lot [at Pixar] and, like a sedan carrying the Pharaoh, walked it into the machine room.”
Sweating through their shirts, they booted up the machine. They plugged it into the company network and immediately copied the contents off its hard drive. Thousands of files were missing, tens of thousands needed to be manually verified, but the necessary stuff was there. The picture was saved.
Bottom Line: Backups are Crucial
Toy Story 2 teaches us about love, identity, abandonment. Toy Story 2’s production teaches us that backing up your data is really important.
Pixar lost 90% of its file tree in a matter of seconds. They had backups, but nobody knew they were broken because they hadn’t been tested. Luckily, by pure accident, a deus ex machina appeared in the form of a working mother and her home desktop.
It may have happened years ago but, as someone who works in this field, I can tell you: the same problems, the same oversights, are as much with us today as they were back then. Some companies still don’t back up off-site, they don’t perform disaster recovery tests often enough, or they don’t test at all, or they don’t back up at all. This, in spite of the wealth of new tools and services we have today that we didn’t in 1997, like the cloud, which solves most of these problems all at once.
(A license plate in Toy Story 4 which reads “RMRF97”; image capture via u/Numerous-Lemon)
Whether you’re an executive managing tens of millions of dollars in value, or just a person who has important stuff on your computer, it’s important to learn the lessons of the past. Pixar did. According to The Next Web, following their crisis:
The systems administrators definitely “went through some deep soul-searching” about the backup plans and came to the big production meeting with a new backup plan in place, which was talked over very thoroughly.
Oddly enough, it ended up making no difference. In December, 1998, four of Pixar’s best writers were tasked with taking a look at the final movie and...they didn’t like it. They rewrote the whole thing, and the studio had to start everything over from scratch.
So there you go. If you’d like the chance to rebuild your entire company from scratch--everything you’ve worked on, everything you’ve built, back to square one--then maybe backing up isn’t necessary for you. Otherwise, it probably is.