Saturday, March 27, 2010

Using Foremost to recover files from a dead hard drive

A client gave me a 250 gig hard drive that wouldn't boot any more. I was hoping it was a problem with Windows, such that I could image it and move on. However, when I tried imaging the drive, it would fail after 145 gigs of imaging. I tried this a couple of times and was able to repeat the fail at the 145 gig mark. Without a physical image, I wasn't able to pull out the logical partition. However, the client was asking what word documents I could pull off the machine.

So, with an image (as complete as I could make it) I decided to carve out what I could find. I edited the foremost.conf file to uncomment the "doc" file type. Following that, I ran foremost:

foremost -o /path/to/foremost/output -c /path/to/formost.conf /path/to/image

This bombed right away. I shouldn't say that it bombed, rather it brought back many files, and most of them were huge files, quite obviously not Word documents. Taking a look at the documentation, I decided to add the -q switch, which starts the search of files on sector boundaries. This produced more files, but all of them were least, I couldn't read anything meaningful from them. I took another look at the foremost.conf file and some postings on the internet and found that the ole type has automatic extration. And, I would not need the config file. My final command was:

foremost -q -t ole -o /path/to/foremost/output /path/to/image

This carved out plenty of Word files for me. I'm going to try carving jpgs in a few minutes. One spec I haven't found is Word 2007 files (docx) or excel files. If you have a config that can be used in a foremost.conf file for those formats, I'd appreciate it. Just leave a comment.


  1. It is recommended to always use the demo first to get an overview of how the full version will perform on a particular set of data.

  2. For mission critical databases, you should have redundant backup sets that can even extend for several backup periods. There are scenarios in which organizations keep databases backups for years to meet the legal compliance.

  3. Excuse me for dredging up an old post, but just in case anyone searching for info on foremost stubles across this (as I did):

    docx is an internal / automatic extraction setting for foremost as well. You just have to use the 'zip' file type, as .docx files are zipped XML, like .odt etc.

    I usually work with .odt, so when I get back hundreds of them, I write a shell script to unzip them (specifically "content.xml", which contains what the user actually wrote), and grep that for a particular phrase (paying attention to various whitespace and line-ending issues which can arise).

    If the file matches, the script constructs as indicative a filename as possible from various meta information which the file format includes, and saves the file in an output directory.

    Since .docx is all XML, I'd assume the same should be possible there without much bother.

    Today I'm after a .doc though, so not sure if the above will be possible. Unzipping .docs always gives me an error, indicating they're not simply zipped XML? So I'm not sure if appropriate meta-info is available for .doc, unfortunately.

    To add a possible complication, it was a .doc file created with Word 2007... but I'm making an assumption that MS would keep the file format indicated by that extention the same despite the newer version of Word.