Thursday, January 16, 2014

Convert PDFs to Word

Optical Character Recognition (OCR) is the process of converting scanned printed/handwritten image files into machine readable text format. It works by analyzing a document and comparing it with fonts stored in its database and/or by noting features typical to characters. OCR enables you to take a book or magazine article feed it directly into a computer file then edit the file using a word processor.

What that means:
OCR software can convert a PDF into a Word document. There are free websites that offer this service. One that I recently discovered is called Online OCR. They claim to recognize thirty-two languages, can recognize PDF, TIFF, JPG, and PNG (among others) and translate them into a number of formats – PDF, Word, Excel, html, RTF and Plain Text. Best of all it’s free and no account is required.
Scenario:
Let’s say you have a resume that needs updated. The original Word document is missing but you do have a hard copy. It is possible to digitize that document and edit it in Word.
The process:

1)      Convert the hard copy to PDF by scanning it and then email it to yourself (a free service available at all SCDL locations.)

2)      Log-in to your email and save the PDF to your computer.

3)       Go to onlineocr.net and click the Browse… button.

4)      Locate the PDF that was saved to your computer in step 2. Click Open. Then click Upload.

5)      Select your Recognition language and Output format. Output format means what type of file you would like to create (Word, Excel, or Plain Text.) To convert your PDF to a Word document select MS Word (doc).

6)      Enter the numbers displayed in the CAPTCHA box (a test some websites use to determine whether or not a user us human.)

7)      Click Recognize.

8)      Scroll to the bottom of the page and click on Download Output File.

The resume will now open in Microsoft Word. From here you can make changes and save it for future reference.

Disclaimer:

While great and incredibly accurate remember that Online OCR is a free service. Much of the original document’s formatting is preserved during the conversion but it is not always exact. After your conversion it is highly recommended you proofread your document. There could be misspelled words and perhaps some formatting differences compared with the original (text is usually bold, for example.) Compared to when the document was a PDF and could not be edited the formatting errors are pretty minor.
As I said earlier the site is free and does not require registration. However, there is an option to create an account which has some benefits. It is free to create an account (all that is required is an email address) and you get twenty-five free credits (worth twenty-five scans.) There are options to purchase or earn more credits which are pretty reasonable. The biggest advantage to creating an account is multipage scanning. If your original document is more than one page you’ll need to scan each page individually and convert them individually if you do not create an account. But if you create a free account you can scan and convert all three pages at once (you could even tell it specific pages to convert – like only pages one and three, for example.) Your converted documents are also saved in your account too. So if you lose the Word version you can retrieve it from your Online OCR account.
So next time you have a hard copy of a document (or just a PDF) and you want to make changes remember that there are options. Free options! One of them is called Online OCR and only takes a few minutes to convert your PDF into Word document.