Don't Judge a PDF by its cover.
What is a PDF document? Is a PDF document only an image that you can email and print? There is much more to PDF documents than most people realize. The PDF format is a widely used standard for secure, dependable electronic information exchange that is recognized by industries and governments around the world. According to Adobe, there are more than 200 million PDF documents on the web today. The website www.wikipedia.com describes a PDF standard as follows: “the Portable Document Format (PDF) is the file format created by Adobe Systems in 1993 for document exchange. PDF is used for representing two-dimensional documents in a device-independent and display resolution-independent fixed-layout document format.”
Even though this is an accurate description, it is important to note a few other characteristics. Portable Document Format documents generally come in two types, searchable and non- searchable. The main difference between the two is that searchable PDFs have an invisible layer underneath the image that contains the text shown in the image. This invisible layer is inserted during a process known as OCR, or Optical Character Recognition. Non searchable PDFs do not have this layer and are simply an image of the document. The easiest way to tell if a document is searchable is to choose the Select Tool located on the tool bar in most Adobe products. If the pointer does not turn into an I Beam shape when placed over text in the document, the document is non-searchable.
OCR is described by www.wikipedia.com as “the mechanical or electronic translation of images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text. OCR is a field of research in pattern recognition, artificial intelligence and machine vision.” Though academic research in the field continues, the focus on OCR has shifted to implementation of proven techniques. Optical character recognition (using optical techniques such as mirrors and lenses) and digital character recognition (using scanners and computer algorithms) were originally considered separate fields. Because very few applications survive that use true optical techniques, the OCR term has now been broadened to include digital image processing as well.
There are many programs that perform the OCR process, such as Adobe Acrobat and ABBYY Fine Reader. The quality of the character recognition varies from software to software and document to document, but in general, the quality has improved greatly in recent years. There are also many different options (DPI, scanning in color or black and white) that can greatly impact document quality and functionality. When creating PDF documents, it is recommended that a few test documents are created to ensure the correct combination of options. This is critical; almost all options are set at the time of creation and cannot be easily changed. If you have any questions about PDF’s and scanning. Feel free to leave me a comment and I will get back to you.