A scanner is a photocopier connected to a computer. The scanner takes a picture of the paper and the computer plots each speck of color. This electronic pattern then can be used as an art file or analyzed further for patterns that form letters (optical character recognition, or OCR).
        In all cases, however, the scanner follows the GIGO rule: garbage in, garbage out. It is imperative that the item being scanned is as clean as possible and that any editing or corrections be made on a duplicate copy.

Handle material to be scanned as you would handle original art.

It is best to assume a piece of manuscript may be scannable whenever you do not have a corresponding electronic file. Always make a clean photocopy, save the original, and use the copy for editing and marking.

Scanning line art
Properly executed line art can be scanned, which offers an economical solution that will meet your demands for quality. Even a dollar bill can be scanned and beautifully reproduced in your books (although we do not recommend this practice).
     The problem with scanned art is that alterations are difficult. For instance, authors often place leader lines and temporary labels on their art, and ask that they be redone. Labels can easily be reset if they are placed outside the margin of the rest of the art, but this may necessitate spatial adjustment of the leader lines—and lines crossing other elements of art cannot be easily moved.
     Scanned art should be prepared so that nothing needs to be altered or removed. Prepare the base art and save it as camera copy; then print labels and draw leader lines on a copy. Remember, we can add information; but we cannot easily alter or delete background elements.

Scanning halftone art
The computer can duplicate anything that a camera can do with halftones. However, scanning them requires specialized training and the results can, at best, equal the camera—the computer cannot outperform traditional technology.
     For the best balance of quality and price, you will want to have halftones shot and inserted by the printer. Our policy is to scan and size halftones and place them as FPOs at no charge.
     There is one situation when computers outshine the camera: retouching halftones. However, this is not simply scanning, but rather a creative technique for generating new art.

Scanning for optical character recognition
The text of a book must be keyboarded accurately. The quickest, most accurate way to capture keystrokes is to translate the author's file; the slowest and possibly least accurate way is to retype the manuscript. The alternative way to capture keystrokes is to scan a manuscript and use an OCR program.
     The speed of OCR is dependent on the clarity of the letters and the cleanness of the page. With a clean, sharp manuscript, OCR can read up to 2000 characters a minute—the manuscript for a 400-page book in about 10 hours. On the other hand, OCR will require several minutes for one page of a messy manuscript.
     The accuracy of OCR is also dependent on letter- and page-clarity, with the cleanest generating a file with only 1 to 2 typos per 100 pages. Copy that is marked up or otherwise of poor quality will have an avalanche effect on OCR, creating gibberish.
     OCR accuracy is not affected noticeably by smallness of type sizes or leading, but display and large type sizes do create problems for OCR. Italics, letters with descenders that are also underlined, and super- and subscripts are often misread.
     OCR programs are trainable—you can teach OCR how to recognize dot matrix, a carbon copy, or a photocopy of a photocopy. The key is that it is not economic to train an OCR program for a small amount of material, and material that is not physically consistent cannot benefit from the training.

Summary of OCR considerations

  1. Consider OCR only when no electronic file is available.
  2. The manuscript should be clean and without handwritten notations (make additions or edit on clean photocopies).
  3. The letters should not be broken or filled in.
  4. There should be large sections of physically consistent material.
  5. Clean inserts or small portions of manuscript on separate pages do not present problems.

Benefits of OCR use

  1. A 400-page book that is reasonably clean might cost about $1800 to retype but only about $1200 to scan (saving $600).
  2. This same book would take 2 weeks to type or 2 days to scan.
  3. The scanned file would have about the same error rate as a typist.
  4. The error rate would improve if the manuscript were exceptionally clean, perhaps only 8 typos in the entire book.
N.B.: This manuscript could not be considered for OCR if editing had been done on the original pages.

