PREVIOUS PAGE • SUBSCRIBE TO THE NEWSLETTER • CLIENT LOGIN
Do you really Caere about OCR?
Optical character recognition used to be expensive and none too good,
but about 3 or 4 years ago Caere's OmniPage program tipped the scale enough
that it became faster to scan text and clean it up than to retype it from
hard copy.
Version 8 of the software was the first version I could recommend without
reservation. Then version 9 raised the standard. Now version 10 has gone
beyond that standard. I won't say it's perfect because Caere is probably
working on version 11 - but it's very, very good.
When you need to edit or reuse a document that you've received on paper,
you have two choices: Type the material yourself or use an OCR program.
No matter how good the OCR program is, it will invariably make mistakes
and you'll have to fix them. Version 10 of OmniPage Pro is surprisingly
accurate even with less-than-perfect copies and dead-on with good copies.
The program runs under Windows 9x, Windows NT 4, and Windows 2000. New
to this version is a voice read-back capability that enables users to
verify OCR results. It could also be used by sight-impaired people to
read printed materials.
OmniPage Web (Personal Edition) is in the box with OmniPage Pro, making
it possible to convert multi-page paper documents into hyperlinked Web
sites. The program tries to recognize the hierarchy of the paper documents
and create links accordingly.
OCR used to be a complex operation. You had to fiddle with the scanner
to get it in the proper mode. Then you had to scan the page. Next came
manually marking columns of text. Then the recognition phase. And the
correction phase, during which most users decided that retyping the material
would have been easier.
That's not the case with OmniPage Pro. If your scanner has a page feeder,
you can scan multiple pages by pressing a single button. I still prefer
some human intervention, but the program is surprisingly accurate in automatic
mode - determining columns, text flow, graphics, headlines, and such.
OCR parallels speech recognition
Both optical character recognition and speech recognition have a long
history in computing. Both were envisioned at a time when computers weren't
powerful enough to do the job - not even the old "mainframe" computers.
The earliest OCR program was introduced in 1959 by the Intelligent Machine
Corporation. It could read just one font in one point size, and was used
for processing preprinted mortgage loan applications in the banking industry.
Later, programs that could read nearly a dozen typefaces were developed,
but they were accurate only when the operator selected the right typeface
library.
In 1966, an American standardized font called OCR-A and a European font
called OCR-B were developed. You've probably seen these in your list of
fonts. This is OCR-A.
Kurzweil Computer Products introduced a system in 1978 that could be
trained to read any font, but each new typeface required several hours
of training time.
Until the early 1980s, OCR systems were rule-based. They broke each character
image into a set of lines and curves and then determined which character
most closely matched the extracted features. This method worked well as
long as the original was clean.
In 1986, Palantir introduced an "omnifont" that could read many typefaces
and, instead of being rule-based, used new technology - "neural networks".
This is a technique that allows the computer to learn. Later, Palantir
became Calera and in 1993 the company introduced "Adaptive Recognition
Technology" that improved recognition again.
Caere Corporation developed Language Analyst software that added linguistic
information and the ability to examine three letters at a time to look
for common patterns. Caere also added a dictionary to help improve accuracy.
OCR programs began to use a variety of tricks to figure out what they
were looking at. They began with letter forms, but they also considered
spelling rules. Caere refers to these various technologies as "experts"
and during recognition, each expert had a vote. The interpretation that
receives the most votes is the one the program selected.
After purchasing Calera, Caere combined technologies developed by both
companies. When the individual characters in a word were difficult to
isolate and recognize, as was common on degraded document images, "Predictive
Optical Word Recognition".
This technology has been tweaked, tuned, and tinkered with over the past
few years. Computers have become faster and more powerful. Combined, these
forces make it possible for version 10 to be faster and more accurate
than any previous version.
Do you need OCR?
Not everybody needs OCR, but if you're someone who does, make sure you
see OmniPage Pro before you buy any other program.
How much should you pay? $99 You'll probably see a higher price because
$99 is the "upgrade". This is the price you should pay even if you think
you don't own any OCR software. If you have a scanner (if you want to
do OCR, you need a scanner!) you probably have OCR software. Most scanners
ship with some OCR product. That's enough to qualify you for the upgrade.
Don't pay more than you have to.
For more information, see http://www.caere.com/.
|