The Telugu Script
The Telugu Language
is the 2nd most widely spoken language in India, and is one of the 22
official languages. It is also known as
the "Roman of the east" and is very easy to
learn to speak and write.
The Telugu
Script is very complex for a machine to recognize, in other words, for
an OCR. The language has 4 classes of symbols which
form words, they are 1. vowels,
2. consonants, 3. vowel modifiers (maatras)
and 4. consonant modifiers (vatthus).
A small example is shown below. Words are formed as a combination of
the following.

I researched an OCR for printed but real (as in
from magazines/old printed material) Telugu
document images. It was a formidable task to develop one, since the Telugu orthography has characters
(or combinations) which are highly complicated and very close to each
other in statistical, structural as well as visual senses.
More information on Telugu
can be found at this link.
back