Skip to main content

Tesseractindic: Tesseract OCR engine that supports Indic script.

This is a port of Tesseract OCR engine that supports Indic script.
The aim of this project is to add Indic script support to the Tesseract OCR engine, which currently does not support connected script such as devnagri. This includes adding some routines to the existing code base, training the engine with sample images and then testing for accuracy for subsequent debugging and refinement in the algorithms.

Tools and used software

Tesseract OCR engine 2.03 http://code.google.com/p/tesseract-ocr/

Gimp 2.2.17 http://www.gimp.org/

bbtesseract (GUI for editing training data, such as box files) 0.5.34 http://code.google.com/p/bbtesseract/

Project Plan: Take the input image and then manipulate it in a manner so that it then fit to be processed by the Tesseract OCR engine. For devnagri scripts, it translates to clipping the maatra(shironaam) between successive characters.

Comments

Popular posts from this blog

Howto Install BSNL wll clarity phone in Ubuntu

Hello linixians, Failed to access internet through your bsnl clarity phone.... Here is an easy way 1. Just download this executable..... bsnlclarity and save it to your home folder. 2. Connect your phone to the system with the usb cable. 3. Now at terminal type $ sudo ./bsnlclarity You will be asked for your username and password for accessing the internet enter it ..... now start surffing Ctrl+C to stop

Blank (or black) screen after boot process in Ubuntu

You may encounter a blank screen after boot in ubuntu. This will be due to some update of certain packages. Follow these steps to correct the problem. 1. Select recovery mode from the boot menu. 2. Select login as root from the menu in recovery mode. 3. Type this at the prompt # sudo apt-get remove xorg-driver-fglrx # sudo dpkg-reconfigure -phigh xserver-xorg 4. Exit # exit 5. Now select Resume normal boot from the menu. Every thing should be OK by now. Please comment about your experience.

gtalx: Howto Gtalk in ubuntu ( google talk )

Did you fail to use " gtalk in ubuntu " ... Now you can chat and talk to your gmail buddies in ubuntu... 1. check for gtk2.0 and qt4 $ sudo apt-get install libgtk2.0-dev $ sudo apt-get install libqt4-dev libogg-dev libtheora-dev $ sudo apt-get install libsdl-dev libavcodec-dev libswscale-dev $ sudo apt-get install libexpat-dev libraw1394-dev libvorbis-dev $ sudo apt-get install libgsm1-dev  libspeexdsp-dev libmediastreamer0-dev libortp-dev 2. Remove 'pulse audio sound server' Open synaptic package manager and remove pulse audio 3. "Download gtalx" from here Dowload it here 4. Extract the file.... $ tar -zxvf 0.0.4.tar.gz 5. Get into the directory.... $ cd 0.0.4 and then... $ sudo chmod u+x make $ ./make $ sudo ./make install 6. Enjoy..... Please comment about your installation(whether it was a success or not....) Take gtalx from Applications > Internet > gtalx Enter your gmail login information..and click connect...