Tesseract:Correcting page orientation
Recently for work I had to deal with doing OCR over inverted pages, something like this
I came across some threads(1, 2, 3)on tesseract google group and StackOverflow which were close to problem I was dealing with. I tried example mentioned on tesseract API page, looked at OSDetect code(4) and though it has already couple of implementations like updatebestorientation somehow it was not working for me. Maybe I was doing something wrong but eventually I ended up with following code which works for me, it rotates the page despite the top line.
#include <tesseract/baseapi.h>
#include <tesseract/osdetect.h>
#include <leptonica/allheaders.h>
int main(int argc, char **argv)
{
OSResults os_results;
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
if (api->Init(NULL, "eng")) {
fprintf(stderr, "Could not initialize tesseract.\n");
exit(1);
}
Pix *image = pixRead(filename);
api->SetImage(image);
// To detect correct OS and flip images
api->DetectOS(&os_results);
OrientationDetector os_detector = OrientationDetector(&os_results);
int correct_orientation = os_detector.get_orientation();
// Had to add this condition because get_orientation result and
// pixRotateOrth were not in sync.
if (correct_orientation == 1) {
image = pixRotate90(image, -1);
}
else if (correct_orientation == 3) {
image = pixRotate90(image, 1);
}
else if (correct_orientation == 2) {
pixRotate180(image, image);
}
api->SetImage(image);
char* ocrResult = api->GetUTF8Text();
fprintf(stdout, "Recognized Text: %s\n", ocrResult);
api->End();
pixDestroy(&image);
delete [] ocrResult;
return 0;
}