On the use of neural text generation for the task of optical character recognition

Mohammadi, Mahnaz; Jaf, Sardar; Breckon, Toby; Matthews, Peter; McGough, Andrew Stephen; Theodoropoulos, Georgios; Obara, Boguslaw

doi:10.1109/aiccsa47632.2019.9035333

On the use of neural text generation for the task of optical character recognition

Mohammadi, Mahnaz; Jaf, Sardar; Breckon, Toby; Matthews, Peter; McGough, Andrew Stephen; Theodoropoulos, Georgios; Obara, Boguslaw

Authors

Mahnaz Mohammadi

Sardar Jaf

Toby Breckon

Peter Matthews

Andrew Stephen McGough

Georgios Theodoropoulos

Boguslaw Obara

Abstract

Optical Character Recognition (OCR), is extraction of textual data from scanned text documents to facilitate their indexing, searching, editing and to reduce storage space. Although OCR systems have improved significantly in recent years, they still suffer in situations where the OCR output does not match the text in the original document. Deep learning models have contributed positively to many problems but their full potential to many other problems are yet to be explored. In this paper we propose a post-processing approach based on the application deep learning to improve the accuracy of OCR system (minimizing the error rate). We report on the use of neural network language models to accomplish the task of correcting incorrectly predicted characters/words by OCR systems. We applied our approach to the IAM handwriting database. Our proposed approach delivers significant accuracy improvement of 20.41% in F-score, 10.86% in character level comparison using Levenshtein distance and 20.69% in document level comparison over previously reported context based OCR empirical results of IAM handwriting database.

Citation

Mohammadi, M., Jaf, S., Breckon, T., Matthews, P., McGough, A. S., Theodoropoulos, G., & Obara, B. (2019, November). On the use of neural text generation for the task of optical character recognition. Presented at 16th ACS/IEEE International Conference on Computer Systems and Applications AICCSA 2019., Abu Dhabi, UAE

Presentation Conference Type	Conference Paper (published)
Conference Name	16th ACS/IEEE International Conference on Computer Systems and Applications AICCSA 2019.
Start Date	Nov 3, 2019
End Date	Nov 7, 2019
Acceptance Date	Jul 5, 2019
Online Publication Date	Mar 16, 2020
Publication Date	2019
Deposit Date	Jul 8, 2019
Publicly Available Date	Mar 19, 2020
Pages	1-8
Series ISSN	2161-5330
Book Title	2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA).
DOI	https://doi.org/10.1109/aiccsa47632.2019.9035333
Public URL	https://durham-repository.worktribe.com/output/1142440

Files

Accepted Conference Proceeding (151 Kb)
PDF

Copyright Statement
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.