Research

Research in Handwritten Text Recognition (HTR) has seen remarkable progress in the past few years. But there are still many challenges to overcome before computers will be able to read handwritten documents (especially historical documents) in a similar way to human beings.

Some of these challenges:

  • Current machine learning algorithms still need large amounts of training data (“ground truth”). It would be desirable to reuse available resources in order to speed up the training process
  • The layout of historical documents is often arbitrary and complex. New methods will be necessary to “understand” additional layers of information as they appear in tables, forms and repeated elements of documents.
  • Though the basic technology is language independent, language data and models are necessary when actually decoding the information and outputting human readable text. Historical language is non-standardized and often biased by the individual writer.
  • Processing huge amounts of documents, such as millions or hundreds-of-millions of pages will require large amounts of computing power. Research will therefore also focus on reducing the effort in this respect.

Competitions

One important means of supporting research in the Handwritten Text Recognition domain is to organise research competitions. Several partners of READ have years of experience in this field and have organised such competitions at leading conferences, such as the International Conference on Document Image Analysis and Recognition (ICDAR) and the International Conference on Frontiers in Handwriting Recognition (ICFHR).

ScriptNet is the READ project platform for research competitions.  Computer scientists can participate in open competitions or start their own.

Open Access – Open Research Data – Open Source

READ is a project which is highly dedicated to the concept of “openness”.

  • Our publications will be accessible as Open Access
  • Research Data forming the basis for scientific publications is made available via the Zenodo research repository
  • The software developed in READ is available as Open Source via GitHub.