READ revolutionizes access to handwritten documents

From the Middle Ages to today, from old Greek to modern English, from running text to tables or forms

About

READ's mission is to revolutionize access to archival documents with the support of cutting-edge technology such as Handwritten Text Recognition (HTR) and Keyword Spotting (KWS).

Learn more

Network

READ addresses archives and libraries, humanities scholars, family historians, volunteers - and computer scientists

Learn more

Research

Research in READ comprises exciting fields such as Artificial Intelligence, Pattern Recognition, Machine Learning and Natural Language Processing.

Learn more

Services

READ technology is available via the service platform Transkribus. Upload documents, train a Handwritten Text Recognition (HTR) model, process text and follow the progress of the project.

Learn more

Recent Posts

+ Special models on Slavic handwriting released

Prof. Achim Rabus from the University of Freiburg has released two specialized models which are able to read Russian Curch Slavonic. The first model is called VMC_Test_4+: Training data consist of parts of the Russian Church Slavonic Great Reading Menology (16th century). The model is tailored towards transcribing Cyrillic semi-uncial script from the 16th century. Character Error Rates for the training data are 3.72% and for the validation set 3.92% and for the validation set 3.82%.

The second model is called: Combined_Full_VKS_2: Training data consist of parts of the Russian Church Slavonic Great Reading Menology (16th century), Old Church Slavonic Codex Suprasliensis (11th century), and the 11th century manuscript of the Catecheses of Cyril of Jerusalem. This is a generic model suitable for transcribing a variety of Old Cyrillic script styles including uncial and semi-uncial. Character Error Rates for the training data are 4.42% and for the validation set 3.92%.

Achim has written a detailed report  about his usage of Transkribus which is an excellent example how such a general model can be created. Thanks a lot!

 

+ General model for “Fraktur” released

Thanks to the Library Labs of the Austrian National Library and the NewsEye project we are happy to announce the release of a free model which is capable to read German Fraktur documents especially from the 19th and 20th century in a convincing quality outperforming most standard OCR engines. The model is based on training data coming from the ANNO collection of the Austrian National Library and comprises 442.141 words. It shows a CER of 1,55% on the training set and 1,65% on the test set without any dictionary support. Note: the model is trained on German language documents. It will provide less convincing results for other languages, such as Swedish or Finnish Fraktur. However models for these languages are also in preparation and may be released in the coming months. The Fraktur model is available for every registered user in Transkribus and called: ONB _Newseye_GT_M1+. Have fun!