READ revolutionizes access to handwritten documents

From the Middle Ages to today, from old Greek to modern English, from running text to tables or forms

About

READ's mission is to revolutionize access to archival documents with the support of cutting-edge technology such as Handwritten Text Recognition (HTR) and Keyword Spotting (KWS).

Learn more

Network

READ addresses archives and libraries, humanities scholars, family historians, volunteers - and computer scientists

Learn more

Research

Research in READ comprises exciting fields such as Artificial Intelligence, Pattern Recognition, Machine Learning and Natural Language Processing.

Learn more

Services

READ technology is available via the service platform Transkribus. Upload documents, train a Handwritten Text Recognition (HTR) model, process text and follow the progress of the project.

Learn more

Recent Posts

+ National Archives releases first version of a Dutch handwriting model

The digitisation team around Liesbeth Keyser from the National Archives in the Netherlands is working hard on creating training data for their collections in order to prepare HTR processing on a large scale. As a first result a model based on 475.769 words is now made available for Transkribus users. The model shows a Character Error Rate of 7.48% on the training set and 6.15% on the validation set. It is based on the careful transcription of dozens of different handwritings and comprises scans from the Incoming Documents from the Dutch East India Company (Overgekomen Brieven en Papieren van de VOC) of the National Archives of the Netherlands and of 19th century Notarial deeds from the Noord-Hollands archief.  The model is named: NAN/NHA_GT_M3+ Enjoy!

 

 

+ Special models on Slavic handwriting released

Prof. Achim Rabus from the University of Freiburg has released two specialized models which are able to read Russian Curch Slavonic. The first model is called VMC_Test_4+: Training data consist of parts of the Russian Church Slavonic Great Reading Menology (16th century). The model is tailored towards transcribing Cyrillic semi-uncial script from the 16th century. Character Error Rates for the training data are 3.72% and for the validation set 3.92% and for the validation set 3.82%.

The second model is called: Combined_Full_VKS_2: Training data consist of parts of the Russian Church Slavonic Great Reading Menology (16th century), Old Church Slavonic Codex Suprasliensis (11th century), and the 11th century manuscript of the Catecheses of Cyril of Jerusalem. This is a generic model suitable for transcribing a variety of Old Cyrillic script styles including uncial and semi-uncial. Character Error Rates for the training data are 4.42% and for the validation set 3.92%.

Achim has written a detailed report  about his usage of Transkribus which is an excellent example how such a general model can be created. Thanks a lot!