READ revolutionizes access to handwritten documents
From the Middle Ages to today, from old Greek to modern English, from running text to tables or forms
Prof. Achim Rabus from the University of Freiburg has released two specialized models which are able to read Russian Curch Slavonic. The first model is called VMC_Test_4+: Training data consist of parts of the Russian Church Slavonic Great Reading Menology (16th century). The model is tailored towards transcribing Cyrillic semi-uncial script from the 16th century. Character Error Rates for the training data are 3.72% and for the validation set 3.92% and for the validation set 3.82%.
The second model is called: Combined_Full_VKS_2: Training data consist of parts of the Russian Church Slavonic Great Reading Menology (16th century), Old Church Slavonic Codex Suprasliensis (11th century), and the 11th century manuscript of the Catecheses of Cyril of Jerusalem. This is a generic model suitable for transcribing a variety of Old Cyrillic script styles including uncial and semi-uncial. Character Error Rates for the training data are 4.42% and for the validation set 3.92%.
Achim has written a detailed report about his usage of Transkribus which is an excellent example how such a general model can be created. Thanks a lot!
Thanks to the Library Labs of the Austrian National Library and the NewsEye project we are happy to announce the release of a free model which is capable to read German Fraktur documents especially from the 19th and 20th century in a convincing quality outperforming most standard OCR engines. The model is based on training data coming from the ANNO collection of the Austrian National Library and comprises 442.141 words. It shows a CER of 1,55% on the training set and 1,65% on the test set without any dictionary support. Note: the model is trained on German language documents. It will provide less convincing results for other languages, such as Swedish or Finnish Fraktur. However models for these languages are also in preparation and may be released in the coming months. The Fraktur model is available for every registered user in Transkribus and called: ONB _Newseye_GT_M1+. Have fun!