Members of the Zurich University compared two versions of the ABBYY FineReader (FineReader XIX and FineReader Server 11) OCR (Optical Character Recognition) and the Transkribus HTR (Handwritten Text Recognition) in order to find out which one is the most effective one when it comes to recognition results on black letters in historical newspapers. For the test they used PDFs with medium resolution images of the German-language Neue Zürcher Zeitung.
The recognition of black letters in historical newspapers can be particularly challenging because the distinctiveness of characters is often low, the paper quality can be bad and, in many cases, small font sizes are used. Systems like ABBYY FineReader and Transkribus are working on tackling such problems. We are happy that the experiment of the University of Zurich shows that Transkribus provides significantly better results than the commercial system ABBYY FineReader.
The article explains the effectiveness of the HTR, as only a modest amount of manual work is needed for the creation of ground truth, which makes it possible to apply the HTR on documents. Especially with printed texts in newspapers, error rates in Transkribus are usually low. Moreover, the test shows that the model, which had been trained for the Neue Zürcher Zeitung, also provided good results for other newspapers of the same epoch, like the Bundesblatt and the Neue Zuger Zeitung. Good news is, that the model of the Neue Zürcher Zeitung will become public during 2019.
If you would like to have a closer look on the experiment, you can find the whole article here: https://dev.clariah.nl/files/dh2019/boa/0694.html