+ Decoding famous British engineer’s handwriting

The SS Great Britain trust accepted the challenge of deciphering Isambard Kingdom Brunel’s handwriting. Without technical help this had been a challenge. The team in charge discovered, that his handwriting was “almost impossible to read”. That’s where Transkribus came into play: even though the project-team only started to use Transkribs and the amount of training data wasn’t very high, useful results had already been achieved. Still there is a lot of work to do, as there are several thousand pages of Brunel’s handwriting.

“It’s hugely sped up the process and we’re learning new bits about his life; there’s so much potential to unlock”, says Nick Booth from the SS Great Britain trust. The SS Great Britain in Bristol is keeping the collection of material about the life and works of Brunel and tells the story of one of Britain’s greatest engineers and Brunel’s SS Great Britain, one of the most important historic ships in the maritime history.

The BBC news became aware of the project and published an article about it, which you can find here: https://www.bbc.com/news/uk-england-bristol-49347472

More information about Brunel and the SS Great Britain can be found on this homepage: https://www.ssgreatbritain.org

Source: https://www.bbc.com/news/uk-england-bristol-49347472

+ National Archives releases first version of a Dutch handwriting model

The digitisation team around Liesbeth Keyser from the National Archives in the Netherlands is working hard on creating training data for their collections in order to prepare HTR processing on a large scale. As a first result a model based on 475.769 words is now made available for Transkribus users. The model shows a Character Error Rate of 7.48% on the training set and 6.15% on the validation set. It is based on the careful transcription of dozens of different handwritings and comprises scans from the Incoming Documents from the Dutch East India Company (Overgekomen Brieven en Papieren van de VOC) of the National Archives of the Netherlands and of 19th century Notarial deeds from the Noord-Hollands archief.  The model is named: NAN/NHA_GT_M3+ Enjoy!

 

 

+ Special models on Slavic handwriting released

Prof. Achim Rabus from the University of Freiburg has released two specialized models which are able to read Russian Curch Slavonic. The first model is called VMC_Test_4+: Training data consist of parts of the Russian Church Slavonic Great Reading Menology (16th century). The model is tailored towards transcribing Cyrillic semi-uncial script from the 16th century. Character Error Rates for the training data are 3.72% and for the validation set 3.92% and for the validation set 3.82%.

The second model is called: Combined_Full_VKS_2: Training data consist of parts of the Russian Church Slavonic Great Reading Menology (16th century), Old Church Slavonic Codex Suprasliensis (11th century), and the 11th century manuscript of the Catecheses of Cyril of Jerusalem. This is a generic model suitable for transcribing a variety of Old Cyrillic script styles including uncial and semi-uncial. Character Error Rates for the training data are 4.42% and for the validation set 3.92%.

Achim has written a detailed report  about his usage of Transkribus which is an excellent example how such a general model can be created. Thanks a lot!

 

+ General model for “Fraktur” released

Thanks to the Library Labs of the Austrian National Library and the NewsEye project we are happy to announce the release of a free model which is capable to read German Fraktur documents especially from the 19th and 20th century in a convincing quality outperforming most standard OCR engines. The model is based on training data coming from the ANNO collection of the Austrian National Library and comprises 442.141 words. It shows a CER of 1,55% on the training set and 1,65% on the test set without any dictionary support. Note: the model is trained on German language documents. It will provide less convincing results for other languages, such as Swedish or Finnish Fraktur. However models for these languages are also in preparation and may be released in the coming months. The Fraktur model is available for every registered user in Transkribus and called: ONB _Newseye_GT_M1+. Have fun!

+ In preparation for the next Transkribus User Conference

We are excited about our next Transkribus User Conference, which this time will take place in Innsbruck. Preparations are in full swing and the date will be announced within the next few weeks.

Our last Transkribus User Conference in November 2018 was a great success, but there is always room for improvement. If you have some input to share on topics for the next conference or other suggestions for improvement, we are happy to receive them via email@transkribus.eu.

The program of our last user conference can be found here: https://read.transkribus.eu/transkribus-user-conference-2018/

… videos of the presentations in our Youtube-channel: https://www.youtube.com/playlist?list=PL7UbQtd4qlhKCEgLnZbJKQu9qpA5iF-sC

Hope to meet you there!

+ Transkribus HTR competing in OCR-test of the Zurich University

Members of the Zurich University compared two versions of the ABBYY FineReader (FineReader XIX and FineReader Server 11) OCR (Optical Character Recognition) and the Transkribus HTR (Handwritten Text Recognition) in order to find out which one is the most effective one when it comes to recognition results on black letters in historical newspapers. For the test they used PDFs with medium resolution images of the German-language Neue Zürcher Zeitung.

The recognition of black letters in historical newspapers can be particularly challenging because the distinctiveness of characters is often low, the paper quality can be bad and, in many cases, small font sizes are used. Systems like ABBYY FineReader and Transkribus are working on tackling such problems. We are happy that the experiment of the University of Zurich shows that Transkribus provides significantly better results than the commercial system ABBYY FineReader.

The article explains the effectiveness of the HTR, as only a modest amount of manual work is needed for the creation of ground truth, which makes it possible to apply the HTR on documents. Especially with printed texts in newspapers, error rates in Transkribus are usually low. Moreover, the test shows that the model, which had been trained for the Neue Zürcher Zeitung, also provided good results for other newspapers of the same epoch, like the Bundesblatt and the Neue Zuger Zeitung. Good news is, that the model of the Neue Zürcher Zeitung will become public during 2019.

If you would like to have a closer look on the experiment, you can find the whole article here: https://dev.clariah.nl/files/dh2019/boa/0694.html

Source: https://dev.clariah.nl/files/dh2019/boa/0694.html

+ Foundation of READ-COOP

On 1st of July 2019 the READ project will turn into a European Cooperative Society (SCE). READ-COOP will serve as the basis for sustaining and further developing the Transkribus platform and related services and tools.

READ-COOP will be based on the EU directive of a European Cooperative (SCE). Though the SCE will be set up according to EU law it will be open to members outside of the European Community as well. If you are interested in working with Transkribus on a long run – join READ-COOP and benefit from the work done by your collaborators.

One of the main reasons that we decided to go for a coop is that we want to support a “culture of collaboration” between archives/libraries, humanities scholars, computer scientists and the public (volunteers). We believe that intersectoral collaboration and full control over data are key for a successful integration of machine learning technologies into society and daily life. And an SCE delivers the best infrastructure to realize this goal.

An SCE is a legal entity which is open to new members (institutions, natural persons). Members shall benefit from an SCE directly, there is no shareholder value. Moroever SCEs are organised in a democratic way: The final say has the General Meeting.

More information can be found here: https://read.transkribus.eu/coop/

 

+ Transkribus goes America

In May, Barbara Denicolò from the Transkribus-Team Innsbruck and Elena Mühlbauer from the Diözesanarchiv in Passau in the name of READ travelled to the Midwest of the USA to present Transkribus to the American audience.

Though small, Kalamazoo in the state of Michigan, is well-known for one of the major congresses of the various mediaeval disciplines, which takes place every year at the Western Michigan University WMU. At the 54th International Congress on Medieval Studies, Elena and Barbara presented Transkribus as a practical tool for philologists and historians to transcribe and annotate old manuscripts and prints manually or automatically.

After a brief general introduction, the participants were able to segment and transcribe various documents themselves in a test collection created especially for them in a total of 90 minutes, and to apply particularly good models themselves. Although the workshop was unfortunately scheduled for the last time slot on Sunday and there were already noticeably less congress participants walking across the large university campus, a good dozen interested people found their way to the workshop.

The great interest was noticeable, while participants talked about their own projects and discussed possible applications and use cases. Hanna Lloyd from the University of Toronto for example reported on her user experiences and research results in her lecture “Digitizing Paleography: Transcribing Latin Charters with Transkribus”.

More information about the conference can be found here:

https://wmich.edu/sites/default/files/attachments/u434/2019/medieval-congress-program-2019.pdf

https://wmich.edu/medievalcongress

+ Handwritten Text Recognition at the National Archives of Finland

In the past 3 years research groups and archives from all over Europe were working on Handwritten Text Recognition for historical documents. Results can now be seen at the public Transkribus seminar at the National Archives of Finland in Helsinki on Wednesday 26.6.2019!

The Transkribus platform enables non-technical users to train neural networks in order to recognize and search historical documents. The seminar will provide an update on latest technical developments and showcase how Transkribus can be used in various scenarios. Moreover, a first version of a web-interface for searching Finnish Court records from the 19th century will be launched. With this search interface users can search historical documents in a “Google like” way.

The READ project is currently in the transformation to become one of the first European Cooperative Societies in the research, education and cultural heritage domain. Institutions and private persons are warmly invited to join this initiative.

If you would like to take part, please register yourself under the following link (participation is free of charge and registration is possible until 18.6.2019): https://www.eventbrite.com/e/transkribus-seminar-at-the-national-archives-of-finland-tickets-61567839064

The program includes an inspiring set of presentations from our international partners, as well as lunch and a panel discussion:

10.00. Welcoming words

10.15. READ-COOP: Günter Mühlberger (UIBK)

      Transkribus and the technology behind it

10.45. Transkribus platform: Sebastian Colutto (UIBK)

11.15. HTR in READ and Transkribus: Roger Labahn, Gundram Leifert (URO and CITlab)

11.30. Segmentation tools: Sofia Ares Oliveira (EPFL)

11.45. Table recognition: Hervé Déjean (NAVER)

12.00. ScanTent and DocScan: Matthias Wödlinger (CVL)

12.15- 13.15 Lunch

      Transkribus in practice

13.15 Edelfelt project: Maria Vainio-Kurtakko (SLS)

13.45 VeleHanden: Marc Ponte, Jirsi Reinders

14.15 Court Records Collection: (NAF and UPVLC)

15.00. Panel discussion

Source: https://pixabay.com/photos/helsinki-city-night-finland-1269310/

+ HTR+ reads old Slavonic documents with 3-5 % Character Error Rate

Recently our new HTR+ was tested on different styles of Church Slavonic handwritings by Achim Rabus, who is holding the Chair of Slavic Linguistics at the University of Freiburg in Germany. With Transkribus’ technology the error rates went down to 3 to 5 percent. Superscript letters, abbreviations and word separation are the challenges the HTR+ had to deal with.

A paper on the topic of recognizing handwritten text in Slavic manuscripts with Transkribus is about to be publicised by Achim Rabus. Within this project he discovered the potential of Transkribus when it comes to the digitizing of Church Slavonic manuscripts: the possibility to search in big documents without even having a special model for the individual handwriting and the opportunity to avoid a full manual transcription and instead just correcting the mistakes of the automated transcription makes “digitisation-life” a lot easier.

Part of the models Achim Rabus has trained already contain different hands and provide useful automatic transcripts. Nevertheless the READ-Team is working on further improving Transkribus in the way, that also for documents with mixed handwritings automatic transcripts with low character error rate can be produced.

Cooperation is the key for getting out the biggest benefit for everybody. That is also what Achim Rabus is convinced of and therefore he is happy to share his model with interested people. You can get in touch with him via email: achim.rabus@slavistik.uni-freiburg.de

You can have a look at the draft of the paper Recognizing handwritten text in Slavic manuscripts: A neural-network approach using Transkribus under the following link: https://www.academia.edu/38835297/Recognizing_handwritten_text_in_Slavic_manuscripts_A_neural-network_approach_using_Transkribus_1_Achim_Rabus

Source: Rabus, Achim: Recognizing handwritten text in Slavic manuscripts: A neural-network approach using Transkribus