+ Special models on Slavic handwriting released

Prof. Achim Rabus from the University of Freiburg has released two specialized models which are able to read Russian Curch Slavonic. The first model is called VMC_Test_4+: Training data consist of parts of the Russian Church Slavonic Great Reading Menology (16th century). The model is tailored towards transcribing Cyrillic semi-uncial script from the 16th century. Character Error Rates for the training data are 3.72% and for the validation set 3.92% and for the validation set 3.82%.

The second model is called: Combined_Full_VKS_2: Training data consist of parts of the Russian Church Slavonic Great Reading Menology (16th century), Old Church Slavonic Codex Suprasliensis (11th century), and the 11th century manuscript of the Catecheses of Cyril of Jerusalem. This is a generic model suitable for transcribing a variety of Old Cyrillic script styles including uncial and semi-uncial. Character Error Rates for the training data are 4.42% and for the validation set 3.92%.

Achim has written a detailed report  about his usage of Transkribus which is an excellent example how such a general model can be created. Thanks a lot!

 

+ General model for “Fraktur” released

Thanks to the Library Labs of the Austrian National Library and the NewsEye project we are happy to announce the release of a free model which is capable to read German Fraktur documents especially from the 19th and 20th century in a convincing quality outperforming most standard OCR engines. The model is based on training data coming from the ANNO collection of the Austrian National Library and comprises 442.141 words. It shows a CER of 1,55% on the training set and 1,65% on the test set without any dictionary support. Note: the model is trained on German language documents. It will provide less convincing results for other languages, such as Swedish or Finnish Fraktur. However models for these languages are also in preparation and may be released in the coming months. The Fraktur model is available for every registered user in Transkribus and called: ONB _Newseye_GT_M1+. Have fun!

+ In preparation for the next Transkribus User Conference

We are excited about our next Transkribus User Conference, which this time will take place in Innsbruck. Preparations are in full swing and the date will be announced within the next few weeks.

Our last Transkribus User Conference in November 2018 was a great success, but there is always room for improvement. If you have some input to share on topics for the next conference or other suggestions for improvement, we are happy to receive them via email@transkribus.eu.

The program of our last user conference can be found here: https://read.transkribus.eu/transkribus-user-conference-2018/

… videos of the presentations in our Youtube-channel: https://www.youtube.com/playlist?list=PL7UbQtd4qlhKCEgLnZbJKQu9qpA5iF-sC

Hope to meet you there!

+ Transkribus HTR competing in OCR-test of the Zurich University

Members of the Zurich University compared two versions of the ABBYY FineReader (FineReader XIX and FineReader Server 11) OCR (Optical Character Recognition) and the Transkribus HTR (Handwritten Text Recognition) in order to find out which one is the most effective one when it comes to recognition results on black letters in historical newspapers. For the test they used PDFs with medium resolution images of the German-language Neue Zürcher Zeitung.

The recognition of black letters in historical newspapers can be particularly challenging because the distinctiveness of characters is often low, the paper quality can be bad and, in many cases, small font sizes are used. Systems like ABBYY FineReader and Transkribus are working on tackling such problems. We are happy that the experiment of the University of Zurich shows that Transkribus provides significantly better results than the commercial system ABBYY FineReader.

The article explains the effectiveness of the HTR, as only a modest amount of manual work is needed for the creation of ground truth, which makes it possible to apply the HTR on documents. Especially with printed texts in newspapers, error rates in Transkribus are usually low. Moreover, the test shows that the model, which had been trained for the Neue Zürcher Zeitung, also provided good results for other newspapers of the same epoch, like the Bundesblatt and the Neue Zuger Zeitung. Good news is, that the model of the Neue Zürcher Zeitung will become public during 2019.

If you would like to have a closer look on the experiment, you can find the whole article here: https://dev.clariah.nl/files/dh2019/boa/0694.html

Source: https://dev.clariah.nl/files/dh2019/boa/0694.html

+ Foundation of READ-COOP

On 1st of July 2019 the READ project will turn into a European Cooperative Society (SCE). READ-COOP will serve as the basis for sustaining and further developing the Transkribus platform and related services and tools.

READ-COOP will be based on the EU directive of a European Cooperative (SCE). Though the SCE will be set up according to EU law it will be open to members outside of the European Community as well. If you are interested in working with Transkribus on a long run – join READ-COOP and benefit from the work done by your collaborators.

One of the main reasons that we decided to go for a coop is that we want to support a “culture of collaboration” between archives/libraries, humanities scholars, computer scientists and the public (volunteers). We believe that intersectoral collaboration and full control over data are key for a successful integration of machine learning technologies into society and daily life. And an SCE delivers the best infrastructure to realize this goal.

An SCE is a legal entity which is open to new members (institutions, natural persons). Members shall benefit from an SCE directly, there is no shareholder value. Moroever SCEs are organised in a democratic way: The final say has the General Meeting.

More information can be found here: https://read.transkribus.eu/coop/

 

+ Transkribus goes America

In May, Barbara Denicolò from the Transkribus-Team Innsbruck and Elena Mühlbauer from the Diözesanarchiv in Passau in the name of READ travelled to the Midwest of the USA to present Transkribus to the American audience.

Though small, Kalamazoo in the state of Michigan, is well-known for one of the major congresses of the various mediaeval disciplines, which takes place every year at the Western Michigan University WMU. At the 54th International Congress on Medieval Studies, Elena and Barbara presented Transkribus as a practical tool for philologists and historians to transcribe and annotate old manuscripts and prints manually or automatically.

After a brief general introduction, the participants were able to segment and transcribe various documents themselves in a test collection created especially for them in a total of 90 minutes, and to apply particularly good models themselves. Although the workshop was unfortunately scheduled for the last time slot on Sunday and there were already noticeably less congress participants walking across the large university campus, a good dozen interested people found their way to the workshop.

The great interest was noticeable, while participants talked about their own projects and discussed possible applications and use cases. Hanna Lloyd from the University of Toronto for example reported on her user experiences and research results in her lecture “Digitizing Paleography: Transcribing Latin Charters with Transkribus”.

More information about the conference can be found here:

https://wmich.edu/sites/default/files/attachments/u434/2019/medieval-congress-program-2019.pdf

https://wmich.edu/medievalcongress

+ Handwritten Text Recognition at the National Archives of Finland

In the past 3 years research groups and archives from all over Europe were working on Handwritten Text Recognition for historical documents. Results can now be seen at the public Transkribus seminar at the National Archives of Finland in Helsinki on Wednesday 26.6.2019!

The Transkribus platform enables non-technical users to train neural networks in order to recognize and search historical documents. The seminar will provide an update on latest technical developments and showcase how Transkribus can be used in various scenarios. Moreover, a first version of a web-interface for searching Finnish Court records from the 19th century will be launched. With this search interface users can search historical documents in a “Google like” way.

The READ project is currently in the transformation to become one of the first European Cooperative Societies in the research, education and cultural heritage domain. Institutions and private persons are warmly invited to join this initiative.

If you would like to take part, please register yourself under the following link (participation is free of charge and registration is possible until 18.6.2019): https://www.eventbrite.com/e/transkribus-seminar-at-the-national-archives-of-finland-tickets-61567839064

The program includes an inspiring set of presentations from our international partners, as well as lunch and a panel discussion:

10.00. Welcoming words

10.15. READ-COOP: Günter Mühlberger (UIBK)

      Transkribus and the technology behind it

10.45. Transkribus platform: Sebastian Colutto (UIBK)

11.15. HTR in READ and Transkribus: Roger Labahn, Gundram Leifert (URO and CITlab)

11.30. Segmentation tools: Sofia Ares Oliveira (EPFL)

11.45. Table recognition: Hervé Déjean (NAVER)

12.00. ScanTent and DocScan: Matthias Wödlinger (CVL)

12.15- 13.15 Lunch

      Transkribus in practice

13.15 Edelfelt project: Maria Vainio-Kurtakko (SLS)

13.45 VeleHanden: Marc Ponte, Jirsi Reinders

14.15 Court Records Collection: (NAF and UPVLC)

15.00. Panel discussion

Source: https://pixabay.com/photos/helsinki-city-night-finland-1269310/

+ HTR+ reads old Slavonic documents with 3-5 % Character Error Rate

Recently our new HTR+ was tested on different styles of Church Slavonic handwritings by Achim Rabus, who is holding the Chair of Slavic Linguistics at the University of Freiburg in Germany. With Transkribus’ technology the error rates went down to 3 to 5 percent. Superscript letters, abbreviations and word separation are the challenges the HTR+ had to deal with.

A paper on the topic of recognizing handwritten text in Slavic manuscripts with Transkribus is about to be publicised by Achim Rabus. Within this project he discovered the potential of Transkribus when it comes to the digitizing of Church Slavonic manuscripts: the possibility to search in big documents without even having a special model for the individual handwriting and the opportunity to avoid a full manual transcription and instead just correcting the mistakes of the automated transcription makes “digitisation-life” a lot easier.

Part of the models Achim Rabus has trained already contain different hands and provide useful automatic transcripts. Nevertheless the READ-Team is working on further improving Transkribus in the way, that also for documents with mixed handwritings automatic transcripts with low character error rate can be produced.

Cooperation is the key for getting out the biggest benefit for everybody. That is also what Achim Rabus is convinced of and therefore he is happy to share his model with interested people. You can get in touch with him via email: achim.rabus@slavistik.uni-freiburg.de

You can have a look at the draft of the paper Recognizing handwritten text in Slavic manuscripts: A neural-network approach using Transkribus under the following link: https://www.academia.edu/38835297/Recognizing_handwritten_text_in_Slavic_manuscripts_A_neural-network_approach_using_Transkribus_1_Achim_Rabus

Source: Rabus, Achim: Recognizing handwritten text in Slavic manuscripts: A neural-network approach using Transkribus

 

+ Transkribus-support for DIGITENS

Transkribus now helps to produce a digital encyclopaedia, containing articles regarding sociability during the Age of British enlightenment. This should be achieved within the H2020 DIGITENS-project coordinated by the University of Western Brittany (UBO) in Brest, France, which gives young scholars the chance to get familiar with new digital humanities research tools and with the work in archives. At the same time the project opens them up to the opportunity to spend time abroad and therefore supports mobility.

For the DIGITENS-project, it is important to have a standardized workflow in order to work efficiently. This is where Transkribus comes into play. Our software makes it possible to cover the whole workflow from scanning with the ScanTent and the DocScan app up to international cooperation in using Handwritten Text Recognition models. This way Transkribus can give the project the required infrastructure.

To give the scholars an insight into the work with Transkribus, we have organised a workshop in Brest at the UBO on 22nd of May.

For more information about the project, visit the DIGITENS website: https://www.univ-brest.fr/digitens/

The project is coordinated by the GIS Sociability and the research lab HCTI. The DIGITENS encyclopedia will soon be available online at the following address:  http://www.digitens.fr/1/accueil

http://www.digitens.fr/1/accueil

+ First meeting of Dutch Transkribus network

On 4 April 2019 Transkribus users from the Netherlands and Belgium gathered under cloudy skies in The Hague to discuss the possibility of forming a network to improve the automated recognition of Dutch language documents.

The event was kindly hosted by Liesbeth Keijser and her team at Nationaal Archief.

The event attracted 45 people from more than 15 institutions including Nationaal Archief, Huygens ING, Koninklijke Bibliotheek, Stadsarchief Amsterdam, Europeana, International Institute of Social History, Ghent University, Het Utrechts Archief, Stadsarchief Antwerpen, Noord-Hollands Archief and Picturae.

There are many active Transkribus users in the Dutch-speaking regions and The Netherlands in particular is advanced in the realms of digitisation, technological innovation and digital humanities. This event was designed to allow users to share information about their work with Transkribus and forge collaboration on a generic ‘Dutch model’ capable of recognising a sizeable variety of Dutch language documents.

We were welcomed with an introduction from Marens Engelhard, Director of Nationaal Archief.

The event moved on with a presentation about the vision for  a Dutch network from Günter Mühlberger, coordinator of the READ project.

Günter explained that the launch of READ-COOP in the summer of 2019 will provide sustainability for Transkribus after the end of the READ project. He also offered a sneak peak of some forthcoming features in the platform including:

  • Advanced error rate tool to assess the accuracy of individual pages of automatically generated text
  • Trainable Layout Analysis capable of recognising page features like dates and marginalia
  • Improved interface for HTR model training, which makes it easier to mix different models together

We then heard from four sets of Transkribus users about the HTR models they had created and their experiences with the platform:

Liesbeth Keijser (Nationaal Archief) presenting

The afternoon was dedicated to discussion about necessary milestones and next steps. Participants discussed the challenge of recognising Dutch material, possible avenues for exchanging data and the desirability of an individual coordinator to play a leading role in the network.

The idea of a Dutch language Transkribus network was enthusiastically received and the event concluded with suggestions of funding avenues to investigate and a proposal for twice-yearly meetings be held at different venues. There is huge potential here to work collaboratively to significantly improve the recognition of Dutch language material and we look forward to seeing what develops!