+ Foundation of READ-COOP

On 1st of July 2019 the READ project will turn into a European Cooperative Society (SCE). READ-COOP will serve as the basis for sustaining and further developing the Transkribus platform and related services and tools.

READ-COOP will be based on the EU directive of a European Cooperative (SCE). Though the SCE will be set up according to EU law it will be open to members outside of the European Community as well. If you are interested in working with Transkribus on a long run – join READ-COOP and benefit from the work done by your collaborators.

One of the main reasons that we decided to go for a coop is that we want to support a “culture of collaboration” between archives/libraries, humanities scholars, computer scientists and the public (volunteers). We believe that intersectoral collaboration and full control over data are key for a successful integration of machine learning technologies into society and daily life. And an SCE delivers the best infrastructure to realize this goal.

An SCE is a legal entity which is open to new members (institutions, natural persons). Members shall benefit from an SCE directly, there is no shareholder value. Moroever SCEs are organised in a democratic way: The final say has the General Meeting.

More information can be found here: https://read.transkribus.eu/coop/

 

+ Transkribus goes America

In May, Barbara Denicolò from the Transkribus-Team Innsbruck and Elena Mühlbauer from the Diözesanarchiv in Passau in the name of READ travelled to the Midwest of the USA to present Transkribus to the American audience.

Though small, Kalamazoo in the state of Michigan, is well-known for one of the major congresses of the various mediaeval disciplines, which takes place every year at the Western Michigan University WMU. At the 54th International Congress on Medieval Studies, Elena and Barbara presented Transkribus as a practical tool for philologists and historians to transcribe and annotate old manuscripts and prints manually or automatically.

After a brief general introduction, the participants were able to segment and transcribe various documents themselves in a test collection created especially for them in a total of 90 minutes, and to apply particularly good models themselves. Although the workshop was unfortunately scheduled for the last time slot on Sunday and there were already noticeably less congress participants walking across the large university campus, a good dozen interested people found their way to the workshop.

The great interest was noticeable, while participants talked about their own projects and discussed possible applications and use cases. Hanna Lloyd from the University of Toronto for example reported on her user experiences and research results in her lecture “Digitizing Paleography: Transcribing Latin Charters with Transkribus”.

More information about the conference can be found here:

https://wmich.edu/sites/default/files/attachments/u434/2019/medieval-congress-program-2019.pdf

https://wmich.edu/medievalcongress

+ Handwritten Text Recognition at the National Archives of Finland

In the past 3 years research groups and archives from all over Europe were working on Handwritten Text Recognition for historical documents. Results can now be seen at the public Transkribus seminar at the National Archives of Finland in Helsinki on Wednesday 26.6.2019!

The Transkribus platform enables non-technical users to train neural networks in order to recognize and search historical documents. The seminar will provide an update on latest technical developments and showcase how Transkribus can be used in various scenarios. Moreover, a first version of a web-interface for searching Finnish Court records from the 19th century will be launched. With this search interface users can search historical documents in a “Google like” way.

The READ project is currently in the transformation to become one of the first European Cooperative Societies in the research, education and cultural heritage domain. Institutions and private persons are warmly invited to join this initiative.

If you would like to take part, please register yourself under the following link (participation is free of charge and registration is possible until 18.6.2019): https://www.eventbrite.com/e/transkribus-seminar-at-the-national-archives-of-finland-tickets-61567839064

The program includes an inspiring set of presentations from our international partners, as well as lunch and a panel discussion:

10.00. Welcoming words

10.15. READ-COOP: Günter Mühlberger (UIBK)

      Transkribus and the technology behind it

10.45. Transkribus platform: Sebastian Colutto (UIBK)

11.15. HTR in READ and Transkribus: Roger Labahn, Gundram Leifert (URO and CITlab)

11.30. Segmentation tools: Sofia Ares Oliveira (EPFL)

11.45. Table recognition: Hervé Déjean (NAVER)

12.00. ScanTent and DocScan: Matthias Wödlinger (CVL)

12.15- 13.15 Lunch

      Transkribus in practice

13.15 Edelfelt project: Maria Vainio-Kurtakko (SLS)

13.45 VeleHanden: Marc Ponte, Jirsi Reinders

14.15 Court Records Collection: (NAF and UPVLC)

15.00. Panel discussion

Source: https://pixabay.com/photos/helsinki-city-night-finland-1269310/

+ HTR+ reads old Slavonic documents with 3-5 % Character Error Rate

Recently our new HTR+ was tested on different styles of Church Slavonic handwritings by Achim Rabus, who is holding the Chair of Slavic Linguistics at the University of Freiburg in Germany. With Transkribus’ technology the error rates went down to 3 to 5 percent. Superscript letters, abbreviations and word separation are the challenges the HTR+ had to deal with.

A paper on the topic of recognizing handwritten text in Slavic manuscripts with Transkribus is about to be publicised by Achim Rabus. Within this project he discovered the potential of Transkribus when it comes to the digitizing of Church Slavonic manuscripts: the possibility to search in big documents without even having a special model for the individual handwriting and the opportunity to avoid a full manual transcription and instead just correcting the mistakes of the automated transcription makes “digitisation-life” a lot easier.

Part of the models Achim Rabus has trained already contain different hands and provide useful automatic transcripts. Nevertheless the READ-Team is working on further improving Transkribus in the way, that also for documents with mixed handwritings automatic transcripts with low character error rate can be produced.

Cooperation is the key for getting out the biggest benefit for everybody. That is also what Achim Rabus is convinced of and therefore he is happy to share his model with interested people. You can get in touch with him via email: achim.rabus@slavistik.uni-freiburg.de

You can have a look at the draft of the paper Recognizing handwritten text in Slavic manuscripts: A neural-network approach using Transkribus under the following link: https://www.academia.edu/38835297/Recognizing_handwritten_text_in_Slavic_manuscripts_A_neural-network_approach_using_Transkribus_1_Achim_Rabus

Source: Rabus, Achim: Recognizing handwritten text in Slavic manuscripts: A neural-network approach using Transkribus

 

+ Transkribus-support for DIGITENS

Transkribus now helps to produce a digital encyclopaedia, containing articles regarding sociability during the Age of British enlightenment. This should be achieved within the H2020 DIGITENS-project coordinated by the University of Western Brittany (UBO) in Brest, France, which gives young scholars the chance to get familiar with new digital humanities research tools and with the work in archives. At the same time the project opens them up to the opportunity to spend time abroad and therefore supports mobility.

For the DIGITENS-project, it is important to have a standardized workflow in order to work efficiently. This is where Transkribus comes into play. Our software makes it possible to cover the whole workflow from scanning with the ScanTent and the DocScan app up to international cooperation in using Handwritten Text Recognition models. This way Transkribus can give the project the required infrastructure.

To give the scholars an insight into the work with Transkribus, we have organised a workshop in Brest at the UBO on 22nd of May.

For more information about the project, visit the DIGITENS website: https://www.univ-brest.fr/digitens/

The project is coordinated by the GIS Sociability and the research lab HCTI. The DIGITENS encyclopedia will soon be available online at the following address:  http://www.digitens.fr/1/accueil

http://www.digitens.fr/1/accueil

+ First meeting of Dutch Transkribus network

On 4 April 2019 Transkribus users from the Netherlands and Belgium gathered under cloudy skies in The Hague to discuss the possibility of forming a network to improve the automated recognition of Dutch language documents.

The event was kindly hosted by Liesbeth Keijser and her team at Nationaal Archief.

The event attracted 45 people from more than 15 institutions including Nationaal Archief, Huygens ING, Koninklijke Bibliotheek, Stadsarchief Amsterdam, Europeana, International Institute of Social History, Ghent University, Het Utrechts Archief, Stadsarchief Antwerpen, Noord-Hollands Archief and Picturae.

There are many active Transkribus users in the Dutch-speaking regions and The Netherlands in particular is advanced in the realms of digitisation, technological innovation and digital humanities. This event was designed to allow users to share information about their work with Transkribus and forge collaboration on a generic ‘Dutch model’ capable of recognising a sizeable variety of Dutch language documents.

We were welcomed with an introduction from Marens Engelhard, Director of Nationaal Archief.

The event moved on with a presentation about the vision for  a Dutch network from Günter Mühlberger, coordinator of the READ project.

Günter explained that the launch of READ-COOP in the summer of 2019 will provide sustainability for Transkribus after the end of the READ project. He also offered a sneak peak of some forthcoming features in the platform including:

  • Advanced error rate tool to assess the accuracy of individual pages of automatically generated text
  • Trainable Layout Analysis capable of recognising page features like dates and marginalia
  • Improved interface for HTR model training, which makes it easier to mix different models together

We then heard from four sets of Transkribus users about the HTR models they had created and their experiences with the platform:

Liesbeth Keijser (Nationaal Archief) presenting

The afternoon was dedicated to discussion about necessary milestones and next steps. Participants discussed the challenge of recognising Dutch material, possible avenues for exchanging data and the desirability of an individual coordinator to play a leading role in the network.

The idea of a Dutch language Transkribus network was enthusiastically received and the event concluded with suggestions of funding avenues to investigate and a proposal for twice-yearly meetings be held at different venues. There is huge potential here to work collaboratively to significantly improve the recognition of Dutch language material and we look forward to seeing what develops!

+ Searching more than 100 years of mountaineering history with Transkribus

We are proud to be part of a successful project carried out by the New Zealand Alpine Club and the University of Innsbruck (Linguistic Institute). The complete workflow was done within Transkribus: apart from uploading files and running the text recognition volunteers used the web-based transcription interface from Transkribus to carefully correct all 17,500 pages of the New Zealand Alpine Journal.

In order to make the journal also searchable, Transkribus team members developed a simple but effective web-application, which enables users to browse all issues and to search the full-text of the complete journal. The application runs also very well on a smartphone. All data is hosted by Transkribus.

The project received its funding via the crowd-funding-platform Give a little. People donated about 6 000 NZD, which is a great support.

Check out the web-application for searching the journal editions here.

 

Foto credit: https://www.nzaj-archive.nz/

+ Crowdsourcing with Transkribus at Amsterdam City Archives

When we work together, there’s so much we can achieve! Amsterdam City Archives and VeleHanden have just launched a fantastic crowdsourcing initiative which combines the power of our Handwritten Text Recognition (HTR) technology with the talents of volunteer transcribers.

Image credit: Amsterdam City Archives

Amsterdam City Archives are interested in opening up access to the records of Amsterdam’s notaries, which span from the sixteenth to the twentieth century. These documents are ripe for further exploration for those interested in the rich social and economic history of the Dutch capital.  The ultimate aim is to create a fully searchable record of this precious handwritten collection.

The team have been working with our Transkribus platform to train HTR models to recognise different parts of this collection.

The HTR models were used to generate automated transcripts of the documents. It is now up to volunteers to correct any errors made by the machine!

The project is hosted on VeleHanden, a successful crowdsourcing platform created by the company PicturaeCrowd leert computer lezen is directly connected to the Transkribus web interface, meaning that any changes made by volunteers can be fed straight back into the system to improve the automated recognition.

Anyone can take part in this new project and explore various difficulty levels to find documents they are interested in.  Volunteers collect points for their transcription work which can be redeemed at exhibitions and events at Amsterdam City Archives.

We are really looking forward to seeing what the computer can learn from the crowd…

Mark Ponte from Amsterdam City Archives gave us a sneak peak of the project at our recent Transkribus User Conference

CORNELIS STAAL 1749-1753 – 1 – Beginner – 13131 – A31239000579. Screenshot from Crowd leert computer lezen. Image credit: Amsterdam City Archives.

+ 20,000 tremendous Transkribus users!

Our latest milestone has put a big smile on our faces – there are now over 20,000 registered users of our Transkribus platform for Handwritten Text Recognition! People are working with Transkribus across the globe, using it to train hundreds of models to recognise texts of diverse dates, languages and styles.

Across the course of the READ project, we have welcomed over 13,000 new users of the platform and created a formidable network of people interested in opening up access to historical documents. We look forward to continued growth as we move into our next phase with READ COOP.

Become part of our community! Follow us on Twitter or Facebook to keep up to date with our latest news.

+ Plant power! Results from the Royal College of Physicians’ Herbarium

The Royal College of Physicians has been devoted to advancing medicine for the past 500 years and has amassed outstanding historical collections of rare books, medical instruments and medicinal plant specimens.

The RCP has recently digitised the 6000 sheets from the (mostly) nineteenth-century Herbarium of the Pharmaceutical Society of Great Britain.  This collection comprises thousands of preserved plant specimens and their associated labels.

Dr Michael de Swiet, Dr Henry Oakley and Professor Anthony Dayan of the RCP then decided to work with the Transkribus team to try to recognise the text from the Herbarium collection.

The documents present various challenges for Handwritten Text Recognition (HTR) technology.  They contain a mix of printed and handwritten text (in ink and pencil), various languages, abbreviations and specialist vocabulary. They are also written in several (similar) hands.

A first HTR model was trained on 29,083 transcribed words  from the collection, using the pre-existing ‘English Writing M1’ model as part of the training process.  The ‘English Writing M1’ model is trained to recognise the writing of the English philosopher Jeremy Bentham (1748 – 1832) and his secretaries – it is freely available to all Transkribus users for their experiments.

In the best cases, the resulting model can automatically transcribe pages from the collection with a Character Error Rate (CER) of around 10%.

Image from the Herbarium with an automated transcription of the label. Image credit: Royal College of Physicians.

The team at the RCP are pleased with these results and would be happy if they could be shared and improved upon by other people working with Herbarium material.  If you would like to find out more about their work or have access to their HTR model, please contact the Transkribus team (email@transkribus.eu).