+ HTR+ reads old Slavonic documents with 3-5 % Character Error Rate

Recently our new HTR+ was tested on different styles of Church Slavonic handwritings by Achim Rabus, who is holding the Chair of Slavic Linguistics at the University of Freiburg in Germany. With Transkribus’ technology the error rates went down to 3 to 5 percent. Superscript letters, abbreviations and word separation are the challenges the HTR+ had to deal with.

A paper on the topic of recognizing handwritten text in Slavic manuscripts with Transkribus is about to be publicised by Achim Rabus. Within this project he discovered the potential of Transkribus when it comes to the digitizing of Church Slavonic manuscripts: the possibility to search in big documents without even having a special model for the individual handwriting and the opportunity to avoid a full manual transcription and instead just correcting the mistakes of the automated transcription makes “digitisation-life” a lot easier.

Part of the models Achim Rabus has trained already contain different hands and provide useful automatic transcripts. Nevertheless the READ-Team is working on further improving Transkribus in the way, that also for documents with mixed handwritings automatic transcripts with low character error rate can be produced.

Cooperation is the key for getting out the biggest benefit for everybody. That is also what Achim Rabus is convinced of and therefore he is happy to share his model with interested people. You can get in touch with him via email: achim.rabus@slavistik.uni-freiburg.de

You can have a look at the draft of the paper Recognizing handwritten text in Slavic manuscripts: A neural-network approach using Transkribus under the following link: https://www.academia.edu/38835297/Recognizing_handwritten_text_in_Slavic_manuscripts_A_neural-network_approach_using_Transkribus_1_Achim_Rabus

Source: Rabus, Achim: Recognizing handwritten text in Slavic manuscripts: A neural-network approach using Transkribus

 

+ Transkribus-support for DIGITENS

Transkribus now helps to produce a digital encyclopaedia, containing articles regarding sociability during the Age of British enlightenment. This should be achieved within the H2020 DIGITENS-project coordinated by the University of Western Brittany (UBO) in Brest, France, which gives young scholars the chance to get familiar with new digital humanities research tools and with the work in archives. At the same time the project opens them up to the opportunity to spend time abroad and therefore supports mobility.

For the DIGITENS-project, it is important to have a standardized workflow in order to work efficiently. This is where Transkribus comes into play. Our software makes it possible to cover the whole workflow from scanning with the ScanTent and the DocScan app up to international cooperation in using Handwritten Text Recognition models. This way Transkribus can give the project the required infrastructure.

To give the scholars an insight into the work with Transkribus, we have organised a workshop in Brest at the UBO on 22nd of May.

For more information about the project, visit the DIGITENS website: https://www.univ-brest.fr/digitens/

The project is coordinated by the GIS Sociability and the research lab HCTI. The DIGITENS encyclopedia will soon be available online at the following address:  http://www.digitens.fr/1/accueil

http://www.digitens.fr/1/accueil

+ First meeting of Dutch Transkribus network

On 4 April 2019 Transkribus users from the Netherlands and Belgium gathered under cloudy skies in The Hague to discuss the possibility of forming a network to improve the automated recognition of Dutch language documents.

The event was kindly hosted by Liesbeth Keijser and her team at Nationaal Archief.

The event attracted 45 people from more than 15 institutions including Nationaal Archief, Huygens ING, Koninklijke Bibliotheek, Stadsarchief Amsterdam, Europeana, International Institute of Social History, Ghent University, Het Utrechts Archief, Stadsarchief Antwerpen, Noord-Hollands Archief and Picturae.

There are many active Transkribus users in the Dutch-speaking regions and The Netherlands in particular is advanced in the realms of digitisation, technological innovation and digital humanities. This event was designed to allow users to share information about their work with Transkribus and forge collaboration on a generic ‘Dutch model’ capable of recognising a sizeable variety of Dutch language documents.

We were welcomed with an introduction from Marens Engelhard, Director of Nationaal Archief.

The event moved on with a presentation about the vision for  a Dutch network from Günter Mühlberger, coordinator of the READ project.

Günter explained that the launch of READ-COOP in the summer of 2019 will provide sustainability for Transkribus after the end of the READ project. He also offered a sneak peak of some forthcoming features in the platform including:

  • Advanced error rate tool to assess the accuracy of individual pages of automatically generated text
  • Trainable Layout Analysis capable of recognising page features like dates and marginalia
  • Improved interface for HTR model training, which makes it easier to mix different models together

We then heard from four sets of Transkribus users about the HTR models they had created and their experiences with the platform:

Liesbeth Keijser (Nationaal Archief) presenting

The afternoon was dedicated to discussion about necessary milestones and next steps. Participants discussed the challenge of recognising Dutch material, possible avenues for exchanging data and the desirability of an individual coordinator to play a leading role in the network.

The idea of a Dutch language Transkribus network was enthusiastically received and the event concluded with suggestions of funding avenues to investigate and a proposal for twice-yearly meetings be held at different venues. There is huge potential here to work collaboratively to significantly improve the recognition of Dutch language material and we look forward to seeing what develops!

+ Searching more than 100 years of mountaineering history with Transkribus

We are proud to be part of a successful project carried out by the New Zealand Alpine Club and the University of Innsbruck (Linguistic Institute). The complete workflow was done within Transkribus: apart from uploading files and running the text recognition volunteers used the web-based transcription interface from Transkribus to carefully correct all 17,500 pages of the New Zealand Alpine Journal.

In order to make the journal also searchable, Transkribus team members developed a simple but effective web-application, which enables users to browse all issues and to search the full-text of the complete journal. The application runs also very well on a smartphone. All data is hosted by Transkribus.

The project received its funding via the crowd-funding-platform Give a little. People donated about 6 000 NZD, which is a great support.

Check out the web-application for searching the journal editions here.

 

Foto credit: https://www.nzaj-archive.nz/

+ Crowdsourcing with Transkribus at Amsterdam City Archives

When we work together, there’s so much we can achieve! Amsterdam City Archives and VeleHanden have just launched a fantastic crowdsourcing initiative which combines the power of our Handwritten Text Recognition (HTR) technology with the talents of volunteer transcribers.

Image credit: Amsterdam City Archives

Amsterdam City Archives are interested in opening up access to the records of Amsterdam’s notaries, which span from the sixteenth to the twentieth century. These documents are ripe for further exploration for those interested in the rich social and economic history of the Dutch capital.  The ultimate aim is to create a fully searchable record of this precious handwritten collection.

The team have been working with our Transkribus platform to train HTR models to recognise different parts of this collection.

The HTR models were used to generate automated transcripts of the documents. It is now up to volunteers to correct any errors made by the machine!

The project is hosted on VeleHanden, a successful crowdsourcing platform created by the company PicturaeCrowd leert computer lezen is directly connected to the Transkribus web interface, meaning that any changes made by volunteers can be fed straight back into the system to improve the automated recognition.

Anyone can take part in this new project and explore various difficulty levels to find documents they are interested in.  Volunteers collect points for their transcription work which can be redeemed at exhibitions and events at Amsterdam City Archives.

We are really looking forward to seeing what the computer can learn from the crowd…

Mark Ponte from Amsterdam City Archives gave us a sneak peak of the project at our recent Transkribus User Conference

CORNELIS STAAL 1749-1753 – 1 – Beginner – 13131 – A31239000579. Screenshot from Crowd leert computer lezen. Image credit: Amsterdam City Archives.

+ 20,000 tremendous Transkribus users!

Our latest milestone has put a big smile on our faces – there are now over 20,000 registered users of our Transkribus platform for Handwritten Text Recognition! People are working with Transkribus across the globe, using it to train hundreds of models to recognise texts of diverse dates, languages and styles.

Across the course of the READ project, we have welcomed over 13,000 new users of the platform and created a formidable network of people interested in opening up access to historical documents. We look forward to continued growth as we move into our next phase with READ COOP.

Become part of our community! Follow us on Twitter or Facebook to keep up to date with our latest news.

+ Plant power! Results from the Royal College of Physicians’ Herbarium

The Royal College of Physicians has been devoted to advancing medicine for the past 500 years and has amassed outstanding historical collections of rare books, medical instruments and medicinal plant specimens.

The RCP has recently digitised the 6000 sheets from the (mostly) nineteenth-century Herbarium of the Pharmaceutical Society of Great Britain.  This collection comprises thousands of preserved plant specimens and their associated labels.

Dr Michael de Swiet, Dr Henry Oakley and Professor Anthony Dayan of the RCP then decided to work with the Transkribus team to try to recognise the text from the Herbarium collection.

The documents present various challenges for Handwritten Text Recognition (HTR) technology.  They contain a mix of printed and handwritten text (in ink and pencil), various languages, abbreviations and specialist vocabulary. They are also written in several (similar) hands.

A first HTR model was trained on 29,083 transcribed words  from the collection, using the pre-existing ‘English Writing M1’ model as part of the training process.  The ‘English Writing M1’ model is trained to recognise the writing of the English philosopher Jeremy Bentham (1748 – 1832) and his secretaries – it is freely available to all Transkribus users for their experiments.

In the best cases, the resulting model can automatically transcribe pages from the collection with a Character Error Rate (CER) of around 10%.

Image from the Herbarium with an automated transcription of the label. Image credit: Royal College of Physicians.

The team at the RCP are pleased with these results and would be happy if they could be shared and improved upon by other people working with Herbarium material.  If you would like to find out more about their work or have access to their HTR model, please contact the Transkribus team (email@transkribus.eu).

+ Birth of the First Republic: Recognising Austrian Parliamentary papers

After the tumult of World War One, the First Austrian Republic was declared in September 1919. Nearly 100 years later, the Austrian Parliamentary Directorate are making documents from this crucial period of national history available online.

They have recently digitised 9000 parliamentary papers from the First Republic and are working with our Transkribus platform to provide the fully-searchable text of this collection.

Document from the Meeting of the Provisional National Assembly on 6 February 1919. Image credit: L1.5 Kompetenzzentrum, Parlamentsdirektion.

These parliamentary documents contain both handwritten and printed text (using the Fraktur font of Latin calligraphy).

5000 pages of the collection have now been processed in Transkribus with a mixture of Handwritten Text Recognition and our ABBYY FineReader OCR engine. 3,500 of these pages were then manually checked to ensure the accuracy of the recognised words.  Keywords have been captured and assigned to documents, allowing for further search options.  The next task is to record the titles and metadata of the documents to enhance their usefulness.

+ Learn how to read historical handwriting with Transkribus Learn

We’re sure many people know how challenging it can be to read historical manuscripts. Our new e-learning website, Transkribus LEARN  is here to help! Transkribus LEARN does not replace systematic paleography training but it allows users to practice reading and transcribing individual words, learning as they go.

Learners can practice on a computer or mobile phone and check their learning progress.

Transkribus LEARN has two transcription modes – ‘Study’ and ‘Test’. In the former, users can guess and then reveal the transcription of individual words in a manuscript. In the latter, users will be prompted to transcribe the missing word in a series of examples. At the end, a score is received, along with a list of correct and incorrect answers. Users can keep studying and testing themselves, as often as they like.

Screenshot from Transkribus Learn. Sütterlin script from a 1926 Austrian recipe book.

Levels of difficulty vary from simple, computer-generated cursive scripts to challenging historical documents in various languages and of different time periods. Users can also upload their own documents to the platform as a training exercise for students or volunteers.

To try it out:

Happy learning one and all! We welcome any feedback at: learn@transkribus.eu