Another day, another news feature for Transkribus! il Fatto Quotidiano, an Italian daily newspaper, recently published a summary of our huge achievements in Handwritten Text Recognition and invited readers to try out our technology.
After the tumult of World War One, the First Austrian Republic was declared in September 1919. Nearly 100 years later, the Austrian Parliamentary Directorate are making documents from this crucial period of national history available online.
They have recently digitised 9000 parliamentary papers from the First Republic and are working with our Transkribus platform to provide the fully-searchable text of this collection.
These parliamentary documents contain both handwritten and printed text (using the Fraktur font of Latin calligraphy).
5000 pages of the collection have now been processed in Transkribus with a mixture of Handwritten Text Recognition and our ABBYY FineReader OCR engine. 3,500 of these pages were then manually checked to ensure the accuracy of the recognised words. Keywords have been captured and assigned to documents, allowing for further search options. The next task is to record the titles and metadata of the documents to enhance their usefulness.
We’re sure many people know how challenging it can be to read historical manuscripts. Our new e-learning website, Transkribus LEARN is here to help! Transkribus LEARN does not replace systematic paleography training but it allows users to practice reading and transcribing individual words, learning as they go.
Learners can practice on a computer or mobile phone and check their learning progress.
Transkribus LEARN has two transcription modes – ‘Study’ and ‘Test’. In the former, users can guess and then reveal the transcription of individual words in a manuscript. In the latter, users will be prompted to transcribe the missing word in a series of examples. At the end, a score is received, along with a list of correct and incorrect answers. Users can keep studying and testing themselves, as often as they like.
Levels of difficulty vary from simple, computer-generated cursive scripts to challenging historical documents in various languages and of different time periods. Users can also upload their own documents to the platform as a training exercise for students or volunteers.
To try it out:
- Register for a free Transkribus account
- Login at Transkribus Learn (works on web or mobile)
- Explore the site and choose a document to practice on
Happy learning one and all! We welcome any feedback at: firstname.lastname@example.org
At the READ project, we believe in using cutting-edge technology to help people study a rich variety of historical documents.
The app automatically detects the page area of a document in milliseconds and provides real-time feedback on the quality of the image. It also has an auto-shoot feature which will take a picture every time a page is turned. It works especially well when used alongside our ScanTent device (also developed by the Computer Vision Lab), which holds a mobile phone in place above a historical document and allows for hands-free scanning.
The remarkable potential of these tools was revealed at our recent Transkribus User Conference. Dirk Alvermann from the University of Greifswald Library (one of READ’s MOU partners) presented the results of an experiment his team had been conducting.
The idea is that archival users can work with DocScan and the ScanTent to digitise historical documents with their mobile phone and then share the resulting images directly with the archive.
As shown in the below video, Greifswald University Library assigned a QR code to a set of documents and asked users to scan this code with the DocScan app before they started digitising documents. Once images were scanned using DocScan and the ScanTent, they were uploaded to our Transkribus platform and became available to view and transcribe in the Transkribus Web interface. The library was then able to create links in its digital repository, connecting archival metadata with the digitised images on Transkribus Web. A future version of DocScan will make it easier for images to be ingested directly into archival systems.
Dirk Alvermann emphasised that this workflow could be incredibly beneficial for small archives who lack funding for digitisation. Whilst user-generated content is not a substitute for a full digitisation strategy, it has the advantage of creating new resources and engaging interested archival users.
The DocScan app is available to download now, free of charge. The ScanTent is still in development and will be available for sale and hire later in 2019.
Our Transkribus platform for Handwritten Text Recognition (HTR) is used by thousands of researchers and archivists all over the world. And we’ve just been featured on the news on the New Zealand television network TVNZ.
Archives New Zealand explained how they have been experimenting with Transkribus to recognise different historical collections. They underlined some of the huge benefits of working with Transkribus – transcription is sped up, physical documents receive less wear and tear and most importantly, historical treasures become much easier to access!
With thousands of Transkribus users working all over the world, there is huge potential for collaborative work on the automated recognition of historical documents.
Dr Tobias Hodel (State Archives of Zurich, University of Zurich) has set up the ‘Gothic Hands’ working group with this mind, hoping to improve recognition of medieval Gothic script. The ‘Comb_Gothic_Bookwriting’ model has been trained on different sets of medieval scripts and is already available to all Transkribus users. In the best cases, it can produce automated transcripts with a Character Error Rate of less than 10%.
We are looking for more users to join this working group and share images and transcripts of Gothic scripts written between the 11th and 15th century. The current model has been trained primarily on German language material, so we are especially keen to receive documents written in Latin.
The latest contributor to the working group is Digital Statius: The Achilleid, a project which is producing a digital edition of Achilleid, an unfinished epic poem written in Latin, in which the poet Statius (later 1st c. AD), narrates the childhood of Achilles and the stay of the hero on the island of Scyros. This text was part of the school curriculum in the Middle Ages, before losing its status as a classic. The project, funded by the Swiss National Science Foundation (SNSF) and based at the University of Geneva, aims to produce a new critical edition of the Achilleid, fully and exclusively digital, which takes into account the complete manuscript tradition of the poem (224 manuscripts, c. 8000 images). The open access digital critical edition will include a new text, a full interactive apparatus criticus, comparative visualization of numerous readings, comments, translations, links to other tools and/or platforms, and the images of the largest possible number of manuscripts.
If you work with Gothic script, you can join the team behind the Achilleid edition and many others by becoming part of the ‘Gothic Hands’ working group.
To participate in the group, you can:
- share existing training data that you have already prepared in Transkribus
- prepare new images and transcripts in Transkribus in the ‘Gothic Hands’ collection
- send over files containing images and transcripts which can be matched automatically and converted into training data
Please contact Tobias Hodel (email@example.com) with any questions about the group.
Working together gives us a great chance to transcribe and search medieval documents more efficiently!
The READ consortium together with several other institutions is currently preparing the foundation of a legal entity (working title: READ-COOP) which will serve as the basis for sustaining and further developing the Transkribus platform and related services.
The governance model will be based on the EU directive for European Cooperative Societies (SCE). Though the SCE will be set up according to EU law it will be open to members outside of the European Community as well.
- is a legal entity that allows its members to carry out common activities, while preserving their independence
- has the principal objective of satisfying its members’ needs and not the return of capital investment
- allows members to benefit proportionally to their profit and not to their capital contribution.
Günter Mühlberger, coordinator of READ and head of the Digital Humanities Research Center at the University of Innsbruck has recently been interviewed on a new podcast (in German).
The interview was recorded by the NewsEye project which like READ, is funded by European Union’s Horizon 2020 scheme. NewsEye aims to use digital tools to provide enhanced access to digitised historical newspapers and the project will build upon READ’s existing achievements in relating to the automated recognition of printed text.
We have another new How to Guide for users of our Transkribus platform. This time we’re showing you how to enrich documents with structural tags like ‘paragraph’, ‘heading’, or ‘footer’.
In the near future, it will be possible to train models to automatically recognise the structure of historical documents. Adding structural tags creates training data for this process. If you work with this feature, there is no need to tag every element of your documents – just focus on marking up the sections that are of interest to you.
If you have any questions about structural tags, the Transkribus team are here to help (firstname.lastname@example.org)