We are proud to be part of a successful project carried out by the New Zealand Alpine Club and the University of Innsbruck (Linguistic Institute). The complete workflow was done within Transkribus: apart from uploading files and running the text recognition volunteers used the web-based transcription interface from Transkribus to carefully correct all 17,500 pages of the New Zealand Alpine Journal.
In order to make the journal also searchable, Transkribus team members developed a simple but effective web-application, which enables users to browse all issues and to search the full-text of the complete journal. The application runs also very well on a smartphone. All data is hosted by Transkribus.
The project received its funding via the crowd-funding-platform Give a little. People donated about 6 000 NZD, which is a great support.
When we work together, there’s so much we can achieve! Amsterdam City Archives and VeleHanden have just launched a fantastic crowdsourcing initiative which combines the power of our Handwritten Text Recognition (HTR) technology with the talents of volunteer transcribers.
Amsterdam City Archives are interested in opening up access to the records of Amsterdam’s notaries, which span from the sixteenth to the twentieth century. These documents are ripe for further exploration for those interested in the rich social and economic history of the Dutch capital. The ultimate aim is to create a fully searchable record of this precious handwritten collection.
The team have been working with our Transkribus platform to train HTR models to recognise different parts of this collection.
The HTR models were used to generate automated transcripts of the documents. It is now up to volunteers to correct any errors made by the machine!
The project is hosted on VeleHanden, a successful crowdsourcing platform created by the company Picturae. Crowd leert computer lezen is directly connected to the Transkribus web interface, meaning that any changes made by volunteers can be fed straight back into the system to improve the automated recognition.
Anyone can take part in this new project and explore various difficulty levels to find documents they are interested in. Volunteers collect points for their transcription work which can be redeemed at exhibitions and events at Amsterdam City Archives.
We are really looking forward to seeing what the computer can learn from the crowd…
Mark Ponte from Amsterdam City Archives gave us a sneak peak of the project at our recent Transkribus User Conference
Our latest milestone has put a big smile on our faces – there are now over 20,000 registered users of our Transkribus platform for Handwritten Text Recognition! People are working with Transkribus across the globe, using it to train hundreds of models to recognise texts of diverse dates, languages and styles.
Across the course of the READ project, we have welcomed over 13,000 new users of the platform and created a formidable network of people interested in opening up access to historical documents. We look forward to continued growth as we move into our next phase with READ COOP.
Become part of our community! Follow us on Twitter or Facebook to keep up to date with our latest news.
The Royal College of Physicians has been devoted to advancing medicine for the past 500 years and has amassed outstanding historical collections of rare books, medical instruments and medicinal plant specimens.
The RCP has recently digitised the 6000 sheets from the (mostly) nineteenth-century Herbarium of the Pharmaceutical Society of Great Britain. This collection comprises thousands of preserved plant specimens and their associated labels.
Dr Michael de Swiet, Dr Henry Oakley and Professor Anthony Dayan of the RCP then decided to work with the Transkribus team to try to recognise the text from the Herbarium collection.
The documents present various challenges for Handwritten Text Recognition (HTR) technology. They contain a mix of printed and handwritten text (in ink and pencil), various languages, abbreviations and specialist vocabulary. They are also written in several (similar) hands.
In the best cases, the resulting model can automatically transcribe pages from the collection with a Character Error Rate (CER) of around 10%.
Image from the Herbarium with an automated transcription of the label. Image credit: Royal College of Physicians.
The team at the RCP are pleased with these results and would be happy if they could be shared and improved upon by other people working with Herbarium material. If you would like to find out more about their work or have access to their HTR model, please contact the Transkribus team (email@example.com).
Another day, another news feature for Transkribus! il Fatto Quotidiano, an Italian daily newspaper, recently published a summary of our huge achievements in Handwritten Text Recognition and invited readers to try out our technology.
After the tumult of World War One, the First Austrian Republic was declared in September 1919. Nearly 100 years later, the Austrian Parliamentary Directorate are making documents from this crucial period of national history available online.
They have recently digitised 9000 parliamentary papers from the First Republic and are working with our Transkribus platform to provide the fully-searchable text of this collection.
Document from the Meeting of the Provisional National Assembly on 6 February 1919. Image credit: L1.5 Kompetenzzentrum, Parlamentsdirektion.
These parliamentary documents contain both handwritten and printed text (using the Fraktur font of Latin calligraphy).
5000 pages of the collection have now been processed in Transkribus with a mixture of Handwritten Text Recognition and our ABBYY FineReader OCR engine. 3,500 of these pages were then manually checked to ensure the accuracy of the recognised words. Keywords have been captured and assigned to documents, allowing for further search options. The next task is to record the titles and metadata of the documents to enhance their usefulness.
We’re sure many people know how challenging it can be to read historical manuscripts. Our new e-learning website, Transkribus LEARNis here to help! Transkribus LEARN does not replace systematic paleography training but it allows users to practice reading and transcribing individual words, learning as they go.
Learners can practice on a computer or mobile phone and check their learning progress.
Transkribus LEARN has two transcription modes – ‘Study’ and ‘Test’. In the former, users can guess and then reveal the transcription of individual words in a manuscript. In the latter, users will be prompted to transcribe the missing word in a series of examples. At the end, a score is received, along with a list of correct and incorrect answers. Users can keep studying and testing themselves, as often as they like.
Screenshot from Transkribus Learn. Sütterlin script from a 1926 Austrian recipe book.
Levels of difficulty vary from simple, computer-generated cursive scripts to challenging historical documents in various languages and of different time periods. Users can also upload their own documents to the platform as a training exercise for students or volunteers.
At the READ project, we believe in using cutting-edge technology to help people study a rich variety of historical documents.
The Computer Vision Lab at the Technical University of Vienna (one of the READ project partners) have developed the DocScan mobile app for this very purpose.
The app automatically detects the page area of a document in milliseconds and provides real-time feedback on the quality of the image. It also has an auto-shoot feature which will take a picture every time a page is turned. It works especially well when used alongside our ScanTent device (also developed by the Computer Vision Lab), which holds a mobile phone in place above a historical document and allows for hands-free scanning.
The idea is that archival users can work with DocScan and the ScanTent to digitise historical documents with their mobile phone and then share the resulting images directly with the archive.
As shown in the below video, Greifswald University Library assigned a QR code to a set of documents and asked users to scan this code with the DocScan app before they started digitising documents. Once images were scanned using DocScan and the ScanTent, they were uploaded to our Transkribus platform and became available to view and transcribe in the Transkribus Web interface. The library was then able to create links in its digital repository, connecting archival metadata with the digitised images on Transkribus Web. A future version of DocScan will make it easier for images to be ingested directly into archival systems.
Dirk Alvermann emphasised that this workflow could be incredibly beneficial for small archives who lack funding for digitisation. Whilst user-generated content is not a substitute for a full digitisation strategy, it has the advantage of creating new resources and engaging interested archival users.
The DocScan app is available to download now, free of charge. The ScanTent is still in development and will be available for sale and hire later in 2019.
Our Transkribus platform for Handwritten Text Recognition (HTR) is used by thousands of researchers and archivists all over the world. And we’ve just been featured on the news on the New Zealand television network TVNZ.
Archives New Zealand explained how they have been experimenting with Transkribus to recognise different historical collections. They underlined some of the huge benefits of working with Transkribus – transcription is sped up, physical documents receive less wear and tear and most importantly, historical treasures become much easier to access!