+ National Archives Finland takes first steps towards Handwritten Text Recognition

The National Archives of Finland is committed to promoting access to documents relating to Finland’s cultural heritage.  Handwritten Text Recognition (HTR) technology is now part of its mission.

The National Archives of Finland has digitised millions of documents, most of which are handwritten.  As a first step, 500 of these digitised pages have now been uploaded and transcribed in the Transkribus platform.  These documents range from the sixteenth to the nineteenth century and include estate inventories of the Finnish nobility, court books and land tax registers.  These 500 pages represent training data and will play a vital role in enabling HTR engines to recognise Swedish handwriting (which was used in official documents in Finland at this time).

Aatelin_perukirjat_65

Manuscript page from the records of the Turku Court of Appeal, 1828-1829  (Image courtesy of Digital Archive, National Archives Finland)

READ researchers will use these pages to generate a HTR model that can be applied to other documents held in the National Archives of Finland.  This model will make it possible to automatically transcribe and search images of historical manuscripts, thereby ensuring easier access to the records of Finnish history.

500 pages is just the beginning!  The National Archives of Finland will continue to contribute training data as we move through the READ project and this data will help to improve the accuracy of the HTR technology.

+ Videos of READ presentations

If you want to find out more about READ, you can now watch the READ partners live in action!  Videos of presentations made at the co:op project’s ‘Technology meets Scholarship’ conference in January 2016 are now available.

Text transcripts and slides from the READ presentations are available here: http://read.transkribus.eu/2016/03/31/presentations-from-the-read-partners-now-available/

Videos of the READ presentations can be found here: https://www.youtube.com/playlist?list=PLElrWLCQvZaRny2G_gXAINGpCtNrpPBBO

For an introduction to HTR technology, try watching Dr Roger Labahn (University of Rostock) on ‘Handwritten Text Recognition.  Key Concepts’: https://youtu.be/3d-Iru6qLRc

 

 

+ READ meets in Valencia

At the beginning of May, the READ partners met together in sunny Valencia!  The Universitat Politècnica de València hosted a technical meeting where the group considered a range of current research questions in small workshops.  Eva Lang (Passau Diocesan Archives) has summarised the discussions on the co:op blog.

+ Presentations from the READ partners now available!

The READ project was launched in January 2016 at the ‘Technology meets Scholarship’ conference at the Hessian State Archives in Marburg (Germany).  This conference was organised by the co:op (community as opportunity – the creative archives’ and users’ network) project.

Slides and presentations from the READ partners are now available at the co:op website (see links below).  Videos of the presentations are also coming soon!

These presentations give a great introduction to the READ partners and the objectives of our project.

………………………………………

Francesco Roberg (Hessian State Archives): Short Introduction to co:op and READ

Roger Labahn (University of Rostock): Handwritten Text Recognition.  Key Concepts

Enrique Vidal (Polytechnic University of Valencia): Keyword Searching as a Trade-Off Between Recall and Precision.  A New Way to Search Large Collections of Digitised Documents

Basilis Gatos (National Centre for Scientific Research “Demokritos”): Hard Tasks in the Background.  Layout Analysis

Stefan Fiel (Technical University of Vienna): Automated Writer Identification and its Use Cases for Archival Documents

Louise Seaward (University College London): The Crowd, the Volunteers and the Supertranscribers.  Building and Supporting an Online User Community for the Bentham Edition

Christian Sieber (State Archives of Zurich): Transcription – Swiss Made.  The Projects of the State Archives of Zurich

István Kecskeméti  (National Archives Finland): In-House Digitisation as a Core Task.  The Finnish National Archives

Sebastian Colutto (University of Innsbruck): Transkribus.  A Virtual Research Environment for the Transcription and Recognition of Historical Documents 

Günter Mühlberger (University of Innsbruck): The READ project.  Objectives, Tasks and Partner Organisations

 

+ Conference at Archives Nationales

University College London presented at a conference at the Archives Nationales in Paris on 16 March 2016.

Researchers from France, the UK and Ireland came together to share ideas on using crowdsourcing for collaborative transcription and digital scholarly editing.  Louise Seaward gave an introduction to the READ project and explained how handwritten text recognition is being used by volunteers working on UCL’s Transcribe Bentham initiative.

 

+ READ Project launches

In January 2016, more than 150 people gathered at the Hessian State Archives in Marburg (Germany) for the ‘Technology meets Scholarship’ conference.  Organised by the co:op (community as opportunity – the creative archives’ and users’ network) project, this conference was also the first public event of the READ project.

We were welcomed by an audience of more than 150 archivists, scholars and computer scientists.  The idea behind the conference was to share information about how handwritten text recognition technology could benefit those who work on archival documents.

To kick things off, the READ technical partners demonstrated some of the tools they are developing for analysing handwritten manuscripts including layout analysis, writer identification and keyword spotting.   The READ archival partners then explained how they have been digitising and transcribing their collections.

Technology and scholarship came even closer together on the second day of the conference.  A selection of German archivists gave an overview of their collections and the READ technical partners responded live to suggest how computers could deal with some of the problems of legibility and layout.  We were also treated to a ‘behind the scenes’ tour of the Hessian State Archives – which was a first for some of our computer scientists!

Members of READ enjoyed meeting together for the first time and planning the initial stages of the project.  The papers from the conference will be shared on the co:op website in due course.  We look forward to many more conferences and workshops as the project moves forward.