+ Another partner joins the READ project network

The READ project network continues to expand, as we welcome a new Memorandum of Understanding partner from Finland!  The Society of Swedish Literature in Finland  exists to preserve and promote knowledge about Swedish language and culture in Finland.  Swedish is one of Finland’s two national languages and the Society was established in 1885 to help to ensure that Swedish culture was protected in the country.

The Society has a large collection of manuscripts, letters and diaries written by Finnish authors and artists.  It is aiming to use Handwritten Text Recognition technology to transcribe letters written by two notable Finns; the painter Albert Edelfelt and the author Zachris Topelius.

This is a great opportunity for READ to work with an institution which plays an essential role in studying and safeguarding the cultural heritage of the Swedish-speaking part of Finland.

Institutions who would also like to become part of the READ project network might like to think about signing a Memorandum of Understanding with us.  Consult our list of Memorandum of Understanding partners to see who we’re working with or send us an email (email@transkribus.eu) to find out more.

 

+ Meet the READ project partners – Tobias Hodel

What’s your name?

Tobias Hodel.

thodel

 

 

 

 

 

 

 

 

 

Where do you work?

State Archives of Zurich.

Tell us a bit about your background…

I have a PhD in History from the University of Zurich.  I’m interested in all things digital (history, humanities, archiving).  I like to travel to new places, especially when visiting other partners in the READ project.

What is your role in the READ project?

Connecting the world of (digital) archives and scholars with the possibilities offered by READ.

What is top of your to-do list at the moment?

Evaluating models trained for Handwritten Text Recognition, scaling tests using material from the archives and contacting interested parties to tell them about READ and the Transkribus tool.

What do you like best about working on READ?

Getting to work on a regular basis with people with different backgrounds and diverse research interests.

If you could do another job for just one day, what would it be?

President of the United States (if it’s similar to House of Cards!)

What can you see out of the window of your office? 

tobias

Thanks Tobias! 

+ Presenting READ at the next International Medieval Congress!

We are excited to report that the READ project will be presenting a panel at the next International Medieval Congress at the University of Leeds in July 2017.

READ partners from Zurich State Archives, National Archives Finland and Passau Diocesan Archives will be demonstrating how they have been working with Handwritten Text Recognition technology to transcribe and search their document collections.

This a great opportunity to showcase the possibilities of Handwritten Text Recognition to medievalists – over 2000 of them gather in Leeds every year for this conference!

 

+ Working with a small crowd – Transcribing the ‘Bozner Ratsprotokolle’

The READ project is working to make handwritten historical collections more accessible through the development and application of Handwritten Text Recognition (HTR) technology.  This technology is certainly of interest to archivists and scholars but we hope that members of the public will find it useful too!  The crowdsourcing initiative Transcribe Bentham is already part of the READ project and we will be creating a new open source crowdsourcing platform which can be used and adapted by any institution which would like to get volunteers to work on a manuscript collection.  We have also begun working with a small focus group of volunteers to introduce them to the Transkribus transcription platform and the possibilities of HTR technology.  Barbara Denicolo is working with the Civic Archives of Bozen-Bolzano (one of the READ MOU partners) to manage this project and she gives a summary below of her progress so far:

‘‘Transcribing the Bozner Ratsprotokolle’ is a collaboration between READ and the Civic Archives of Bozen-Bolzano, Italy, which was set up at the beginning of 2016.  Our aim is to recruit and train volunteers to work with Transkribus to transcribe the ‘Bozner Ratsprotokolle’; records of the municipal council of the town which were written between the fifteenth and nineteenth centuries.  These transcripts will help to train a HTR engine to read the ‘Bozner Ratsprotokolle’ collection.  Once a computer is capable of processing these documents, users will be able to view automatically-generated transcripts and search for particular keywords that they might be interested in.  The archive could also use these transcripts to create an enriched digital edition of the collection.

hs28a_16_object_21183

Page from the Ratsprotokoll (1600) [Image from Civic Archives of Bozen-Bolzano]

Before I asked any volunteers to work with Transkribus, I needed to learn how to use it myself!  I am familiar with the process of transcribing historical documents.  I have studied medieval history for many years, worked with sources in various archives and am about to start my PhD.  I managed to transcribe 60 pages from the ‘Bozner Ratsprotokolle’ collection, which is a strong basis for training the HTR.  The next step was to recruit some volunteers who could help us to produce even more training data.

Between May and September 2016 we sent out a call for volunteers using the archives’ website, flyers, emails and word of mouth.  An advert placed in a local newspaper seems to have attracted quite a few participants, whilst Facebook posts helped me to get in contact with several students.

We now have a group of around 30 interested people, about half of whom have started to work with Transkribus.  My experience so far suggests that it is difficult to find an ideal volunteer – older people generally have time to participate and are skilled in reading old handwriting but need more support to understand and work with Transkribus on their computers.  For students, the opposite is true!

This project offers the opportunity to connect different generations together and use historical documents to contribute to innovative research.  This focus group is working to make the ‘Bozner Ratsprotokolle’ more accessible and providing feedback on Transkribus that will help the READ project team to refine the platform in the future.  I look forward to continuing my work with the volunteers and will report back on their next milestones!’

+ Meet the READ project partners – Joan Andreu Sánchez

What’s your name?

Joan Andreu Sánchez

img_20161116_101804849

 

 

 

 

 

 

 

 

 

 

Where do you work?

Universitat Politècnica de València

Tell us a bit about your background…

I earned my Diploma and PhD in Computer Science from the Universitat Politèccnica de València, Spain, in 1991 and 1999, respectively.  I am currently an Associate Professor in the Departamento de Sistemas Informáticos y Computación, Universitat Politècnica de València and have been an active member of the Pattern Recognition and Human Language Technology research center since 1990.  My current research interests include the areas of pattern recognition, machine learning, and their applications to language, speech, handwriting recognition and image processing. I have led several Spanish and European research projects and have co-authored more than 80 articles published in international conferences and journals.

What is your role in the READ project?

I lead the Universitat Politècnica de València’s contribution to READ.  We are focusing on Handwritten Text Recognition and Keyword Spotting.

What is top of your to-do list at the moment?

To survive another day.

What do you like best about working on READ?

Great team, exciting problems.

If you could do another job for just one day, what would it be?

I would like to be rich for a day, or just richer!

What can you see out of the window of your office? 

img_20161116_102032533

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Thanks Joan Andreu!

+ Transkribus comes to Youtube!

Lights, camera action!  Transkribus now has a Youtube channel.  Our channel will be a place for us to share ‘How to’ videos to help users work with the Transkribus platform.

We are currently working on our first videos – they will be made available soon!

For the moment, we have created a playlist of videos relating to the READ project.  These videos give an insight into the research behind Transkribus and our plans for the future.  Enjoy!

+ See Transkribus in action at Digital Humanities Austria Conference

The 3rd Digital Humanities Austria Conference is taking place at the Austrian Academy of Sciences in Vienna, between 5 and 7 December 2016.

The READ project will be there to tell an audience of digital humanities specialists how they can use Transkribus to apply Handwritten Text Recognition technology to historical documents.  Dr Günter Mühlberger, coordinator of the READ project, will be giving a demo of Transkribus on 6 December.

Registration for the conference is free and open now.

+ Meet the READ project partners – Florian Kleber

We thought it was about time that we got to know the people working on the READ project a little better!  We are armed with a list of questions that we’ll be asking the computer scientists, archivists, historians and researchers working on READ over the coming months to find out more about their research.  Read on for our first interview…

What’s your name?

Florian Kleber

_mg_5881

 

 

 

 

 

 

 

 

 

 

Where do you work?

Computer Vision Lab, Vienna University of Technology

Tell us a bit about your background…

I am a senior scientist at the Computer Vision Lab.  My research interests relate to Cultural Heritage applications of Document Image Analysis.  I finished my PhD in 2014, where I worked on Document Image Analysis Preprocessing of Low-Quality and Sparsely Inscribed Documents.  I have recently worked on the multispectral acquisition of ancient documents as part of a project to process and analyse the Sinaitic Glagolitic Sacramentary Fragments, two medieval Slavonic manuscripts which were discovered in a monastery in Egypt in 1975.  In my spare time I like to go skiing, rowing and watch TV series.

What is your role in the READ project?

Layout Analysis of documents, with a special focus on Form Classification.

What is top of your to-do list at the moment?

I am working on Form Classification and preparing for ScriptNet, the READ programme of competitions in Handwritten Text Recognition and Document Image Analysis.

What do you like best about working on READ?

The challenge of working with a large number of different documents and the interdisciplinarity of the project.

If you could do another job for just one day, what would it be?

Helicopter pilot 🙂

What can you see out of the window of your office? 

officeview

 

 

 

 

 

 

 

 

 

 

 

 

 

Nice view of Vienna!  Thanks Florian! 

+ A user’s perspective on Transkribus!

What is it like to work with Transkribus as a user?  Melina Jander from the Institute of Computer Science at the University of Göttingen has provided some insights in a new User Report.

Melina has been using Transkribus to create training data for a Handwritten Text Recognition (HTR) model that can provide automatic transcripts of the correspondence of the Brothers Grimm.  This work was undertaken as part of a pilot project called Tracing Authorship in Noise (TrAIN), which is analysing how page noise affects the accuracy of HTR and OCR.  This is part of the wider eTRAP project on the reuse of electronic  texts.  We are happy to have this feedback and we look forward to seeing the results the results of the HTR!

+ Searching Handwritten Manuscripts at Greifswald University Library

One of the oldest University libraries in Germany is working with some of the newest technology!  Greifswald University Library and Archives have been in cooperation with the Transkribus team since September 2015 and now have some exciting results to share.

Around 800 pages of documents and transcripts from the University Archives have been uploaded to Transkribus.  These documents were a collection of minutes from meetings of the Konzil, the central administrative body of Greifswald University.  These pages were written by three professional writers in Kurrent script, between the years 1775 and 1840.  The Transkribus team used these documents to generate a Handwritten Text Recognition (HTR) model capable of automatically reading documents in the Konzil collection.

Greifswald University Library has been able to integrate the HTR technology from Transkribus directly into its digital library system (Digitale Bibliothek Mecklenburg-Vorpommern).  This innovation was realised using the Open Source Goobi software provided by Intranda.  Library users are now able to conduct keyword searches in a sample of handwritten material from the Konzil collection.  You can see the full-text search in action in this example query, where the system has searched for the word ‘Greifswald’.  Why not try searching for yourself?

This is a first for the READ project and an important milestone in our mission to disseminate HTR technology.  We are grateful to Greifswald University Library and Archives for showing that it is possible to provide HTR technology directly to users in order to facilitate research.  Over the next few weeks, Greifswald University Library will be importing 100 more volumes from the Konzil collection into Transkribus to allow for more comprehensive searching of the collection.

A reminder that a full-text search function is also now available in the latest version of the Transkribus platform.  Once you have trained a HTR model for your manuscript collection, you will be able to conduct a full-text search of your documents.