+ A new Transkribus User Report

Chiara Petrolini, a post-doctoral fellow at the German Historical Institute in Rome (DHI) recently spent a few days with the Transkribus team at the University of Innsbruck.

She has kindly written a User Report about her experience of working with Transkribus so far.

Dr Petrolini is an early modern scholar, working on a project about the court librarian Sebastian Tengnagel and the Imperial Library in Vienna.  She has begun transcribing Tengnagel’s seventeenth-century correspondence, with a view to training the Handwritten Text Recognition engine to recognise this handwriting.  She is also finding Transkribus a useful transcription tool for documents written in more than one language, as scholars with different skills can work on the same document from different locations.

This new project will help us to spread the word about Transkribus in Italy – we will be coming there for a workshop soon!

+ Looking back on 2016…

January is always a time for reflection and at the READ project, we have a lot to reflect on!  We’ve been busy over the past 12 months in our mission to use new technologies to make historical documents more accessible.  We wanted to give a quick recap of our major milestones and our future plans.

Research

Our research teams have been exploring the fields of Handwritten Text Recognition, Layout Analysis, Document Understanding, Writer Identification, Language Models and more.  Some of these technologies are already available in our Transkribus tool and more will be integrated over the coming months.  Towards the end of 2016 we also started to prepare for the launch of our ScriptNet platform, a new collection of research competitions where computer scientists will experiment with huge amounts of data to improve their technologies.

Discussion topics at one of the READ project meetings [Image by Louise Seaward]

Discussion topics at one of the READ project meetings [Image by Louise Seaward]

Services

The Transkribus tool has been maintained and improved across the year.  Over 2000 new users registered for a Transkribus account in 2016 and they are now able to access new features such as full-text search and a table editing tool.  We have also developed How to Guides to help people navigate the platform.

We are working with partners inside and outside of the project to develop bespoke Handwritten Text Recognition models capable of transcribing and searching specific collections of documents.  Our most successful models so far relate to eighteenth- and nineteenth-century German and English handwriting.  But we are working with many more languages, styles and time-frames – watch this space!

Demonstrating the Table Editing Tool in Transkribus [Image by Louise Seaward]

Demonstrating the Table Editing Tool in Transkribus [Image by Louise Seaward]

Dissemination

Dissemination is a key part of READ – we want to raise awareness about the technology that we are developing and ensure that it is used by the people who need it.

We have helped to organise four conferences in Germany, Austria and the United Kingdom for collection holders, researchers and computer scientists.  We have also been travelling a lot – delivering 30 Transkribus workshops (at last count!) in different European cities.  In these workshops, we teach people how to use Transkribus and explain the potential of Handwritten Text Recognition.  If you are interested in organising a workshop at your institution, just send us an email!

018

READ project members taking a break from their computers at a meeting in Passau, Germany [Image by Louise Seaward]

In terms of our research outputs, we are working to ensure that our project publications are Open Access, our research tools are Open Source via Github and our published research data is being made available in Zenodo.

We have had fun spreading the word about Transkribus on Twitter and will be branching out to YouTube and Facebook this year.

Collaboration

Our network grew steadily across 2016.  Over 30 institutions have now signed a Memorandum of Understanding with READ, which brings them into the project network.  To give just a couple of examples, we are working with the Belgrade University Library on training computers to understand Cyrillic text and receiving advice from the Institute for Documentology and Editing on the role of Transkribus in digital scholarly editing.

Cyrillic document from the University Library of Belgrade.

Cyrillic document from the Belgrade University Library.

What’s next?

All this work will continue into 2017 but there will also be some exciting new developments.

The project technologies are beginning to be integrated into new web tools which will be made available via the Transkribus website.  An e-learning module, a platform for crowdsourced transcription and a mobile app for scanning documents are all in the works.  We are also developing our business plan to ensure that we can sustain the services provided by Transkribus far into the future.

Want to find out more?

You can find more detailed summaries of the work that READ has completed in these different areas by taking at look at the latest reports (deliverables) that we have submitted to the European Commission.

+ Watch presentations from our ‘Digital Toolbox’ Conference

On 10 October 2016, we asked researchers, archivists and curators to discuss ‘What should be in your Digital Toolbox?’ at our conference in London.  This event was organised by the Linnean Society (part of the READ MOU network) and the Bentham Project at University College London (one of the READ partners).  Videos and slides of the speakers’ presentations are now available.

Networking in the Linnean Society Library [Image by Louise Seaward]

Networking in the Linnean Society Library [Image by Louise Seaward]

There was a great exchange of ideas on the day, both in person and on Twitter, about the best means of extracting data from complex handwritten and printed records.  You can now get a flavour of what went on through the videos and slides below.

Professor Melissa Terras (University College London), If you teach a computer to READ: Transcribe Bentham, Transkribus, and Handwriting Technology Recognition 

Dr Günter Mühlberger (University of Innsbruck), Transkribus as a Toolkit for text Recognition, Transcription and Information Extraction 

Dr Roger Labahan (University of Rostock), Key concepts of Handwritten Text Recognition

Dr Mia Ridge (The British Library), The Art of Work in the Age of Mechanical Reproduction

Professor James Loxley (University of Edinburgh), Lines of Enquiry: Reordering Edinburgh’s Literary History

Dr Elspeth Haston (Royal Botanic Garden Edinburgh), Automating Label Data Capture from Natural History Specimens

Alison Harding and Lisa Cardy (Natural History Museum/Biodiversity Heritage Library), Unlocking Biodiversity Data @ The Biodiversity Heritage Library 

Dr Victoria Van Hyning (University of Oxford/Zooniverse), Metadata Extraction and Full Text Transcription on the Zooniverse Platform

 

+ Meet the READ project partners – Max Bryan

What’s your name?

Max Bryan.

Where do you work?

The Department for Natural Language Processing at Leipzig University.

Tell us a bit about your background…

My main research interests lie in everything that has to do with neural networks. I first became interested in this subject at Hamburg University where I wrote my Masters thesis on different learning strategies.  In my free time, I like to paint or cook with my Chinese friends.

What is your role in the READ project?

Our group is responsible for creating various language resource tools to be integrated into Handwritten Text Recognition models.  We are also sharing our knowledge of language models with the READ project partners.

What is top of your to-do list at the moment?

Using dictionaries to create various formats for the experiments and training a language model that learns to recognize abbreviations.

What do you like best about working on READ?

Working with people that do very similar things but come from different directions and thus have different views.

If you could do another job for just one day, what would it be?

Pilot or train conductor.

What can you see out of the window of your office? 

leipzig

Thanks Max! 

+ What’s that written in the margin? Handwritten Text Recognition, Marginalia and John Stuart Mill

Some people are horrified by the thought of writing notes on the pages of books.  But for the English philosopher John Stuart Mill (1806 – 1873), marginal notes were a useful way to record his thoughts and observations as he read.

Mill’s collection of books is now in the possession of Somerville College at the University of Oxford.  The John Stuart Mill Collection holds more than 1500 books once owned by Mill.  Many of these texts contain annotations and markings made by Mill.

The John Stuart Mill Collection, Somerville College, University of Oxford [Image by Louise Seaward]

The John Stuart Mill Collection, Somerville College, University of Oxford [Image by Louise Seaward]

Somerville College, in collaboration with the University of Alabama, is currently undertaking a project to digitise and categorise this marginalia.  These partners have now begun to work with Transkribus, with a view to applying Handwritten Text Recognition to Mill’s scribblings.

READ partners from Xerox Research Centre Europe and the Computer Vision Lab at Vienna Technical University are working with hundreds of images from the Mill collection.  They aim to use Document Understanding to distinguish between the printed and handwritten text on the pages of these books and also use Handwritten Text Recognition to transcribe the comments which Mill wrote in the margins.   Transcripts of the Mill marginalia would be an invaluable resource to scholars and would complement the forthcoming Mill Marginalia database.

This is an exciting experiment for the READ project, as the methods and results of this endeavour could be applicable to other collections where marginal annotations appear on printed texts.  Many other writers, including Oscar Wilde and Mark Twain, were habitual annotators and technology from the READ project could help us to understand how they read, processed and understood books and articles.

+ Another partner joins the READ project network

The READ project network continues to expand, as we welcome a new Memorandum of Understanding partner from Finland!  The Society of Swedish Literature in Finland  exists to preserve and promote knowledge about Swedish language and culture in Finland.  Swedish is one of Finland’s two national languages and the Society was established in 1885 to help to ensure that Swedish culture was protected in the country.

The Society has a large collection of manuscripts, letters and diaries written by Finnish authors and artists.  It is aiming to use Handwritten Text Recognition technology to transcribe letters written by two notable Finns; the painter Albert Edelfelt and the author Zachris Topelius.

This is a great opportunity for READ to work with an institution which plays an essential role in studying and safeguarding the cultural heritage of the Swedish-speaking part of Finland.

Institutions who would also like to become part of the READ project network might like to think about signing a Memorandum of Understanding with us.  Consult our list of Memorandum of Understanding partners to see who we’re working with or send us an email (email@transkribus.eu) to find out more.

 

+ Meet the READ project partners – Tobias Hodel

What’s your name?

Tobias Hodel.

thodel

 

 

 

 

 

 

 

 

 

Where do you work?

State Archives of Zurich.

Tell us a bit about your background…

I have a PhD in History from the University of Zurich.  I’m interested in all things digital (history, humanities, archiving).  I like to travel to new places, especially when visiting other partners in the READ project.

What is your role in the READ project?

Connecting the world of (digital) archives and scholars with the possibilities offered by READ.

What is top of your to-do list at the moment?

Evaluating models trained for Handwritten Text Recognition, scaling tests using material from the archives and contacting interested parties to tell them about READ and the Transkribus tool.

What do you like best about working on READ?

Getting to work on a regular basis with people with different backgrounds and diverse research interests.

If you could do another job for just one day, what would it be?

President of the United States (if it’s similar to House of Cards!)

What can you see out of the window of your office? 

tobias

Thanks Tobias! 

+ Presenting READ at the next International Medieval Congress!

We are excited to report that the READ project will be presenting a panel at the next International Medieval Congress at the University of Leeds in July 2017.

READ partners from Zurich State Archives, National Archives Finland and Passau Diocesan Archives will be demonstrating how they have been working with Handwritten Text Recognition technology to transcribe and search their document collections.

This a great opportunity to showcase the possibilities of Handwritten Text Recognition to medievalists – over 2000 of them gather in Leeds every year for this conference!

 

+ Working with a small crowd – Transcribing the ‘Bozner Ratsprotokolle’

The READ project is working to make handwritten historical collections more accessible through the development and application of Handwritten Text Recognition (HTR) technology.  This technology is certainly of interest to archivists and scholars but we hope that members of the public will find it useful too!  The crowdsourcing initiative Transcribe Bentham is already part of the READ project and we will be creating a new open source crowdsourcing platform which can be used and adapted by any institution which would like to get volunteers to work on a manuscript collection.  We have also begun working with a small focus group of volunteers to introduce them to the Transkribus transcription platform and the possibilities of HTR technology.  Barbara Denicolo is working with the Civic Archives of Bozen-Bolzano (one of the READ MOU partners) to manage this project and she gives a summary below of her progress so far:

‘‘Transcribing the Bozner Ratsprotokolle’ is a collaboration between READ and the Civic Archives of Bozen-Bolzano, Italy, which was set up at the beginning of 2016.  Our aim is to recruit and train volunteers to work with Transkribus to transcribe the ‘Bozner Ratsprotokolle’; records of the municipal council of the town which were written between the fifteenth and nineteenth centuries.  These transcripts will help to train a HTR engine to read the ‘Bozner Ratsprotokolle’ collection.  Once a computer is capable of processing these documents, users will be able to view automatically-generated transcripts and search for particular keywords that they might be interested in.  The archive could also use these transcripts to create an enriched digital edition of the collection.

hs28a_16_object_21183

Page from the Ratsprotokoll (1600) [Image from Civic Archives of Bozen-Bolzano]

Before I asked any volunteers to work with Transkribus, I needed to learn how to use it myself!  I am familiar with the process of transcribing historical documents.  I have studied medieval history for many years, worked with sources in various archives and am about to start my PhD.  I managed to transcribe 60 pages from the ‘Bozner Ratsprotokolle’ collection, which is a strong basis for training the HTR.  The next step was to recruit some volunteers who could help us to produce even more training data.

Between May and September 2016 we sent out a call for volunteers using the archives’ website, flyers, emails and word of mouth.  An advert placed in a local newspaper seems to have attracted quite a few participants, whilst Facebook posts helped me to get in contact with several students.

We now have a group of around 30 interested people, about half of whom have started to work with Transkribus.  My experience so far suggests that it is difficult to find an ideal volunteer – older people generally have time to participate and are skilled in reading old handwriting but need more support to understand and work with Transkribus on their computers.  For students, the opposite is true!

This project offers the opportunity to connect different generations together and use historical documents to contribute to innovative research.  This focus group is working to make the ‘Bozner Ratsprotokolle’ more accessible and providing feedback on Transkribus that will help the READ project team to refine the platform in the future.  I look forward to continuing my work with the volunteers and will report back on their next milestones!’

+ Meet the READ project partners – Joan Andreu Sánchez

What’s your name?

Joan Andreu Sánchez

img_20161116_101804849

 

 

 

 

 

 

 

 

 

 

Where do you work?

Universitat Politècnica de València

Tell us a bit about your background…

I earned my Diploma and PhD in Computer Science from the Universitat Politèccnica de València, Spain, in 1991 and 1999, respectively.  I am currently an Associate Professor in the Departamento de Sistemas Informáticos y Computación, Universitat Politècnica de València and have been an active member of the Pattern Recognition and Human Language Technology research center since 1990.  My current research interests include the areas of pattern recognition, machine learning, and their applications to language, speech, handwriting recognition and image processing. I have led several Spanish and European research projects and have co-authored more than 80 articles published in international conferences and journals.

What is your role in the READ project?

I lead the Universitat Politècnica de València’s contribution to READ.  We are focusing on Handwritten Text Recognition and Keyword Spotting.

What is top of your to-do list at the moment?

To survive another day.

What do you like best about working on READ?

Great team, exciting problems.

If you could do another job for just one day, what would it be?

I would like to be rich for a day, or just richer!

What can you see out of the window of your office? 

img_20161116_102032533

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Thanks Joan Andreu!