READ revolutionizes access to handwritten documents

From the Middle Ages to today, from old Greek to modern English, from running text to tables or forms

About

READ's mission is to revolutionize access to archival documents with the support of cutting-edge technology such as Handwritten Text Recognition (HTR) and Keyword Spotting (KWS).

Learn more

Network

READ addresses archives and libraries, humanities scholars, family historians, volunteers - and computer scientists

Learn more

Research

Research in READ comprises exciting fields such as Artificial Intelligence, Pattern Recognition, Machine Learning and Natural Language Processing.

Learn more

Services

READ technology is available via the service platform Transkribus. Upload documents, train a Handwritten Text Recognition (HTR) model, process text and follow the progress of the project.

Learn more

Recent Posts

+ Plant power! Results from the Royal College of Physicians’ Herbarium

The Royal College of Physicians has been devoted to advancing medicine for the past 500 years and has amassed outstanding historical collections of rare books, medical instruments and medicinal plant specimens.

The RCP has recently digitised the 6000 sheets from the (mostly) nineteenth-century Herbarium of the Pharmaceutical Society of Great Britain.  This collection comprises thousands of preserved plant specimens and their associated labels.

Dr Michael de Swiet, Dr Henry Oakley and Professor Anthony Dayan of the RCP then decided to work with the Transkribus team to try to recognise the text from the Herbarium collection.

The documents present various challenges for Handwritten Text Recognition (HTR) technology.  They contain a mix of printed and handwritten text (in ink and pencil), various languages, abbreviations and specialist vocabulary. They are also written in several (similar) hands.

A first HTR model was trained on 29,083 transcribed words  from the collection, using the pre-existing ‘English Writing M1’ model as part of the training process.  The ‘English Writing M1’ model is trained to recognise the writing of the English philosopher Jeremy Bentham (1748 – 1832) and his secretaries – it is freely available to all Transkribus users for their experiments.

In the best cases, the resulting model can automatically transcribe pages from the collection with a Character Error Rate (CER) of around 10%.

Image from the Herbarium with an automated transcription of the label. Image credit: Royal College of Physicians.

The team at the RCP are pleased with these results and would be happy if they could be shared and improved upon by other people working with Herbarium material.  If you would like to find out more about their work or have access to their HTR model, please contact the Transkribus team (email@transkribus.eu).