Label Babel

The Notes from Nature team has been thinking a lot about how to take some bold next steps to make transcription faster and better. To that end, we have launched something of an outlier expedition. This new expedition, which we have called “Label Babel” asks for help delineating the main label on herbarium sheets, which is usually – but not always! – in the bottom right corner. We also are asking you to tell us if the label is “all typewritten”, “all handwritten” or “both handwritten and typewritten”.

So you might be thinking “Why is Notes from Nature asking you to do this?” The short answer is that we think we can use machine learning approaches to detect where a label is, and the type of content (handwritten, typewritten, both) on the label. Your work helps us develop a training dataset for this machine learning effort. If we can indeed build this machine learning approach, it would allow us to have a quick way to sort different herbarium sheets and use the right Optical Character Recognition or Handwriting Detection Tools depending on the label.

We hope this is a fun diversion from the usual task and that the work you are doing here can help us build a better Notes from Nature. We will let you know how we do building the machine learning tools from the initial efforts here as soon as we can.

— The Notes from Nature Team

Leave a comment