Loading & Inferencing with SpanMarker¶

SpanMarker is an accessible yet powerful Python module for training Named Entity Recognition models.

In this short notebook, we’ll have a look at how to load a SpanMarker model from the hub for inference. For a larger and more general tutorial on how to use SpanMarker, please have a look at the Getting Started notebook.

Setup¶

First of all, the span_marker Python module needs to be installed.

[ ]:
%pip install span_marker

Loading the Model¶

We’re going to load the tomaarsen/span-marker-bert-base-fewnerd-fine-super model from the hub, which has previously been trained to 0.7020 Test F1 on the finegrained supervised FewNERD dataset. We use SpanMarkerModel.from_pretrained for this. Note that we place it on the GPU with .cuda(). If you’re running this on Google Colab, be sure to set hardware accelerator to “GPU” in Runtime > Change runtime type.

[ ]:
from span_marker import SpanMarkerModel

model_name = "tomaarsen/span-marker-bert-base-fewnerd-fine-super"
model = SpanMarkerModel.from_pretrained(model_name).cuda()

Let’s try the model out with some predictions. For this we can use the model.predict method, which accepts either:

  • A sentence as a string.

  • A tokenized sentence as a list of strings.

  • A list of sentences as a list of strings.

  • A list of tokenized sentences as a list of lists of strings.

The method returns a list of dictionaries for each sentence, with the following keys:

  • "label": The string label for the found entity.

  • "score": The probability score indicating the model its confidence.

  • "span": The entity span as a string.

  • "word_start_index" and "word_end_index": Integers useful for indexing the entity from a tokenized sentence.

  • "char_start_index" and "char_end_index": Integers useful for indexing the entity from a string sentence.

[3]:
sentences = [
    "The Ninth suffered a serious defeat at the Battle of Camulodunum under Quintus Petillius Cerialis in the rebellion of Boudica (61), when most of the foot-soldiers were killed in a disastrous attempt to relieve the besieged city of Camulodunum (Colchester).",
    "He was born in Wellingborough, Northamptonshire, where he attended Victoria Junior School, Westfield Boys School and Sir Christopher Hatton School.",
    "Nintendo continued to sell the revised Wii model and the Wii Mini alongside the Wii U during the Wii U's first release year.",
    "Dorsa has a Bachelor of Music in Composition from California State University, Northridge in 2001, Master of Music in Harpsichord Performance at Cal State Northridge in 2004, and a Doctor of Musical Arts at the University of Michigan, Ann Arbor in 2008."
]

entities_per_sentence = model.predict(sentences)

for entities in entities_per_sentence:
    for entity in entities:
        print(entity["span"], "=>", entity["label"])
    print()
Battle of Camulodunum => event-attack/battle/war/militaryconflict
Quintus Petillius Cerialis => person-soldier
Camulodunum => location-GPE
Colchester => location-GPE

Wellingborough => location-GPE
Northamptonshire => location-GPE
Victoria Junior School => organization-education
Westfield Boys School => organization-education
Sir Christopher Hatton School => organization-education

Nintendo => organization-company
Wii => product-other
Wii Mini => product-other
Wii U => product-other
Wii U' => product-other

Dorsa => person-other
Bachelor of Music in Composition => other-educationaldegree
California State University => organization-education
Northridge => location-GPE
Master of Music in Harpsichord Performance => other-educationaldegree
Cal State Northridge => organization-education
Doctor of Musical Arts => other-educationaldegree
University of Michigan => organization-education
Ann Arbor => location-GPE

Feel free to compare this to the Getting Started notebook, which trains a model using bert-base-cased on the simpler coarse-grained FewNERD.