Configuring SpanMarker¶

SpanMarker is an accessible yet powerful Python module for training Named Entity Recognition models.

In this short notebook, we’ll have a look at how to configure a SpanMarker model for training. For a larger and more general tutorial on how to use SpanMarker, please have a look at the Getting Started notebook.

Configuring SpanMarkerModel.from_pretrained¶

The SpanMarkerModel.from_pretrained method is the go-to approach to initialize a new or pretrained SpanMarker model. We will consider these two cases separately here.

Initializing a new model¶

A new model is initialized using a pretrained encoder, e.g. bert-base-cased. See Initializing & Training for details on valid encoders. Additionally, a new SpanMarker model must be initialized using a list of string labels. These labels can have the IOB, IOB2, BIOES or BILOU labeling scheme, or no scheme at all. For example:

from span_marker import SpanMarkerModel

model_name = "bert-base-cased"
labels = ["O", "B-PER", "I-PER", "B-ORG", "I-ORG", "B-LOC", "I-LOC", "B-MISC", "I-MISC"]
# or depending on your dataset, you may have:
# labels = ["O", "PER", "ORG", "LOC", "MISC"]
model = SpanMarkerModel.from_pretrained(model_name, labels=labels)

Additionally, any keyword arguments passed to SpanMarkerModel.from_pretrained will be passed to:

In particular, that first one is noteworthy. Its parameters (model_max_length, marker_max_length and entity_max_length) have large impacts on training & inferencing speeds, as well as final performance.

Initializing a pretrained model¶

A pretrained model is initialized in the same way as a new model, but most of the valuable parameters will have already been set when the model was originally initialized, trained and saved. As a result, we do not need to specify the labels anymore. Beyond that, we need to be careful with providing configuration parameters that differ from the parameters used to train the model. We may degrade model performance otherwise.

Loading a pretrained SpanMarker model usually looks like this:

from span_marker import SpanMarkerModel

model_name = "tomaarsen/span-marker-bert-base-fewnerd-fine-super"
model = SpanMarkerModel.from_pretrained(model_name)

Configuring 🤗 TrainingArguments¶

SpanMarker relies on the Hugging Face TrainingArguments. As a result, online documentation or guides to optimize these training arguments for 🤗 Transformers tend to also apply for SpanMarker.

The TrainingArguments class is quite massive, and a bit overwhelming, so here are my recommendations of parameters to look at:

  • learning_rate: One of the most important parameters for training, values between 1e-5 and 5e-5 have performed well for me.

  • per_device_train_batch_size and per_device_eval_batch_size: Increasing them can boost training speed at the cost of memory. If you experience Out Of Memory exceptions, this is the first parameter to reduce.

  • gradient_accumulation_steps: Allows for reducing the batch size without changing the performance.

  • bf16 or fp16: Mixed precision training - allows for notable speedups if your GPU supports it.

  • dataloader_num_workers: The default of 0 means that data will be loaded in the main process. Generally, this is notably slower (e.g. 30-40%) than using separate/multiple workers by increasing this number. Note, higher is not strictly better. Usually 2 or 4 is a nice common ground.

  • evaluation_strategy + eval_steps, logging_strategy + logging_steps, save_strategy + save_steps: Crucial for better tracking the model performance during training in exactly the way you want it.

  • run_name: Used in third party logging applications like (the recommended) wandb. This parameter allows for easier tracking which model is which.