span_marker.model_card module¶

class span_marker.model_card.ModelCardCallback(trainer)[source]¶

Bases: TrainerCallback

Parameters:

trainer (Trainer) –

on_train_begin(args, state, control, model, **kwargs)[source]¶
Parameters:
  • args (TrainingArguments) –

  • state (TrainerState) –

  • control (TrainerControl) –

  • model (SpanMarkerModel) –

on_evaluate(args, state, control, model, metrics, **kwargs)[source]¶
Parameters:
  • args (TrainingArguments) –

  • state (TrainerState) –

  • control (TrainerControl) –

  • model (SpanMarkerModel) –

  • metrics (Dict[str, float]) –

class span_marker.model_card.SpanMarkerModelCardData(language=None, license=None, tags=<factory>, model_name=None, model_id=None, encoder_name=None, encoder_id=None, dataset_name=None, dataset_id=None, dataset_revision=None, task_name='Named Entity Recognition')[source]¶

Bases: CardData

A dataclass storing data used in the model card.

Parameters:
  • language (Optional[Union[str, List[str]]]) – The model language, either a string or a list, e.g. “en” or [“en”, “de”, “nl”]

  • license (str | None) – (Optional[str]): The license of the model, e.g. “apache-2.0”, “mit” or “cc-by-nc-sa-4.0”

  • model_name (str | None) – (Optional[str]): The pretty name of the model, e.g. “SpanMarker with mBERT-base on CoNLL03”. If not defined, uses encoder_name/encoder_id and dataset_name/dataset_id to generate a model name.

  • model_id (str | None) – (Optional[str]): The model ID when pushing the model to the Hub, e.g. “tomaarsen/span-marker-mbert-base-multinerd”.

  • encoder_name (str | None) – (Optional[str]): The pretty name of the encoder, e.g. “mBERT-base”.

  • encoder_id (str | None) – (Optional[str]): The model ID of the encoder, e.g. “bert-base-multilingual-cased”.

  • dataset_name (str | None) – (Optional[str]): The pretty name of the dataset, e.g. “CoNLL03”.

  • dataset_id (str | None) – (Optional[str]): The dataset ID of the dataset, e.g. “tner/bionlp2004”.

  • dataset_revision (str | None) – (Optional[str]): The dataset revision/commit that was for training/evaluation.

  • tags (List[str] | None) –

  • task_name (str) –

Note

Install nltk to detokenize the examples used in the model card, i.e. attach punctuation and brackets. Additionally, codecarbon can be installed to automatically track carbon emission usage.

Example:

>>> model = SpanMarkerModel.from_pretrained(
...     "bert-base-uncased",
...     labels=["O", "B-DNA", "I-DNA", "B-protein", ...],
...     # SpanMarker hyperparameters:
...     model_max_length=256,
...     marker_max_length=128,
...     entity_max_length=8,
...     # Model card variables
...     model_card_data=SpanMarkerModelCardData(
...         model_id="tomaarsen/span-marker-bbu-bionlp",
...         encoder_id="bert-base-uncased",
...         dataset_name="BioNLP2004,
...         dataset_id="tner/bionlp2004",
...         license="apache-2.0",
...         language="en",
...     ),
... )
language: str | List[str] | None = None¶
license: str | None = None¶
tags: List[str] | None¶
model_name: str | None = None¶
model_id: str | None = None¶
encoder_name: str | None = None¶
encoder_id: str | None = None¶
dataset_name: str | None = None¶
dataset_id: str | None = None¶
dataset_revision: str | None = None¶
task_name: str = 'Named Entity Recognition'¶
infer_dataset_id(dataset)[source]¶
Parameters:

dataset (Dataset) –

Return type:

None

to_dict()[source]¶
Return type:

Dict[str, Any]

to_yaml(line_break=None)[source]¶
Return type:

str