span_marker.configuration module

class span_marker.configuration.SpanMarkerConfig(encoder_config=None, model_max_length=None, marker_max_length=128, entity_max_length=8, max_prev_context=None, max_next_context=None, **kwargs)[source]

Bases: PretrainedConfig

Configuration class for SpanMarkerModel instances.

Parameters:
  • encoder_config (Optional[Dict[str, Any]]) – The configuration dictionary for the underlying encoder used by the SpanMarkerModel instance. Defaults to None.

  • model_max_length (Optional[int]) – The total number of tokens that can be processed before truncation. If None, the tokenizer its model_max_length is used, and if that value is not defined, it becomes 512 instead. Defaults to None.

  • marker_max_length (int) – The maximum length for each of the span markers. A value of 128 means that each training and inferencing sample contains a maximum of 128 start markers and 128 end markers, for a total of 256 markers per sample. Defaults to 128.

  • entity_max_length (int) – The maximum length of an entity span in terms of words. Defaults to 8.

  • max_prev_context (Optional[int]) – The maximum number of previous sentences to include as context. If None, the maximum amount that fits in model_max_length is chosen. Defaults to None.

  • max_next_context (Optional[int]) – The maximum number of next sentences to include as context. If None, the maximum amount that fits in model_max_length is chosen. Defaults to None.

Example:

# These configuration settings are provided via kwargs to `SpanMarkerModel.from_pretrained`:
model = SpanMarkerModel.from_pretrained(
    "bert-base-cased",
    labels=labels,
    model_max_length=256,
    marker_max_length=128,
    entity_max_length=8,
)
Raises:

ValueError – If the labels provided to from_pretrained() do not contain the required “O” label.

Parameters:
  • encoder_config (Dict[str, Any] | None) –

  • model_max_length (int | None) –

  • marker_max_length (int) –

  • entity_max_length (int) –

  • max_prev_context (int | None) –

  • max_next_context (int | None) –

model_type: str = 'span-marker'
is_composition: bool = True
property outside_id: None
are_labels_schemed()[source]

True if all labels are strings matching one of the two following rules:

  • label == “O”

  • label[0] in “BIESLU” and label[1] == “-”, e.g. in “I-LOC”

We ensure that the first index is in “BIELSU” because of these definitions: * “B” for “begin” * “I” for “in” * “E” for “end” * “L” for “last” * “S” for “singular” * “U” for “unit”

Parameters:

id2label (Dict[int, str]) – Dictionary of label ids to label strings.

Returns:

True if it seems like a labeling scheme is used.

Return type:

bool

get_scheme_tags()[source]
Return type:

Set[str]

group_label_ids_by_tag()[source]
Return type:

Dict[str, Set]

get(options, default=None)[source]
Parameters:
Return type:

Any