span_marker.configuration module¶

class span_marker.configuration.SpanMarkerConfig(encoder_config=None, model_max_length=None, marker_max_length=128, entity_max_length=8, max_prev_context=None, max_next_context=None, **kwargs)[source]¶

Bases: PretrainedConfig

Configuration class for SpanMarkerModel instances.

Parameters:

encoder_config (Optional[Dict[str, Any]]) – The configuration dictionary for the underlying encoder used by the SpanMarkerModel instance. Defaults to None.
model_max_length (Optional[int]) – The total number of tokens that can be processed before truncation. If None, the tokenizer its model_max_length is used, and if that value is not defined, it becomes 512 instead. Defaults to None.
marker_max_length (int) – The maximum length for each of the span markers. A value of 128 means that each training and inferencing sample contains a maximum of 128 start markers and 128 end markers, for a total of 256 markers per sample. Defaults to 128.
entity_max_length (int) – The maximum length of an entity span in terms of words. Defaults to 8.
max_prev_context (Optional[int]) – The maximum number of previous sentences to include as context. If None, the maximum amount that fits in model_max_length is chosen. Defaults to None.
max_next_context (Optional[int]) – The maximum number of next sentences to include as context. If None, the maximum amount that fits in model_max_length is chosen. Defaults to None.

Example:

# These configuration settings are provided via kwargs to `SpanMarkerModel.from_pretrained`:
model = SpanMarkerModel.from_pretrained(
    "bert-base-cased",
    labels=labels,
    model_max_length=256,
    marker_max_length=128,
    entity_max_length=8,
)

Raises:

ValueError – If the labels provided to from_pretrained() do not contain the required “O” label.

Parameters:

encoder_config (Dict[str, Any] | None) –
model_max_length (int | None) –
marker_max_length (int) –
entity_max_length (int) –
max_prev_context (int | None) –
max_next_context (int | None) –

model_type: str = 'span-marker'¶

is_composition: bool = True¶

property outside_id: None¶

are_labels_schemed()[source]¶

True if all labels are strings matching one of the two following rules:

label == “O”
label[0] in “BIESLU” and label[1] == “-”, e.g. in “I-LOC”

We ensure that the first index is in “BIELSU” because of these definitions: * “B” for “begin” * “I” for “in” * “E” for “end” * “L” for “last” * “S” for “singular” * “U” for “unit”

Parameters:: id2label (Dict[int, str]) – Dictionary of label ids to label strings.
Returns:: True if it seems like a labeling scheme is used.
Return type:: bool

get_scheme_tags()[source]¶

Return type:: Set[str]

group_label_ids_by_tag()[source]¶

Return type:: Dict[str, Set]

get(options, default=None)[source]¶

Parameters:

options (str | Iterable[str]) –
default (Any | None) –

Return type:

Any