span_marker.configuration module¶
- class span_marker.configuration.SpanMarkerConfig(encoder_config=None, model_max_length=None, marker_max_length=128, entity_max_length=8, max_prev_context=None, max_next_context=None, **kwargs)[source]¶
Bases:
PretrainedConfig
Configuration class for SpanMarkerModel instances.
- Parameters:
encoder_config (Optional[Dict[str, Any]]) – The configuration dictionary for the underlying encoder used by the SpanMarkerModel instance. Defaults to None.
model_max_length (Optional[int]) – The total number of tokens that can be processed before truncation. If None, the tokenizer its model_max_length is used, and if that value is not defined, it becomes 512 instead. Defaults to None.
marker_max_length (int) – The maximum length for each of the span markers. A value of 128 means that each training and inferencing sample contains a maximum of 128 start markers and 128 end markers, for a total of 256 markers per sample. Defaults to 128.
entity_max_length (int) – The maximum length of an entity span in terms of words. Defaults to 8.
max_prev_context (Optional[int]) – The maximum number of previous sentences to include as context. If None, the maximum amount that fits in model_max_length is chosen. Defaults to None.
max_next_context (Optional[int]) – The maximum number of next sentences to include as context. If None, the maximum amount that fits in model_max_length is chosen. Defaults to None.
Example:
# These configuration settings are provided via kwargs to `SpanMarkerModel.from_pretrained`: model = SpanMarkerModel.from_pretrained( "bert-base-cased", labels=labels, model_max_length=256, marker_max_length=128, entity_max_length=8, )
- Raises:
ValueError – If the labels provided to
from_pretrained()
do not contain the required “O” label.- Parameters:
- are_labels_schemed()[source]¶
True if all labels are strings matching one of the two following rules:
label == “O”
label[0] in “BIESLU” and label[1] == “-”, e.g. in “I-LOC”
We ensure that the first index is in “BIELSU” because of these definitions: * “B” for “begin” * “I” for “in” * “E” for “end” * “L” for “last” * “S” for “singular” * “U” for “unit”