span_marker.spacy_integration module

class span_marker.spacy_integration.SpacySpanMarkerWrapper(pretrained_model_name_or_path, *args, batch_size=4, device=None, overwrite_entities=False, **kwargs)[source]

Bases: object

This wrapper allows SpanMarker to be used as a drop-in replacement of the “ner” pipeline component.

Usage:

  import spacy

  nlp = spacy.load("en_core_web_sm")
+ nlp.add_pipe("span_marker", config={"model": "tomaarsen/span-marker-roberta-large-ontonotes5"})

  text = '''Cleopatra VII, also known as Cleopatra the Great, was the last active ruler of the
  Ptolemaic Kingdom of Egypt. She was born in 69 BCE and ruled Egypt from 51 BCE until her
  death in 30 BCE.'''
  doc = nlp(text)

Example:

>>> import spacy
>>> import span_marker
>>> nlp = spacy.load("en_core_web_sm", exclude=["ner"])
>>> nlp.add_pipe("span_marker", config={"model": "tomaarsen/span-marker-roberta-large-ontonotes5"})
>>> text = '''Cleopatra VII, also known as Cleopatra the Great, was the last active ruler of the
... Ptolemaic Kingdom of Egypt. She was born in 69 BCE and ruled Egypt from 51 BCE until her
... death in 30 BCE.'''
>>> doc = nlp(text)
>>> doc.ents
(Cleopatra VII, Cleopatra the Great, 69 BCE, Egypt, 51 BCE, 30 BCE)
>>> for span in doc.ents:
...     print((span, span.label_))
(Cleopatra VII, 'PERSON')
(Cleopatra the Great, 'PERSON')
(69 BCE, 'DATE')
(Egypt, 'GPE')
(51 BCE, 'DATE')
(30 BCE, 'DATE')
Parameters:
  • pretrained_model_name_or_path (str | PathLike) –

  • batch_size (int) –

  • device (str | device | None) –

  • overwrite_entities (bool) –

static convert_inputs_to_dataset(inputs)[source]
set_ents(doc, ents)[source]
Parameters:
  • doc (Doc) –

  • ents (List[Span]) –

pipe(stream, batch_size=128)[source]

Fill doc.ents and span.label_ using the chosen SpanMarker model.