ChangelogΒΆ
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[1.5.0]ΒΆ
AddedΒΆ
Added support for BILO tagging schemes.
ChangedΒΆ
Changed the error when an empty sentence is provided to the tokenizer.
Using spaCy
nlp.pipe
now processes texts sentence-wise, just like fornlp(...)
.
FixedΒΆ
No longer override
language
metadata from the dataset if the language was also set manually viaSpanMarkerModelCardData
.No longer crash on
predict
withValueError: Failed to concatenate on axis=1 ...
if the first sentence in a list of sentences is just one word.
[1.4.0]ΒΆ
AddedΒΆ
Added
SpanMarkerModel.generate_model_card()
method to get a model card string.Added
SpanMarkerModelCardData
that should be passed toSpanMarkerModel.from_pretrained
with additional information likelanguage
,license
,model_name
,model_id
,encoder_name
,encoder_id
,dataset_name
,dataset_id
,dataset_revision
.
Added
transformers
pipeline
support, e.g.pipeline(task="span-marker", model="tomaarsen/span-marker-mbert-base-multinerd")
.
ChangedΒΆ
Heavily improved automatic model card generated.
Evaluating outside of training now returns per-label outputs instead of only βoverallβ F1, precision and recall.
Warn if the used tokenizer distinguishes between punctuation directly attached to a word and punctuation separated from a word by a space.
If so, then inference of that model will require the punctuation to be split from the words.
Improve label normalization speed.
Allow you to call SpanMarkerModel.from_pretrained with a pre-initialized SpanMarkerConfig.
DeprecatedΒΆ
Deprecated Python 3.7.
FixedΒΆ
Fixed tokenization mismatch between training and inference for XLM-RoBERTa models: allows for normal inference of those models.
Resolve niche bug when TrainingArguments are not provided.
[1.3.0]ΒΆ
AddedΒΆ
Added an
overwrite_entities
parameter to the spaCy pipeline component to allow for overwriting spaCy entities.Added
.pipe()
method to spaCy integration to allow for batched inference.
ChangedΒΆ
Stop overwriting spaCy entities by default.
[1.2.5]ΒΆ
FixedΒΆ
Allow for immutable
TrainingArguments
from newertransformers
release.
[1.2.4]ΒΆ
FixedΒΆ
Resolved broken license information.
[1.2.3]ΒΆ
FixedΒΆ
Fix crash in spaCy inference when using subsequent whitespace.
[1.2.2]ΒΆ
AddedΒΆ
Added support for using
span_marker
spaCy pipeline component without importing SpanMarker.
[1.2.1]ΒΆ
AddedΒΆ
Added support for
load_in_8bit=True
anddevice_map="auto"
.
[1.2.0]ΒΆ
AddedΒΆ
Added
trained_with_document_context
to the SpanMarkerConfig.Added warnings if a model is trained with document-context and evaluated/inferenced without, or vice versa.
Added
spaCy
integration vianlp.add_pipe("span_marker")
. See the SpanMarker with spaCy documentation for information.
ChangedΒΆ
Heavily improved computational efficiency of sample spreading, resulting in notably faster inference speeds.
Disable progress bar for inference by default, and add
show_progress_bar
parameter toSpanMarkerModel.predict
.
FixedΒΆ
Fixed evaluation method failing when the testing dataset contains two adjacent and identical sentences.
[1.1.1]ΒΆ
FixedΒΆ
Add missing space in model card template.
Return nested list if input is a singular list of sentences or a dataset with one sample.
[1.1.0]ΒΆ
AddedΒΆ
Added support for document-level context in training, evaluation and inference.
Use it by supplying
document_id
andsentence_id
columns to the Trainer datasets.Tune it by supplying
max_prev_context
andmax_next_context
to theSpanMarkerConfig
viaSpanMarkerModel.from_pretrained(..., max_prev_context=3)
.
Added batch inference support via
SpanMarkerModel.predict(..., batch_size=4)
.
ChangedΒΆ
Ensure models are in evaluation mode when using
SpanMarkerModel.predict
.
DeprecatedΒΆ
Removed the
allow_overlapping
optional keyword fromSpanMarkerModel.predict
[1.0.1]ΒΆ
FixedΒΆ
Fixed critical issue with incorrect predictions at inputs that require multiple samples.
[1.0.0]ΒΆ
AddedΒΆ
Added a warning for entities that are ignored/skipped due to the maximum entity length or maximum model input length.
Added info-level logs displaying the detected labeling scheme (IOB/IOB2, BIOES, BILOU, none).
Added a warning suggesting to use
model.cuda()
when predictions are performed on a CPU while CUDA is available.Added
try_cuda
method toSpanMarkerModel
which tries to place the model on CUDA and does nothing if that fails.
ChangedΒΆ
Updated where in the input IDs the span markers are stored, results in 40% training and inferencing speed increase.
Updated default
marker_max_length
in SpanMarkerConfig from 256 to 128.Updated default
entity_max_length
in SpanMarkerConfig from 16 to 8.Add support for
datasets<2.6.0
.Add warning if a
<v1.0.0
model is loaded usingv1.0.0
or newer.Propagate
SpanMarkerModel.from_pretrained
kwargs to the encoder itsAutoModel.from_pretrained
.Ignore
UndefinedMetricWarning
when evaluation f1 is 0.Improved model card generation.
FixedΒΆ
Resolved tricky issue causing models to learn to never predict the last token as an entity (Closes #1).
Fixed label normalization for BILOU datasets.
[0.2.2] - 2023-04-13ΒΆ
FixedΒΆ
Correctly propagate
SpanMarkerModel.from_pretrained
kwargs to the config initialisation.
[0.2.1] - 2023-04-07ΒΆ
AddedΒΆ
Save
span_marker_version
in config files from now on.
ChangedΒΆ
SpanMarkerModel.save_pretrained
andSpanMarkerModel.push_to_hub
now also pushes the tokenizer and a simple model card.
[0.2.0] - 2023-04-06ΒΆ
AddedΒΆ
Added missing docstrings.
ChangedΒΆ
Updated how entity span indices are returned for
SpanMarkerModel.predict
.
FixedΒΆ
Prevent incorrect labels when loading a model trained with a schemed (e.g. IOB, BIOES) dataset.
Fix several bugs with loading finetuned SpanMarker models.
Add missing methods to
SpanMarkerTokenizer
.Fix endless recursion bug when providing a
compute_metrics
to the Trainer.
[0.1.1] - 2023-03-31ΒΆ
FixedΒΆ
Prevent crash when
args
not supplied to Trainer.Prevent crash on evaluation when using
fp16=True
as a Training Argument.
[0.1.0] - 2023-03-30ΒΆ
AddedΒΆ
Implement initial working version.