ChangelogΒΆ
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[1.7.0]ΒΆ
FixedΒΆ
Fixed compatibility with
transformersv4.47+.
[1.6.0]ΒΆ
FixedΒΆ
Fixed integrations with newer dependency versions, like
transformersandhuggingface_hub.
DeprecatedΒΆ
Deprecated Python 3.8.
[1.5.0]ΒΆ
AddedΒΆ
Added support for BILO tagging schemes.
ChangedΒΆ
Changed the error when an empty sentence is provided to the tokenizer.
Using spaCy
nlp.pipenow processes texts sentence-wise, just like fornlp(...).
FixedΒΆ
No longer override
languagemetadata from the dataset if the language was also set manually viaSpanMarkerModelCardData.No longer crash on
predictwithValueError: Failed to concatenate on axis=1 ...if the first sentence in a list of sentences is just one word.
[1.4.0]ΒΆ
AddedΒΆ
Added
SpanMarkerModel.generate_model_card()method to get a model card string.Added
SpanMarkerModelCardDatathat should be passed toSpanMarkerModel.from_pretrainedwith additional information likelanguage,license,model_name,model_id,encoder_name,encoder_id,dataset_name,dataset_id,dataset_revision.
Added
transformerspipelinesupport, e.g.pipeline(task="span-marker", model="tomaarsen/span-marker-mbert-base-multinerd").
ChangedΒΆ
Heavily improved automatic model card generated.
Evaluating outside of training now returns per-label outputs instead of only βoverallβ F1, precision and recall.
Warn if the used tokenizer distinguishes between punctuation directly attached to a word and punctuation separated from a word by a space.
If so, then inference of that model will require the punctuation to be split from the words.
Improve label normalization speed.
Allow you to call SpanMarkerModel.from_pretrained with a pre-initialized SpanMarkerConfig.
DeprecatedΒΆ
Deprecated Python 3.7.
FixedΒΆ
Fixed tokenization mismatch between training and inference for XLM-RoBERTa models: allows for normal inference of those models.
Resolve niche bug when TrainingArguments are not provided.
[1.3.0]ΒΆ
AddedΒΆ
Added an
overwrite_entitiesparameter to the spaCy pipeline component to allow for overwriting spaCy entities.Added
.pipe()method to spaCy integration to allow for batched inference.
ChangedΒΆ
Stop overwriting spaCy entities by default.
[1.2.5]ΒΆ
FixedΒΆ
Allow for immutable
TrainingArgumentsfrom newertransformersrelease.
[1.2.4]ΒΆ
FixedΒΆ
Resolved broken license information.
[1.2.3]ΒΆ
FixedΒΆ
Fix crash in spaCy inference when using subsequent whitespace.
[1.2.2]ΒΆ
AddedΒΆ
Added support for using
span_markerspaCy pipeline component without importing SpanMarker.
[1.2.1]ΒΆ
AddedΒΆ
Added support for
load_in_8bit=Trueanddevice_map="auto".
[1.2.0]ΒΆ
AddedΒΆ
Added
trained_with_document_contextto the SpanMarkerConfig.Added warnings if a model is trained with document-context and evaluated/inferenced without, or vice versa.
Added
spaCyintegration vianlp.add_pipe("span_marker"). See the SpanMarker with spaCy documentation for information.
ChangedΒΆ
Heavily improved computational efficiency of sample spreading, resulting in notably faster inference speeds.
Disable progress bar for inference by default, and add
show_progress_barparameter toSpanMarkerModel.predict.
FixedΒΆ
Fixed evaluation method failing when the testing dataset contains two adjacent and identical sentences.
[1.1.1]ΒΆ
FixedΒΆ
Add missing space in model card template.
Return nested list if input is a singular list of sentences or a dataset with one sample.
[1.1.0]ΒΆ
AddedΒΆ
Added support for document-level context in training, evaluation and inference.
Use it by supplying
document_idandsentence_idcolumns to the Trainer datasets.Tune it by supplying
max_prev_contextandmax_next_contextto theSpanMarkerConfigviaSpanMarkerModel.from_pretrained(..., max_prev_context=3).
Added batch inference support via
SpanMarkerModel.predict(..., batch_size=4).
ChangedΒΆ
Ensure models are in evaluation mode when using
SpanMarkerModel.predict.
DeprecatedΒΆ
Removed the
allow_overlappingoptional keyword fromSpanMarkerModel.predict
[1.0.1]ΒΆ
FixedΒΆ
Fixed critical issue with incorrect predictions at inputs that require multiple samples.
[1.0.0]ΒΆ
AddedΒΆ
Added a warning for entities that are ignored/skipped due to the maximum entity length or maximum model input length.
Added info-level logs displaying the detected labeling scheme (IOB/IOB2, BIOES, BILOU, none).
Added a warning suggesting to use
model.cuda()when predictions are performed on a CPU while CUDA is available.Added
try_cudamethod toSpanMarkerModelwhich tries to place the model on CUDA and does nothing if that fails.
ChangedΒΆ
Updated where in the input IDs the span markers are stored, results in 40% training and inferencing speed increase.
Updated default
marker_max_lengthin SpanMarkerConfig from 256 to 128.Updated default
entity_max_lengthin SpanMarkerConfig from 16 to 8.Add support for
datasets<2.6.0.Add warning if a
<v1.0.0model is loaded usingv1.0.0or newer.Propagate
SpanMarkerModel.from_pretrainedkwargs to the encoder itsAutoModel.from_pretrained.Ignore
UndefinedMetricWarningwhen evaluation f1 is 0.Improved model card generation.
FixedΒΆ
Resolved tricky issue causing models to learn to never predict the last token as an entity (Closes #1).
Fixed label normalization for BILOU datasets.
[0.2.2] - 2023-04-13ΒΆ
FixedΒΆ
Correctly propagate
SpanMarkerModel.from_pretrainedkwargs to the config initialisation.
[0.2.1] - 2023-04-07ΒΆ
AddedΒΆ
Save
span_marker_versionin config files from now on.
ChangedΒΆ
SpanMarkerModel.save_pretrainedandSpanMarkerModel.push_to_hubnow also pushes the tokenizer and a simple model card.
[0.2.0] - 2023-04-06ΒΆ
AddedΒΆ
Added missing docstrings.
ChangedΒΆ
Updated how entity span indices are returned for
SpanMarkerModel.predict.
FixedΒΆ
Prevent incorrect labels when loading a model trained with a schemed (e.g. IOB, BIOES) dataset.
Fix several bugs with loading finetuned SpanMarker models.
Add missing methods to
SpanMarkerTokenizer.Fix endless recursion bug when providing a
compute_metricsto the Trainer.
[0.1.1] - 2023-03-31ΒΆ
FixedΒΆ
Prevent crash when
argsnot supplied to Trainer.Prevent crash on evaluation when using
fp16=Trueas a Training Argument.
[0.1.0] - 2023-03-30ΒΆ
AddedΒΆ
Implement initial working version.