span_marker.tokenizer module¶
- class span_marker.tokenizer.EntityTracker(entity_max_length, model_max_length, split='train', total_num_entities=0, skipped_entities=<factory>, enabled=False)[source]¶
- Bases: - object- For giving a warning about what percentage of entities are ignored/skipped. - Example: - This SpanMarker model won't be able to predict 5.930931% of all annotated entities in the evaluation dataset. This is caused by the SpanMarkerModel maximum entity length of 6 words and the maximum model input length of 64 tokens. These are the frequencies of the missed entities due to maximum entity length out of 1332 total entities: - 7 missed entities with 7 words (0.525526%) - 2 missed entities with 8 words (0.150150%) - 2 missed entities with 9 words (0.150150%) - 2 missed entities with 13 words (0.150150%) Additionally, a total of 66 (4.954955%) entities were missed due to the maximum input length. - Parameters:
 - add(num_entities)[source]¶
- Add to the counter of total number of entities. - Parameters:
- num_entities (int) – How many entities to increment by. 
- Return type:
- None 
 
 
- class span_marker.tokenizer.SpanMarkerTokenizer(tokenizer, config, **kwargs)[source]¶
- Bases: - object- Parameters:
- tokenizer (PreTrainedTokenizer) – 
- config (SpanMarkerConfig) –