lib package¶
Subpackages¶
Submodules¶
lib.constants module¶
This module provides common constants to various other (sub)modules.
In particular the location of the data is defined here in a single location. The definition is always used dynamically so that it may be overridden by tests, which are performed on smaller datasets.
lib.pipeline module¶
-
class
lib.pipeline.
Pipeline
(config)¶ Bases:
object
Responsible for the whole AI pipeline:
Post-correction
NER using BERT
Modernisation
(performed last so that named entities can be exempted from modernisation
-
__call__
(input_data, steps=None)¶ - Parameters:
input_data (
Union
[str
,List
[List
[Dict
]]]) –Either a string of historical Dutch text, if the step “string_to_sentences” is involved, or a dictionary corresponding to the
json schema
if it is not. The format of the input data in the latter case should correspond to some extent to the remaining steps that are to be executed. The table below provides an overview of which steps are required and/or prefered to have been executed for each subsequent step. A required step must either be executed in the same call to the pipeline, or the results of that step must be present in the input data.¶ Step - requires
STS
PC
BERT
Post-correction
yes
NER using BERT
yes
yes [1]
NER using Lists
yes
yes [1]
depends [2]
Modernisation
yes
yes [1]
preferably [3]
steps (
Optional
[Tuple
[str
, …]]) – Which steps to execute within this call to the pipelines
- Returns:
Dict with a format according to lib.schema (format.schema.json) Depending on which parts of the pipeline are called, the required input and expected output will be different.
-
__init__
(config)¶