lib package

Subpackages

Submodules

lib.constants module

This module provides common constants to various other (sub)modules.

In particular the location of the data is defined here in a single location. The definition is always used dynamically so that it may be overridden by tests, which are performed on smaller datasets.

lib.pipeline module

class lib.pipeline.Pipeline(config)

Bases: object

Responsible for the whole AI pipeline:

  1. String to sentences

  2. Post-correction

  3. NER using BERT

  4. NER using lists

  5. Modernisation (performed last so that named entities can be exempted from modernisation

__call__(input_data, steps=None)
Parameters:
  • input_data (Union[str, List[List[Dict]]]) –

    Either a string of historical Dutch text, if the step “string_to_sentences” is involved, or a dictionary corresponding to the json schema if it is not. The format of the input data in the latter case should correspond to some extent to the remaining steps that are to be executed. The table below provides an overview of which steps are required and/or prefered to have been executed for each subsequent step. A required step must either be executed in the same call to the pipeline, or the results of that step must be present in the input data.

    Dependency on previous steps

    Step - requires

    STS

    PC

    BERT

    Post-correction

    yes

    NER using BERT

    yes

    yes [1]

    NER using Lists

    yes

    yes [1]

    depends [2]

    Modernisation

    yes

    yes [1]

    preferably [3]

  • steps (Optional[Tuple[str, …]]) – Which steps to execute within this call to the pipelines

Returns:

Dict with a format according to lib.schema (format.schema.json) Depending on which parts of the pipeline are called, the required input and expected output will be different.

__init__(config)

lib.schema module