3.12.1. sciexp2.common.text.Extractor

Methods

extract(text)

Apply extraction to given text.

class Extractor(template)

Bases: object

Extract a dict with the variable values that match a given template.

Variables and sections on the template are used to define regular expressions, following Python’s syntax.

Parameters:
templatestr

Template text to extract from.

extract(text)

Apply extraction to given text.

Parameters:
textstr

Text to extract from.

Examples

You can perform simple text extractions, where variables correspond to the simple regex .+:

>>> e = Extractor('Hello {{a}}')
>>> e.extract('Hello world')
{'a': 'world'}
>>> e.extract('Hello 123!')
{'a': '123!'}

More complex regexes can be specified using section tags:

>>> Extractor('Hello {{#a}}[0-9]+{{/a}}.*').extract('Hello 123!')
{'a': 123}

And using the same variable on multiple tags ensures they all match the same contents:

>>> extracted = Extractor('{{#a}}[0-9]+{{/a}}.*{{a}}{{b}}').extract('123-123456')
>>> extracted == {'a': 123, 'b': 456}
True