Reference¶

The following sections describe the available classes, methods and properties of Greynir.

Separate sections describe grammar Nonterminals and Terminals.

The following classes are documented herein:

The Greynir class
The _Job class
The _Paragraph class
The _Sentence class
The NounPhrase class
The SimpleTree class

Initializing Greynir¶

After installing the reynir package (see Installation), use the following code to import it and initialize an instance of the Greynir class:

from reynir import Greynir
g = Greynir()

Now you can use the g instance to parse text, by calling the Greynir.submit(), Greynir.parse() and/or Greynir.parse_single() methods on it. To tokenize text without parsing it, you can call Greynir.tokenize().

If you are only going to be using the NounPhrase class, you don’t need to initialize a Greynir instance.

The Greynir instance

It is recommended to initialize only one instance of the Greynir class for the duration of your program/process, since each instance needs to read its own configuration data. This includes the compressed Database of Modern Icelandic Inflection (BÍN) which occupies about 60 megabytes of memory. However, if you run Greynir in multiple processes, BÍN will – under most operating systems – only be mapped once into the computer’s physical address space.

The Greynir class¶

class Greynir¶

__init__(self, **options)¶

Parameters

options –

Tokenizer options can be passed via keyword arguments, as in g = Greynir(convert_numbers=True). See the documentation for the Tokenizer package for further information.

Additionally, if the parameter parse_foreign_sentences=True is given, the parser will attempt to parse all sentences, even those that seem to be in a foreign language. The default is not to try to parse sentences where >= 50% of the tokens are not found in DMII/BÍN.

Initializes the Greynir instance.

tokenize(self, text: StringIterable) → Iterable[Tok]¶

Parameters: text (StringIterable) – A string or an iterable of strings, containing the text to tokenize.
Returns: A generator of tokenizer.Tok instances.

Tokenizes a string or an iterable of strings, returning a generator of tokenizer.Tok instances. The returned tokens include a val attribute populated with word meanings, lemmas and inflection paradigms from DMII/BÍN, or, in the case of person names, information about gender and case.

The tokenizer options given in the class constructor are automatically passed to the tokenizer.

parse_single(self, sentence: str, *, max_sent_tokens: int = 90) → Optional[_Sentence]¶

Parameters

sentence (str) – The single sentence to parse.
max_sent_tokens (int) – If given, this specifies the maximum number of tokens that a sentence may contain for Greynir to attempt to parse it. The default is 90 tokens. In practice, sentences longer than this are expensive to parse in terms of memory use and processor time. This parameter can be used to make Greynir more brave in its parsing attempts, by specifying a higher number than 90. Setting it to None or zero disables the length limit. Note that the default may be increased from 90 in future versions of Greynir.

Returns

A _Sentence object, or None if no sentence could be extracted from the string.

Parses a single sentence from a string and returns a corresponding _Sentence object.

The given sentence string is tokenized. An internal parse job is created and the first sentence found in the string is parsed. Paragraph markers are ignored.

A single _Sentence object is returned. If the sentence could not be parsed, _Sentence.tree is None and _Sentence.combinations is zero.

Example:

from reynir import Greynir
g = Greynir()
my_text = "Litla gula hænan fann fræ"
sent = g.parse_single(my_text)
if sent.tree is None:
    print("The sentence could not be parsed.")
else:
    print("The parse tree for '{0}' is:\n{1}"
        .format(sent.tidy_text, sent.tree.view))

Output:

The parse tree for 'Litla gula hænan fann fræ' is:
S0
+-S-MAIN
  +-IP
    +-NP-SUBJ
      +-lo_nf_et_kvk: 'Litla'
      +-lo_nf_et_kvk: 'gula'
      +-no_et_nf_kvk: 'hænan'
    +-VP
      +-VP
        +-so_1_þf_et_p3: 'fann'
      +-NP-OBJ
        +-no_et_þf_hk: 'fræ'

parse_tokens(self, tokens: Iterable[Tok], *, max_sent_tokens: int = 90) → Optional[_Sentence]¶

Parameters

tokens (Iterable[Tok]) – An iterable of tokens to parse.
max_sent_tokens (int) – A maximum number of tokens to attempt to parse. For longer sentences, an empty _Sentence object is returned, i.e. one where the tree attribute is None.

Returns

A _Sentence object, or None if no sentence could be extracted from the token iterable.

Parses a single sentence from an iterable of tokens, and returns a corresponding _Sentence object. Except for the input parameter type, the functionality is identical to parse_single().

submit(self, text: str, parse: bool = False, *, split_paragraphs: bool = False, progress_func: Callable[[float], None] = None, max_sent_tokens: int = 90) → _Job ¶

Submits a text string to Greynir for parsing and returns a _Job object.

Parameters

text (str) – The text to parse. Can be a single sentence or multiple sentences.
parse (bool) – Controls whether the text is parsed immediately or upon demand. Defaults to False.
split_paragraphs (bool) – Indicates that the text should be split into paragraps, with paragraph breaks at newline characters (\n). Defaults to False.
progress_func (Callable[[float],None]) – If given, this function will be called periodically during the parse job. The call will have a single float parameter, ranging from 0.0 at the beginning of the parse job, to 1.0 at the end. Defaults to None.
max_sent_tokens (int) – If given, this specifies the maximum number of tokens that a sentence may contain for Greynir to attempt to parse it. The default is 90 tokens. In practice, sentences longer than this are expensive to parse in terms of memory use and processor time. This parameter can be used to make Greynir more brave in its parsing attempts, by specifying a higher number than 90. Setting it to None or zero disables the length limit. Note that the default may be increased from 90 in future versions of Greynir.

Returns

A fresh _Job object.

The given text string is tokenized and split into paragraphs and sentences. If the parse parameter is True, the sentences are parsed immediately, before returning from the method. Otherwise, parsing is incremental (on demand) and is invoked by calling _Sentence.parse() explicitly on each sentence.

Returns a _Job object which supports iteration through the paragraphs (via _Job.paragraphs()) and sentences (via _Job.sentences() or _Job.__iter__()) of the parse job.

parse(self, text: str, *, progress_func: Callable[[float], None] = None, max_sent_tokens: int = 90) → dict¶

Parses a text string and returns a dictionary with the parse job results.

Parameters

text (str) – The text to parse. Can be a single sentence or multiple sentences.
progress_func (Callable[[float],None]) – If given, this function will be called periodically during the parse job. The call will have a single float parameter, ranging from 0.0 at the beginning of the parse job, to 1.0 at the end. Defaults to None.
max_sent_tokens (int) – If given, this specifies the maximum number of tokens that a sentence may contain for Greynir to attempt to parse it. The default is 90 tokens. In practice, sentences longer than this are expensive to parse in terms of memory use and processor time. This parameter can be used to make Greynir more brave in its parsing attempts, by specifying a higher number than 90. Setting it to None or zero disables the length limit. Note that the default may be increased from 90 in future versions of Greynir.

Returns

A dictionary containing the parse results as well as statistics from the parse job.

The given text string is tokenized and split into sentences. An internal parse job is created and the sentences are parsed. The resulting _Sentence objects are returned in a list in the sentences field in the dictionary. The text is treated as one contiguous paragraph.

The result dictionary contains the following items:

sentences: A list of _Sentence objects corresponding to the sentences found in the text. If a sentence could not be parsed, the corresponding object’s tree property will be None.
num_sentences: The number of sentences found in the text.
num_parsed: The number of sentences that were successfully parsed.
ambiguity: A float weighted average of the ambiguity of the parsed sentences. Ambiguity is defined as the n-th root of the number of possible parse trees for the sentence, where n is the number of tokens in the sentence.
parse_time: A float with the wall clock time, in seconds, spent on tokenizing and parsing the sentences, including finding the best parse trees.
reduce_time: A float with the wall clock time, in seconds, spent on finding the best parse tree in each parse forest. This time is included in the parse_time.

Example (try it!):

from reynir import Greynir
g = Greynir()
my_text = "Litla gula hænan fann fræ. Það var hveitifræ."
d = g.parse(my_text)
print("{0} sentences were parsed".format(d["num_parsed"]))
for sent in d["sentences"]:
    print("The parse tree for '{0}' is:\n{1}"
        .format(
            sent.tidy_text,
            "[Null]" if sent.tree is None else sent.tree.flat
        )
    )

dumps_single(self, sent: _Sentence, **kwargs) → str¶

Parameters

sent (_Sentence) – The _Sentence object to dump in JSON format.
kwargs – Optional keyword parameters to be passed to the standard library’s json.dumps() function.

Returns

A JSON string.

Dumps a _Sentence object to a JSON string. Use Greynir.loads_single() to re-create a _Sentence instance from a JSON string.

loads_single(self, json_str: str, **kwargs) → _Sentence ¶

Parameters

json_str (str) – The JSON string to load back into a _Sentence object.
kwargs – Optional keyword parameters to be passed to the standard library’s json.loads() function.

Returns

A _Sentence object constructed from the JSON string.

Constructs a _Sentence instance from a JSON string.

classmethod cleanup(cls)¶

Deallocates memory resources allocated by __init__().

If your code has finished using Greynir and you want to free up the memory allocated for its resources, including the 60 megabytes for the Database of Modern Icelandic Inflection (BÍN), call Greynir.cleanup().

After calling Greynir.cleanup() the functionality of Greynir is no longer available via existing instances of Greynir. However, you can initialize new instances (via g = Greynir()), causing the configuration to be re-read and memory to be allocated again.

The _Job class¶

Instances of this class are returned from Greynir.submit(). You should not need to instantiate it yourself, hence the leading underscore in the class name.

class _Job¶

paragraphs(self) → Iterable[_Paragraph]¶

Returns a generator of _Paragraph objects, corresponding to paragraphs in the parsed text. Paragraphs are assumed to be delimited by [[ and ]] markers in the text, surrounded by whitespace. These markers are optional and not required. If they are not present, the text is assumed to be one contiguous paragraph.

Example:

from reynir import Greynir
g = Greynir()
my_text = ("[[ Þetta er fyrsta efnisgreinin. Hún er stutt. ]] "
    "[[ Hér er önnur efnisgreinin. Hún er líka stutt. ]]")
j = g.submit(my_text)
for pg in j.paragraphs():
    for sent in pg:
        print(sent.tidy_text)
    print()

Output:

Þetta er fyrsta efnisgreinin.
Hún er stutt.

Hér er önnur efnisgreinin.
Hún er líka stutt.

sentences(self) → Iterable[_Sentence]¶: Returns a generator of _Sentence objects. Each object corresponds to a sentence in the parsed text. If the sentence has already been successfully parsed, its _Sentence.tree property will contain its (best) parse tree. Otherwise, the property is None.

__iter__(self) → Iterable[_Sentence]¶

A shorthand for calling _Job.sentences(), supporting the Python iterator protocol. You can iterate through the sentences of a parse job via a for loop:

for sent in job:
    sent.parse()
    # Do something with sent

num_sentences¶: Returns an int with the accumulated number of sentences that have been submitted for parsing via this job.

num_parsed¶: Returns an int with the accumulated number of sentences that have been sucessfully parsed via this job.

num_tokens¶: Returns an int with the accumulated number of tokens in sentences that have been submitted for parsing via this job.

num_combinations¶: Returns an int with the accumulated number of parse tree combinations for the sentences that have been successfully parsed via this job.

ambiguity¶: Returns a float with the weighted average ambiguity factor of the sentences that have been successfully parsed via this job. The ambiguity factor of a sentence is defined as the n-th root of the total number of parse tree combination for the sentence, where n is the number of tokens in the sentence. The average across sentences is weighted by token count.

parse_time¶: Returns a float with the accumulated wall clock time, in seconds, that has been spent parsing sentences via this job.

The _Paragraph class¶

Instances of this class are returned from _Job.paragraphs(). You should not need to instantiate it yourself, hence the leading underscore in the class name.

class _Paragraph¶

sentences(self) → Iterable[_Sentence]¶: Returns a generator of _Sentence objects. Each object corresponds to a sentence within the paragraph in the parsed text. If the sentence has already been successfully parsed, its _Sentence.tree property will contain its (best) parse tree. Otherwise, the property is None.

__iter__(self) → Iterable[_Sentence]¶

A shorthand for calling _Paragraph.sentences(), supporting the Python iterator protocol. You can iterate through the sentences of a paragraph via a for loop:

for pg in job.paragraphs():
    for sent in pg:
        sent.parse()
        # Do something with sent

The _Sentence class¶

Instances of this class are returned from _Job.sentences() and _Job.__iter__(). You should not need to instantiate it yourself, hence the leading underscore in the class name.

class _Sentence¶

__len__(self) → int¶: Returns an int with the number of tokens in the sentence.

text¶

Returns a str with the raw text representation of the sentence, with spaces between all tokens. For a more correctly formatted version of the text, use the _Sentence.tidy_text property instead.

Example:

from reynir import Greynir
g = Greynir()
s = g.parse_single("Jón - faðir Ásgeirs - átti 2/3 hluta "
    "af landinu árin 1944-1950.")
print(s.text)

Output (note the intervening spaces, also before the period at the end):

Jón - faðir Ásgeirs - átti 2/3 hluta af landinu árin 1944 - 1950 .

__str__(self) → str¶: Returns a str with the raw text representation of the sentence, with spaces between all tokens. For a more correctly formatted version of the text, use the _Sentence.tidy_text property instead.

tidy_text¶

Returns a str with a text representation of the sentence, with correct spacing between tokens, and em- and en-dashes substituted for regular hyphens as appropriate.

Example:

from reynir import Greynir
g = Greynir()
s = g.parse_single("Jón - faðir Ásgeirs - átti 2/3 hluta "
    "af landinu árin 1944-1950.")
print(s.tidy_text)

Output (note the dashes and the period at the end):

Jón — faðir Ásgeirs — átti 2/3 hluta af landinu árin 1944–1950.

tokens¶

Returns a list of tokens in the sentence. Each token is represented by a Tok namedtuple instance from the Tokenizer package.

Example:

from reynir import Greynir, TOK
g = Greynir()
s = g.parse_single("5. janúar sá Ása 5 sólir.")
for t in s.tokens:
    print(TOK.descr[t.kind], t.txt)

outputs:

DATE 5. janúar
WORD sá
PERSON Ása
NUMBER 5
WORD sólir
PUNCTUATION .

parse(self) → bool¶: Parses the sentence (unless it has already been parsed) and returns True if at least one parse tree was found, or False otherwise. For successfully parsed sentences, _Sentence.tree contains the best parse tree. Otherwise, _Sentence.tree is None. If the parse is not successful, the 0-based index of the token where the parser gave up is stored in _Sentence.err_index.

error¶: Returns a ParseError instance if an error was found during the parsing of the sentence, or None otherwise. ParseError is an exception class, derived from Exception. It can be converted to str to obtain a human-readable error message.

err_index¶: Returns an int with the 0-based index of the token where the parser could not find any grammar production to continue the parse, or None if the sentence has not been parsed yet or if no error occurred during the parse.

combinations¶: Returns an int with the number of possible parse trees for the sentence, or 0 if no parse trees were found, or None if the sentence hasn’t been parsed yet.

score¶: Returns an int representing the score that the best parse tree got from the scoring heuristics of Greynir. The score is 0 if the sentence has not been successfully parsed.

tree¶

Returns a SimpleTree object representing the best (highest-scoring) parse tree for the sentence, in a simplified form that is easy to work with.

If the sentence has not yet been parsed, or no parse tree was found for it, this property is None.

deep_tree¶

Returns the best (highest-scoring) parse tree for the sentence, in a detailed form corresponding directly to Greynir’s context-free grammar for Icelandic.

If the sentence has not yet been parsed, or no parse tree was found for it, this property is None.

Example:

from reynir import Greynir
g = Greynir()
s = g.parse_single("Ása sá sól.")
print(repr(s.deep_tree))

Output:

S0
  Málsgrein
    MgrInnihald
      Yfirsetning
        HreinYfirsetning
          Setning
            Setning_et_p3_kvk
              BeygingarliðurÁnUmröðunar_et_p3_kvk
                NlFrumlag_nf_et_p3_kvk
                  Nl_et_p3_nf_kvk
                    NlEind_et_p3_nf_kvk
                      NlStak_et_p3_nf_kvk
                        NlStak_p3_et_nf_kvk
                          NlKjarni_et_nf_kvk
                            Fyrirbæri_nf_kvk
                              'Ása' -> no_et_nf_kvk
                BeygingarliðurMegin_et_p3_kvk
                  SagnRuna_et_p3_kvk
                    SagnRunaKnöpp_et_p3_kvk
                      Sagnliður_et_p3_kvk
                        Sögn_1_et_p3_kvk
                          'sá' -> so_1_þf_et_p3
                          NlBeintAndlag_þf
                            Nl_þf
                              NlEind_et_p3_þf_kvk
                                NlStak_et_p3_þf_kvk
                                  NlStak_p3_et_þf_kvk
                                    NlKjarni_et_þf_kvk
                                      Fyrirbæri_þf_kvk
                                        'sól' -> no_et_þf_kvk
      Lokatákn?
        Lokatákn
          '.' -> "."

flat_tree¶

Returns the best (highest-scoring) parse tree for the sentence, simplified and flattened to a text string. Nonterminal scopes are delimited like so: NAME ... /NAME where NAME is the name of the nonterminal, for example NP for noun phrases and VP for verb phrases. Terminals have lower-case identifiers with their various grammar variants separated by underscores, e.g. no_þf_kk_et for a noun, accusative case, masculine gender, singular.

If the sentence has not yet been parsed, or no parse tree was found for it, this property is None.

Example:

from reynir import Greynir
g = Greynir()
s = g.parse_single("Seldum fasteignum hefur fjölgað.")
print(s.flat_tree)

Output:

S0 S-MAIN IP NP-SUBJ lo_þgf_ft_kvk no_ft_þgf_kvk /NP-SUBJ VP VP-AUX so_et_p3 /VP-AUX VP so_sagnb /VP /VP /IP /S-MAIN p /S0

terminals¶

Returns a list of the terminals in the best parse tree for the sentence, in the order in which they occur in the sentence (token order). Each terminal corresponds to a token in the sentence. The entry for each terminal is a typing.NamedTuple called Terminal, having five fields:

text: The token text.
lemma: The lemma of the word, if the token is a word, otherwise it is the text of the token. Lemmas of composite words include hyphens - at the component boundaries. Examples: borgar-stjórnarmál, skugga-kosning.
category: The word category (no for noun, so for verb, etc.)
variants: A list of the grammatical variants for the word or token, or an empty list if not applicable. The variants include the case (nf, þf, þgf, ef), gender (kvk, kk, hk), person, verb form, adjective degree, etc. This list identical to the one returned from SimpleTree.all_variants for the terminal in question.
index: The index of the token that corresponds to this terminal. The index is 0-based.

If the sentence has not yet been parsed, or no parse tree was found for it, this property is None.

Example:

from reynir import Greynir
g = Greynir()
s = g.parse_single("Þórgnýr fór út og fékk sér ís.")
for t in s.terminals:
    print("{0:8s} {1:8s} {2:8s} {3}"
        .format(t.text, t.lemma, t.category,
            ", ".join(t.variants)))

Output:

Þórgnýr  Þórgnýr  person   nf, kk
fór      fara     so       0, et, fh, gm, p3, þt
út       út       ao
og       og       st
fékk     fá       so       2, þgf, þf, et, fh, gm, p3, þt
sér      sig      abfn     þgf
ís       ís       no       et, kk, þf
.        .

(The line for fékk means that this is the verb (so) fá, having two arguments (2) in dative case (þgf) and accusative case (þf); it is singular (et), indicative (fh), active voice (gm), in the third person (p3), and in past tense (þt). See Variants for a detailed explanation.)

lemmas¶

Returns a list of the lemmas of the words in the sentence, or the text of the token for non-word tokens. sent.lemmas is a shorthand for [ t.lemma for t in sent.terminals ].

Lemmas of composite words include hyphens - at the component boundaries. Examples: borgar-stjórnarmál, skugga-kosning.

If the sentence has not yet been parsed, or no parse tree was found for it, this property is None.

Example:

from reynir import Greynir
g = Greynir()
s = g.parse_single(
    "Gullsópur ehf. keypti árið 1984 verðlaunafasteignina "
    "að Laugavegi 26."
)
print(s.lemmas)

Output:

['gullsópur', 'ehf.', 'kaupa', 'árið 1984', 'verðlauna-fasteign',
'að', 'Laugavegur', '26', '.']

categories¶

Returns a list of the categories of the words in the sentence, or "" for non-word tokens. sent.categories is a shorthand for [ d.cat for d in sent.terminal_nodes ].

The categories returned are those of the token associated with each terminal, according to BÍN’s category scheme. Nouns (including person names) thus have categories of kk, kvk or hk, for masculine, feminine and neutral gender, respectively. Unrecognized words have the entity category.

If the sentence has not yet been parsed, or no parse tree was found for it, this property is None.

Example:

from reynir import Greynir
g = Greynir()
s = g.parse_single(
    "Gullsópur ehf. keypti árið 1984 verðlaunafasteignina "
    "að Laugavegi 26."
)
print(s.categories)

Output:

['kk', 'hk', 'so', '', 'kvk', 'fs', 'kk', '', '']

lemmas_and_cats¶

Returns a list of (lemma, category) tuples corresponding to the tokens in the sentence. sent.lemmas_and_cats is a shorthand for [ (d.lemma, d.lemma_cat) for d in sent.terminal_nodes ].

For non-word tokens, the lemma is the original token text and the category is an empty string ("").

For person names, the category is person_kk, person_kvk or person_hk for masculine, feminine or neutral gender names, respectively. For unknown words, the category is entity.

Lemmas of composite words include hyphens - at the component boundaries. Examples: borgar-stjórnarmál, skugga-kosning.

This property is intended to be useful inter alia for topic indexing of text. A good strategy for that purpose could be to index all lemmas having a non-empty category, perhaps also discarding some less significant categories (such as conjunctions).

If the sentence has not yet been parsed, or no parse tree was found for it, this property is None.

Example:

from reynir import Greynir
g = Greynir()
s = g.parse_single(
    "Hallbjörn borðaði ísinn kl. 14 meðan Icelandair át 3 teppi "
    "frá Íran og Xochitl var tilbeðin."
)
print(s.lemmas_and_cats)

Output:

[('Hallbjörn', 'person_kk'), ('borða', 'so'), ('ís', 'kk'),
('kl. 14', ''), ('meðan', 'st'), ('Icelandair', 'entity'),
('éta', 'so'), ('3', ''), ('teppi', 'hk'), ('frá', 'fs'),
('Íran', 'hk'), ('og', 'st'), ('Xochitl', 'entity'),
('vera', 'so'), ('tilbiðja', 'so'), ('.', '')]

terminal_nodes¶

Returns a list of the subtrees (SimpleTree instances) that correspond to terminals in the parse tree for this sentence, in the order in which they occur (token order).

If the sentence has not yet been parsed, or no parse tree was found for it, this property is None.

is_foreign(self, min_icelandic_ratio: float = 0.5) → bool¶

Parameters: min_icelandic_ratio (float) – The minimum ratio of word tokens that must be found in BÍN for a sentence to be considered Icelandic. Defaults to 0.5.

Returns True if the sentence is probably in a foreign language, i.e. not Icelandic. A sentence is probably foreign if it contains at least three word tokens and, out of those, less than 50% are found in the BÍN database. The 50% threshold is adjustable by overriding the min_icelandic_ratio parameter.

The NounPhrase class¶

The NounPhrase class conveniently encapsulates an Icelandic noun phrase (nafnliður), making it easy to obtain correctly inflected forms of the phrase, as required in various contexts.

class NounPhrase¶

__init__(self, np_string: str, *, force_number: str = None)¶

Creates a NounPhrase instance.

Parameters

np_string (str) –
The text string containing the noun phrase (nafnliður). The noun phrase must conform to the grammar specified for the Nl nonterminal in Greynir.grammar. This grammar allows e.g. number, adjective and adverb prefixes, referential phrases (…sem…) and prepositional phrases (…í…). Examples of valid noun phrases include:
- stóri kraftalegi maðurinn sem ég sá í bænum,
- ofboðslega bragðgóði lakkrísinn í nýju umbúðunum, and
- rúmlega 20 millilítrar af kardemommudropum með vanillu.
If the noun phrase cannot be parsed or is empty, the NounPhrase.parsed property will be False and all inflection properties will return None.
force_number (str) – An optional string that can contain "et" or "singular", or "ft" or "plural". If given, it forces the parsing of the noun phrase to be constrained to singular or plural forms, respectively. As an example, NounPhrase("eyjar", force_number="ft") yields a plural result (nominative "eyjar"), while NounPhrase("eyjar") without forcing yields a singular result (nominative "ey").

__str__(self) → str¶: Returns the original noun phrase string as passed to the constructor.

__len__(self) → int¶: Returns the length of the original noun phrase string.

__format__(self, spec: str) → str¶

Formats a noun phrase in the requested inflection form. Works with Python’s format() function as well as in f-strings (available starting with Python 3.6).

Parameters

spec (str) –

An inflection specification for the string to be returned. This can be one of the following:

nf or nom: Nominative case (nefnifall).
þf or acc: Accusative case (þolfall).
þgf or dat: Dative case (þágufall).
ef or gen: Genitive case (eignarfall).
ángr or ind: Indefinite, nominative form (nefnifall án greinis).
stofn or can: Canonical, nominative singular form without attached prepositions or referential phrases (nefnifall eintölu án greinis, án forsetningarliða og tilvísunarsetninga).

Returns

The noun phrase in the requested inflection form, as a string.

Example:

from reynir import NounPhrase as Nl

nl = Nl("blesóttu hestarnir mínir")

print("Hér eru {nl:nf}.".format(nl=nl))
print("Mér þykir vænt um {nl:þf}.".format(nl=nl))
print("Ég segi öllum frá {nl:þgf}.".format(nl=nl))
print("Ég vil tryggja velferð {nl:ef}.".format(nl=nl))
print("Já, {nl:ángr}, þannig er það.".format(nl=nl))
print("Umræðuefnið hér er {nl:stofn}.".format(nl=nl))

# Starting with Python 3.6, f-strings are supported:
print(f"Hér eru {nl:nf}.")  # etc.

Output:

Hér eru blesóttu hestarnir mínir.
Mér þykir vænt um blesóttu hestana mína.
Ég segi öllum frá blesóttu hestunum mínum.
Ég vil tryggja velferð blesóttu hestanna minna.
Já, blesóttir hestar mínir, þannig er það.
Umræðuefnið hér er blesóttur hestur minn.

parsed¶: Returns True if the noun phrase was successfully parsed, or False if not.

tree¶: Returns a SimpleTree object encapsulating the parse tree for the noun phrase.

case¶: Returns a string denoting the case of the noun phrase, as originally passed to the constructor. The case is one of "nf", "þf", "þgf" or "ef", denoting nominative, accusative, dative or genitive case, respectively. If the noun phrase could not be parsed, the property returns None.

number¶: Returns a string denoting the number (singular/plural) of the noun phrase, as originally passed to the constructor. The number is either "et" (singular, eintala) or "ft" (plural, fleirtala). If the noun phrase could not be parsed, the property returns None.

person¶: Returns a string denoting the person (1st, 2nd, 3rd) of the noun phrase, as originally passed to the constructor. The returned string is one of "p1", "p2" or "p3" for first, second or third person, respectively. If the noun phrase could not be parsed, the property returns None.

gender¶: Returns a string denoting the gender (masculine, feminine, neutral) of the noun phrase, as originally passed to the constructor. The returned string is one of "kk", "kvk" or "hk" for masculine (karlkyn), feminine (kvenkyn) or neutral (hvorugkyn), respectively. If the noun phrase could not be parsed, the property returns None.

nominative¶: Returns a string with the noun phrase in nominative case (nefnifall), or None if the noun phrase could not be parsed.

accusative¶: Returns a string with the noun phrase in accusative case (þolfall), or None if the noun phrase could not be parsed.

dative¶: Returns a string with the noun phrase in dative case (þágufall), or None if the noun phrase could not be parsed.

genitive¶: Returns a string with the noun phrase in genitive case (eignarfall), or None if the noun phrase could not be parsed.

indefinite¶: Returns a string with the noun phrase in indefinite form, nominative case (nefnifall án greinis), or None if the noun phrase could not be parsed.

canonical¶: Returns a string with the noun phrase in singular, indefinite form, nominative case, where referential phrases (…sem…) and prepositional phrases (…í…) have been removed. If the noun phrase could not be parsed, None is returned.

Navigation

Related Topics

Reference¶

Initializing Greynir¶

The Greynir class¶

The _Job class¶

The _Paragraph class¶

The _Sentence class¶

The NounPhrase class¶