Reference¶
The following sections describe the available classes, methods and properties of Greynir.
Separate sections describe grammar Nonterminals and Terminals.
The following classes are documented herein:
The
Greynir
classThe
_Job
classThe
_Paragraph
classThe
_Sentence
classThe
NounPhrase
classThe
SimpleTree
class
Initializing Greynir¶
After installing the reynir
package (see Installation),
use the following code to import it and initialize an instance of
the Greynir
class:
from reynir import Greynir
g = Greynir()
Now you can use the g
instance to parse text, by calling
the Greynir.submit()
, Greynir.parse()
and/or
Greynir.parse_single()
methods on it. To tokenize
text without parsing it, you can call Greynir.tokenize()
.
If you are only going to be using the NounPhrase
class,
you don’t need to initialize a Greynir
instance.
The Greynir instance
It is recommended to initialize only one instance of the Greynir class for the duration of your program/process, since each instance needs to read its own configuration data. This includes the compressed Database of Modern Icelandic Inflection (BÍN) which occupies about 60 megabytes of memory. However, if you run Greynir in multiple processes, BÍN will – under most operating systems – only be mapped once into the computer’s physical address space.
The Greynir class¶
-
class
Greynir
¶ -
__init__
(self, **options)¶ - Parameters
options –
Tokenizer options can be passed via keyword arguments, as in
g = Greynir(convert_numbers=True)
. See the documentation for the Tokenizer package for further information.Additionally, if the parameter
parse_foreign_sentences=True
is given, the parser will attempt to parse all sentences, even those that seem to be in a foreign language. The default is not to try to parse sentences where >= 50% of the tokens are not found in DMII/BÍN.
Initializes the
Greynir
instance.
-
tokenize
(self, text: StringIterable) → Iterable[Tok]¶ - Parameters
text (StringIterable) – A string or an iterable of strings, containing the text to tokenize.
- Returns
A generator of tokenizer.Tok instances.
Tokenizes a string or an iterable of strings, returning a generator of tokenizer.Tok instances. The returned tokens include a
val
attribute populated with word meanings, lemmas and inflection paradigms from DMII/BÍN, or, in the case of person names, information about gender and case.The tokenizer options given in the class constructor are automatically passed to the tokenizer.
-
parse_single
(self, sentence: str, *, max_sent_tokens: int = 90) → Optional[_Sentence]¶ - Parameters
sentence (str) – The single sentence to parse.
max_sent_tokens (int) – If given, this specifies the maximum number of tokens that a sentence may contain for Greynir to attempt to parse it. The default is 90 tokens. In practice, sentences longer than this are expensive to parse in terms of memory use and processor time. This parameter can be used to make Greynir more brave in its parsing attempts, by specifying a higher number than 90. Setting it to
None
or zero disables the length limit. Note that the default may be increased from 90 in future versions of Greynir.
- Returns
A
_Sentence
object, orNone
if no sentence could be extracted from the string.
Parses a single sentence from a string and returns a corresponding
_Sentence
object.The given sentence string is tokenized. An internal parse job is created and the first sentence found in the string is parsed. Paragraph markers are ignored.
A single
_Sentence
object is returned. If the sentence could not be parsed,_Sentence.tree
isNone
and_Sentence.combinations
is zero.Example:
from reynir import Greynir g = Greynir() my_text = "Litla gula hænan fann fræ" sent = g.parse_single(my_text) if sent.tree is None: print("The sentence could not be parsed.") else: print("The parse tree for '{0}' is:\n{1}" .format(sent.tidy_text, sent.tree.view))
Output:
The parse tree for 'Litla gula hænan fann fræ' is: S0 +-S-MAIN +-IP +-NP-SUBJ +-lo_nf_et_kvk: 'Litla' +-lo_nf_et_kvk: 'gula' +-no_et_nf_kvk: 'hænan' +-VP +-VP +-so_1_þf_et_p3: 'fann' +-NP-OBJ +-no_et_þf_hk: 'fræ'
-
parse_tokens
(self, tokens: Iterable[Tok], *, max_sent_tokens: int = 90) → Optional[_Sentence]¶ - Parameters
tokens (Iterable[Tok]) – An iterable of tokens to parse.
max_sent_tokens (int) – A maximum number of tokens to attempt to parse. For longer sentences, an empty
_Sentence
object is returned, i.e. one where thetree
attribute isNone
.
- Returns
A
_Sentence
object, orNone
if no sentence could be extracted from the token iterable.
Parses a single sentence from an iterable of tokens, and returns a corresponding
_Sentence
object. Except for the input parameter type, the functionality is identical toparse_single()
.
-
submit
(self, text: str, parse: bool = False, *, split_paragraphs: bool = False, progress_func: Callable[[float], None] = None, max_sent_tokens: int = 90) → _Job¶ Submits a text string to Greynir for parsing and returns a
_Job
object.- Parameters
text (str) – The text to parse. Can be a single sentence or multiple sentences.
parse (bool) – Controls whether the text is parsed immediately or upon demand. Defaults to
False
.split_paragraphs (bool) – Indicates that the text should be split into paragraps, with paragraph breaks at newline characters (
\n
). Defaults toFalse
.progress_func (Callable[[float],None]) – If given, this function will be called periodically during the parse job. The call will have a single
float
parameter, ranging from0.0
at the beginning of the parse job, to1.0
at the end. Defaults toNone
.max_sent_tokens (int) – If given, this specifies the maximum number of tokens that a sentence may contain for Greynir to attempt to parse it. The default is 90 tokens. In practice, sentences longer than this are expensive to parse in terms of memory use and processor time. This parameter can be used to make Greynir more brave in its parsing attempts, by specifying a higher number than 90. Setting it to
None
or zero disables the length limit. Note that the default may be increased from 90 in future versions of Greynir.
- Returns
A fresh
_Job
object.
The given text string is tokenized and split into paragraphs and sentences. If the
parse
parameter isTrue
, the sentences are parsed immediately, before returning from the method. Otherwise, parsing is incremental (on demand) and is invoked by calling_Sentence.parse()
explicitly on each sentence.Returns a
_Job
object which supports iteration through the paragraphs (via_Job.paragraphs()
) and sentences (via_Job.sentences()
or_Job.__iter__()
) of the parse job.
-
parse
(self, text: str, *, progress_func: Callable[[float], None] = None, max_sent_tokens: int = 90) → dict¶ Parses a text string and returns a dictionary with the parse job results.
- Parameters
text (str) – The text to parse. Can be a single sentence or multiple sentences.
progress_func (Callable[[float],None]) – If given, this function will be called periodically during the parse job. The call will have a single
float
parameter, ranging from0.0
at the beginning of the parse job, to1.0
at the end. Defaults toNone
.max_sent_tokens (int) – If given, this specifies the maximum number of tokens that a sentence may contain for Greynir to attempt to parse it. The default is 90 tokens. In practice, sentences longer than this are expensive to parse in terms of memory use and processor time. This parameter can be used to make Greynir more brave in its parsing attempts, by specifying a higher number than 90. Setting it to
None
or zero disables the length limit. Note that the default may be increased from 90 in future versions of Greynir.
- Returns
A dictionary containing the parse results as well as statistics from the parse job.
The given text string is tokenized and split into sentences. An internal parse job is created and the sentences are parsed. The resulting
_Sentence
objects are returned in a list in thesentences
field in the dictionary. The text is treated as one contiguous paragraph.The result dictionary contains the following items:
sentences
: A list of_Sentence
objects corresponding to the sentences found in the text. If a sentence could not be parsed, the corresponding object’stree
property will beNone
.num_sentences
: The number of sentences found in the text.num_parsed
: The number of sentences that were successfully parsed.ambiguity
: Afloat
weighted average of the ambiguity of the parsed sentences. Ambiguity is defined as the n-th root of the number of possible parse trees for the sentence, where n is the number of tokens in the sentence.parse_time
: Afloat
with the wall clock time, in seconds, spent on tokenizing and parsing the sentences, including finding the best parse trees.reduce_time
: Afloat
with the wall clock time, in seconds, spent on finding the best parse tree in each parse forest. This time is included in theparse_time
.
Example (try it!):
from reynir import Greynir g = Greynir() my_text = "Litla gula hænan fann fræ. Það var hveitifræ." d = g.parse(my_text) print("{0} sentences were parsed".format(d["num_parsed"])) for sent in d["sentences"]: print("The parse tree for '{0}' is:\n{1}" .format( sent.tidy_text, "[Null]" if sent.tree is None else sent.tree.flat ) )
-
dumps_single
(self, sent: _Sentence, **kwargs) → str¶ - Parameters
- Returns
A JSON string.
Dumps a
_Sentence
object to a JSON string. UseGreynir.loads_single()
to re-create a_Sentence
instance from a JSON string.
-
loads_single
(self, json_str: str, **kwargs) → _Sentence¶ - Parameters
json_str (str) – The JSON string to load back into a
_Sentence
object.kwargs – Optional keyword parameters to be passed to the standard library’s
json.loads()
function.
- Returns
A
_Sentence
object constructed from the JSON string.
Constructs a
_Sentence
instance from a JSON string.
-
classmethod
cleanup
(cls)¶ Deallocates memory resources allocated by
__init__()
.If your code has finished using Greynir and you want to free up the memory allocated for its resources, including the 60 megabytes for the Database of Modern Icelandic Inflection (BÍN), call
Greynir.cleanup()
.After calling
Greynir.cleanup()
the functionality of Greynir is no longer available via existing instances ofGreynir
. However, you can initialize new instances (viag = Greynir()
), causing the configuration to be re-read and memory to be allocated again.
-
The _Job class¶
Instances of this class are returned from Greynir.submit()
.
You should not need to instantiate it yourself, hence the leading underscore
in the class name.
-
class
_Job
¶ -
paragraphs
(self) → Iterable[_Paragraph]¶ Returns a generator of
_Paragraph
objects, corresponding to paragraphs in the parsed text. Paragraphs are assumed to be delimited by[[
and]]
markers in the text, surrounded by whitespace. These markers are optional and not required. If they are not present, the text is assumed to be one contiguous paragraph.Example:
from reynir import Greynir g = Greynir() my_text = ("[[ Þetta er fyrsta efnisgreinin. Hún er stutt. ]] " "[[ Hér er önnur efnisgreinin. Hún er líka stutt. ]]") j = g.submit(my_text) for pg in j.paragraphs(): for sent in pg: print(sent.tidy_text) print()
Output:
Þetta er fyrsta efnisgreinin. Hún er stutt. Hér er önnur efnisgreinin. Hún er líka stutt.
-
sentences
(self) → Iterable[_Sentence]¶ Returns a generator of
_Sentence
objects. Each object corresponds to a sentence in the parsed text. If the sentence has already been successfully parsed, its_Sentence.tree
property will contain its (best) parse tree. Otherwise, the property isNone
.
-
__iter__
(self) → Iterable[_Sentence]¶ A shorthand for calling
_Job.sentences()
, supporting the Python iterator protocol. You can iterate through the sentences of a parse job via afor
loop:for sent in job: sent.parse() # Do something with sent
-
num_sentences
¶ Returns an
int
with the accumulated number of sentences that have been submitted for parsing via this job.
-
num_parsed
¶ Returns an
int
with the accumulated number of sentences that have been sucessfully parsed via this job.
-
num_tokens
¶ Returns an
int
with the accumulated number of tokens in sentences that have been submitted for parsing via this job.
-
num_combinations
¶ Returns an
int
with the accumulated number of parse tree combinations for the sentences that have been successfully parsed via this job.
-
ambiguity
¶ Returns a
float
with the weighted average ambiguity factor of the sentences that have been successfully parsed via this job. The ambiguity factor of a sentence is defined as the n-th root of the total number of parse tree combination for the sentence, where n is the number of tokens in the sentence. The average across sentences is weighted by token count.
-
parse_time
¶ Returns a
float
with the accumulated wall clock time, in seconds, that has been spent parsing sentences via this job.
-
The _Paragraph class¶
Instances of this class are returned from _Job.paragraphs()
.
You should not need to instantiate it yourself,
hence the leading underscore in the class name.
-
class
_Paragraph
¶ -
sentences
(self) → Iterable[_Sentence]¶ Returns a generator of
_Sentence
objects. Each object corresponds to a sentence within the paragraph in the parsed text. If the sentence has already been successfully parsed, its_Sentence.tree
property will contain its (best) parse tree. Otherwise, the property isNone
.
-
__iter__
(self) → Iterable[_Sentence]¶ A shorthand for calling
_Paragraph.sentences()
, supporting the Python iterator protocol. You can iterate through the sentences of a paragraph via afor
loop:for pg in job.paragraphs(): for sent in pg: sent.parse() # Do something with sent
-
The _Sentence class¶
Instances of this class are returned from _Job.sentences()
and
_Job.__iter__()
. You should not need to instantiate it yourself,
hence the leading underscore in the class name.
-
class
_Sentence
¶ -
__len__
(self) → int¶ Returns an
int
with the number of tokens in the sentence.
-
text
¶ Returns a
str
with the raw text representation of the sentence, with spaces between all tokens. For a more correctly formatted version of the text, use the_Sentence.tidy_text
property instead.Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Jón - faðir Ásgeirs - átti 2/3 hluta " "af landinu árin 1944-1950.") print(s.text)
Output (note the intervening spaces, also before the period at the end):
Jón - faðir Ásgeirs - átti 2/3 hluta af landinu árin 1944 - 1950 .
-
__str__
(self) → str¶ Returns a
str
with the raw text representation of the sentence, with spaces between all tokens. For a more correctly formatted version of the text, use the_Sentence.tidy_text
property instead.
-
tidy_text
¶ Returns a
str
with a text representation of the sentence, with correct spacing between tokens, and em- and en-dashes substituted for regular hyphens as appropriate.Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Jón - faðir Ásgeirs - átti 2/3 hluta " "af landinu árin 1944-1950.") print(s.tidy_text)
Output (note the dashes and the period at the end):
Jón — faðir Ásgeirs — átti 2/3 hluta af landinu árin 1944–1950.
-
tokens
¶ Returns a
list
of tokens in the sentence. Each token is represented by aTok
namedtuple
instance from theTokenizer
package.Example:
from reynir import Greynir, TOK g = Greynir() s = g.parse_single("5. janúar sá Ása 5 sólir.") for t in s.tokens: print(TOK.descr[t.kind], t.txt)
outputs:
DATE 5. janúar WORD sá PERSON Ása NUMBER 5 WORD sólir PUNCTUATION .
-
parse
(self) → bool¶ Parses the sentence (unless it has already been parsed) and returns
True
if at least one parse tree was found, orFalse
otherwise. For successfully parsed sentences,_Sentence.tree
contains the best parse tree. Otherwise,_Sentence.tree
isNone
. If the parse is not successful, the 0-based index of the token where the parser gave up is stored in_Sentence.err_index
.
-
error
¶ Returns a
ParseError
instance if an error was found during the parsing of the sentence, orNone
otherwise.ParseError
is an exception class, derived fromException
. It can be converted tostr
to obtain a human-readable error message.
-
err_index
¶ Returns an
int
with the 0-based index of the token where the parser could not find any grammar production to continue the parse, orNone
if the sentence has not been parsed yet or if no error occurred during the parse.
-
combinations
¶ Returns an
int
with the number of possible parse trees for the sentence, or0
if no parse trees were found, orNone
if the sentence hasn’t been parsed yet.
-
score
¶ Returns an
int
representing the score that the best parse tree got from the scoring heuristics of Greynir. The score is0
if the sentence has not been successfully parsed.
-
tree
¶ Returns a
SimpleTree
object representing the best (highest-scoring) parse tree for the sentence, in a simplified form that is easy to work with.If the sentence has not yet been parsed, or no parse tree was found for it, this property is
None
.
-
deep_tree
¶ Returns the best (highest-scoring) parse tree for the sentence, in a detailed form corresponding directly to Greynir’s context-free grammar for Icelandic.
If the sentence has not yet been parsed, or no parse tree was found for it, this property is
None
.Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Ása sá sól.") print(repr(s.deep_tree))
Output:
S0 Málsgrein MgrInnihald Yfirsetning HreinYfirsetning Setning Setning_et_p3_kvk BeygingarliðurÁnUmröðunar_et_p3_kvk NlFrumlag_nf_et_p3_kvk Nl_et_p3_nf_kvk NlEind_et_p3_nf_kvk NlStak_et_p3_nf_kvk NlStak_p3_et_nf_kvk NlKjarni_et_nf_kvk Fyrirbæri_nf_kvk 'Ása' -> no_et_nf_kvk BeygingarliðurMegin_et_p3_kvk SagnRuna_et_p3_kvk SagnRunaKnöpp_et_p3_kvk Sagnliður_et_p3_kvk Sögn_1_et_p3_kvk 'sá' -> so_1_þf_et_p3 NlBeintAndlag_þf Nl_þf NlEind_et_p3_þf_kvk NlStak_et_p3_þf_kvk NlStak_p3_et_þf_kvk NlKjarni_et_þf_kvk Fyrirbæri_þf_kvk 'sól' -> no_et_þf_kvk Lokatákn? Lokatákn '.' -> "."
-
flat_tree
¶ Returns the best (highest-scoring) parse tree for the sentence, simplified and flattened to a text string. Nonterminal scopes are delimited like so:
NAME ... /NAME
whereNAME
is the name of the nonterminal, for exampleNP
for noun phrases andVP
for verb phrases. Terminals have lower-case identifiers with their various grammar variants separated by underscores, e.g.no_þf_kk_et
for a noun, accusative case, masculine gender, singular.If the sentence has not yet been parsed, or no parse tree was found for it, this property is
None
.Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Seldum fasteignum hefur fjölgað.") print(s.flat_tree)
Output:
S0 S-MAIN IP NP-SUBJ lo_þgf_ft_kvk no_ft_þgf_kvk /NP-SUBJ VP VP-AUX so_et_p3 /VP-AUX VP so_sagnb /VP /VP /IP /S-MAIN p /S0
-
terminals
¶ Returns a
list
of the terminals in the best parse tree for the sentence, in the order in which they occur in the sentence (token order). Each terminal corresponds to a token in the sentence. The entry for each terminal is atyping.NamedTuple
calledTerminal
, having five fields:text: The token text.
lemma: The lemma of the word, if the token is a word, otherwise it is the text of the token. Lemmas of composite words include hyphens
-
at the component boundaries. Examples:borgar-stjórnarmál
,skugga-kosning
.category: The word category (
no
for noun,so
for verb, etc.)variants: A list of the grammatical variants for the word or token, or an empty list if not applicable. The variants include the case (
nf
,þf
,þgf
,ef
), gender (kvk
,kk
,hk
), person, verb form, adjective degree, etc. This list identical to the one returned fromSimpleTree.all_variants
for the terminal in question.index: The index of the token that corresponds to this terminal. The index is 0-based.
If the sentence has not yet been parsed, or no parse tree was found for it, this property is
None
.Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Þórgnýr fór út og fékk sér ís.") for t in s.terminals: print("{0:8s} {1:8s} {2:8s} {3}" .format(t.text, t.lemma, t.category, ", ".join(t.variants)))
Output:
Þórgnýr Þórgnýr person nf, kk fór fara so 0, et, fh, gm, p3, þt út út ao og og st fékk fá so 2, þgf, þf, et, fh, gm, p3, þt sér sig abfn þgf ís ís no et, kk, þf . .
(The line for fékk means that this is the verb (
so
) fá, having two arguments (2
) in dative case (þgf
) and accusative case (þf
); it is singular (et
), indicative (fh
), active voice (gm
), in the third person (p3
), and in past tense (þt
). See Variants for a detailed explanation.)
-
lemmas
¶ Returns a
list
of the lemmas of the words in the sentence, or the text of the token for non-word tokens.sent.lemmas
is a shorthand for[ t.lemma for t in sent.terminals ]
.Lemmas of composite words include hyphens
-
at the component boundaries. Examples:borgar-stjórnarmál
,skugga-kosning
.If the sentence has not yet been parsed, or no parse tree was found for it, this property is
None
.Example:
from reynir import Greynir g = Greynir() s = g.parse_single( "Gullsópur ehf. keypti árið 1984 verðlaunafasteignina " "að Laugavegi 26." ) print(s.lemmas)
Output:
['gullsópur', 'ehf.', 'kaupa', 'árið 1984', 'verðlauna-fasteign', 'að', 'Laugavegur', '26', '.']
-
categories
¶ Returns a
list
of the categories of the words in the sentence, or""
for non-word tokens.sent.categories
is a shorthand for[ d.cat for d in sent.terminal_nodes ]
.The categories returned are those of the token associated with each terminal, according to BÍN’s category scheme. Nouns (including person names) thus have categories of
kk
,kvk
orhk
, for masculine, feminine and neutral gender, respectively. Unrecognized words have theentity
category.If the sentence has not yet been parsed, or no parse tree was found for it, this property is
None
.Example:
from reynir import Greynir g = Greynir() s = g.parse_single( "Gullsópur ehf. keypti árið 1984 verðlaunafasteignina " "að Laugavegi 26." ) print(s.categories)
Output:
['kk', 'hk', 'so', '', 'kvk', 'fs', 'kk', '', '']
-
lemmas_and_cats
¶ Returns a
list
of (lemma, category) tuples corresponding to the tokens in the sentence.sent.lemmas_and_cats
is a shorthand for[ (d.lemma, d.lemma_cat) for d in sent.terminal_nodes ]
.For non-word tokens, the lemma is the original token text and the category is an empty string (
""
).For person names, the category is
person_kk
,person_kvk
orperson_hk
for masculine, feminine or neutral gender names, respectively. For unknown words, the category isentity
.Lemmas of composite words include hyphens
-
at the component boundaries. Examples:borgar-stjórnarmál
,skugga-kosning
.This property is intended to be useful inter alia for topic indexing of text. A good strategy for that purpose could be to index all lemmas having a non-empty category, perhaps also discarding some less significant categories (such as conjunctions).
If the sentence has not yet been parsed, or no parse tree was found for it, this property is
None
.Example:
from reynir import Greynir g = Greynir() s = g.parse_single( "Hallbjörn borðaði ísinn kl. 14 meðan Icelandair át 3 teppi " "frá Íran og Xochitl var tilbeðin." ) print(s.lemmas_and_cats)
Output:
[('Hallbjörn', 'person_kk'), ('borða', 'so'), ('ís', 'kk'), ('kl. 14', ''), ('meðan', 'st'), ('Icelandair', 'entity'), ('éta', 'so'), ('3', ''), ('teppi', 'hk'), ('frá', 'fs'), ('Íran', 'hk'), ('og', 'st'), ('Xochitl', 'entity'), ('vera', 'so'), ('tilbiðja', 'so'), ('.', '')]
-
terminal_nodes
¶ Returns a
list
of the subtrees (SimpleTree
instances) that correspond to terminals in the parse tree for this sentence, in the order in which they occur (token order).If the sentence has not yet been parsed, or no parse tree was found for it, this property is
None
.
-
is_foreign
(self, min_icelandic_ratio: float = 0.5) → bool¶ - Parameters
min_icelandic_ratio (float) – The minimum ratio of word tokens that must be found in BÍN for a sentence to be considered Icelandic. Defaults to
0.5
.
Returns
True
if the sentence is probably in a foreign language, i.e. not Icelandic. A sentence is probably foreign if it contains at least three word tokens and, out of those, less than 50% are found in the BÍN database. The 50% threshold is adjustable by overriding themin_icelandic_ratio
parameter.
-
The NounPhrase class¶
The NounPhrase
class conveniently encapsulates an Icelandic
noun phrase (nafnliður), making it easy to obtain correctly inflected
forms of the phrase, as required in various contexts.
-
class
NounPhrase
¶ -
__init__
(self, np_string: str, *, force_number: str = None)¶ Creates a
NounPhrase
instance.- Parameters
np_string (str) –
The text string containing the noun phrase (nafnliður). The noun phrase must conform to the grammar specified for the
Nl
nonterminal inGreynir.grammar
. This grammar allows e.g. number, adjective and adverb prefixes, referential phrases (…sem…) and prepositional phrases (…í…). Examples of valid noun phrases include:stóri kraftalegi maðurinn sem ég sá í bænum,
ofboðslega bragðgóði lakkrísinn í nýju umbúðunum, and
rúmlega 20 millilítrar af kardemommudropum með vanillu.
If the noun phrase cannot be parsed or is empty, the
NounPhrase.parsed
property will beFalse
and all inflection properties will returnNone
.force_number (str) – An optional string that can contain
"et"
or"singular"
, or"ft"
or"plural"
. If given, it forces the parsing of the noun phrase to be constrained to singular or plural forms, respectively. As an example,NounPhrase("eyjar", force_number="ft")
yields a plural result (nominative"eyjar"
), whileNounPhrase("eyjar")
without forcing yields a singular result (nominative"ey"
).
-
__str__
(self) → str¶ Returns the original noun phrase string as passed to the constructor.
-
__len__
(self) → int¶ Returns the length of the original noun phrase string.
-
__format__
(self, spec: str) → str¶ Formats a noun phrase in the requested inflection form. Works with Python’s
format()
function as well as in f-strings (available starting with Python 3.6).- Parameters
spec (str) –
An inflection specification for the string to be returned. This can be one of the following:
nf
ornom
: Nominative case (nefnifall).þf
oracc
: Accusative case (þolfall).þgf
ordat
: Dative case (þágufall).ef
orgen
: Genitive case (eignarfall).ángr
orind
: Indefinite, nominative form (nefnifall án greinis).stofn
orcan
: Canonical, nominative singular form without attached prepositions or referential phrases (nefnifall eintölu án greinis, án forsetningarliða og tilvísunarsetninga).
- Returns
The noun phrase in the requested inflection form, as a string.
Example:
from reynir import NounPhrase as Nl nl = Nl("blesóttu hestarnir mínir") print("Hér eru {nl:nf}.".format(nl=nl)) print("Mér þykir vænt um {nl:þf}.".format(nl=nl)) print("Ég segi öllum frá {nl:þgf}.".format(nl=nl)) print("Ég vil tryggja velferð {nl:ef}.".format(nl=nl)) print("Já, {nl:ángr}, þannig er það.".format(nl=nl)) print("Umræðuefnið hér er {nl:stofn}.".format(nl=nl)) # Starting with Python 3.6, f-strings are supported: print(f"Hér eru {nl:nf}.") # etc.
Output:
Hér eru blesóttu hestarnir mínir. Mér þykir vænt um blesóttu hestana mína. Ég segi öllum frá blesóttu hestunum mínum. Ég vil tryggja velferð blesóttu hestanna minna. Já, blesóttir hestar mínir, þannig er það. Umræðuefnið hér er blesóttur hestur minn.
-
parsed
¶ Returns
True
if the noun phrase was successfully parsed, orFalse
if not.
-
tree
¶ Returns a
SimpleTree
object encapsulating the parse tree for the noun phrase.
-
case
¶ Returns a string denoting the case of the noun phrase, as originally passed to the constructor. The case is one of
"nf"
,"þf"
,"þgf"
or"ef"
, denoting nominative, accusative, dative or genitive case, respectively. If the noun phrase could not be parsed, the property returnsNone
.
-
number
¶ Returns a string denoting the number (singular/plural) of the noun phrase, as originally passed to the constructor. The number is either
"et"
(singular, eintala) or"ft"
(plural, fleirtala). If the noun phrase could not be parsed, the property returnsNone
.
-
person
¶ Returns a string denoting the person (1st, 2nd, 3rd) of the noun phrase, as originally passed to the constructor. The returned string is one of
"p1"
,"p2"
or"p3"
for first, second or third person, respectively. If the noun phrase could not be parsed, the property returnsNone
.
-
gender
¶ Returns a string denoting the gender (masculine, feminine, neutral) of the noun phrase, as originally passed to the constructor. The returned string is one of
"kk"
,"kvk"
or"hk"
for masculine (karlkyn), feminine (kvenkyn) or neutral (hvorugkyn), respectively. If the noun phrase could not be parsed, the property returnsNone
.
-
nominative
¶ Returns a string with the noun phrase in nominative case (nefnifall), or
None
if the noun phrase could not be parsed.
-
accusative
¶ Returns a string with the noun phrase in accusative case (þolfall), or
None
if the noun phrase could not be parsed.
-
dative
¶ Returns a string with the noun phrase in dative case (þágufall), or
None
if the noun phrase could not be parsed.
-
genitive
¶ Returns a string with the noun phrase in genitive case (eignarfall), or
None
if the noun phrase could not be parsed.
-
indefinite
¶ Returns a string with the noun phrase in indefinite form, nominative case (nefnifall án greinis), or
None
if the noun phrase could not be parsed.
-
canonical
¶ Returns a string with the noun phrase in singular, indefinite form, nominative case, where referential phrases (…sem…) and prepositional phrases (…í…) have been removed. If the noun phrase could not be parsed,
None
is returned.
-