The SimpleTree class¶
Instances of this class are returned from _Sentence.tree
.
They describe a simplified parse tree or a part (subtree) thereof.
The tree can be queried in various ways for information about its
head (top) node, as well as about its children and contained subtrees.
-
class
SimpleTree
¶ -
is_terminal
¶ Returns
True
if this subtree corresponds to a grammar terminal (in which case it has no child subtrees), orFalse
if it corresponds to a grammar nonterminal (in which case it has child subtrees).
-
tag
¶ Returns a
str
with the name of the nonterminal corresponding to the root of this tree or subtree. The tag may have subcategories, separated by a hyphen, e.g.NP-OBJ
(Noun Phrase - OBJect).
-
terminal
¶ Returns a
str
with the terminal corresponding to this subtree. The terminal contains a category followed by eventual variants, separated by underscores, e.g.no_ef_ft_hvk
for a noun, genitive case, plural, neutral gender.
-
variants
¶ Returns a
list
of the grammatical variants specified in the terminal corresponding to this subtree.For example, if the terminal is
no_ft_ef_hk
this property is[ 'ft', 'ef', 'hk' ]
for plural, genitive case, neutral gender.This property only returns the variants that occur in the terminal name in the context-free grammar, and are thus significant in the parse. To obtain all applicable variants (features) of the associated word form, augmented with data from the Database of Modern Icelandic Inflection (BÍN), use the
SimpleTree.all_variants
property.
-
all_variants
¶ Returns a
list
of all grammatical variants (features) associated with this word form, as inferred from its associated grammar terminal and as augmented from the Database of Modern Icelandic Inflection (BÍN).Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Ása sá sól.") print(s.tree.S.IP.VP.VP[0].all_variants)
Output (the variants of the verb sá in the verb phrase):
['1', 'þf', 'et', 'fh', 'gm', 'p3', 'þt']
These are all the variants (features) of the verb form sá, in this case specifying that it has one argument in accusative case (
þf
), and that the verb itself is singular (et
), indicative (fh
), active voice (gm
), third person (p3
), and past tense (þt
).The variants
fh
,gm
andþt
are only returned from theSimpleTree.all_variants
property, not from theSimpleTree.variants
property, as they are not present in the terminal name in the grammar and are not significant when deriving the parse tree.
-
tcat
¶ Returns a
str
with the terminal category corresponding to this subtree, e.g.no
for nouns orfs
for prepositions. If this subtree does not correspond to a terminal, an empty string is returned.
-
cat
¶ Returns a
str
with the BÍN word category (orðflokkur,ordfl
) corresponding to this subtree, e.g.kk
,kvk
orhk
for masculine, feminine or neutral nouns, respectively,so
for verbs, etc. If this subtree does not correspond to a terminal, an empty string is returned.
-
fl
¶ Returns a
str
with the BÍN word subcategory (flokkur,fl
) corresponding to this subtree, e.g.alm
for general words,örn
for place names, etc. If this subtree does not correspond to a terminal, an empty string is returned.
-
match_tag
(self, item: str) → bool¶ Checks whether the nonterminal corresponding to this subtree matches the given nonterminal identifier.
- Parameters
item (str) – The nonterminal tag to match. The match can be partial, i.e. the item
NP
matches the nonterminalsNP-OBJ
andNP-SUBJ
as well as plainNP
.- Returns
True
if the nonterminal matches, orFalse
if not.
-
enclosing_tag
(self, item: str) → SimpleTree¶ Returns the innermost enclosing subtree that matches the given nonterminal tag, or
None
if no such tree is found.- Parameters
item (str) – The nonterminal identifier to match. The match can be partial, i.e. the item
NP
matches the nonterminalsNP-OBJ
andNP-SUBJ
as well as plainNP
.- Returns
A
SimpleTree
instance that corresponds to a nonterminal having a matching tag, orNone
.
-
children
¶ Returns a generator for the (immediate) child subtrees of this tree. The generator yields a
SimpleTree
instance for every child.
-
descendants
¶ Returns a generator of all descendants of this tree. The generator yields a
SimpleTree
instance for every child, recursively, using left-first traversal.
-
span
¶ Returns a
(start, end)
tuple of token indices, pointing to the first and the last token spanned by this subtree.
-
index
¶ Returns the index of the associated token, if this subtree corresponds to a terminal, or
None
otherwise.
-
parent
¶ Returns the parent of this subtree, or
None
if this is the root.
-
root
¶ Returns the root of this subtree, or the subtree itself if it is the root.
-
leaves
¶ Returns a generator of all leaf (terminal) descendants of this tree. The generator yields a
SimpleTree
instance for every leaf, in left-first traversal order. The output of theleaves
generator can for instance be zipped with thetokens
list of the parsed sentence, to obtain token-terminal matches, as both have the same number of elements:zip(sentence.tokens, sentence.tree.leaves)
-
view
¶ Returns a
str
representation of this subtree, in an easily viewable indented format with nodes separated by newlines.Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Ása sá sól.") print(s.tree.view)
outputs:
S0 +-S-MAIN +-IP +-NP-SUBJ +-no_et_nf_kvk: 'Ása' +-VP +-VP +-so_1_þf_et_p3: 'sá' +-NP-OBJ +-no_et_þf_kvk: 'sól' +-'.'
-
flat
¶ Returns this subtree, simplified and flattened to a text string. Nonterminal scopes are delimited like so:
NAME ... /NAME
whereNAME
is the name of the nonterminal, for exampleNP
for noun phrases andVP
for verb phrases. Terminals have lower-case identifiers with their various grammar variants separated by underscores, e.g.no_þf_kk_et
for a noun, accusative case, masculine gender, singular.Note that the terminal identifiers returned by this property correspond directly with the Greynir grammar. If you want all grammatical variants (features) of the word that matched a terminal (for instance whether a verb was in the present (
_nt
) or past (_þt
) tense), use theSimpleTree.flat_with_all_variants
property instead.Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Jón greiddi bænum 10 milljónir króna.") print(s.tree.flat)
Output (line breaks inserted):
S0 S-MAIN IP NP-SUBJ person_nf_kk /NP-SUBJ VP VP so_2_þgf_þf_et_p3 /VP NP-IOBJ no_et_þgf_kk /NP-IOBJ NP-OBJ tala no_ft_kvk_þf no_ef_ft_kvk /NP-OBJ /VP /IP /S-MAIN p /S0
-
flat_with_all_variants
¶ Returns this subtree, simplified and flattened to a text string. Nonterminal scopes are delimited like so:
NAME ... /NAME
whereNAME
is the name of the nonterminal, for exampleNP
for noun phrases andVP
for verb phrases. Terminals have lower-case identifiers with their various grammar variants separated by underscores, e.g.no_þf_kk_et_gr
for a noun, accusative case, masculine gender, singular, with attached definite article (greinir).For each terminal in the tree, this property returns all grammatical variants (features) of the word that matched the terminal. If you only need the variants that were actually specified in the Greynir grammar, use the
SimpleTree.flat
property instead.Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Jón greiddi bænum 10 milljónir króna.") print(s.tree.flat_with_all_variants)
Output (line breaks inserted):
S0 S-MAIN IP NP-SUBJ person_kk_nf /NP-SUBJ VP VP so_2_þgf_þf_et_fh_gm_p3_þt /VP NP-IOBJ no_et_gr_kk_þgf /NP-IOBJ NP-OBJ tala no_ft_kvk_þf no_ef_ft_kvk /NP-OBJ /VP /IP /S-MAIN p /S0
-
__getitem__
(self, item) → SimpleTree¶ Returns the specified child subtree of this tree.
- Parameters
item (str/int) – This can be either a nonterminal identifier (e.g.
"S-MAIN"
), in which case the first child having that nonterminal as its root is returned, or anint
, in which case the child having the specified 0-based index is returned. A nonterminal match can be partial, i.e. the itemNP
matches the rootsNP-OBJ
andNP-SUBJ
as well as plainNP
.- Returns
A
SimpleTree
instance for the indicated child subtree. If no such subtree is found, the exceptionKeyError
(in the case of a nonterminal identifier) orIndexError
(in the case of an integer index) are raised.
Example:
from reynir import Greynir g = Greynir() my_text = "Prakkarinn Ása í Garðastræti sá tvær gular sólir." s = g.parse_single(my_text) print(s.tree[0]["IP"][1].lemmas)
outputs (the lemmas of the verb phrase):
['sjá', 'tveir', 'gulur', 'sól']
-
__getattr__
(self, name) → SimpleTree¶ Returns the specified child subtree of this tree.
- Parameters
name (str) – A nonterminal identifier (e.g.
"NP"
). The first child having that nonterminal as its root is returned. A nonterminal match can be partial, i.e. the itemNP
matches the rootsNP-OBJ
andNP-SUBJ
as well as plainNP
. An underscore in the identifier name matches a hyphen in the root nonterminal name.- Returns
A
SimpleTree
instance for the indicated child subtree. If no such subtree is found, the exceptionKeyError
is raised.
Example:
from reynir import Greynir g = Greynir() my_text = "Prakkarinn Ása í Garðastræti sá sól." s = g.parse_single(my_text) print(s.tree.S_MAIN.IP.NP_SUBJ.lemmas)
outputs (the lemmas of the sentence’s subject, frumlag):
['prakkari', 'Ása', 'í', 'Garðastræti']
-
text
¶ Returns a
str
with the raw text corresponding to this subtree, including its children, with spaces between tokens.
-
tidy_text
¶ Returns a
str
with the text corresponding to this subtree, including its children, spaced correctly by thetokenizer.correct_spaces()
function.
-
own_text
¶ Returns a
str
with the raw text corresponding to the root of this subtree only, i.e. not including its children. For nonterminals, this is always an empty string. For terminals, it is the text of the corresponding token.
Returns a
list
of Icelandic Frequency Dictionary (IFD) part-of-speech tag strings corresponding to raw tokens contained within this subtree.The IFD tagset is documented here.
The list returned from
tree.ifd_tags
has the same length astree.text.split()
. Note that it may be longer than the list of terminals withintree
, and also longer than the list of tokens withintree
, for instance in the case of person names, dates and amounts.Example:
from reynir import Greynir g = Greynir() t = "María Jónsdóttir skuldaði leigusalanum 10.000 krónur." s = g.parse_single(t) for token, ifd_tag in zip(s.text.split(), s.ifd_tags): print("{0:13}: {1:6}".format(token, ifd_tag))
outputs:
María : nven-m Jónsdóttir : nven-m skuldaði : sfg3eþ leigusalanum : nkeþg 10.000 : tfvfo krónur : nvfo . : .
-
lemmas
¶ Returns a
list
of the word lemmas corresponding to terminals contained within this subtree. For terminals that correspond to non-word tokens, the original token text is included in the list.Lemmas of compound words include hyphens
-
at the component boundaries. Examples:borgar-stjórnarmál
,skugga-kosning
.
-
lemma
¶ Returns a
str
containing a concatenation of the word lemmas corresponding to terminals contained within this subtree. For terminals that correspond to non-word tokens, the original token text is included in the string. The lemmas are separated by spaces.Lemmas of compound words include hyphens
-
at the component boundaries. Examples:borgar-stjórnarmál
,skugga-kosning
.
-
own_lemma
¶ Returns a
str
containing the word lemma corresponding to the root of this subtree only. For nonterminal roots, this returns an empty string.Lemmas of compound words include hyphens
-
at the component boundaries. Examples:borgar-stjórnarmál
,skugga-kosning
.
-
nominative
¶ Returns a
str
containing the nominative form, if it exists, of the word corresponding to the root of this subtree only. If no nominative form exists, the word or token text is returned unchanged. For nonterminal roots, an empty string is returned.Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Góðglaða karlana langar í hest.") print(" ".join(n.nominative for n in s.tree.descendants if n.is_terminal))
outputs:
Góðglaðir karlarnir langar í hestur .
-
accusative
¶ Returns a
str
containing the accusative form, if it exists, of the word corresponding to the root of this subtree only. If no accusative form exists, the word or token text is returned unchanged. For nonterminal roots, an empty string is returned.Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Góðglaða karlana langar í hest.") print(" ".join(n.accusative for n in s.tree.descendants if n.is_terminal))
outputs:
Góðglaða karlana langar í hest .
-
dative
¶ Returns a
str
containing the dative form, if it exists, of the word corresponding to the root of this subtree only. If no dative form exists, the word or token text is returned unchanged. For nonterminal roots, an empty string is returned.Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Góðglaða karlana langar í hest.") print(" ".join(n.dative for n in s.tree.descendants if n.is_terminal))
outputs:
Góðglöðum körlunum langar í hesti .
-
genitive
¶ Returns a
str
containing the genitive form, if it exists, of the word corresponding to the root of this subtree only. If no genitive form exists, the word or token text is returned unchanged. For nonterminal roots, an empty string is returned.Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Góðglaða karlana langar í hest.") print(" ".join(n.genitive for n in s.tree.descendants if n.is_terminal))
outputs:
Góðglaðra karlanna langar í hests .
-
indefinite
¶ Returns a
str
containing the indefinite nominative form, if it exists, of the word corresponding to the root of this subtree only. If no such form exists, the word or token text is returned unchanged. For nonterminal roots, an empty string is returned.Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Góðglaða karlana langar í hest.") print(" ".join(n.indefinite for n in s.tree.descendants if n.is_terminal))
outputs:
Góðglaðir karlar langar í hestur .
-
canonical
¶ Returns a
str
containing the singular indefinite nominative form, if it exists, of the word corresponding to the root of this subtree only. If no such form exists, the word or token text is returned unchanged. For nonterminal roots, an empty string is returned.Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Góðglaða karlana langar í hest.") print(" ".join(n.canonical for n in s.tree.descendants if n.is_terminal))
outputs:
Góðglaður karl langar í hestur .
-
nominative_np
¶ Returns a
str
containing the text within the subtree, except that if the subtree root is a noun phrase (NP
) nonterminal, that phrase is converted to nominative form (nefnifall).Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Ótrúlega frábærum bílstjórum " "þriggja góðglöðu alþingismannanna " "sem fóru út þykir þetta leiðinlegt.") print(s.tree.S_MAIN.IP.NP_SUBJ.nominative_np) print(s.tree.S_MAIN.IP.NP_SUBJ.NP_POSS.nominative_np)
outputs:
Ótrúlega frábærir bílstjórar þriggja góðglöðu alþingismannanna sem fóru út þrír góðglöðu alþingismennirnir sem fóru út
-
accusative_np
¶ Returns a
str
containing the text within the subtree, except that if the subtree root is a noun phrase (NP
) nonterminal, that phrase is converted to accusative form (þolfall).Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Ótrúlega frábærum bílstjórum " "þriggja góðglöðu alþingismannanna " "sem fóru út þykir þetta leiðinlegt.") print(s.tree.S_MAIN.IP.NP_SUBJ.accusative_np) print(s.tree.S_MAIN.IP.NP_SUBJ.NP_POSS.accusative_np)
outputs:
Ótrúlega frábæra bílstjóra þriggja góðglöðu alþingismannanna sem fóru út þrjá góðglöðu alþingismennina sem fóru út
-
dative_np
¶ Returns a
str
containing the text within the subtree, except that if the subtree root is a noun phrase (NP
) nonterminal, that phrase is converted to dative form (þágufall).Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Ótrúlega frábærum bílstjórum " "þriggja góðglöðu alþingismannanna " "sem fóru út þykir þetta leiðinlegt.") print(s.tree.S_MAIN.IP.NP_SUBJ.dative_np) print(s.tree.S_MAIN.IP.NP_SUBJ.NP_POSS.dative_np)
outputs:
Ótrúlega frábærum bílstjórum þriggja góðglöðu alþingismannanna sem fóru út þremur góðglöðu alþingismönnunum sem fóru út
-
genitive_np
¶ Returns a
str
containing the text within the subtree, except that if the subtree root is a noun phrase (NP
) nonterminal, that phrase is converted to genitive form (eignarfall).Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Ótrúlega frábærum bílstjórum " "þriggja góðglöðu alþingismannanna " "sem fóru út þykir þetta leiðinlegt.") print(s.tree.S_MAIN.IP.NP_SUBJ.genitive_np) print(s.tree.S_MAIN.IP.NP_SUBJ.NP_POSS.genitive_np)
outputs:
Ótrúlega frábærra bílstjóra þriggja góðglöðu alþingismannanna sem fóru út þriggja góðglöðu alþingismannanna sem fóru út
-
indefinite_np
¶ Returns a
str
containing the text within the subtree, except that if the subtree root is a noun phrase (NP
) nonterminal, that phrase is converted to indefinite nominative form (nefnifall án greinis). The determiner (laus greinir) and any demonstrative pronouns (ábendingarfornöfn) are cut off the front of the noun phrases in the conversion, if present. Adjectives are converted from definite (veik beyging) to indefinite forms (sterk beyging).Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Hinum ótrúlega frábæru bílstjórum " "þriggja góðglöðu alþingismannanna " "sem fóru út þykir þetta leiðinlegt.") print(s.tree.S_MAIN.IP.NP_SUBJ.indefinite_np) print(s.tree.S_MAIN.IP.NP_SUBJ.NP_POSS.indefinite_np)
outputs:
ótrúlega frábærir bílstjórar þriggja góðglöðu alþingismannanna sem fóru út þrír góðglaðir alþingismenn sem fóru út
-
canonical_np
¶ Returns a
str
containing the text within the subtree, except that if the subtree root is a noun phrase (NP
) nonterminal, that phrase is converted to singular indefinite nominative form (nefnifall eintölu án greinis). The determiner (laus greinir) and any demonstrative pronouns (ábendingarfornöfn) are cut off the front of the noun phrases in the conversion, if present. Also, associated possessive phrases and referential sentences are removed (mennina sem ég þekkti vel -> maður). Adjectives are converted from definite (veik beyging) to indefinite forms (sterk beyging).Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Hinum ótrúlega frábæru bílstjórum " "þriggja góðglöðu alþingismannanna " "sem fóru út þykir þetta leiðinlegt.") print(s.tree.S_MAIN.IP.NP_SUBJ.canonical_np) print(s.tree.S_MAIN.IP.NP_SUBJ.NP_POSS.canonical_np)
outputs:
ótrúlega frábær bílstjóri góðglaður alþingismaður
-
nouns
¶ Returns a
list
of the lemmas of all nouns within this subtree, i.e. the root and all its descendants, recursively. The list is in left-traversal order.Lemmas of compound words include hyphens
-
at the component boundaries. Examples:borgar-stjórnarmál
,skugga-kosning
.
-
verbs
¶ Returns a
list
of the lemmas of all verbs within this subtree, i.e. the root and all its descendants, recursively. The list is in left-traversal order.Lemmas of compound words include hyphens
-
at the component boundaries. Examples:borgar-stjórnarmál
,skugga-kosning
.
-
persons
¶ Returns a
list
of the lemmas (in nominative case) of all person names within this subtree, i.e. the root and all its descendants, recursively. The list is in left-traversal order.Example:
from reynir import Greynir g = Greynir() my_text = "Eftir síðustu kosningar ræddi " \ "Bjarni Benediktsson við Katrínu Jakobsdóttur " \ "um myndun ríkisstjórnar." s = g.parse_single(my_text) print(s.tree.persons)
outputs:
['Bjarni Benediktsson', 'Katrín Jakobsdóttir']
-
entities
¶ Returns a
list
of the lemmas (in nominative case, as far as that can be established and is applicable) of all entity names within this subtree, i.e. the root and all its descendants, recursively. The list is in left-traversal order.
-
proper_names
¶ Returns a
list
of the lemmas (in nominative case, as far as that can be established and is applicable) of all proper names (sérnöfn) within this subtree, i.e. the root and all its descendants, recursively. The list is in left-traversal order.
-
match
(self, pattern: str) → bool¶ Checks whether this subtree matches the given pattern.
- Parameters
pattern (str) – The pattern to match against. For information about pattern specifications, see Patterns.
- Returns
True
if this subtree matches the pattern, orFalse
if not.
-
first_match
(self, pattern: str) → SimpleTree¶ Finds the first match of the given pattern within this subtree. The first match may be the subtree itself. If no match is found, returns
None
.- Parameters
pattern (str) – The pattern to match against. For information about pattern specifications, see Patterns.
- Returns
A
SimpleTree
instance that matches the given pattern, orNone
.
Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Kristín málaði hús Steingríms") # Show the first possessive noun phrase ('Steingríms') print(s.tree.first_match("NP-POSS").nominative_np)
outputs:
Steingrímur
-
all_matches
(self, pattern: str) → generator[SimpleTree]¶ Returns a generator of all matches of the given pattern within this subtree. The generator may yield the subtree itself, if it matches the pattern. Note that the search is recursive and exhaustive, so that matches within matching subtrees (for instance noun phrases within noun phrases) will also be returned.
- Parameters
pattern (str) – The pattern to match against. For information about pattern specifications, see Patterns.
- Returns
A generator of
SimpleTree
instances that match the given pattern.
Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Stóri feiti jólasveinninn beislaði " "fjögur sætustu hreindýrin og ók rauða vagninum " "með fjölda gjafa til spenntu barnanna sem biðu " "milli vonar og ótta.") print("\n".join(n.nominative_np for n in s.tree.all_matches("NP")))
outputs:
Stóri feiti jólasveinninn fjögur sætustu hreindýrin rauði vagninn með fjölda pakka til spenntu barnanna sem biðu milli vonar og ótta fjöldi gjafa til spenntu barnanna sem biðu milli vonar og ótta gjafir til spenntu barnanna sem biðu milli vonar og ótta spenntu börnin sem biðu milli vonar og ótta
Note that milli vonar og ótta is parsed as a fixed adverbial phrase. The nouns von and ótti are thus not included in the list of noun phrases.
Also note that rauði vagninn með fjölda gjafa til spenntu barnanna sem biðu milli vonar og ótta is a noun phrase containing two nested noun phrases.
SimpleTree.all_matches()
returns all three noun phrases, also the nested ones. If you only want the outermost (top) matching subtree for a pattern, useSimpleTree.top_matches()
instead.
-
top_matches
(self, pattern: str) → generator[SimpleTree]¶ Returns a generator of all topmost (enclosing) matches of the given pattern within this subtree. The generator may yield the subtree itself (only), if it matches the pattern. This search is different from
SimpleTree.all_matches()
in that it is recursive but not exhaustive, i.e. does not return matches within matches.- Parameters
pattern (str) – The pattern to match against. For information about pattern specifications, see Patterns.
- Returns
A generator of
SimpleTree
instances that match the given pattern.
Example:
from reynir import Greynir g = Greynir() s = g.parse_single("Stóri feiti jólasveinninn beislaði " "fjögur sætustu hreindýrin og ók rauða vagninum " "með fjölda gjafa til spenntu barnanna sem biðu " "milli vonar og ótta.") print("\n".join(n.nominative_np for n in s.tree.top_matches("NP")))
outputs:
Stóri feiti jólasveinninn fjögur sætustu hreindýrin rauði vagninn með fjölda gjafa til spenntu barnanna sem biðu milli vonar og ótta
Note that rauði vagninn með fjölda gjafa til spenntu barnanna sem biðu milli vonar og ótta is a single noun phrase containing two nested noun phrases. If you want all matching phrases for a pattern, including nested ones, use
SimpleTree.all_matches()
instead.
-