Terminals¶
This section lists the terminals that can occur within simplified
sentence trees, i.e. instances of the SimpleTree
class. The
terminal associated with a tree node is available in the
SimpleTree.terminal
property.
A terminal node always corresponds to a single token from the input text.
A typical terminal string looks like this (for instance matching the word hestur):
'no_kk_nf_et' # Noun, masculine, nominative case, singular
The terminal category, i.e. the first part of the terminal name (no
in the
example), is available
in the SimpleTree.tcat
property. The grammatical variants of the
terminal are stored in the list SimpleTree.variants
,
which is [ 'kk', 'nf', 'et' ]
in the example.
To obtain the entire set of variants (features) associated with a word form,
use the property SimpleTree.all_variants
.
The terminal categories and grammatical variants are listed below.
Word categories¶
no |
Noun (nafnorð) |
so |
Verb (sagnorð) |
lo |
Adjective (lýsingarorð) |
fs |
Preposition (forsetning) |
nhm |
Verb infinitive indicator (nafnháttarmerki, að) |
gr |
Definite article (laus greinir, hinn/hin/hið) |
uh |
Exclamation (upphrópun) |
ao |
Adverb (atviksorð) |
eo |
Qualifying adverb (atviksorð sem stendur með nafnorði í einkunn) |
st |
Conjunction (samtenging) |
stt |
Connective conjunction (sem/er-samtenging) |
fn |
Pronoun (fornafn) |
pfn |
Personal pronoun (persónufornafn) |
abfn |
Reflexive pronoun (afturbeygt fornafn) |
person |
Person name (mannsnafn) |
sérnafn |
Proper name (sérnafn) |
entity |
Proper name of recognized named entity |
fyrirtæki |
Company name (fyrirtækisnafn) |
gata |
Street name (götuheiti) |
to |
Number word, inflectable (beygjanlegt töluorð) Only núll, einn, tveir, þrír, fjórir |
töl |
Number word, uninflectable (óbeygjanlegt töluorð) |
Number categories¶
tala |
Number |
prósenta |
Percentage |
ártal |
Year |
raðnr |
Ordinal number |
sequence |
Sequence: 1, 2, 3…, a, b, c…, i, ii, iii… |
Date and time categories¶
dagsföst |
Absolute date (year, month, day) |
dagsafs |
Relative date (year, month, day - at least one value missing) |
tími |
Time (hour, minute, second) |
tímapunktur |
Time point (year, month, day, hour, minute, second) |
Other¶
lén |
greynir.is |
myllumerki |
#lífiðeryndislegt |
tölvupóstfang |
gervi@greynir.is |
Punctuation¶
grm |
Punctuation |
Variants¶
This section lists grammatical variants (features) that are
included as parts of terminal names, separated by underscores (_
).
Gender¶
kk |
Masculine (karlkyn) |
kvk |
Feminine (kvenkyn) |
hk |
Neutral (hvorugkyn) |
Number¶
et |
Singular (eintala) |
ft |
Plural (fleirtala) |
Case¶
The case variants may occur with nouns, pronouns, adjectives, prepositions
and verbs (lhþt
and subj
). In the case of prepositions, the
variant indicates which case the preposition controls.
nf |
Nominative (nefnifall) |
þf |
Accusative (þolfall) |
þgf |
Dative (þágufall) |
ef |
Genitive (eignarfall) |
Arguments¶
Verb terminals, other than lhþt
and subj
, indicate the number
and cases of the verb’s arguments as follows:
'so_0_et_p3_gm' # No argument, singular/3rd person/active voice
'so_1_þf_et_p3_gm' # Same, but with one argument in accusative case
'so_2_þgf_þf_et_p3_gm' # Two arguments, dative and accusative
An example of a verb that matches the last terminal would be skrifaði (wrote) in the sentence “Hann skrifaði konunni bréf” (“He wrote a letter to the woman”).
0 |
No argument |
1 |
One argument, whose case is in the following variant |
2 |
Two arguments, whose cases are in the following two variants |
Person¶
Occurs with verbs (so
terminal category) only.
p1 |
First person (Ég er / Við erum) |
p2 |
Second person (Þú ert / Þið eruð) |
p3 |
Third person (Það er / Þau eru) |
Degree¶
Occurs with adjectives (lo
terminal category), and in the
case of mst
with certain adverbs (ao
terminal category).
mst |
Comparative (stærri) |
esb |
Superlative, indefinite (maðurinn er stærstur) |
evb |
Superlative, definite (stærsti maðurinn) |
Adjective object case¶
Occurs with adjectives (lo
terminal category) only.
sþf |
Accusative (viðstaddur hátíðina) |
sþgf |
Dative (líkur Páli) |
sef |
Genitive (fullur orku) |
Verb forms¶
These variants occur with verbs (so
terminal category) only.
gm |
Active voice (germynd) |
mm |
Middle voice (miðmynd) |
nh |
Infinitive (nafnháttur) |
fh |
Indicative (framsöguháttur) |
bh |
Imperative (boðháttur) |
vh |
Subjunctive (viðtengingarháttur) |
nt |
Present tense (nútíð) |
þt |
Past tense (þátíð) |
lh |
Present participle (lýsingarháttur nútíðar)
(note that the
nt variant will also be present) |
lhþt |
Past participle (lýsingarþáttur þátíðar)
(note that the
þt variant will NOT be present) |
sagnb |
Supine (sagnbót) |
sb |
Indefinite (sterk beyging),
only occurs with |
vb |
Definite (veik beyging),
only occurs with |
op |
Impersonal verb (ópersónuleg sögn) |
subj |
Verb that requires the subject’s case to be non-nominative (sögn sem krefst frumlags í aukafalli) |
expl |
Expletive (leppur), matches verb forms that can be used with an expletive (það rignir) |
Noun qualifiers¶
These variants occur with noun terminals (no
category) only.
gr |
Definite, attached to noun (viðskeyttur greinir með nafnorði) |
abbrev |
Abbreviation (skammstöfun) |
Word or lemma endings¶
These variants can be used to constrain matching to word forms or lemmas with particular endings only. They are used to detect certain forms of grammatical errors.
xir |
Matches only words with lemmas that end with ir (e.g., læknir, kælir) |
zana |
Matches only word forms that end with ana (e.g., flokkana, bílana) |