.. _terminals: Terminals ========= This section lists the terminals that can occur within simplified sentence trees, i.e. instances of the :py:class:`SimpleTree` class. The terminal associated with a tree node is available in the :py:attr:`SimpleTree.terminal` property. A terminal node always corresponds to a single token from the input text. A typical terminal string looks like this (for instance matching the word *hestur*):: 'no_kk_nf_et' # Noun, masculine, nominative case, singular The terminal category, i.e. the first part of the terminal name (``no`` in the example), is available in the :py:attr:`SimpleTree.tcat` property. The grammatical variants of the terminal are stored in the list :py:attr:`SimpleTree.variants`, which is ``[ 'kk', 'nf', 'et' ]`` in the example. To obtain the entire set of variants (features) associated with a word form, use the property :py:attr:`SimpleTree.all_variants`. The terminal categories and grammatical variants are listed below. .. _categories: Word categories --------------- +------------+---------------------------------------------------+ | no | Noun (nafnorð) | +------------+---------------------------------------------------+ | so | Verb (sagnorð) | +------------+---------------------------------------------------+ | lo | Adjective (lýsingarorð) | +------------+---------------------------------------------------+ | fs | Preposition (forsetning) | +------------+---------------------------------------------------+ | nhm | Verb infinitive indicator (nafnháttarmerki, *að*) | +------------+---------------------------------------------------+ | gr | Definite article (laus greinir, *hinn/hin/hið*) | +------------+---------------------------------------------------+ | uh | Exclamation (upphrópun) | +------------+---------------------------------------------------+ | ao | Adverb (atviksorð) | +------------+---------------------------------------------------+ | eo | Qualifying adverb (atviksorð sem stendur með | | | nafnorði í einkunn) | +------------+---------------------------------------------------+ | st | Conjunction (samtenging) | +------------+---------------------------------------------------+ | stt | Connective conjunction (sem/er-samtenging) | +------------+---------------------------------------------------+ | fn | Pronoun (fornafn) | +------------+---------------------------------------------------+ | pfn | Personal pronoun (persónufornafn) | +------------+---------------------------------------------------+ | abfn | Reflexive pronoun (afturbeygt fornafn) | +------------+---------------------------------------------------+ | person | Person name (mannsnafn) | +------------+---------------------------------------------------+ | sérnafn | Proper name (sérnafn) | +------------+---------------------------------------------------+ | entity | Proper name of recognized named entity | +------------+---------------------------------------------------+ | fyrirtæki | Company name (fyrirtækisnafn) | +------------+---------------------------------------------------+ | gata | Street name (götuheiti) | +------------+---------------------------------------------------+ | to | Number word, inflectable (beygjanlegt töluorð) | | | Only *núll, einn, tveir, þrír, fjórir* | +------------+---------------------------------------------------+ | töl | Number word, uninflectable (óbeygjanlegt töluorð) | +------------+---------------------------------------------------+ Number categories ----------------- +------------+---------------------------------------------------+ | tala | Number | +------------+---------------------------------------------------+ | prósenta | Percentage | +------------+---------------------------------------------------+ | ártal | Year | +------------+---------------------------------------------------+ | raðnr | Ordinal number | +------------+---------------------------------------------------+ | sequence | Sequence: *1, 2, 3..., a, b, c..., i, ii, iii...* | +------------+---------------------------------------------------+ Date and time categories ------------------------ +------------+---------------------------------------------------+ | dagsföst | Absolute date (year, month, day) | +------------+---------------------------------------------------+ | dagsafs | Relative date | | | (year, month, day - at least one value missing) | +------------+---------------------------------------------------+ | tími | Time (hour, minute, second) | +------------+---------------------------------------------------+ | tímapunktur| Time point | | | (year, month, day, hour, minute, second) | +------------+---------------------------------------------------+ Other ----------- +---------------+------------------------------------------------+ | lén | *greynir.is* | +---------------+------------------------------------------------+ | myllumerki | *#lífiðeryndislegt* | +---------------+------------------------------------------------+ | tölvupóstfang | *gervi@greynir.is* | +---------------+------------------------------------------------+ Punctuation ----------- +------------+---------------------------------------------------+ | grm | Punctuation | +------------+---------------------------------------------------+ .. _variants: Variants ======== This section lists grammatical variants (features) that are included as parts of terminal names, separated by underscores (``_``). Gender ------ +------------+---------------------------------------------------+ | kk | Masculine (karlkyn) | +------------+---------------------------------------------------+ | kvk | Feminine (kvenkyn) | +------------+---------------------------------------------------+ | hk | Neutral (hvorugkyn) | +------------+---------------------------------------------------+ Number ------ +------------+---------------------------------------------------+ | et | Singular (eintala) | +------------+---------------------------------------------------+ | ft | Plural (fleirtala) | +------------+---------------------------------------------------+ Case ---- The *case* variants may occur with nouns, pronouns, adjectives, prepositions and verbs (``lhþt`` and ``subj``). In the case of prepositions, the variant indicates which case the preposition controls. +------------+---------------------------------------------------+ | nf | Nominative (nefnifall) | +------------+---------------------------------------------------+ | þf | Accusative (þolfall) | +------------+---------------------------------------------------+ | þgf | Dative (þágufall) | +------------+---------------------------------------------------+ | ef | Genitive (eignarfall) | +------------+---------------------------------------------------+ Arguments --------- Verb terminals, other than ``lhþt`` and ``subj``, indicate the number and cases of the verb's arguments as follows:: 'so_0_et_p3_gm' # No argument, singular/3rd person/active voice 'so_1_þf_et_p3_gm' # Same, but with one argument in accusative case 'so_2_þgf_þf_et_p3_gm' # Two arguments, dative and accusative An example of a verb that matches the last terminal would be *skrifaði* (wrote) in the sentence *"Hann skrifaði konunni bréf"* ("He wrote a letter to the woman"). +------------+---------------------------------------------------+ | 0 | No argument | +------------+---------------------------------------------------+ | 1 | One argument, whose case is in the following | | | variant | +------------+---------------------------------------------------+ | 2 | Two arguments, whose cases are in the following | | | two variants | +------------+---------------------------------------------------+ Person ------ Occurs with verbs (``so`` terminal category) only. +------------+---------------------------------------------------+ | p1 | First person *(Ég er / Við erum)* | +------------+---------------------------------------------------+ | p2 | Second person *(Þú ert / Þið eruð)* | +------------+---------------------------------------------------+ | p3 | Third person *(Það er / Þau eru)* | +------------+---------------------------------------------------+ Degree ------ Occurs with adjectives (``lo`` terminal category), and in the case of ``mst`` with certain adverbs (``ao`` terminal category). +------------+---------------------------------------------------+ | mst | Comparative *(stærri)* | +------------+---------------------------------------------------+ | esb | Superlative, indefinite *(maðurinn er stærstur)* | +------------+---------------------------------------------------+ | evb | Superlative, definite *(stærsti maðurinn)* | +------------+---------------------------------------------------+ Adjective object case --------------------- Occurs with adjectives (``lo`` terminal category) only. +------------+---------------------------------------------------+ | sþf | Accusative (viðstaddur *hátíðina*) | +------------+---------------------------------------------------+ | sþgf | Dative (líkur *Páli*) | +------------+---------------------------------------------------+ | sef | Genitive (fullur *orku*) | +------------+---------------------------------------------------+ Verb forms ---------- These variants occur with verbs (``so`` terminal category) only. +------------+---------------------------------------------------------+ | gm | Active voice (germynd) | +------------+---------------------------------------------------------+ | mm | Middle voice (miðmynd) | +------------+---------------------------------------------------------+ | nh | Infinitive (nafnháttur) | +------------+---------------------------------------------------------+ | fh | Indicative (framsöguháttur) | +------------+---------------------------------------------------------+ | bh | Imperative (boðháttur) | +------------+---------------------------------------------------------+ | vh | Subjunctive (viðtengingarháttur) | +------------+---------------------------------------------------------+ | nt | Present tense (nútíð) | +------------+---------------------------------------------------------+ | þt | Past tense (þátíð) | +------------+---------------------------------------------------------+ | lh | | Present participle (lýsingarháttur nútíðar) | | | | (note that the ``nt`` variant will also be present) | +------------+---------------------------------------------------------+ | lhþt | | Past participle (lýsingarþáttur þátíðar) | | | | (note that the ``þt`` variant will NOT be present) | +------------+---------------------------------------------------------+ | sagnb | Supine (sagnbót) | +------------+---------------------------------------------------------+ | sb | Indefinite (sterk beyging), | | | only occurs with ``lhþt`` | +------------+---------------------------------------------------------+ | vb | Definite (veik beyging), | | | only occurs with ``lhþt`` | +------------+---------------------------------------------------------+ | op | Impersonal verb (ópersónuleg sögn) | +------------+---------------------------------------------------------+ | subj | Verb that requires the subject's case to be | | | non-nominative (sögn sem krefst frumlags í | | | aukafalli) | +------------+---------------------------------------------------------+ | expl | Expletive (leppur), matches verb forms that can be used | | | with an expletive (*það rignir*) | +------------+---------------------------------------------------------+ Noun qualifiers --------------- These variants occur with noun terminals (``no`` category) only. +------------+---------------------------------------------------+ | gr | Definite, attached to noun (viðskeyttur greinir | | | með nafnorði) | +------------+---------------------------------------------------+ | abbrev | Abbreviation (skammstöfun) | +------------+---------------------------------------------------+ Word or lemma endings --------------------- These variants can be used to constrain matching to word forms or lemmas with particular endings only. They are used to detect certain forms of grammatical errors. +------------+---------------------------------------------------+ | xir | Matches only words with lemmas that end with | | | *ir* (e.g., *læknir*, *kælir*) | +------------+---------------------------------------------------+ | zana | Matches only word forms that end with | | | *ana* (e.g., *flokkana*, *bílana*) | +------------+---------------------------------------------------+