.. _reference: Reference ========= The following sections describe the available classes, methods and properties of Greynir. Separate sections describe grammar :ref:`nonterminals` and :ref:`terminals`. The following classes are documented herein: * The :py:class:`Greynir` class * The :py:class:`_Job` class * The :py:class:`_Paragraph` class * The :py:class:`_Sentence` class * The :py:class:`NounPhrase` class * The :py:class:`SimpleTree` class Initializing Greynir -------------------- After installing the ``reynir`` package (see :ref:`installation`), use the following code to import it and initialize an instance of the :py:class:`Greynir` class:: from reynir import Greynir g = Greynir() Now you can use the ``g`` instance to parse text, by calling the :py:meth:`Greynir.submit()`, :py:meth:`Greynir.parse()` and/or :py:meth:`Greynir.parse_single()` methods on it. To tokenize text without parsing it, you can call :py:meth:`Greynir.tokenize()`. If you are only going to be using the :py:class:`NounPhrase` class, you don't need to initialize a :py:class:`Greynir` instance. .. topic:: The Greynir instance It is recommended to initialize **only one instance** of the Greynir class for the duration of your program/process, since each instance needs to read its own configuration data. This includes the compressed *Database of Modern Icelandic Inflection (BÍN)* which occupies about 60 megabytes of memory. However, if you run Greynir in multiple processes, BÍN will -- under most operating systems -- only be mapped once into the computer's physical address space. The Greynir class ----------------- .. py:class:: Greynir .. py:method:: __init__(self, **options) :param options: Tokenizer options can be passed via keyword arguments, as in ``g = Greynir(convert_numbers=True)``. See the documentation for the `Tokenizer `__ package for further information. Additionally, if the parameter ``parse_foreign_sentences=True`` is given, the parser will attempt to parse all sentences, even those that seem to be in a foreign language. The default is not to try to parse sentences where >= 50% of the tokens are not found in DMII/BÍN. Initializes the :py:class:`Greynir` instance. .. py:method:: tokenize(self, text: StringIterable) -> Iterable[Tok] :param StringIterable text: A string or an iterable of strings, containing the text to tokenize. :return: A generator of `tokenizer.Tok `__ instances. Tokenizes a string or an iterable of strings, returning a generator of `tokenizer.Tok `__ instances. The returned tokens include a ``val`` attribute populated with word meanings, lemmas and inflection paradigms from DMII/BÍN, or, in the case of person names, information about gender and case. The tokenizer options given in the class constructor are automatically passed to the tokenizer. .. py:method:: parse_single( \ self, sentence: str, *, \ max_sent_tokens: int=90 \ ) -> Optional[_Sentence] :param str sentence: The single sentence to parse. :param int max_sent_tokens: If given, this specifies the maximum number of tokens that a sentence may contain for Greynir to attempt to parse it. The default is 90 tokens. In practice, sentences longer than this are expensive to parse in terms of memory use and processor time. This parameter can be used to make Greynir more brave in its parsing attempts, by specifying a higher number than 90. Setting it to ``None`` or zero disables the length limit. Note that the default may be increased from 90 in future versions of Greynir. :return: A :py:class:`_Sentence` object, or ``None`` if no sentence could be extracted from the string. Parses a single sentence from a string and returns a corresponding :py:class:`_Sentence` object. The given sentence string is tokenized. An internal parse job is created and the first sentence found in the string is parsed. Paragraph markers are ignored. A single :py:class:`_Sentence` object is returned. If the sentence could not be parsed, :py:attr:`_Sentence.tree` is ``None`` and :py:attr:`_Sentence.combinations` is zero. Example:: from reynir import Greynir g = Greynir() my_text = "Litla gula hænan fann fræ" sent = g.parse_single(my_text) if sent.tree is None: print("The sentence could not be parsed.") else: print("The parse tree for '{0}' is:\n{1}" .format(sent.tidy_text, sent.tree.view)) Output:: The parse tree for 'Litla gula hænan fann fræ' is: S0 +-S-MAIN +-IP +-NP-SUBJ +-lo_nf_et_kvk: 'Litla' +-lo_nf_et_kvk: 'gula' +-no_et_nf_kvk: 'hænan' +-VP +-VP +-so_1_þf_et_p3: 'fann' +-NP-OBJ +-no_et_þf_hk: 'fræ' .. py:method:: parse_tokens( \ self, tokens: Iterable[Tok], *, \ max_sent_tokens: int=90 \ ) -> Optional[_Sentence] :param Iterable[Tok] tokens: An iterable of tokens to parse. :param int max_sent_tokens: A maximum number of tokens to attempt to parse. For longer sentences, an empty :py:class:`_Sentence` object is returned, i.e. one where the ``tree`` attribute is ``None``. :return: A :py:class:`_Sentence` object, or ``None`` if no sentence could be extracted from the token iterable. Parses a single sentence from an iterable of tokens, and returns a corresponding :py:class:`_Sentence` object. Except for the input parameter type, the functionality is identical to :py:meth:`parse_single`. .. py:method:: submit( \ self, text: str, parse: bool=False, *, \ split_paragraphs: bool=False, \ progress_func: Callable[[float], None]=None, \ max_sent_tokens: int=90 \ ) -> _Job Submits a text string to Greynir for parsing and returns a :py:class:`_Job` object. :param str text: The text to parse. Can be a single sentence or multiple sentences. :param bool parse: Controls whether the text is parsed immediately or upon demand. Defaults to ``False``. :param bool split_paragraphs: Indicates that the text should be split into paragraps, with paragraph breaks at newline characters (``\n``). Defaults to ``False``. :param Callable[[float],None] progress_func: If given, this function will be called periodically during the parse job. The call will have a single ``float`` parameter, ranging from ``0.0`` at the beginning of the parse job, to ``1.0`` at the end. Defaults to ``None``. :param int max_sent_tokens: If given, this specifies the maximum number of tokens that a sentence may contain for Greynir to attempt to parse it. The default is 90 tokens. In practice, sentences longer than this are expensive to parse in terms of memory use and processor time. This parameter can be used to make Greynir more brave in its parsing attempts, by specifying a higher number than 90. Setting it to ``None`` or zero disables the length limit. Note that the default may be increased from 90 in future versions of Greynir. :return: A fresh :py:class:`_Job` object. The given text string is tokenized and split into paragraphs and sentences. If the ``parse`` parameter is ``True``, the sentences are parsed immediately, before returning from the method. Otherwise, parsing is incremental (on demand) and is invoked by calling :py:meth:`_Sentence.parse()` explicitly on each sentence. Returns a :py:class:`_Job` object which supports iteration through the paragraphs (via :py:meth:`_Job.paragraphs()`) and sentences (via :py:meth:`_Job.sentences()` or :py:meth:`_Job.__iter__()`) of the parse job. .. py:method:: parse( \ self, text: str, *, \ progress_func: Callable[[float], None] = None, \ max_sent_tokens: int=90 \ ) -> dict Parses a text string and returns a dictionary with the parse job results. :param str text: The text to parse. Can be a single sentence or multiple sentences. :param Callable[[float],None] progress_func: If given, this function will be called periodically during the parse job. The call will have a single ``float`` parameter, ranging from ``0.0`` at the beginning of the parse job, to ``1.0`` at the end. Defaults to ``None``. :param int max_sent_tokens: If given, this specifies the maximum number of tokens that a sentence may contain for Greynir to attempt to parse it. The default is 90 tokens. In practice, sentences longer than this are expensive to parse in terms of memory use and processor time. This parameter can be used to make Greynir more brave in its parsing attempts, by specifying a higher number than 90. Setting it to ``None`` or zero disables the length limit. Note that the default may be increased from 90 in future versions of Greynir. :return: A dictionary containing the parse results as well as statistics from the parse job. The given text string is tokenized and split into sentences. An internal parse job is created and the sentences are parsed. The resulting :py:class:`_Sentence` objects are returned in a list in the ``sentences`` field in the dictionary. The text is treated as one contiguous paragraph. The result dictionary contains the following items: * ``sentences``: A list of :py:class:`_Sentence` objects corresponding to the sentences found in the text. If a sentence could not be parsed, the corresponding object's ``tree`` property will be ``None``. * ``num_sentences``: The number of sentences found in the text. * ``num_parsed``: The number of sentences that were successfully parsed. * ``ambiguity``: A ``float`` weighted average of the ambiguity of the parsed sentences. Ambiguity is defined as the *n*-th root of the number of possible parse trees for the sentence, where *n* is the number of tokens in the sentence. * ``parse_time``: A ``float`` with the wall clock time, in seconds, spent on tokenizing and parsing the sentences, including finding the best parse trees. * ``reduce_time``: A ``float`` with the wall clock time, in seconds, spent on finding the best parse tree in each parse forest. This time is included in the ``parse_time``. Example *(try it!)*:: from reynir import Greynir g = Greynir() my_text = "Litla gula hænan fann fræ. Það var hveitifræ." d = g.parse(my_text) print("{0} sentences were parsed".format(d["num_parsed"])) for sent in d["sentences"]: print("The parse tree for '{0}' is:\n{1}" .format( sent.tidy_text, "[Null]" if sent.tree is None else sent.tree.flat ) ) .. py:method:: dumps_single(self, sent: _Sentence, **kwargs) -> str :param _Sentence sent: The :py:class:`_Sentence` object to dump in JSON format. :param kwargs: Optional keyword parameters to be passed to the standard library's ``json.dumps()`` function. :return: A JSON string. Dumps a :py:class:`_Sentence` object to a JSON string. Use :py:meth:`Greynir.loads_single()` to re-create a :py:class:`_Sentence` instance from a JSON string. .. py:method:: loads_single(self, json_str: str, **kwargs) -> _Sentence :param str json_str: The JSON string to load back into a :py:class:`_Sentence` object. :param kwargs: Optional keyword parameters to be passed to the standard library's ``json.loads()`` function. :return: A :py:class:`_Sentence` object constructed from the JSON string. Constructs a :py:class:`_Sentence` instance from a JSON string. .. py:classmethod:: cleanup(cls) Deallocates memory resources allocated by :py:meth:`__init__`. If your code has finished using Greynir and you want to free up the memory allocated for its resources, including the 60 megabytes for the *Database of Modern Icelandic Inflection (BÍN)*, call :py:meth:`Greynir.cleanup()`. After calling :py:meth:`Greynir.cleanup()` the functionality of Greynir is no longer available via existing instances of :py:class:`Greynir`. However, you can initialize new instances (via ``g = Greynir()``), causing the configuration to be re-read and memory to be allocated again. The _Job class ---------------- Instances of this class are returned from :py:meth:`Greynir.submit()`. You should not need to instantiate it yourself, hence the leading underscore in the class name. .. py:class:: _Job .. py:method:: paragraphs(self) -> Iterable[_Paragraph] Returns a generator of :py:class:`_Paragraph` objects, corresponding to paragraphs in the parsed text. Paragraphs are assumed to be delimited by ``[[`` and ``]]`` markers in the text, surrounded by whitespace. These markers are optional and not required. If they are not present, the text is assumed to be one contiguous paragraph. Example:: from reynir import Greynir g = Greynir() my_text = ("[[ Þetta er fyrsta efnisgreinin. Hún er stutt. ]] " "[[ Hér er önnur efnisgreinin. Hún er líka stutt. ]]") j = g.submit(my_text) for pg in j.paragraphs(): for sent in pg: print(sent.tidy_text) print() Output:: Þetta er fyrsta efnisgreinin. Hún er stutt. Hér er önnur efnisgreinin. Hún er líka stutt. .. py:method:: sentences(self) -> Iterable[_Sentence] Returns a generator of :py:class:`_Sentence` objects. Each object corresponds to a sentence in the parsed text. If the sentence has already been successfully parsed, its :py:attr:`_Sentence.tree` property will contain its (best) parse tree. Otherwise, the property is ``None``. .. py:method:: __iter__(self) -> Iterable[_Sentence] A shorthand for calling :py:meth:`_Job.sentences()`, supporting the Python iterator protocol. You can iterate through the sentences of a parse job via a ``for`` loop:: for sent in job: sent.parse() # Do something with sent .. py:attribute:: num_sentences Returns an ``int`` with the accumulated number of sentences that have been submitted for parsing via this job. .. py:attribute:: num_parsed Returns an ``int`` with the accumulated number of sentences that have been sucessfully parsed via this job. .. py:attribute:: num_tokens Returns an ``int`` with the accumulated number of tokens in sentences that have been submitted for parsing via this job. .. py:attribute:: num_combinations Returns an ``int`` with the accumulated number of parse tree combinations for the sentences that have been successfully parsed via this job. .. py:attribute:: ambiguity Returns a ``float`` with the weighted average ambiguity factor of the sentences that have been successfully parsed via this job. The ambiguity factor of a sentence is defined as the *n*-th root of the total number of parse tree combination for the sentence, where *n* is the number of tokens in the sentence. The average across sentences is weighted by token count. .. py:attribute:: parse_time Returns a ``float`` with the accumulated wall clock time, in seconds, that has been spent parsing sentences via this job. The _Paragraph class -------------------- Instances of this class are returned from :py:meth:`_Job.paragraphs()`. You should not need to instantiate it yourself, hence the leading underscore in the class name. .. py:class:: _Paragraph .. py:method:: sentences(self) -> Iterable[_Sentence] Returns a generator of :py:class:`_Sentence` objects. Each object corresponds to a sentence within the paragraph in the parsed text. If the sentence has already been successfully parsed, its :py:attr:`_Sentence.tree` property will contain its (best) parse tree. Otherwise, the property is ``None``. .. py:method:: __iter__(self) -> Iterable[_Sentence] A shorthand for calling :py:meth:`_Paragraph.sentences()`, supporting the Python iterator protocol. You can iterate through the sentences of a paragraph via a ``for`` loop:: for pg in job.paragraphs(): for sent in pg: sent.parse() # Do something with sent The _Sentence class ------------------- Instances of this class are returned from :py:meth:`_Job.sentences()` and :py:meth:`_Job.__iter__()`. You should not need to instantiate it yourself, hence the leading underscore in the class name. .. py:class:: _Sentence .. py:method:: __len__(self) -> int Returns an ``int`` with the number of tokens in the sentence. .. py:attribute:: text Returns a ``str`` with the raw text representation of the sentence, with spaces between all tokens. For a more correctly formatted version of the text, use the :py:attr:`_Sentence.tidy_text` property instead. Example:: from reynir import Greynir g = Greynir() s = g.parse_single("Jón - faðir Ásgeirs - átti 2/3 hluta " "af landinu árin 1944-1950.") print(s.text) Output (note the intervening spaces, also before the period at the end):: Jón - faðir Ásgeirs - átti 2/3 hluta af landinu árin 1944 - 1950 . .. py:method:: __str__(self) -> str Returns a ``str`` with the raw text representation of the sentence, with spaces between all tokens. For a more correctly formatted version of the text, use the :py:attr:`_Sentence.tidy_text` property instead. .. py:attribute:: tidy_text Returns a ``str`` with a text representation of the sentence, with correct spacing between tokens, and em- and en-dashes substituted for regular hyphens as appropriate. Example:: from reynir import Greynir g = Greynir() s = g.parse_single("Jón - faðir Ásgeirs - átti 2/3 hluta " "af landinu árin 1944-1950.") print(s.tidy_text) Output (note the dashes and the period at the end):: Jón — faðir Ásgeirs — átti 2/3 hluta af landinu árin 1944–1950. .. py:attribute:: tokens Returns a ``list`` of tokens in the sentence. Each token is represented by a ``Tok`` ``namedtuple`` instance from the ``Tokenizer`` package. Example:: from reynir import Greynir, TOK g = Greynir() s = g.parse_single("5. janúar sá Ása 5 sólir.") for t in s.tokens: print(TOK.descr[t.kind], t.txt) outputs:: DATE 5. janúar WORD sá PERSON Ása NUMBER 5 WORD sólir PUNCTUATION . .. py:method:: parse(self) -> bool Parses the sentence (unless it has already been parsed) and returns ``True`` if at least one parse tree was found, or ``False`` otherwise. For successfully parsed sentences, :py:attr:`_Sentence.tree` contains the best parse tree. Otherwise, :py:attr:`_Sentence.tree` is ``None``. If the parse is not successful, the 0-based index of the token where the parser gave up is stored in :py:attr:`_Sentence.err_index`. .. py:attribute:: error Returns a ``ParseError`` instance if an error was found during the parsing of the sentence, or ``None`` otherwise. ``ParseError`` is an exception class, derived from ``Exception``. It can be converted to ``str`` to obtain a human-readable error message. .. py:attribute:: err_index Returns an ``int`` with the 0-based index of the token where the parser could not find any grammar production to continue the parse, or ``None`` if the sentence has not been parsed yet or if no error occurred during the parse. .. py:attribute:: combinations Returns an ``int`` with the number of possible parse trees for the sentence, or ``0`` if no parse trees were found, or ``None`` if the sentence hasn't been parsed yet. .. py:attribute:: score Returns an ``int`` representing the score that the best parse tree got from the scoring heuristics of Greynir. The score is ``0`` if the sentence has not been successfully parsed. .. py:attribute:: tree Returns a :py:class:`SimpleTree` object representing the best (highest-scoring) parse tree for the sentence, in a *simplified form* that is easy to work with. If the sentence has not yet been parsed, or no parse tree was found for it, this property is ``None``. .. py:attribute:: deep_tree Returns the best (highest-scoring) parse tree for the sentence, in a *detailed form* corresponding directly to Greynir's context-free grammar for Icelandic. If the sentence has not yet been parsed, or no parse tree was found for it, this property is ``None``. Example:: from reynir import Greynir g = Greynir() s = g.parse_single("Ása sá sól.") print(repr(s.deep_tree)) Output: .. code-block:: none S0 Málsgrein MgrInnihald Yfirsetning HreinYfirsetning Setning Setning_et_p3_kvk BeygingarliðurÁnUmröðunar_et_p3_kvk NlFrumlag_nf_et_p3_kvk Nl_et_p3_nf_kvk NlEind_et_p3_nf_kvk NlStak_et_p3_nf_kvk NlStak_p3_et_nf_kvk NlKjarni_et_nf_kvk Fyrirbæri_nf_kvk 'Ása' -> no_et_nf_kvk BeygingarliðurMegin_et_p3_kvk SagnRuna_et_p3_kvk SagnRunaKnöpp_et_p3_kvk Sagnliður_et_p3_kvk Sögn_1_et_p3_kvk 'sá' -> so_1_þf_et_p3 NlBeintAndlag_þf Nl_þf NlEind_et_p3_þf_kvk NlStak_et_p3_þf_kvk NlStak_p3_et_þf_kvk NlKjarni_et_þf_kvk Fyrirbæri_þf_kvk 'sól' -> no_et_þf_kvk Lokatákn? Lokatákn '.' -> "." .. py:attribute:: flat_tree Returns the best (highest-scoring) parse tree for the sentence, simplified and flattened to a text string. Nonterminal scopes are delimited like so: ``NAME ... /NAME`` where ``NAME`` is the name of the nonterminal, for example ``NP`` for noun phrases and ``VP`` for verb phrases. Terminals have lower-case identifiers with their various grammar variants separated by underscores, e.g. ``no_þf_kk_et`` for a noun, accusative case, masculine gender, singular. If the sentence has not yet been parsed, or no parse tree was found for it, this property is ``None``. Example:: from reynir import Greynir g = Greynir() s = g.parse_single("Seldum fasteignum hefur fjölgað.") print(s.flat_tree) Output: .. code-block:: none S0 S-MAIN IP NP-SUBJ lo_þgf_ft_kvk no_ft_þgf_kvk /NP-SUBJ VP VP-AUX so_et_p3 /VP-AUX VP so_sagnb /VP /VP /IP /S-MAIN p /S0 .. py:attribute:: terminals Returns a ``list`` of the terminals in the best parse tree for the sentence, in the order in which they occur in the sentence (token order). Each terminal corresponds to a token in the sentence. The entry for each terminal is a ``typing.NamedTuple`` called ``Terminal``, having five fields: 0. **text**: The token text. 1. **lemma**: The lemma of the word, if the token is a word, otherwise it is the text of the token. Lemmas of composite words include hyphens ``-`` at the component boundaries. Examples: ``borgar-stjórnarmál``, ``skugga-kosning``. 2. **category**: The word :ref:`category ` (``no`` for noun, ``so`` for verb, etc.) 3. **variants**: A list of the :ref:`grammatical variants ` for the word or token, or an empty list if not applicable. The variants include the case (``nf``, ``þf``, ``þgf``, ``ef``), gender (``kvk``, ``kk``, ``hk``), person, verb form, adjective degree, etc. This list identical to the one returned from :py:attr:`SimpleTree.all_variants` for the terminal in question. 4. **index**: The index of the token that corresponds to this terminal. The index is 0-based. If the sentence has not yet been parsed, or no parse tree was found for it, this property is ``None``. Example:: from reynir import Greynir g = Greynir() s = g.parse_single("Þórgnýr fór út og fékk sér ís.") for t in s.terminals: print("{0:8s} {1:8s} {2:8s} {3}" .format(t.text, t.lemma, t.category, ", ".join(t.variants))) Output: .. code-block:: none Þórgnýr Þórgnýr person nf, kk fór fara so 0, et, fh, gm, p3, þt út út ao og og st fékk fá so 2, þgf, þf, et, fh, gm, p3, þt sér sig abfn þgf ís ís no et, kk, þf . . (The line for *fékk* means that this is the verb (``so``) *fá*, having two arguments (``2``) in dative case (``þgf``) and accusative case (``þf``); it is singular (``et``), indicative (``fh``), active voice (``gm``), in the third person (``p3``), and in past tense (``þt``). See :ref:`variants` for a detailed explanation.) .. py:attribute:: lemmas Returns a ``list`` of the lemmas of the words in the sentence, or the text of the token for non-word tokens. ``sent.lemmas`` is a shorthand for ``[ t.lemma for t in sent.terminals ]``. Lemmas of composite words include hyphens ``-`` at the component boundaries. Examples: ``borgar-stjórnarmál``, ``skugga-kosning``. If the sentence has not yet been parsed, or no parse tree was found for it, this property is ``None``. Example:: from reynir import Greynir g = Greynir() s = g.parse_single( "Gullsópur ehf. keypti árið 1984 verðlaunafasteignina " "að Laugavegi 26." ) print(s.lemmas) Output: .. code-block:: none ['gullsópur', 'ehf.', 'kaupa', 'árið 1984', 'verðlauna-fasteign', 'að', 'Laugavegur', '26', '.'] .. py:attribute:: categories Returns a ``list`` of the categories of the words in the sentence, or ``""`` for non-word tokens. ``sent.categories`` is a shorthand for ``[ d.cat for d in sent.terminal_nodes ]``. The categories returned are those of the token associated with each terminal, according to BÍN's category scheme. Nouns (including person names) thus have categories of ``kk``, ``kvk`` or ``hk``, for masculine, feminine and neutral gender, respectively. Unrecognized words have the ``entity`` category. If the sentence has not yet been parsed, or no parse tree was found for it, this property is ``None``. Example:: from reynir import Greynir g = Greynir() s = g.parse_single( "Gullsópur ehf. keypti árið 1984 verðlaunafasteignina " "að Laugavegi 26." ) print(s.categories) Output: .. code-block:: none ['kk', 'hk', 'so', '', 'kvk', 'fs', 'kk', '', ''] .. py:attribute:: lemmas_and_cats Returns a ``list`` of (lemma, category) tuples corresponding to the tokens in the sentence. ``sent.lemmas_and_cats`` is a shorthand for ``[ (d.lemma, d.lemma_cat) for d in sent.terminal_nodes ]``. For non-word tokens, the lemma is the original token text and the category is an empty string (``""``). For person names, the category is ``person_kk``, ``person_kvk`` or ``person_hk`` for masculine, feminine or neutral gender names, respectively. For unknown words, the category is ``entity``. Lemmas of composite words include hyphens ``-`` at the component boundaries. Examples: ``borgar-stjórnarmál``, ``skugga-kosning``. This property is intended to be useful *inter alia* for topic indexing of text. A good strategy for that purpose could be to index all lemmas having a non-empty category, perhaps also discarding some less significant categories (such as conjunctions). If the sentence has not yet been parsed, or no parse tree was found for it, this property is ``None``. Example:: from reynir import Greynir g = Greynir() s = g.parse_single( "Hallbjörn borðaði ísinn kl. 14 meðan Icelandair át 3 teppi " "frá Íran og Xochitl var tilbeðin." ) print(s.lemmas_and_cats) Output: .. code-block:: none [('Hallbjörn', 'person_kk'), ('borða', 'so'), ('ís', 'kk'), ('kl. 14', ''), ('meðan', 'st'), ('Icelandair', 'entity'), ('éta', 'so'), ('3', ''), ('teppi', 'hk'), ('frá', 'fs'), ('Íran', 'hk'), ('og', 'st'), ('Xochitl', 'entity'), ('vera', 'so'), ('tilbiðja', 'so'), ('.', '')] .. py:attribute:: terminal_nodes Returns a ``list`` of the subtrees (:py:class:`SimpleTree` instances) that correspond to terminals in the parse tree for this sentence, in the order in which they occur (token order). If the sentence has not yet been parsed, or no parse tree was found for it, this property is ``None``. .. py:method:: is_foreign(self, min_icelandic_ratio: float=0.5) -> bool :param float min_icelandic_ratio: The minimum ratio of word tokens that must be found in BÍN for a sentence to be considered Icelandic. Defaults to ``0.5``. Returns ``True`` if the sentence is probably in a foreign language, i.e. not Icelandic. A sentence is probably foreign if it contains at least three word tokens and, out of those, less than 50% are found in the BÍN database. The 50% threshold is adjustable by overriding the ``min_icelandic_ratio`` parameter. The NounPhrase class -------------------- The :py:class:`NounPhrase` class conveniently encapsulates an Icelandic noun phrase (*nafnliður*), making it easy to obtain correctly inflected forms of the phrase, as required in various contexts. .. py:class:: NounPhrase .. py:method:: __init__(self, np_string: str, *, force_number: str = None) Creates a :py:class:`NounPhrase` instance. :param str np_string: The text string containing the noun phrase (*nafnliður*). The noun phrase must conform to the grammar specified for the ``Nl`` nonterminal in ``Greynir.grammar``. This grammar allows e.g. number, adjective and adverb prefixes, referential phrases (*...sem...*) and prepositional phrases (*...í...*). Examples of valid noun phrases include: * *stóri kraftalegi maðurinn sem ég sá í bænum*, * *ofboðslega bragðgóði lakkrísinn í nýju umbúðunum*, and * *rúmlega 20 millilítrar af kardemommudropum með vanillu*. If the noun phrase cannot be parsed or is empty, the :py:attr:`NounPhrase.parsed` property will be ``False`` and all inflection properties will return ``None``. :param str force_number: An optional string that can contain ``"et"`` or ``"singular"``, or ``"ft"`` or ``"plural"``. If given, it forces the parsing of the noun phrase to be constrained to singular or plural forms, respectively. As an example, ``NounPhrase("eyjar", force_number="ft")`` yields a plural result (nominative ``"eyjar"``), while ``NounPhrase("eyjar")`` without forcing yields a singular result (nominative ``"ey"``). .. py:method:: __str__(self) -> str Returns the original noun phrase string as passed to the constructor. .. py:method:: __len__(self) -> int Returns the length of the original noun phrase string. .. py:method:: __format__(self, spec: str) -> str Formats a noun phrase in the requested inflection form. Works with Python's ``format()`` function as well as in f-strings (available starting with Python 3.6). :param str spec: An inflection specification for the string to be returned. This can be one of the following: * ``nf`` or ``nom``: Nominative case (*nefnifall*). * ``þf`` or ``acc``: Accusative case (*þolfall*). * ``þgf`` or ``dat``: Dative case (*þágufall*). * ``ef`` or ``gen``: Genitive case (*eignarfall*). * ``ángr`` or ``ind``: Indefinite, nominative form (*nefnifall án greinis*). * ``stofn`` or ``can``: Canonical, nominative singular form without attached prepositions or referential phrases (*nefnifall eintölu án greinis, án forsetningarliða* *og tilvísunarsetninga*). :return: The noun phrase in the requested inflection form, as a string. Example:: from reynir import NounPhrase as Nl nl = Nl("blesóttu hestarnir mínir") print("Hér eru {nl:nf}.".format(nl=nl)) print("Mér þykir vænt um {nl:þf}.".format(nl=nl)) print("Ég segi öllum frá {nl:þgf}.".format(nl=nl)) print("Ég vil tryggja velferð {nl:ef}.".format(nl=nl)) print("Já, {nl:ángr}, þannig er það.".format(nl=nl)) print("Umræðuefnið hér er {nl:stofn}.".format(nl=nl)) # Starting with Python 3.6, f-strings are supported: print(f"Hér eru {nl:nf}.") # etc. Output:: Hér eru blesóttu hestarnir mínir. Mér þykir vænt um blesóttu hestana mína. Ég segi öllum frá blesóttu hestunum mínum. Ég vil tryggja velferð blesóttu hestanna minna. Já, blesóttir hestar mínir, þannig er það. Umræðuefnið hér er blesóttur hestur minn. .. py:attribute:: parsed Returns ``True`` if the noun phrase was successfully parsed, or ``False`` if not. .. py:attribute:: tree Returns a :py:class:`SimpleTree` object encapsulating the parse tree for the noun phrase. .. py:attribute:: case Returns a string denoting the case of the noun phrase, as originally passed to the constructor. The case is one of ``"nf"``, ``"þf"``, ``"þgf"`` or ``"ef"``, denoting nominative, accusative, dative or genitive case, respectively. If the noun phrase could not be parsed, the property returns ``None``. .. py:attribute:: number Returns a string denoting the number (singular/plural) of the noun phrase, as originally passed to the constructor. The number is either ``"et"`` (singular, *eintala*) or ``"ft"`` (plural, *fleirtala*). If the noun phrase could not be parsed, the property returns ``None``. .. py:attribute:: person Returns a string denoting the person (1st, 2nd, 3rd) of the noun phrase, as originally passed to the constructor. The returned string is one of ``"p1"``, ``"p2"`` or ``"p3"`` for first, second or third person, respectively. If the noun phrase could not be parsed, the property returns ``None``. .. py:attribute:: gender Returns a string denoting the gender (masculine, feminine, neutral) of the noun phrase, as originally passed to the constructor. The returned string is one of ``"kk"``, ``"kvk"`` or ``"hk"`` for masculine (*karlkyn*), feminine (*kvenkyn*) or neutral (*hvorugkyn*), respectively. If the noun phrase could not be parsed, the property returns ``None``. .. py:attribute:: nominative Returns a string with the noun phrase in nominative case (*nefnifall*), or ``None`` if the noun phrase could not be parsed. .. py:attribute:: accusative Returns a string with the noun phrase in accusative case (*þolfall*), or ``None`` if the noun phrase could not be parsed. .. py:attribute:: dative Returns a string with the noun phrase in dative case (*þágufall*), or ``None`` if the noun phrase could not be parsed. .. py:attribute:: genitive Returns a string with the noun phrase in genitive case (*eignarfall*), or ``None`` if the noun phrase could not be parsed. .. py:attribute:: indefinite Returns a string with the noun phrase in indefinite form, nominative case (*nefnifall án greinis*), or ``None`` if the noun phrase could not be parsed. .. py:attribute:: canonical Returns a string with the noun phrase in singular, indefinite form, nominative case, where referential phrases (*...sem...*) and prepositional phrases (*...í...*) have been removed. If the noun phrase could not be parsed, ``None`` is returned.