Morpho-Guessing

Morpho-Guessing

A new feature of version 4.0 is "morpho-guessing" of unknown words. While the old version was able to guess the part of speech of a word based on its context, the new version also considers its spelling. Words that end in "-s" are assumed to be plural nouns or singular verbs; those ending in "-ed", past-tense (or passive) verbs; those ending in "-ing", present participles; those ending in "-ly", adjectives. This greatly improves the ability of the parser to handle sentences containing multiple unknown words. Words that have been treated in this way are marked with a "[!]". (The parser's old system of "unknown words" is still in place, for handling words whose spelling does not match any of the categories listed above; as before, these are marked with a "[?]".)

For example, consider the sentence

        Overhyped megamergers underperform horrifically

None of these four words are in the parser's dictionary. Without morphoguessing, the parser would have no way of deciding between many possible interpretations. With morphoguessing, however, the parser is able to correctly identify the first as an passive verb form (used here as an adjective), the second as a noun, and the last as an adverb; from context, it is then able to guess that the third is a verb.

            +-------A-------+-------Sp-------+-------MVa-------+
            |               |                |                 |
     overhyped[!].v megamergers[!].n underperform[?].v   horrifically[!].e 

     (S (NP overhyped megamergers)
            (VP underperform
                (ADVP horrifically)))

(Given the same words in a different order--"overhyped megamergers horrifically underperform"--it can be seen that they are still correctly identified.)