The Link Parser Application Program Interface (API)


CONTENTS

1. Introduction
2. Terminology and Concepts
3. A Simple Example
4. Basic Operations
5. Another Example
6. Reference Manual

1. Introduction

The original version of the parser was designed around a standard interface, where the user types in a sentence, and the parser displays the linkages that it finds. This is fine for showing how the grammar and parser work, but in order to make actual use of the information that the parser provides, it is necessary to have access to its inner workings. The Link Parser API was written to give users flexibility in using the parser in their applications. Examples of the kind of capability the API provides include:
  • Open up more than one dictionary at a time.
  • Parse a sentence with different dictionaries or parsing parameters, and compare the results.
  • Limit the time and memory that the parsing process takes.
  • Use different "cost functions" for ranking linkages.
  • Save linkages from a sentence, and access individual links.
  • Post-process a sentence with more than one set of post-processing rules.
  • Extract the domains that links participate in, to perform transformations on a linkage.
  • Recover the constituent structure corresponding to a phrase-structure grammar.
The API provides a set of basic data structures and function calls that allow the programmer to easily design a customized parser. The API is written in ANSI C, and runs in both UNIX and Windows environments.

2. Terminology and Concepts

There are five basic data structures or "types" defined by the API. To parse a sentence and extract information from it, the user creates and manipulates these types using a standard set of function calls. An overview of these five data structures is given in the following table.

Name Description
Dictionary A Dictionary is the programmer's handle on the set of word definitions that defines the grammar. A user creates a Dictionary from a grammar file and post-process knowledge file, and then passes it to the various parsing routines.
Sentence A Sentence is the API's representation of an input string, tokenized and interpreted according to a specific Dictionary. After a Sentence is created and parsed, various attributes of the resulting set of linkages can be obtained.
Parse_Options Parse_Options specify the different parameters that are used to parse sentences. Examples of the kinds of things that are controlled by Parse_Options include maximum parsing time and memory, whether to use null-links, and whether or not to use "panic" mode. This data structure is passed in to the various parsing and printing routines along with the sentence.
Linkage This is the API's representation of a parse. A Linkage can be constructed after a sentence has been parsed, and can be thought of as a Sentence together with a collection of links. If the parse has a conjunction, then the Linkage is made up of at least two "sublinkages". A Linkage can be pretty printed in either ASCII or Postscript format, and individual links can be extracted.
PostProcessor Individual linkages can be post-processed with different sets of context-sensitive post-processing rules. The API enables this by letting the user open up a set of rules and pass it around as a PostProcessor. A PostProcessor is also associated with each Dictionary, and automatically applied after parsing each Sentence constructed using that dictionary.

Sections 4 and 6 contain more complete descriptions of these five basic types.


3. A Simple Example

In the spirit of the "hello world" program that is often included in an introductory programming language manual, we now give a very simple example. The following C program demonstrates the basic use of the API. The program opens up a dictionary and then parses two sentences, graphically displaying a linkage for each.


#include "link-includes.h"

int main() {

    Dictionary    dict;
    Parse_Options opts;
    Sentence      sent;
    Linkage       linkage;
    char *        diagram;
    int           i, num_linkages;
    char *        input_string[] = {
       "Grammar is useless because there is nothing to say -- Gertrude Stein.",
       "Computers are useless; they can only give you answers -- Pablo Picasso."};

    opts  = parse_options_create();
    dict  = dictionary_create("4.0.dict", "4.0.knowledge", NULL, "4.0.affix");

    for (i=0; i<2; ++i) {
	sent = sentence_create(input_string[i], dict);
	num_linkages = sentence_parse(sent, opts);
	if (num_linkages > 0) {
	    linkage = linkage_create(0, sent, opts);
	    printf("%s\n", diagram = linkage_print_diagram(linkage));
	    string_delete(diagram);
	    linkage_delete(linkage);
	}
	sentence_delete(sent);
    }

    dictionary_delete(dict);
    parse_options_delete(opts);
    return 0;
}

When run (using the dictionary 4.0.dict, post-processing file 4.0.knowledge and affix file 4.0.affix that is provided with the code), the program produces the following output:


                                               +------------MXs------------+    
                                               +----Bs---+    +-----Xd-----+    
    +---Ss--+--Pa--+---MVs--+--Cs--+-SFst+-Ost-+--R--+-I-+    |     +---G--+-Xc+
    |       |      |        |      |     |     |     |   |    |     |      |   |
grammar.n is.v useless.a because there is.v nothing to say.v -- Gertrude Stein . 

                                                                       +-------MXp-------+     
    +----------------Xx---------------+        +-----I-----+----Opn----+      +----Xd----+     
    +----Wd----+---Spx--+---Pa--+     +-Wd+-Sp-+     +--E--+-Ox-+      |      |   +---G--+-Xc-+
    |          |        |       |     |   |    |     |     |    |      |      |   |      |    |
   ////   computers.n are.v useless.a ; they can.v only give.v you answers.n -- Pablo Picasso . 

The first two statements of the program:

    opts  = parse_options_create();
    dict  = dictionary_create("4.0.dict", "4.0.knowledge", NULL, "4.0.affix");

create Parse_Options and a Dictionary to be used in processing sentences. To create the dictionary, the program looks in the current directory for the files 4.0.dict and 4.0.knowledge, as well as in the directory dict/ if it exists.

In the loop through the two input sentences, the statement


    sent = sentence_create(input_string[i], dict);

creates a Sentence from the input string, using the Dictionary that was created earlier to tokenize and define words. The statement

    num_linkages = sentence_parse(sent, opts);

passes the sentence, along with the Parse_Options, to the function sentence_parse, which searches for all possible linkages, and returns the number that were found. If linkages are found, the sequence of statements

    linkage = linkage_create(0, sent, opts);
    printf("%s\n", diagram = linkage_print_diagram(linkage));
    string_delete(diagram);
    linkage_delete(linkage);

extracts the first linkage in the list (the indexing is 0-based), prints the linkage diagram to the standard output, and then deletes linkage and the string allocated for the diagram. Since the various constructions (sentences, linkages, diagrams) are allowed to survive after the parsing process has finished, the user is responsible for their memory management. After each of the input strings is processed, the program finishes up by deleting the Dictionary and Parse_Options with the statements

    dictionary_delete(dict);
    parse_options_delete(opts);


4. Basic Operations

The basic operations that a typical use of the API will involve include opening a dictionary, creating and customizing a set of parse options, creating sentences, parsing them, and extracting linkages. In this section we briefly describe how these basic operations on the Dictionary, Parse_Options, Sentence and Linkage types are carried out. Further details can be found in the Reference Manual.

Dictionary dictionary_create(char *dict_name, char *post_process_file_name,
                             char *constituent_knowledge_name, char *affix_name);
    

To create the dictionary, the program looks in the current directory and the data directory for the files dict_name, post_process_file_name, constituent_knowledge_name, and affix_name. The last three entries may be omitted by specifying NULL. If dict_name is a fully specified path name, then the other file names, which need not be fully specified, will prefixed by the directory specified by dict_name.

Other than deleting it, there isn't much else to be done with a Dictionary, other than use it to create a Sentence:


Sentence sentence_create(char *input_string, Dictionary dict);

This routine takes the input string and tokenizes it using the word definitions in the Dictionary passed in. If there are any problems defining the words in the sentence, then an appropriate lperror code and message are set, and sentence_create returns NULL. In order to parse a sentence, it is necessary to tell the parser how the job should be done using a set of Parse_Options. These are created with default parameters, which can be changed using routines such as the following:

void  parse_options_set_min_null_count(Parse_Options opts, int min_null_count);
void  parse_options_set_max_null_count(Parse_Options opts, int max_null_count);

When parsing a sentence, the parser will find all solutions having the minimum number of null links. It carries out its search in the range of null link counts between min_null_count and max_null_count. By default, the minimum and maximum number of null links is 0, so null links are not used.

int sentence_parse(Sentence sent, Parse_Options opts);

This routine represents the heart of the program. There are several things that are done when a sentence is parsed:
1. Word expressions are extracted from the dictionary and pruned.
2. Disjuncts are built.
3. A series of pruning operations is carried out.
4. The linkages having the minimal number of null links are counted.
5. A "parse set" of linkages is built.
6. The linkages are post-processed.
The "parse set" is attached to the sentence, and this is one of the key reasons that the API is flexible and modular. All of the necessary information for building linkages is stored in the parse set. This means that other sentences can be parsed, possibly using different dictionaries and other parameters, without disturbing the information obtained from a call to sentence_parse. If another call to sentence_parse is made on the same sentence, the parsing information for the previous call is deleted.

Linkage linkage_create(int index, Sentence sent, Parse_Options opts);

This function creates the index-th linkage from the (parsed) sentence sent. Several operations can be carried out on the resulting linkage; for example it can be printed, post-processed with a different post-processor, or information on individual links can be extracted. If the parse has a conjunction, then the linkage will be made up of two or more sublinkages.

One implementation detail that may be helpful to users of the API is that internally, the API uses different memory bookkeeping for linkages and several other objects created by the user. The idea is that there is internal workspace used by the parser for carrying out its search, and external workspace for linkages, diagrams, and other objects created by the user that may persist after the parsing process has been completed. (This can be helpful for determining memory leaks in an implementation of the parser.) As shown in the example in Section 3, when the user is finished with these objects, their memory should be freed up with calls to the appropriate functions, such as linkage_delete.

5. Another Example

As another example of the API in use, we can use the constituent code to parse a sentence and then mark the prepositional phrases. With the input sentence
"This is a test of the constituent code in the API."
the output is
    +--------------------------------Xp-------------------------------+
    |              +-------------------MVp------------------+         |
    |              |            +----------Jp----------+    |         |
    |              +--Ost--+    |  +--------D*u--------+    +--Js--+  |
    +---Wd---+-Ss*b+  +-Ds-+-Mp-+  |        +-----A----+    |  +-DG+  |
    |        |     |  |    |    |  |        |          |    |  |   |  |
LEFT-WALL this.p is.v a test.n of the constituent.a code.n in the API . 

 This is a test [of the constituent code] [in the API] .
Here's the code:


#include "link-includes.h"


/* Print out the words at the leaves of the tree,
   bracketing constituents labeled "PP" */

void print_words_with_prep_phrases_marked(CNode *n) {
    CNode * m;
    static char * spacer=" ";

    if (n == NULL) return;
    if (strcmp(n->label, "PP")==0) {
	printf("%s[", spacer);
	spacer="";
    }
    for (m=n->child; m!=NULL; m=m->next) {
	if (m->child == NULL) {
	    printf("%s%s", spacer, m->label);
	    spacer=" ";
	}
	else {
	    print_words_with_prep_phrases_marked(m);
	}
    }
    if (strcmp(n->label, "PP")==0) {
	printf("]");
    }
}

int main() {

    Dictionary    dict;
    Parse_Options opts;
    Sentence      sent;
    Linkage       linkage;
    CNode *       cn;
    char *        string;
    char *        input_string = 
       "This is a test of the constituent code in the API.";

    opts  = parse_options_create();
    dict  = dictionary_create("4.0.dict", "4.0.knowledge", 
			      "4.0.constituent-knowledge", "4.0.affix");

    sent = sentence_create(input_string, dict);
    if (sentence_parse(sent, opts)) {
	linkage = linkage_create(0, sent, opts);
	printf("%s", string = linkage_print_diagram(linkage));
	string_delete(string);
	cn = linkage_constituent_tree(linkage);
	print_words_with_prep_phrases_marked(cn);
	linkage_free_constituent_tree(cn);
	fprintf(stdout, "\n\n");
	linkage_delete(linkage);
    }
    sentence_delete(sent);

    dictionary_delete(dict);
    parse_options_delete(opts);
    return 0;
}

  

The functions used are described in the section on extracting constituent structure.

6. Reference Manual

6.1 Creating Dictionaries
6.2 Using Parse Options
6.3 Processing Sentences
6.4 Manipulating Linkages
6.5 Independent Post-Processing
6.6 Extracting Constituent Structure

6.1 Creating Dictionaries


Dictionary dictionary_create(char *dict_name, char *post_process_file_name,
                             char *constituent_knowledge_name, char *affix_name);
    

The current directory, and the data directory, if it exists, is searched for the files dict_name, post_process_file_name, constituent_knowledge_name, and affix_name. The last three entries may be elided by specifying NULL. If there is a non-empty path prefix for the dict_name variable, then the other files will be searched for in the directory specified by that path. If opening the dictionary fails, then dictionary_create returns NULL (a Dictionary is actually a pointer to a data structure).

When a Dictionary is created, all of its entries are loaded, and the post-process knowledge file is also opened. If there is an error while reading the dictionary, an lperror number and message are set.


int dictionary_delete(Dictionary dict);

Frees up all of the space used by the Dictionary, and closes the post-processor that was associated with it.

int dictionary_get_max_cost(Dictionary dict);

Returns the maximum cost (number of brackets []) that is placed on any connector in the dictionary. This is useful for designing a parsing algorithm that progresses in stages, first trying the cheap connectors.

6.2 Using Parse Options


Parse_Options  parse_options_create();

Create Parse_Options with the default settings. These include:
    verbosity		= 0;
    linkage_limit	= 10000;
    min_null_count	= 0;
    max_null_count	= 0;
    null_block		= 1;
    islands_ok		= FALSE;
    short_length	= 6;
    all_short		= FALSE;
    display_short	= TRUE;
    display_word_subscripts = TRUE;
    display_link_subscripts = TRUE;
    display_walls	= FALSE;
    display_union	= FALSE;
    allow_null		= TRUE;
    echo_on		= FALSE;
    batch_mode		= FALSE;
    panic_mode		= FALSE;
    screen_width	= 79;
    display_on		= TRUE;
    display_postscript	= FALSE;
    display_bad		= FALSE;
    display_links	= FALSE;

int parse_options_delete(Parse_Options opts);
Frees up the memory used by this data structure.

void parse_options_set_verbosity(Parse_Options opts, int verbosity);
int  parse_options_get_verbosity(Parse_Options opts);
This sets/gets the level of description printed to stderr/stdout about the parsing process. A verbosity level of 2 generates stuff like this:
linkparser> !verbosity=2
verbosity set to 2
linkparser> Logorrhea, or excessive and often incoherent talkativeness or wordiness, is a social disease.
++++Finished expression pruning                   0.02 seconds
++++Built disjuncts                               0.18 seconds
++++Eliminated duplicate disjuncts                0.02 seconds
++++power pruned (gentle)                         0.17 seconds
++++pp pruning                                    0.09 seconds
++++power pruned (gentle)                         0.01 seconds
++++pp pruning                                    0.04 seconds
807 Match cost
++++Done conjunction pruning                      0.06 seconds
++++Constructed fat disjuncts                     0.06 seconds
++++Pruned fat disjuncts                          0.03 seconds
++++Eliminated duplicate disjuncts (again)        0.00 seconds
++++power pruned (ruthless)                       0.02 seconds
++++Initialized fast matcher and hash table       0.00 seconds
Total count with 0 null links:   0
++++Counted parses                                0.01 seconds
89 Match cost
++++Finished parse                                0.02 seconds
No linkages without null links
++++Finished expression pruning                   0.01 seconds
++++Built disjuncts                               0.17 seconds
++++Eliminated duplicate disjuncts                0.02 seconds
++++power pruned (gentle)                         0.17 seconds
++++pp pruning                                    0.07 seconds
++++power pruned (gentle)                         0.01 seconds
++++pp pruning                                    0.04 seconds
1283 Match cost
++++Done conjunction pruning                      0.09 seconds
++++Constructed fat disjuncts                     0.16 seconds
++++Pruned fat disjuncts                          0.11 seconds
++++Eliminated duplicate disjuncts (again)        0.00 seconds
++++power pruned (ruthless)                       0.04 seconds
++++Initialized fast matcher and hash table       0.00 seconds
Total count with 1 null links:   18
++++Counted parses                                0.08 seconds
++++Began post-processing linkages                0.03 seconds
++++Postprocessed all linkages                    0.11 seconds
6 of 14 linkages with no P.P. violations
++++Sorted all linkages                           0.01 seconds
2604 Match cost
++++Finished parse                                0.02 seconds
++++Time                                          1.87 seconds (8.90 total)
Found 18 linkages (6 with no P.P. violations) at null count 1

void parse_options_set_linkage_limit(Parse_Options opts, int linkage_limit);
int  parse_options_get_linkage_limit(Parse_Options opts);
This parameter determines the maximum number of linkages that are considered in post-processing. If more than linkage_limit linkages found, then a random sample of linkage_limit is chosen for post-processing. When this happen a warning is displayed at verbosity levels bigger than 1.

void parse_options_set_disjunct_cost(Parse_Options opts, int disjunct_cost);
int  parse_options_get_disjunct_cost(Parse_Options opts);
Determines the maximum disjunct cost used during parsing, where the cost of a disjunct is equal to the maximum cost of all of its connectors. The default is that all disjuncts, no matter what their cost, are considered.

void parse_options_set_min_null_count(Parse_Options opts, int null_count);
int  parse_options_get_min_null_count(Parse_Options opts);
void parse_options_set_max_null_count(Parse_Options opts, int null_count);
int  parse_options_get_max_null_count(Parse_Options opts);
These determine the minimum and maximum number of null links that a parse might have. A call to sentence_parse will find all linkages having the minimum number of null links within the range specified by this parameter in the Parse_Options.

void parse_options_set_null_block(Parse_Options opts, int null_block);
int  parse_options_get_null_block(Parse_Options opts);
This allows null links to be counted in "bunches." For example, if null_block is 4, then a linkage with 1,2,3 or 4 null links has a null cost of 1, a linkage with 5,6,7 or 8 null links has a null cost of 2, etc. (This is only in effect if islands are not allowed; see below.)

void parse_options_set_short_length(Parse_Options opts, int short_length);
int  parse_options_get_short_length(Parse_Options opts);
The short_length parameter determines how long the links are allowed to be. The intended use of this is to speed up parsing by not considering very long links for most connectors, since they are very rarely used in a correct parse. An entry for UNLIMITED-CONNECTORS in the dictionary will specify which connectors are exempt from the length limit.

void parse_options_set_islands_ok(Parse_Options opts, int islands_ok);
int  parse_options_get_islands_ok(Parse_Options opts);
This option determines whether or not "islands" of links are allowed. For example, the following linkage has an island:
+------Wd-----+ | +--Dsu--+---Ss--+-Paf-+ +--Dsu--+---Ss--+--Pa-+ | | | | | | | | | ///// this sentence.n is.v false.a this sentence.n is.v true.a

void parse_options_set_max_parse_time(Parse_Options  opts, int secs);
int  parse_options_get_max_parse_time(Parse_Options opts);
Determines the approximate maximum time that parsing is allowed to take. The way it works is that after this time has expired, the parsing process is artificially forced to complete quickly by pretending that no further solutions (entries in the hash table) can be constructed. The actual parsing time might be slightly longer.

void parse_options_set_max_memory(Parse_Options  opts, int mem);
int  parse_options_get_max_memory(Parse_Options opts);
Determines the maximum memory allowed during parsing. This is used just as max_parse_time is, so that the parsing process is terminated as quickly as possible after the total memory (including that allocated to all dictionaries, etc.) exceeds the maximum allowed.

int  parse_options_timer_expired(Parse_Options opts);
int  parse_options_memory_exhausted(Parse_Options opts);
int  parse_options_resources_exhausted(Parse_Options opts);
void parse_options_reset_resources(Parse_Options opts);
These functions tell whether the timer and memory constraints were exceeded during parsing. parse_options_resources_exhausted means parse_options_memory_exhausted OR parse_options_timer_expired.

void parse_options_set_cost_model_type(Parse_Options opts, int cm);
int  parse_options_get_cost_model_type(Parse_Options opts);
The cost model type for ranking linkages, which is an index into an array of function pointers. The current code only has a single entry, but others could easily be added.

void parse_options_set_screen_width(Parse_Options opts, int val);
int  parse_options_get_screen_width(Parse_Options opts);
The width of the screen (in characters) for displaying linkages.

void parse_options_set_allow_null(Parse_Options opts, int val);
int  parse_options_get_allow_null(Parse_Options opts);
Whether or not to allow linkages to have null links.

void parse_options_set_display_walls(Parse_Options opts, int val);
int  parse_options_get_display_walls(Parse_Options opts);
Whether or not to show the wall word(s) when a linkage diagram is printed.

void parse_options_set_all_short_connectors(Parse_Options opts, int val);
int  parse_options_get_all_short_connectors(Parse_Options opts);
If true, then all connectors have length restrictions imposed on them -- they can be no farther than short_length apart. This is used when parsing in "panic" mode, for example.

6.3 Processing Sentences


Sentence sentence_create(char *input_string, Dictionary dict);

This routine tokenizes the input string using the Dictionary passed as an argument. The sentence expressions are also constructed. If there is an error, NULL is returned, and an appropriate error number and message is set.

void sentence_delete(Sentence sent);

Frees up all of the storage associated with the sentence.

int sentence_parse(Sentence sent, Parse_Options opts);

This routine represents the heart of the program. There are several things that are done when a sentence is parsed:
1. Word expressions are extracted from the dictionary and pruned.
2. Disjuncts are built.
3. A series of pruning operations is carried out.
4. The linkages having the minimal number of null links are counted.
5. A "parse set" of linkages is built.
6. The linkages are post-processed.
The "parse set" is attached to the sentence, and this is one of the key reasons that the API is flexible and modular. All of the necessary information for building linkages is stored in the parse set. This means that other sentences can be parsed, possibly using different dictionaries and other parameters, without disturbing the information obtained from a call to sentence_parse. If another call to sentence_parse is made on the same sentence, the parsing information for the previous call is deleted.

int sentence_length(Sentence sent);

Returns the number of words in the tokenized sentence, including the boundary words and punctuation, for example.

char * sentence_get_word(Sentence sent, int w);

Returns the spelling of the w-th word in the sentence as it appears after tokenization.

int sentence_null_count(Sentence sent);

Returns the number of null links that were used in parsing the sentence.

int sentence_num_linkages_found(Sentence sent);
int sentence_num_valid_linkages(Sentence sent);
int sentence_num_linkages_post_processed(Sentence sent);

These return the number of linkages that the search found, the number that had no post-processing violations, and the number of linkages that were actually post-processed (which may be less than the number found because of the linkage_limit parameter.

int sentence_num_violations(Sentence sent, int i);

The number of post-processing violations that the i-th linkage had during the last call to sentence_parse.

int sentence_disjunct_cost(Sentence sent, int i);

The maximum cost of connectors used in the i-th linkage of the sentence.

6.4 Manipulating Linkages


Linkage  linkage_create(int index, Sentence sent, Parse_Options opts);
This function creates the index-th linkage from the (parsed) sentence sent. Several operations can be carried out on the resulting linkage; for example it can be printed, post-processed with a different post-processor, or information on individual links can be extracted. If the parse has a conjunction, then the linkage will be made up of two or more sublinkages.

One implementation detail that may be helpful to users of the API is that internally, the API uses different memory bookkeeping for linkages and several other objects created by the user. The idea is that there is internal workspace used by the parser for carrying out its search, and external workspace for linkages, diagrams, and other objects created by the user that may persist after the parsing process has been completed. (This can be helpful for determining memory leaks in an implementation of the parser.) As shown in the example in Section 3, when the user is finished with these objects, their memory should be freed up with calls to the appropriate functions, such as linkage_delete.


int linkage_get_num_sublinkages(Linkage linkage);
Returns the number of sublinkages for a linkage with conjunctions, 1 otherwise.

int linkage_set_current_sublinkage(Linkage linkage, int index);
After this call, all operations on the linkage will refer to the index-th sublinkage (in the case of a linkage without conjunctions, this has no effect). For example, in the linkage
+-----------------Ss----------------+ | +---Js---+ | +-Ds-+--Mp-+ +-Ds-+ | (*) | | | | | | the dog.n with the man.n and the bone.n ran.v +-----------------Ss----------------+ | +-----------Js----------+ | +-Ds-+--Mp-+ +--Ds-+ | (**) | | | | | | the dog.n with the man.n and the bone.n ran.v
the second diagram is obtained by first making a call to linkage_set_current_sublinkage(linkage, 1) and then linkage_get_diagram(linkage).

int linkage_compute_union(Linkage linkage);
If the linkage has a conjunction, what this does is to combine all of the links occurring in all sublinkages together -- in effect creating a "master" linkage (which may have crossing links). If the linkage has no conjunctions, computing its union has no effect.

For example, the union of the two sublinkages

+------------------Ss-----------------+ +----Wd---+ +----Js---+ | | +-Ds-+--Mp-+ +--Ds-+ | | | | | | | | ///// the dog.n with the bone.n and the cat.n played.v +-------------------Wd------------------+ | +-Ds-+---Ss--+ | | | | ///// the dog.n with the bone.n and the cat.n played.v
is this:
+-------------------Wd------------------+ | +------------------Ss-----------------+ +----Wd---+ +----Js---+ | | | +-Ds-+--Mp-+ +--Ds-+ +-Ds-+---Ss--+ | | | | | | | | | ///// the dog.n with the bone.n and the cat.n played.v
The union is created as another sublinkage, thus increasing the number of sublinkages by one. To access this linkage, a call to
 linkage_set_current_sublinkage(linkage, linkage_get_num_sublinkages(linkage)-1)
is made.

int linkage_get_num_words(Linkage linkage);
The number of words in the sentence for which this is a linkage. Note that this function does not return the number of words used in the current sublinkage.

int  linkage_get_num_links(Linkage linkage);
The number of links used in the current sublinkage.

int linkage_get_link_length(Linkage linkage, int index);
int linkage_get_link_lword(Linkage linkage, int index);
int linkage_get_link_rword(Linkage linkage, int index);
The value returned by linkage_get_link_length is the number of words spanned by the index-th link of the current sublinkage. For example, in the (**) sublinkage above, the length of the Js link is 2, the length of the Mp link is 1, and the length of the Ss link is 4. The value returned by the lword function is the number of the word on the left end of the index-th link of the current sublinkage. For example, the lword of the fifth (Ds) link of the sublinkage (**) above is equal to 7. There is no canonical ordering of the links that is guaranteed. The rword of this link is equal to 8.

char * linkage_print_diagram(Linkage linkage);
char * linkage_print_postscript(Linkage linkage, int mode);
char * linkage_print_links_and_domains(Linkage linkage);
These functions pretty print the linkage in various ways. The strings returned are allocated using the external parser memory, and should be freed by the user with a call to string_delete. The linkage_print_diagram function returns a pointer to a string containing the familiar graphical linkage display. The linkage_print_postscript function returns the macros needed to print out the linkage in a postscript file. For example, the linkage diagram
+-----CC-----+ +Sp*+ +--Xd-+--Wd--+-Sp*i+ | | | | | | I.p eat , therefore I.p think.v
has the following postscript output:
[(/////)(I.p)(eat)(,)(therefore)(I.p)(think.v)] [[0 1 0 (Wd)][1 4 1 (CC)][1 2 0 (Sp*i)][3 4 0 (Xd)][4 5 0 (Wd)][5 6 0 (Sp*i)]] [0]
which can be used to generate a postscript figure that looks something like this:
Postscript Linkage Display
With mode=0, the output is just the set of postscript macros shown above. With mode=1 a complete encapsulated postscript document is printed.

The linkage_print_links_and_domains function returns a string that lists all of the links and domain names for the current sublinkage. For example, the output for the linkage above would look like this:

///// RW <---RW----> RW ///// (m) ///// Wd <---Wd----> Wd I.p (m) I.p CC <---CC----> CC therefore (m) I.p Sp*i <---Sp*i--> Sp eat (m) , Xd <---Xd----> Xd therefore (m) (m) therefore Wd <---Wd----> Wd I.p (m) (m) I.p Sp*i <---Sp*i--> Sp think.v

char * linkage_get_link_label(Linkage linkage, int index);
char * linkage_get_link_llabel(Linkage linkage, int index);
char * linkage_get_link_rlabel(Linkage linkage, int index);
The label on a link in a diagram is constructed by taking the "intersection" of the left and right connectors that comprise the link. For example, in the list of links shown above, the Sp*i label on the link between the words I.p and eat is constructed from the Sp*i connector on the its left word, and the Sp connector on its right word. So, for this example, both linkage_get_link_label and linkage_get_link_llabel return "Sp*i" while linkage_get_link_rlabel returns "Sp" for this link.

int     linkage_get_link_num_domains(Linkage linkage, int index);
char ** linkage_get_link_domain_names(Linkage linkage, int index);
char *  linkage_get_violation_name(Linkage linkage);
These functions allow access to most of the domain structure extracted during post-processing. The index parameter in the first two calls specify which link in the current sublinkage to extract the information for. In the "I eat therefore I think" example above, the link between the words therefore and I.p belongs to two "m" domains. If the linkage violated any post-processing rules, the name of the violated rule in the post-process knowledge file can be determined by a call to linkage_get_violation_name.

char ** linkage_get_words(Linkage linkage);
char *  linkage_get_word(Linkage linkage, int w);
These return the array of word spellings or individual word spelling for the current sublinkage. These are the "inflected" spellings, such as "dog.n". The original spellings can be obtained by calls to sentence_get_word(Sentence sent, int wordnum).

int linkage_unused_word_cost(Linkage linkage);
int linkage_disjunct_cost(Linkage linkage);
int linkage_and_cost(Linkage linkage);
int linkage_link_cost(Linkage linkage);
These functions return the various cost parameters of a linkages, used for sorting them in post-processing. For the actual meanings of these numbers, refer to the dictionary documentation and source code.

void linkage_delete(Linkage linkage);
Frees up all of the storage used for the linkage.

6.5 Independent Post-Processing



PostProcessor   post_process_open(char * name);
void            post_process_close(PostProcessor postprocessor);
post_process_close opens and parses the input pp knowledge file, and post_process_close frees it.

void linkage_post_process(Linkage linkage, PostProcessor postprocessor);
This allows an arbitrary PostProcessor to be applied to an individual linkage, even though that linkage may have been previously post-processed, in which case the earlier information is first freed.

6.6 Extracting Constituent Structure

The following simple tree data structure is used to represent a constituent parse tree:

typedef struct CNode_s CNode;
struct CNode_s {
  char  * label;
  CNode * child;
  CNode * next;
  int   start, end;
};
CNode is a standard C-style tree data structure. The children of a node are stored as a linked list, with the end of the list indicated by next==NULL. The start and end fields of a node indicate the span of the constituent, with the first word indexed by 0. Leaves are defined by the condition child==NULL. There are three basic functions to work with the constituent structure:

CNode * linkage_constituent_tree(Linkage linkage);
void    linkage_free_constituent_tree(CNode * n);
char *  linkage_print_constituent_tree(Linkage linkage, int mode);
The function linkage_constituent_tree returns a pointer to a tree; after using it the space should be freed-up with a call to linkage_free_constituent_tree.

In the function linkage_print_constituent_tree, the parameter mode=1 specifies that the tree is displayed using the nested Lisp format, and mode=2 specifies that a flat tree is displayed using brackets. When mode=0, no constituent structure is generated and a null string is returned.

The string returned by a call to linkage_print_constituent_tree is allocated using the external parser memory, and should be freed by the user with a call to string_delete.


John Lafferty
Last modified: Fri Oct 31 09:47:55 EST 2003