Compiling the Link Grammar System under DOS S. Candelaria de Ram, Copyright (C) Oct.97 (0) WINHOW LINKGRAMMAR: Here's a way to INSTALL AND RUN Daniel Sleator and Davy Temperley's Link Grammar. You start from its vanilla ANSI C source code and compile it to run in a DOS 6 MS-Windows 3.1 environment. The port was worked out and tested on 486DX running gnumake version 3.75 and gcc version 2.7.2 with 8MB RAM by S. Candelaria de Ram, c. Oct.97 A LINK GRAMMAR REFERENCE CARD is at the end of this file. (1) TOOLS: INSTALL THE C LANGUAGE in the form of freeware Gnu C/C++/ObjC package as ported for Intel x86 available from www.delorie.com . INSTALL the x86-ported Gnu program make . INSTALL THE untgz.exe program to unpack files. INSTALL THE pkunzip.exe program to unpack files. (2) SET UP PLACE: Make a directory to call the link-grammar parser from, such as by mkdir LinkGram mkdir LinkGram\grammar-.2 A subdirectory will be needed to hold the pages of the dictionary the parser calls, such as by mkdir LinkGram\grammar-.2\WORDS (3) GET PARSER CODE: DOWNLOAD THE LINK GRAMMAR C SOURCE CODE and DICTIONARY from ftp.cs.cmu.edu/user/sleator/link-grammar as follows: sourcecode from: system-2.2.tar.gz (saved as lg22srct.gz or the like) dictionary from: system-2.0-for-DOS/LG2_WRDS.ZIP system-2.0-for-DOS/LG2_SRC.ZIP makefile with 8+3 short filenames from: [enclosed] (4) UNPACK PARSER CODE: In LinkGram directory, you get the C code and the dictionary rules out: DOS\LinkGram\> untgz lg22srct.gz This puts things into a subdirectory named grammar-.2 to which you add the dictionary cover file(s): DOS\LinkGram\grammar-.2\> pkunzip LG2_SRC.zip *.DIC In grammar-.2\WORDS subdirectory, you set out all wordlists. (They are named by grammatical type, shortened for DOS to 8+3 shortnames.): DOS\LinkGram\grammar-.2\> mkdir WORDS DOS\LinkGram\grammar-.2\> cd WORDS DOS\LinkGram\grammar-.2\WORDS\> pkunzip ..\LG2_WRDS.ZIP . (5) COMPILE THE C CODE In LinkGram directory, use the shortname edited version of the Makefile to apply the gcc compiler so as to make x86 runnable code. This can be done all at once for all the pieces of sourcecode in turn by using the Gnu program make (it will pick up the file makefile by default, or you can name the shortname one makefile.DOS and tell it make makefile.DOS .) DOS\LinkGram\grammar-.2> make [THERE ARE ERRORS reported and warnings but although they don't seem to prevent the program from compiling or running but they may cause fatal General Protection Fault crashing later, see below. The errors are in PP_LEXER.C (the second), apparently from various USES OF int FROM PP_LEXER.C.HIDE inconsistent in current PP_LEXER.FL and PP_LEXER.C ; refer to the tail section of the Makefile pp_lexer.c that seds a .yy. file and see filecompare of pp_lexer.fl pp_lexer.c pp_lexer.c.hide . UNTIL THE ERRORS ARE FIXED it is probably safest to run the link grammar program from PLAIN DOS, for although it appears to work fine in a Windows DOS-box (a window showing DOS), a very unusual "invalid General Protection Fault" arose that said to shut down [the DOS box, Windows, and] the whole computer and start anew.] (6) RUN THE LINK GRAMMAR The command for interactive use is, where 2.DIC is the dictionary to use: LinkGram\grammar-.2\parse.exe 2.DIC (For some testing, a shortened dictionary may be handy like the 2A.DIC in= the [pre-]/for-DOS LG2_SRC.ZIP package.) The other most critical commands from the parser's prompt are: > !help > !quit To get the parser's prompt, then, you've done DOS...grammar-.2\> parse DICTIONARY and the program says to input your sentences, giving a drawing of its solution each time. If you want to send in a whole batch of input, use batch mode, either with screen output of the diagrams: parse 2.DIC -batch < INPUTFILE The format of a batch file is one "sentence" item per line. (LG2_SRC.ZIP has a couple of sample test inputfiles, REAL.BAT and TEST-2.BAT, i.e., real.batch and test-2.0.batch though as these are not in DOS batch-language some suffix like .TST or .S would be clearer.) Or output can be saved to an output file: parse 2.DIC -batch < INPUTFILE > OUTPUTFILE In fact you can save your interactive screen session to a file too with parse 2.DIC > OUTPUTFILE but the prompt is just a blinking cursor for those inputs -- you end each item with a Return -- and in this mode you can't see the diagrams till you've quit with _ !quit (7) REFERENCE CARD: The sourcecode is the bottom line as far as how the program works. The Manual sets out the notation for making a dictionary (about 20 pages to print). The v. 2.1 guide-to-links file catalogs the word-pair relation categories (link types) developed for English with contrastive examples (almost 100 pages). Sleator's website's tech papers (in Postscript) tell about the grammar. Practical information to work the link grammar program is spilled (by parse.c v.2.2; main.c earlier versions): ----(6a) USAGE help from the plain command starting up the program: DOS\LinkGram\grammar-.2> parse Usage: DRIVE:/LinkGram/grammar-2./parse.exe dictionary-file [-batch] [-wordlimit number] [-linklimit number] [MORE COMPLETELY, in DOS as unix ?? CHECK LOCATION OF OPTIONS -- & CAN !VAR or !VAR=3DVAL be here? be in batch file? as !echo can per= manual Usage: DRIVE:LinkGram/grammar-.2/parse.exe DictionaryCover \ [-wordlimit number] [-linklimit number] \ [-batch < Parsable.TXT] [> OUTPUTFILE] ] ----(6b) SUMMARY is from builtin help command: > !help and your setup options come out from doing: > !variables Here's what the interface and its info actually look like (from an output file): Welcome to the Link Parser -- Version 2.1 (Compiled Oct 3 1997) Copyright (C) 1991-1995 Daniel Sleator and Davy Temperley Type your sentence and press Return ("!help" for options). > Special commands always begin with "!". Command and variable names can be abbreviated. Here is a list of the commands: !quit Exit the system !save Save your changes to the dictionary !variables List user-settable variables and their functions !help List the commands and what they do !! Print all the dictionary words matching . Also print the number of disjuncts of each. !- Delete all the dictionary words matching . !=3D This indicates that a new word () is to be added to the dictionary. Its definition will be the same as that of , and, if appropriate, it will be added to the word file of . ! Toggle the specified boolean variable. !=3D Assign that value to that variable. > Variable Controls Value -------- -------- ----- verbosity Level of detail to give about the computation 0 width The width of your screen 79 limit The maximum number of linkages processed 10000 graphics Graphical link display 1 (On) multiple Expansion of fat (conjunctive) linkages 1 (On) short Reduced height display 1 (On) postscript Generating of postscript data 0 (Off) links Showing of complete link data 0 (Off) bad Showing of linkages failing postprocessing 0 (Off) fat Showing of fat (conjunctive) linkages 0 (Off) lsubscripts Showing of complete link labels 1 (On) wsubscripts Showing of word subscripts 1 (On) walls Always show the walls 0 (Off) null Null link search 1 (On) unknown Using of the "unknown word" definition 1 (On) echo Echoing of input sentence 0 (Off) www Suppression of prompt 0 (Off) justone Displaying of just one linkage 0 (Off) Toggle a boolean variable as in "!links", set a variable as in "!width=3D100= ". > !quit These settings like !echo can be put into your files of batches of sentence type items. The manual gives some more information. Here's part of a parse OUTPUTFILE with DIAGRAMS; I capitalized the sentence, the lexer filtered that out: Processing sentences in batch mode --- stage 2 Input: in the long run , it 's hard to say where word boundaries actually are . = +-------------------------------------------Xp-----------------------------= - +------------Wd------------+ | +----------CO---------+ | +---Jp---+ +--COp-+ +----B---+ = +-------Cs-------+------ | | +--DD-+ +-Xc+ +Ss*+-Paf-+-TOt+-I-+-MVp-+ +----AN---+ | | | | | | | | | | | | | | ///// in the long.a run.v , it 's.v hard.a to say.v where word.n= boundaries.n -------------+ | | -Spx------+ | +--E--+ | | | |=7F actually are . --- No linkage for (stage 2): thanks , buddy . 2 errors. -----------------------------------------------------------------------SC --=====================_875954028==_ Content-Type: text/plain; charset="us-ascii" CC =gcc CFLAGS = -g -O -Wuninitialized -Wall -Wno-implicit -Wno-char-subscripts OBJS = \ analyze-.o\ and.o\ build-di.o\ count.o\ error.o\ extract-.o\ fast-mat.o\ idiom.o\ massage.o\ parse.o\ post-pro.o\ pp_lexer.o\ print.o\ prune.o\ read-dic.o\ utilitie.o\ word-fil.o HEADERS = \ analyze-.h\ count.h\ error.h\ extract-.h\ fast-mat.h\ idiom.h\ massage.h\ prune.h\ post-pro.h\ print.h\ read-dic.h\ utilitie.h\ word-fil.h\ and.h\ build-di.h\ header.h parse: $(OBJS) $(CC) $(CLFLAGS) $(OBJS) -o parse $(OBJS): $(HEADERS) clean: /bin/rm *.o *~ echo "Project cleaned." pp_lexer.c: pp_lexer.fl lex pp_lexer.fl mv lex.yy.c pp_lexer.tmp.c cat pp_lexer.tmp.c | sed "s/yy/pp_lexer_/g" > pp_lexer.c rm -f pp_lexer.tmp.c --=====================_875954028==_--