Return to search

Some PL/1 subroutines for natural language analysis

The purpose of this dissertation was to write and make available a small set of PL/1 computer subroutines that can be used in other computer programs attempting to do any kind of analysis of natural language data. The subroutines present in the dissertation are for some of the housekeeping, that is the jobs that must be done before analysis can begin.Four subroutines were written and tested: a subroutine called FINDONE (find one) that isolateswords in an input string of characters, and three subroutines, called the LAGADOs, that find words or word parts on lists of words or word parts. The reliability of the subroutines was tested in small testing programs and in a larger lexical diversity program that was modified to use the subroutines.FINDONE finds graphemic words and punctuation marks in an input character string. In addition, it truncates the input string from the left so that repeated calls of the subroutine finds the words in the input string in sequence. FINDONE takes as parameters the name of the input string and a name to be associated with the word found.The three LAGADO functions search for words on lists of words. Each of the functions is designed to search a list of a certain structure. LAGADO1 searches an alphabetized list where to length of the list is known. It uses the economical binary search technique. LAGADO1 takes as parameters the name of the word searched for, the name of the list to be searched and the length of the list to be searched.LAGADO2 searches a list in any order that is alphabetically indexed by an indexing array. LAGADO2 takes as parameters the name of the word being searched for, the name of the list being searched, the name of the indexing array, and the length of the list being searched.LAGAD03 searches any list that has an end-of-list symbol. LAGADO3 uses a linear search technique and looks at each element of the list being searched in order until it either finds the word being searched for or the final boundary symbols. LAGADO3 takes as parameters the name of the word searced for, the name of the list being searched, and the name of the end-of-list symbol.Each of the LAGADO functions returns a positive value equal to the subscript of the list element that matches the input word if the input word is matched, or a negative number whose absolute value is the subscript of the location of the cell where the input word would have to be inserted into the list if the input word is not matched.Two of the subroutines, FINDONE and LAGADO2, were tested by being incorporated into SUPRFRQ, a lexical diversity program developed from an earlier program written by Robert Wachal. An Appendix includes the documented texts of he subroutines and of the lexical diversity program. In addition, the appendix includes the result of a run of SUPRFQ on for short dialect texts collected, by Charles Houck in Leeds, England.

Identiferoai:union.ndltd.org:BSU/oai:cardinalscholar.bsu.edu:handle/176165
Date January 1973
CreatorsFink, John William
ContributorsHouck, Charles L.
Source SetsBall State University
Detected LanguageEnglish
Formatv, 167 leaves : ill. ; 28 cm.
SourceVirtual Press

Page generated in 0.0017 seconds