Global ETD Search

Return to search

Language identification by statistical analysis.

Approved for public release; distribution is unlimited. / An analysis was conducted of English and Spanish text. The statistical analysis determined the independent probability of letters and the joint probability of various letter combinations for large samples of each language. Various methods were tested in an attempt to utilize these characteristics to identify the language of a short sample text. By use of the joint probability of various vowel-consonant relationships and the Kolmogorov-Smirnov Goodness of Fit Test an identification system was defined that provided a significance level of .0077 for a sample of 107 letters (approximately 21 words). Investigation also showed that the space rate or the interword structure in each language contains a measure of intelligence and was useful in identification

http://hdl.handle.net/10945/17103

Identifer	oai:union.ndltd.org:nps.edu/oai:calhoun.nps.edu:10945/17103
Date	09 1900
Creators	Rau, Morton David
Contributors	Weitzman, R.A., Power, V.M.
Publisher	Monterey, California. Naval Postgraduate School
Source Sets	Naval Postgraduate School
Language	en_US
Detected Language	English
Type	Thesis
Rights	This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.

Page generated in 0.002 seconds

Language identification by statistical analysis.

Description

Links & Downloads

Additional Fields