Return to search

A Hubterranean View of Syntax: An Analysis of Linguistic Form through Network Theory

Language is part of nature, and as such, certain general principles that generate the form of natural systems, will also create the patterns found within linguistic form. Since network theory is one of the best theoretical frameworks for extracting general principles from diverse systems, this thesis examines how a network perspective can shed light on the characteristics and the learning of syntax. It is demonstrated that two word co-occurrence networks constructed from adult and child speech (BNC World Edition 2001; Sachs 1983; MacWhinney 2000a) exhibit three non-atomic syntactic primitives namely, the truncated power law distributions of frequency, degree and the link length between two nodes (the link representing a precedence relation). Since a power law distribution of link lengths characterises a hubterranean structure (Kasturirangan 1999) i.e. a structure that has a few highly connected nodes and many poorly connected nodes, both the adult and the child word co-occurrence networks exhibit hubterranean structure. This structure is formed by an optimisation process that minimises the link length whilst maximising connectivity (Mathias & Gopal 2001 a&b). The link length in a word co-occurrence network is the storage cost of representing two adjacently co-occurring words and is inversely proportion to the transitional probability (TP) of the word pair. Adjacent words that co-occur often together i.e. have a high TP, exhibit a high cohesion and tend to form chunks. These chunks are a cost effective method of storing representations. Thus, on this view, the (multi-) power law of link lengths represents the distribution of storage costs or cohesions within adjacent words. Such cohesions form groupings of linguistic form known as syntactic constituents. Thus, syntactic constituency is not specific to language and is a property derived from the optimisation of the network. In keeping with other systems generated by a cost constraint on the link length, it is demonstrated that both the child and adult word co-occurrence networks are not hierarchically organised in terms of degree distribution (Ravasz and Barabási 2003:1). Furthermore, both networks are disassortative, and in line with other disassortative networks, there is a correlation between degree and betweenness centrality (BC) values (Goh, Kahng and Kim 2003). In agreement with scale free networks (Goh, Oh, Jeong, Kahng and Kim 2002), the BC values in both networks follow a power law distribution. In this thesis, a motif analysis of the two word co-occurrence networks is a richly detailed (non-functional) distributional analysis and reveals that the adult and child significance profiles for triad subgraphs correlate closely. Furthermore, the most significant 4-node motifs in the adult network are also the most significant in the child network. Utilising this non-functional distributional analysis in a word co-occurrence network, it is argued that the notion of a general syntactic category is not evidenced and as such is inadmissible. Thus, non-general or construction-specific categories are preferred (in line with Croft 2001). Function words tend to be the hub words of the network (see Ferrer i Cancho and Solé 2001a), being defined and therefore identified by their high type and token frequency. These properties are useful for identifying syntactic categories since function words are traditionally associated with particular syntactic categories (see Cann 2000). Consequently, a function word and thus a syntactic category may be identified by the interception of the frequency and degree power laws with their truncated tails. As a given syntactic category captures the type of words that may co-occur with the function word, the category then encourages consistency within the functional patterns in the network and re-enforces the network’s (near-) optimised state. Syntax then, on this view, is both a navigator, manoeuvring through the ever varying sea of linguistic form and a guide, forging an uncharted course through novel expression. There is also evidence suggesting that the hubterranean structure is not only found in the word co-occurrence network, but within other theoretical syntactic levels. Factors affecting the choice of a verb that is generalised early relate to the formation and the characteristics of hubs. In that, the property of a high (token) frequency in combination with either a high degree (type frequency) or a low storage cost, point to certain verbs within the network and these highly ‘visible’ verbs tend to be generalised early (in line with Boyd and Goldberg forthcoming). Furthermore, the optimisation process that creates hubterranean structure is implicated in the verb-construction subpart network of the adult’s linguistic knowledge, the mapping of the constructions’ form-to-meaning pairings, the construction inventory size as well as certain strategies aiding first language learning and adult artificial language learning.

Identiferoai:union.ndltd.org:ADTP/279418
CreatorsJulie Louise Steele
Source SetsAustraliasian Digital Theses Program
Detected LanguageEnglish

Page generated in 0.0033 seconds