1 |
Linguistically Motivated Features for CCG Realization RankingRajkumar, Rajakrishnan P. 19 July 2012 (has links)
No description available.
|
2 |
Harmonic analysis of music using combinatory categorial grammarGranroth-Wilding, Mark Thomas January 2013 (has links)
Various patterns of the organization of Western tonal music exhibit hierarchical structure, among them the harmonic progressions underlying melodies and the metre underlying rhythmic patterns. Recognizing these structures is an important part of unconscious human cognitive processing of music. Since the prosody and syntax of natural languages are commonly analysed with similar hierarchical structures, it is reasonable to expect that the techniques used to identify these structures automatically in natural language might also be applied to the automatic interpretation of music. In natural language processing (NLP), analysing the syntactic structure of a sentence is prerequisite to semantic interpretation. The analysis is made difficult by the high degree of ambiguity in even moderately long sentences. In music, a similar sort of structural analysis, with a similar degree of ambiguity, is fundamental to tasks such as key identification and score transcription. These and other tasks depend on harmonic and rhythmic analyses. There is a long history of applying linguistic analysis techniques to musical analysis. In recent years, statistical modelling, in particular in the form of probabilistic models, has become ubiquitous in NLP for large-scale practical analysis of language. The focus of the present work is the application of statistical parsing to automatic harmonic analysis of music. This thesis demonstrates that statistical parsing techniques, adapted from NLP with little modification, can be successfully applied to recovering the harmonic structure underlying music. It shows first how a type of formal grammar based on one used for linguistic syntactic processing, Combinatory Categorial Grammar (CCG), can be used to analyse the hierarchical structure of chord sequences. I introduce a formal language similar to first-order predicate logical to express the hierarchical tonal harmonic relationships between chords. The syntactic grammar formalism then serves as a mechanism to map an unstructured chord sequence onto its structured analysis. In NLP, the high degree of ambiguity of the analysis means that a parser must consider a huge number of possible structures. Chart parsing provides an efficient mechanism to explore them. Statistical models allow the parser to use information about structures seen before in a training corpus to eliminate improbable interpretations early on in the process and to rank the final analyses by plausibility. To apply the same techniques to harmonic analysis of chord sequences, a corpus of tonal jazz chord sequences annotated by hand with harmonic analyses is constructed. Two statistical parsing techniques are adapted to the present task and evaluated on their success at recovering the annotated structures. The experiments show that parsing using a statistical model of syntactic derivations is more successful than a Markovian baseline model at recovering harmonic structure. In addition, the practical technique of statistical supertagging serves to speed up parsing without any loss in accuracy. This approach to recovering harmonic structure can be extended to the analysis of performance data symbolically represented as notes. Experiments using some simple proof-of-concept extensions of the above parsing models demonstrate one probabilistic approach to this. The results reported provide a baseline for future work on the task of harmonic analysis of performances.
|
3 |
Transition-based combinatory categorial grammar parsing for English and HindiAmbati, Bharat Ram January 2016 (has links)
Given a natural language sentence, parsing is the task of assigning it a grammatical structure, according to the rules within a particular grammar formalism. Different grammar formalisms like Dependency Grammar, Phrase Structure Grammar, Combinatory Categorial Grammar, Tree Adjoining Grammar are explored in the literature for parsing. For example, given a sentence like “John ate an apple”, parsers based on the widely used dependency grammars find grammatical relations, such as that ‘John’ is the subject and ‘apple’ is the object of the action ‘ate’. We mainly focus on Combinatory Categorial Grammar (CCG) in this thesis. In this thesis, we present an incremental algorithm for parsing CCG for two diverse languages: English and Hindi. English is a fixed word order, SVO (Subject-Verb- Object), and morphologically simple language, whereas, Hindi, though predominantly a SOV (Subject-Object-Verb) language, is a free word order and morphologically rich language. Developing an incremental parser for Hindi is really challenging since the predicate needed to resolve dependencies comes at the end. As previously available shift-reduce CCG parsers use English CCGbank derivations which are mostly right branching and non-incremental, we design our algorithm based on the dependencies resolved rather than the derivation. Our novel algorithm builds a dependency graph in parallel to the CCG derivation which is used for revealing the unbuilt structure without backtracking. Though we use dependencies for meaning representation and CCG for parsing, our revealing technique can be applied to other meaning representations like lambda expressions and for non-CCG parsing like phrase structure parsing. Any statistical parser requires three major modules: data, parsing algorithm and learning algorithm. This thesis is broadly divided into three parts each dealing with one major module of the statistical parser. In Part I, we design a novel algorithm for converting dependency treebank to CCGbank. We create Hindi CCGbank with a decent coverage of 96% using this algorithm. We also do a cross-formalism experiment where we show that CCG supertags can improve widely used dependency parsers. We experiment with two popular dependency parsers (Malt and MST) for two diverse languages: English and Hindi. For both languages, CCG categories improve the overall accuracy of both parsers by around 0.3-0.5% in all experiments. For both parsers, we see larger improvements specifically on dependencies at which they are known to be weak: long distance dependencies for Malt, and verbal arguments for MST. The result is particularly interesting in the case of the fast greedy parser (Malt), since improving its accuracy without significantly compromising speed is relevant for large scale applications such as parsing the web. We present a novel algorithm for incremental transition-based CCG parsing for English and Hindi, in Part II. Incremental parsers have potential advantages for applications like language modeling for machine translation and speech recognition. We introduce two new actions in the shift-reduce paradigm for revealing the required information during parsing. We also analyze the impact of a beam and look-ahead for parsing. In general, using a beam and/or look-ahead gives better results than not using them. We also show that the incremental CCG parser is more useful than a non-incremental version for predicting relative sentence complexity. Given a pair of sentences from wikipedia and simple wikipedia, we build a classifier which predicts if one sentence is simpler/complex than the other. We show that features from a CCG parser in general and incremental CCG parser in particular are more useful than a chart-based phrase structure parser both in terms of speed and accuracy. In Part III, we develop the first neural network based training algorithm for parsing CCG. We also study the impact of neural network based tagging models, and greedy versus beam-search parsing, by using a structured neural network model. In greedy settings, neural network models give significantly better results than the perceptron models and are also over three times faster. Using a narrow beam, structured neural network model gives consistently better results than the basic neural network model. For English, structured neural network gives similar performance to structured perceptron parser. But for Hindi, structured perceptron is still the winner.
|
4 |
A CCG-Based Method for Training a Semantic Role Labeler in the Absence of Explicit Syntactic Training DataBoxwell, Stephen Arthur 19 December 2011 (has links)
No description available.
|
5 |
Information Structure and Discourse ModellingBott, Stefan Markus 04 April 2008 (has links)
This dissertation investigates the interrelation between information structure and discourse structure. Information-structurally backgrounded material is here generally treated as being anaphoric in a very strict sense. It is argued that, apart from having more descriptive content, elements from the sentence background are not different from other types of anaphora: they are subject to the same locality restrictions and they must correspond to the same semantic types. The treatment of the sentence background as a monolithic and atomic unit is refuted. Instead it is argued that sentence backgrounds may be built up from smaller units which are linguistically realise as links and tails (in the sense of Vallduví, 1992). It is shown that links and tails play different roles with respect to the structure of discourse: linguistically realised links have to be bound by a discourse topic, while tails have to be bound by other salient referents within the discourse environment. / Este trabajo investiga la interrelación entre la estructura informativa y la estructura del discurso. El material del trasfondo lingüístico (background) de la oración se trata como una serie de elementos anafóricos en sentido estricto. Aunque tengan más contenido descriptivo, comparten las mismas características con otros tipos de anáfora en términos de restricciones de localidad y del tipo semántico. Se rechaza un tratamiento del background de la oración como una unidad atómica. En este trabajo se argumenta que el background se puede construir a partir de elementos más fundamentales, llamados links y tails (siguiendo Vallduví, 1992). Links y tails juegan un papel muy distinto con respecto a la estructura del discurso: los constituyentes realizados como links tienen que estar ligados por un tópico discursivo, mientras que los constituyentes realizados como tails necesitan estar ligados por otros referentes discursivos que estén salientes en el entorno del discurso.
|
6 |
Wide-coverage parsing for TurkishÇakici, Ruket January 2009 (has links)
Wide-coverage parsing is an area that attracts much attention in natural language processing research. This is due to the fact that it is the first step tomany other applications in natural language understanding, such as question answering. Supervised learning using human-labelled data is currently the best performing method. Therefore, there is great demand for annotated data. However, human annotation is very expensive and always, the amount of annotated data is much less than is needed to train well-performing parsers. This is the motivation behind making the best use of data available. Turkish presents a challenge both because syntactically annotated Turkish data is relatively small and Turkish is highly agglutinative, hence unusually sparse at the whole word level. METU-Sabancı Treebank is a dependency treebank of 5620 sentences with surface dependency relations and morphological analyses for words. We show that including even the crudest forms of morphological information extracted from the data boosts the performance of both generative and discriminative parsers, contrary to received opinion concerning English. We induce word-based and morpheme-based CCG grammars from Turkish dependency treebank. We use these grammars to train a state-of-the-art CCG parser that predicts long-distance dependencies in addition to the ones that other parsers are capable of predicting. We also use the correct CCG categories as simple features in a graph-based dependency parser and show that this improves the parsing results. We show that a morpheme-based CCG lexicon for Turkish is able to solve many problems such as conflicts of semantic scope, recovering long-range dependencies, and obtaining smoother statistics from the models. CCG handles linguistic phenomena i.e. local and long-range dependencies more naturally and effectively than other linguistic theories while potentially supporting semantic interpretation in parallel. Using morphological information and a morpheme-cluster based lexicon improve the performance both quantitatively and qualitatively for Turkish. We also provide an improved version of the treebank which will be released by kind permission of METU and Sabancı.
|
7 |
Dependency based CCG derivation and applicationBrewster, Joshua Blake 21 February 2011 (has links)
This paper presents and evaluates an algorithm to translate a dependency treebank into a Combinatory Categorial Grammar (CCG) lexicon. The dependency relations between a head and a child in a dependency tree are exploited to determine how CCG categories should be derived by making a functional distinction between adjunct and argument relations. Derivations for an English (CoNLL08 shared task treebank) and for an Italian (Turin University Treebank) dependency treebank are performed, each requiring a number of preprocessing steps.
In order to determine the adequacy of the lexicons, dubbed DepEngCCG and DepItCCG, they are compared via two methods to preexisting CCG lexicons derived from similar or equivalent sources (CCGbank and TutCCG). First, a number of metrics are used to compare the state of the lexicon, including category complexity and category growth. Second, to measures the potential applicability of the lexicons in NLP tasks, the derived English CCG lexicon and CCGbank are compared in a sentiment analysis task. While the numeric measurements show promising results for the quality of the lexicons, the sentiment analysis task fails to generate a usable comparison. / text
|
8 |
Mechanism and Kinetics of Catalyzed Chain GrowthPrimpke, Sebastian 17 December 2014 (has links)
No description available.
|
9 |
An Inverse Lambda Calculus Algorithm for Natural Language ProcessingJanuary 2010 (has links)
abstract: Natural Language Processing is a subject that combines computer science and linguistics, aiming to provide computers with the ability to understand natural language and to develop a more intuitive human-computer interaction. The research community has developed ways to translate natural language to mathematical formalisms. It has not yet been shown, however, how to automatically translate different kinds of knowledge in English to distinct formal languages. Most of the recent work presents the problem that the translation method aims to a specific formal language or is hard to generalize. In this research, I take a first step to overcome this difficulty and present two algorithms which take as input two lambda-calculus expressions G and H and compute a lambda-calculus expression F. The expression F returned by the first algorithm satisfies F@G=H and, in the case of the second algorithm, we obtain G@F=H. The lambda expressions represent the meanings of words and sentences. For each formal language that one desires to use with the algorithms, the language must be defined in terms of lambda calculus. Also, some additional concepts must be included. After doing this, given a sentence, its representation and knowing the representation of several words in the sentence, the algorithms can be used to obtain the representation of the other words in that sentence. In this work, I define two languages and show examples of their use with the algorithms. The algorithms are illustrated along with soundness and completeness proofs, the latter with respect to typed lambda-calculus formulas up to the second order. These algorithms are a core part of a natural language semantics system that translates sentences from English to formulas in different formal languages. / Dissertation/Thesis / M.S. Computer Science 2010
|
10 |
Delay, Stop and Queue Estimation for Uniform and Random Traffic Arrivals at Fixed-Time Signalized IntersectionsKang, Youn-Soo 24 April 2000 (has links)
With the introduction of different forms of adaptive and actuated signal control, there is a need for effective evaluation tools that can capture the intricacies of real-life applications. While the current state-of-the-art analytical procedures provide simple approaches for estimating delay, queue length and stops at signalized intersections, they are limited in scope. Alternatively, several microscopic simulation softwares are currently available for the evaluation of signalized intersections. The objective of this dissertation is fourfold. First, it evaluates the consistency, accuracy, limitations and scope of the alternative analytical models. Second, it evaluates the validity of micro simulation results that evolve as an outcome of the car-following relationships. The validity of these models is demonstrated for idealized hypothetical examples where analytical solutions can be derived. Third, the dissertation expands the scope of current analytical models for the evaluation of oversaturated signalized intersections. Finally, the dissertation demonstrates the implications of using analytical models for the evaluation of real-life network and traffic configurations.
This dissertation compared the delay estimates from numerous models for an undersaturated and oversaturated signalized intersection considering uniform and random arrivals in an attempt to systematically evaluate and demonstrate the assumptions and limitations of different delay estimation approaches. Specifically, the dissertation compared a theoretical vertical queuing analysis model, the queue-based models used in the 1994 and 2000 versions of the Highway Capacity Manual, the queue-based model in the 1995 Canadian Capacity Guide for Signalized Intersections, a theoretical horizontal queuing model derived from shock wave analysis, and the delay estimates produced by the INTEGRATION microscopic traffic simulation software. The results of the comparisons for uniform arrivals indicated that all delay models produced identical results under such traffic conditions, except for the estimates produced by the INTEGRATION software, which tended to estimate slightly higher delays than the other approaches. For the random arrivals, the results of the comparisons indicated that the delay estimates obtained by a micro-simulation model like INTEGRATION were consistent with the delay estimates computed by the analytical approaches.
In addition, this dissertation compared the number of stops and the maximum extent of queue estimates using analytical procedures and the INTEGRATION simulation model for both undersaturated and oversaturated signalized intersections to assess their consistency and to analyze their applicability. For the number of stops estimates, it is found that there is a general agreement between the INTEGRATION microscopic simulation model and the analytical models for undersaturated signalized intersections. Both uniform and random arrivals demonstrated consistency between the INTEGRATION model and the analytical procedures; however, at a v/c ratio of 1.0 the analytical models underestimate the number of stops. The research developed an upper limit and a proposed model for estimating the number of vehicle stops for oversaturated conditions. It was demonstrated that the current state-of-the-practice analytical models can provide stop estimates that far exceed the upper bound. On the other hand, the INTEGRATION model was found to be consistent with the upper bound and demonstrated that the number of stops converge to 2.3 as the v/c ratio tends to 2.0. For the maximum extent of queue estimates, the estimated maximum extent of queue predicted from horizontal shock wave analysis was higher than the predictions from vertical deterministic queuing analysis. The horizontal shock wave model predicted lower maximum extent of queue than the CCG 1995 model. For oversaturated conditions, the vertical deterministic queuing model underestimated the maximum queue length. It was found that the CCG 1995 predictions were lower than those from the horizontal shock wave model. These differences were attributed to the fact that the CCG 1995 model estimates the remaining residual queue at the end of evaluation time. A consistency was found between the INTEGRATION model and the horizontal shock wave model predictions with respect to the maximum extent of queue for both undersaturated and oversaturated signalized intersections.
Finally, the dissertation analyzed the impact of mixed traffic condition on the vehicle delay, person delay, and number of vehicle stops at a signalized intersection. The analysis considered approximating the mixed flow for equivalent homogeneous flows using two potential conversion factors. The first of these conversion factors was based on relative vehicle lengths while the second was based on relative vehicle riderships. The main conclusion of the analysis was that the optimum vehicle equivalency was dependent on the background level of congestion, the transit vehicle demand, and the Measure of Effectiveness (MOE) being considered. Consequently, explicit simulation of mixed flow is required in order to capture the unique vehicle interactions that result from mixed flow. Furthermore, while homogeneous flow approximations might be effective for some demand levels, these approximations are not consistently effective. / Ph. D.
|
Page generated in 0.0329 seconds