Spelling suggestions: "subject:"linguistics - data processing"" "subject:"linguistics - mata processing""
1 |
Logic for natural language analysisPereira, Fernando Carlos Neves January 1982 (has links)
This work investigates the use of formal logic as a practical tool for describing the syntax and semantics of a subset of English, and building a computer program to answer data base queries expressed in that subset. To achieve an intimate connection between logical descriptions and computer programs, all the descriptions given are in the definite clause subset of the predicate calculus, which is the basis of the programming language Prolog. The logical descriptions run directly as efficient Prolog programs. Three aspects of the use of logic in natural language analysis are covered: formal representation of syntactic rules by means of a grammar formalism based on logic, extraposition grammars;. formal semantics for the chosen English subset, appropriate for data base queries; informal semantic and pragmatic rules to translate analysed sentences into their formal semantics. On these three aspects, the work improves and extends earlier work by Colmerauer and others, where the use of computational logic in language analysis was first introduced.
|
2 |
Analise sintatica para tratamento de elipse em orações coordenadas / Syntactic analysis for ellipsis handling in coordinated claused.Maduro, Ralph Moreira 29 June 2005 (has links)
Orientador: Ariadne Maria Brito Rizzoni Carvalho / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-07T05:36:27Z (GMT). No. of bitstreams: 1
Maduro_RalphMoreira_M.pdf: 3052079 bytes, checksum: 7bcf8b7d5af90727147cfcd484598cbb (MD5)
Previous issue date: 2005 / Resumo: Esta dissertação tem por objetivo investigar o fenômeno lingüístico da elipse. Nós acreditamos que alguns tipos de elipse podem ser resolvidos com conhecimento sintático, visto que estão sujeitos a esse tipo de restrição. Nós tratamos cinco tipos de elipse encontrados na língua portuguesa: despojamento, elipse do sintagma verbal, elipse lacunar, escoamento e anáfora de complemento nulo. Usamos as Restrições de Ilha para decidir sobre a gramaticalidade da oração. Finalmente, desenvolvemos e implementamos um sistema baseado em sintaxe, que recupera o constituinte elidido e reconstrói a cláusula elíptica, quando permitido pelas restrições sintáticas. Os dados obtidos com este trabalho são relativos ao português, mas nós acreditamos que possam ser aplicados para outras línguas, como por exemplo inglês e espanhol / Abstract: This work is intended as an investigation into elliptical phenomena in natural language. We believe that some types of ellipsis can be resolved at the syntactic leveI since they are subject to syntactic constraints. We have dealt with five of the major types of ellipsis I faund in Portuguese, namely: Null VP, Gapping, Stripping, Sluicing and Null Complement Anaphora. We have used Island Constraints in order to decide on the grammaticality at the sentence. Finally, we have developed and implemented a syntactically-based algorithm that recovers the elided constituents and reconstructs the elliptical clause, when applicable. The linguistic data in this work is drawn primarily from Portuguese, but we believe that the results can also be applied to other languages, such as English / Mestrado / Mestre em Ciência da Computação
|
3 |
A corpus driven computational intelligence framework for deception detection in financial textMinhas, Saliha Z. January 2016 (has links)
Financial fraud rampages onwards seemingly uncontained. The annual cost of fraud in the UK is estimated to be as high as £193bn a year [1] . From a data science perspective and hitherto less explored this thesis demonstrates how the use of linguistic features to drive data mining algorithms can aid in unravelling fraud. To this end, the spotlight is turned on Financial Statement Fraud (FSF), known to be the costliest type of fraud [2]. A new corpus of 6.3 million words is composed of102 annual reports/10-K (narrative sections) from firms formally indicted for FSF juxtaposed with 306 non-fraud firms of similar size and industrial grouping. Differently from other similar studies, this thesis uniquely takes a wide angled view and extracts a range of features of different categories from the corpus. These linguistic correlates of deception are uncovered using a variety of techniques and tools. Corpus linguistics methodology is applied to extract keywords and to examine linguistic structure. N-grams are extracted to draw out collocations. Readability measurement in financial text is advanced through the extraction of new indices that probe the text at a deeper level. Cognitive and perceptual processes are also picked out. Tone, intention and liquidity are gauged using customised word lists. Linguistic ratios are derived from grammatical constructs and word categories. An attempt is also made to determine ‘what’ was said as opposed to ‘how’. Further a new module is developed to condense synonyms into concepts. Lastly frequency counts from keywords unearthed from a previous content analysis study on financial narrative are also used. These features are then used to drive machine learning based classification and clustering algorithms to determine if they aid in discriminating a fraud from a non-fraud firm. The results derived from the battery of models built typically exceed classification accuracy of 70%. The above process is amalgamated into a framework. The process outlined, driven by empirical data demonstrates in a practical way how linguistic analysis could aid in fraud detection and also constitutes a unique contribution made to deception detection studies.
|
Page generated in 0.0967 seconds