Global ETD Search

161	Resolução de anafora pronominal em portugues utilizando o algoritmo de Hobbs / Hobbs' algorithm for pronomin resolution in portuguese Santos, Denis Neves de Arruda 20 June 2008 (has links) Orientador: Ariadne Maria Brito Rizzoni Carvalho / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-11T18:00:17Z (GMT). No. of bitstreams: 1 Santos_DenisNevesdeArruda_M.pdf: 1378385 bytes, checksum: 10cb49b058677a79380f46221351fb8a (MD5) Previous issue date: 2008 / Resumo: Anáfora é uma referência abreviada a uma entidade, esperando que o receptor do discurso possa compreender a referência. A automatização da resolução de anáforas pode melhorar o desempenho de vários sistemas de processamento de língua natural, como tradutores, geradores e sumarizadores. A dificuldade no processo de resolução acontece nos casos em que existe mais de um referente possível. Pesquisas sobre a resolução de anáforas na língua portuguesa ainda são escassas, quando comparadas com as pesquisas para outras línguas, como por exemplo, o inglês. Este trabalho descreve uma adaptação para o português do algoritmo sintático proposto por Hobbs para resolução de anáfora pronominal. A avaliação foi feita comparando os resultados com os obtidos por outro algoritmo sintático para resolução de pronomes, o algoritmo de Lappin e Leass. Os mesmos corpora foram utilizados e uma melhora significativa foi obtida com o algoritmo de Hobbs. / Abstract: Anaphora is an abreviated reference to an entity expecting the receiver of the discourse can understand the reference. Automatic pronoun resolution may improve the performance of natural language systems, such as translators, generators and summarizers. Difficulties may arise when there is more than one potential candidate for a referent. There has been little research on pronoun resolution for Portuguese, if compared to other languages, such as English. This paper describes a variant of Hobbs' syntactic algorithm for pronoun resolution in Portuguese. The system was evaluated comparing the results with the ones obtained with another syntactic algorithm for pronoun resolution handling, the Lappin and Leass' algorithm. The same Portuguese corpora were used and significant improvement was verified with Hobbs' algorithm. / Mestrado / Processamento de Linguas Naturais / Mestre em Ciência da Computação Algoritmos de computador Inteligência artificial Artificial intelligence Computer algorithms
162	Natural Language Interfaces to Databases Chandra, Yohan 12 1900 (has links) Natural language interfaces to databases (NLIDB) are systems that aim to bridge the gap between the languages used by humans and computers, and automatically translate natural language sentences to database queries. This thesis proposes a novel approach to NLIDB, using graph-based models. The system starts by collecting as much information as possible from existing databases and sentences, and transforms this information into a knowledge base for the system. Given a new question, the system will use this knowledge to analyze and translate the sentence into its corresponding database query statement. The graph-based NLIDB system uses English as the natural language, a relational database model, and SQL as the formal query language. In experiments performed with natural language questions ran against a large database containing information about U.S. geography, the system showed good performance compared to the state-of-the-art in the field. User interfaces (Computer systems) Database searching. natural language database natural language interface information extraction
163	Natural language processing for researchh philosophies and paradigms dissertation (DFIT91) Mawila, Ntombhimuni 28 February 2021 (has links) Research philosophies and paradigms (RPPs) reveal researchers’ assumptions and provide a systematic way in which research can be carried out effectively and appropriately. Different studies highlight cognitive and comprehension challenges of RPPs concepts at the postgraduate level. This study develops a natural language processing (NLP) supervised classification application that guides students in identifying RPPs applicable to their study. By using algorithms rooted in a quantitative research approach, this study builds a corpus represented using the Bag of Words model to train the naïve Bayes, Logistic Regression, and Support Vector Machine algorithms. Computer experiments conducted to evaluate the performance of the algorithms reveal that the Naïve Bayes algorithm presents the highest accuracy and precision levels. In practice, user testing results show the varying impact of knowledge, performance, and effort expectancy. The findings contribute to the minimization of issues postgraduates encounter in identifying research philosophies and the underlying paradigms for their studies. / Science and Technology Education / MTech. (Information Technology) Research Philosophy Paradigm Corpus Algorithm Classification model Classifier Bag of words Naive Bayes Researcher 006.35
164	Energy-Efficient Hardware Design for Machine Learning with In-Memory Computing Zhang, Bo January 2024 (has links) Recently, machine learning and deep neural networks (DNNs) have gained a significant amount of attention since they have achieved human-like performance in various tasks, such as image classification, recommendation, and natural language processing. As the tasks get more complicated, people build bigger and deeper networks to obtain high accuracy, and this brings challenges to existing hardware on fast and energy-efficient DNN computation due to the memory wall problem. First, traditional hardware spends a significant amount of energy on moving the data between memory and ALU units. Second, the traditional memory blocks only support row-by-row access, and this limits the computation speed and energy efficiency. In-memory computing (IMC) is one promising measure to solve the aforementioned problems in DNN computation. This approach combines the memory blocks with the computation units to enable high computation throughput and low energy consumption. On the macro level, both digital and analog-mixed-signal (AMS) IMC macros achieve high performance in the multiply-and-accumulation (MAC) computation. The AMS designs have high energy efficiency and highcompute density, and the digital designs have PVT robustness and technology scalability. On the architecture level, specialized hardware accelerators that integrate these IMC macros outperform the traditional hardware accelerators in end-to-end DNN inference. Beyond the IMC, other approaches also reduce energy consumption. For example, sparsity-aware training reduces the arithmetic energy by adding more zeros to the weights and zero-gating the multiplication and/or addition. Weight and activation compression reduces the off-chip memory access energy. This thesis presents new circuit and architecture designs for efficient DNN inference with in-memory computing architectures. First, this thesis presents two SRAM-based analog-mixed signal IMC macros. One is a macro with custom 10T1C cells for binary/ternary MAC operation. The other one, MACC-SRAM, is a multistep-accumulation capacitor-based IMC macro for 4b MAC computation. The macro features stepwise charging and discharging, sparsity optimization, and adder-first architecture for energy efficiency. Second, we propose a programmable DNN accelerator that integrates 108 AMS IMC macros. This accelerator, named PIMCA, with its own pipeline structure and instruction set architecture, can flexibly support the inference at the instruction level. Last, we implement a fully-digital accelerator that integrates IMC macros supporting floating-point number computation. The accelerator contains online decompression hardware to reducedata movement energy of weight and activation. It also contains online activation compressors to reduce the activation memory footprint. Electrical engineering Machine learning Neural networks (Computer science)
165	Developing an enriched natural language grammar for prosodically-improved concent-to-speech synthesis Marais, Laurette 04 1900 (has links) The need for interacting with machines using spoken natural language is growing, along with the expectation that synthetic speech in this context sound natural. Such interaction includes answering questions, where prosody plays an important role in producing natural English synthetic speech by communicating the information structure of utterances. CCG is a theoretical framework that exploits the notion that, in English, information structure, prosodic structure and syntactic structure are isomorphic. This provides a way to convert a semantic representation of an utterance into a prosodically natural spoken utterance. GF is a framework for writing grammars, where abstract tree structures capture the semantic structure and concrete grammars render these structures in linearised strings. This research combines these frameworks to develop a system that converts semantic representations of utterances into linearised strings of natural language that are marked up to inform the prosody-generating component of a speech synthesis system. / Computing / M. Sc. (Computing) GF CCG Prosody Intonation Speech synthesis Concept-to-speech Information structure Syntax Question-answering Spoken natural language 006.54 Speech synthesis Computational linguistics
166	Global connectivity, information diffusion, and the role of multilingual users in user-generated content platforms Hale, Scott A. January 2014 (has links) Internet content and Internet users are becoming more linguistically diverse as more people speaking different languages come online and produce content on user-generated content platforms. Several platforms have emerged as truly global platforms with users speaking many different languages and coming from around the world. It is now possible to study human behavior on these platforms using the digital trace data the platforms make available about the content people are authoring. Network literature suggests that people cluster together by language, but also that there is a small average path length between any two people on most Internet platforms (including two speakers of different languages). If so, multilingual users may play critical roles as bridges or brokers on these platforms by connecting clusters of monolingual users together across languages. The large differences in the content available in different languages online underscores the importance of such roles. This thesis studies the roles of multilingual users and platform design on two large, user-generated content platforms: Wikipedia and Twitter. It finds that language has a strong role structuring each platform, that multilingual users do act as linguistic bridges subject to certain limitations, that the size of a language correlates with the roles its speakers play in cross-language connections, and that there is a correlation between activity and multilingualism. In contrast to the general understanding in linguistics of high levels of multilingualism offline, this thesis finds relatively low levels of multilingualism on Twitter (11%) and Wikipedia (15%). The findings have implications for both platform design and social network theory. The findings suggest design strategies to increase multilingualism online through the identification and promotion of multilingual starter tasks, the discovery of related other-language information, and the promotion of user choice in linguistic filtering. While weak-ties have received much attention in the social networks literature, cross-language ties are often not distinguished from same-language weak ties. This thesis finds that cross-language ties are similar to same-language weak ties in that both connect distant parts of the network, have limited bandwidth, and yet transfer a non-trivial amount of information when considered in aggregate. At the same time, cross-language ties are distinct from same-language weak ties for the purposes of information diffusion. In general cross-language ties are smaller in number than same-language ties, but each cross-language tie may convey more diverse information given the large differences in the content available in different languages and the relative ease with which a multilingual speaker may access content in multiple languages compared to a monolingual speaker. 006.3
167	Handwriting Chinese character recognition based on quantum particle swarm optimization support vector machine Pang, Bo January 2018 (has links) University of Macau / Faculty of Science and Technology. / Department of Computer and Information Science Optical character recognition Chinese characters -- Data processing Computational linguistics
168	Context-based Image Concept Detection and Annotation Unknown Date (has links) Scene understanding attempts to produce a textual description of visible and latent concepts in an image to describe the real meaning of the scene. Concepts are either objects, events or relations depicted in an image. To recognize concepts, the decision of object detection algorithm must be further enhanced from visual similarity to semantical compatibility. Semantically relevant concepts convey the most consistent meaning of the scene. Object detectors analyze visual properties (e.g., pixel intensities, texture, color gradient) of sub-regions of an image to identify objects. The initially assigned objects names must be further examined to ensure they are compatible with each other and the scene. By enforcing inter-object dependencies (e.g., co-occurrence, spatial and semantical priors) and object to scene constraints as background information, a concept classifier predicts the most semantically consistent set of names for discovered objects. The additional background information that describes concepts is called context. In this dissertation, a framework for building context-based concept detection is presented that uses a combination of multiple contextual relationships to refine the result of underlying feature-based object detectors to produce most semantically compatible concepts. In addition to the lack of ability to capture semantical dependencies, object detectors suffer from high dimensionality of feature space that impairs them. Variances in the image (i.e., quality, pose, articulation, illumination, and occlusion) can also result in low-quality visual features that impact the accuracy of detected concepts. The object detectors used to build context-based framework experiments in this study are based on the state-of-the-art generative and discriminative graphical models. The relationships between model variables can be easily described using graphical models and the dependencies and precisely characterized using these representations. The generative context-based implementations are extensions of Latent Dirichlet Allocation, a leading topic modeling approach that is very effective in reduction of the dimensionality of the data. The discriminative contextbased approach extends Conditional Random Fields which allows efficient and precise construction of model by specifying and including only cases that are related and influence it. The dataset used for training and evaluation is MIT SUN397. The result of the experiments shows overall 15% increase in accuracy in annotation and 31% improvement in semantical saliency of the annotated concepts. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2016. / FAU Electronic Theses and Dissertations Collection Computer vision--Mathematical models. Pattern recognition systems. Information visualization. Latent structure analysis. Expert systems (Computer science)
169	Statistical modeling for lexical chains for automatic Chinese news story segmentation. January 2010 (has links) Chan, Shing Kai. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 106-114). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgements --- p.v / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Problem Statement --- p.2 / Chapter 1.2 --- Motivation for Story Segmentation --- p.4 / Chapter 1.3 --- Terminologies --- p.5 / Chapter 1.4 --- Thesis Goals --- p.6 / Chapter 1.5 --- Thesis Organization --- p.8 / Chapter 2 --- Background Study --- p.9 / Chapter 2.1 --- Coherence-based Approaches --- p.10 / Chapter 2.1.1 --- Defining Coherence --- p.10 / Chapter 2.1.2 --- Lexical Chaining --- p.12 / Chapter 2.1.3 --- Cosine Similarity --- p.15 / Chapter 2.1.4 --- Language Modeling --- p.19 / Chapter 2.2 --- Feature-based Approaches --- p.21 / Chapter 2.2.1 --- Lexical Cues --- p.22 / Chapter 2.2.2 --- Audio Cues --- p.23 / Chapter 2.2.3 --- Video Cues --- p.24 / Chapter 2.3 --- Pros and Cons and Hybrid Approaches --- p.25 / Chapter 2.4 --- Chapter Summary --- p.27 / Chapter 3 --- Experimental Corpora --- p.29 / Chapter 3.1 --- The TDT2 and TDT3 Multi-language Text Corpus --- p.29 / Chapter 3.1.1 --- Introduction --- p.29 / Chapter 3.1.2 --- Program Particulars and Structures --- p.31 / Chapter 3.2 --- Data Preprocessing --- p.33 / Chapter 3.2.1 --- Challenges of Lexical Chain Formation on Chi- nese Text --- p.33 / Chapter 3.2.2 --- Word Segmentation for Word Units Extraction --- p.35 / Chapter 3.2.3 --- Part-of-speech Tagging for Candidate Words Ex- traction --- p.36 / Chapter 3.3 --- Chapter Summary --- p.37 / Chapter 4 --- Indication of Lexical Cohesiveness by Lexical Chains --- p.39 / Chapter 4.1 --- Lexical Chain as a Representation of Cohesiveness --- p.40 / Chapter 4.1.1 --- Choice of Word Relations for Lexical Chaining --- p.41 / Chapter 4.1.2 --- Lexical Chaining by Connecting Repeated Lexi- cal Elements --- p.43 / Chapter 4.2 --- Lexical Chain as an Indicator of Story Segments --- p.48 / Chapter 4.2.1 --- Indicators of Absence of Cohesiveness --- p.49 / Chapter 4.2.2 --- Indicator of Continuation of Cohesiveness --- p.58 / Chapter 4.3 --- Chapter Summary --- p.62 / Chapter 5 --- Indication of Story Boundaries by Lexical Chains --- p.63 / Chapter 5.1 --- Formal Definition of the Classification Procedures --- p.64 / Chapter 5.2 --- Theoretical Framework for Segmentation Based on Lex- ical Chaining --- p.65 / Chapter 5.2.1 --- Evaluation of Story Segmentation Accuracy --- p.65 / Chapter 5.2.2 --- Previous Approach of Story Segmentation Based on Lexical Chaining --- p.66 / Chapter 5.2.3 --- Statistical Framework for Story Segmentation based on Lexical Chaining --- p.69 / Chapter 5.2.4 --- Post Processing of Ratio for Boundary Identifi- cation --- p.73 / Chapter 5.3 --- Comparing Segmentation Models --- p.75 / Chapter 5.4 --- Chapter Summary --- p.79 / Chapter 6 --- Analysis of Lexical Chains Features as Boundary Indi- cators --- p.80 / Chapter 6.1 --- Error Analysis --- p.81 / Chapter 6.2 --- Window Length in the LRT Model --- p.82 / Chapter 6.3 --- The Relative Importance of Each Set of Features --- p.84 / Chapter 6.4 --- The Effect of Removing Timing Information --- p.92 / Chapter 6.5 --- Chapter Summary --- p.96 / Chapter 7 --- Conclusions and Future Work --- p.98 / Chapter 7.1 --- Contributions --- p.98 / Chapter 7.2 --- Future Works --- p.100 / Chapter 7.2.1 --- Further Extension of the Framework --- p.100 / Chapter 7.2.2 --- Wider Applications of the Framework --- p.105 / Bibliography --- p.106 Computational linguistics Text processing (Computer Science)
170	Improvements to the complex question answering models Imam, Md. Kaisar January 2011 (has links) In recent years the amount of information on the web has increased dramatically. As a result, it has become a challenge for the researchers to find effective ways that can help us query and extract meaning from these large repositories. Standard document search engines try to address the problem by presenting the users a ranked list of relevant documents. In most cases, this is not enough as the end-user has to go through the entire document to find out the answer he is looking for. Question answering, which is the retrieving of answers to natural language questions from a document collection, tries to remove the onus on the end-user by providing direct access to relevant information. This thesis is concerned with open-domain complex question answering. Unlike simple questions, complex questions cannot be answered easily as they often require inferencing and synthesizing information from multiple documents. Hence, we considered the task of complex question answering as query-focused multi-document summarization. In this thesis, to improve complex question answering we experimented with both empirical and machine learning approaches. We extracted several features of different types (i.e. lexical, lexical semantic, syntactic and semantic) for each of the sentences in the document collection in order to measure its relevancy to the user query. We have formulated the task of complex question answering using reinforcement framework, which to our best knowledge has not been applied for this task before and has the potential to improve itself by fine-tuning the feature weights from user feedback. We have also used unsupervised machine learning techniques (random walk, manifold ranking) and augmented semantic and syntactic information to improve them. Finally we experimented with question decomposition where instead of trying to find the answer of the complex question directly, we decomposed the complex question into a set of simple questions and synthesized the answers to get our final result. / x, 128 leaves : ill. ; 29 cm Question-answering systems -- Research Database searching Querying (Computer science) -- Research Information retrieval -- Research Dissertations, Academic

Search results