Global ETD Search

31	Design and implementation of a content aware image processing module on FPGA Mudassar, Burhan Ahmad 08 June 2015 (has links) In this thesis, we tackle the problem of designing and implementing a wireless video sensor network for a surveillance application. The goal was to design a low power content aware system that is able to take an image from an image sensor, determine blocks in the image that contain important information and encode those block for transmission thus reducing the overall transmission effort. At the same time, the encoder and the preprocessor must not consume so much computation power that the utility of this system is lost. We have implemented such a system which uses a combination of Edge Detection and Frame Differencing to determine useful information within an image. A JPEG encoder then encodes the important blocks for transmission. An implementation on a FPGA is presented in this work. This work demonstrates that preprocessing gives us a 48.6 % reduction in power for a single frame while maintaining a delivery ratio of above 85 % for the given set of test frames. Content aware Image processing Edge detection Image preprocessing Low power FPGA
32	Distributed and Multiphase Inference in Theory and Practice: Principles, Modeling, and Computation for High-Throughput Science Blocker, Alexander Weaver 18 September 2013 (has links) The rise of high-throughput scientific experimentation and data collection has introduced new classes of statistical and computational challenges. The technologies driving this data explosion are subject to complex new forms of measurement error, requiring sophisticated statistical approaches. Simultaneously, statistical computing must adapt to larger volumes of data and new computational environments, particularly parallel and distributed settings. This dissertation presents several computational and theoretical contributions to these challenges. In chapter 1, we consider the problem of estimating the genome-wide distribution of nucleosome positions from paired-end sequencing data. We develop a modeling approach based on nonparametric templates that controls for variability due to enzymatic digestion. We use this to construct a calibrated Bayesian method to detect local concentrations of nucleosome positions. Inference is carried out via a distributed HMC algorithm that scales linearly in complexity with the length of the genome being analyzed. We provide MPI-based implementations of the proposed methods, stand-alone and on Amazon EC2, which can provide inferences on an entire S. cerevisiae genome in less than 1 hour on EC2. We then present a method for absolute quantitation from LC-MS/MS proteomics experiments in chapter 2. We present a Bayesian model for the non-ignorable missing data mechanism induced by this technology, which includes an unusual combination of censoring and truncation. We provide a scalable MCMC sampler for inference in this setting, enabling full-proteome analyses using cluster computing environments. A set of simulation studies and actual experiments demonstrate this approach's validity and utility. We close in chapter 3 by proposing a theoretical framework for the analysis of preprocessing under the banner of multiphase inference. Preprocessing forms an oft-neglected foundation for a wide range of statistical and scientific analyses. We provide some initial theoretical foundations for this area, including distributed preprocessing, building upon previous work in multiple imputation. We demonstrate that multiphase inferences can, in some cases, even surpass standard single-phase estimators in efficiency and robustness. Our work suggests several paths for further research into the statistical principles underlying preprocessing. / Statistics Statistics High-throughput biology Massive data Nucleosomes Preprocessing Proteomics Statistical principles
33	Morphosyntactic Corpora and Tools for Persian Seraji, Mojgan January 2015 (has links) This thesis presents open source resources in the form of annotated corpora and modules for automatic morphosyntactic processing and analysis of Persian texts. More specifically, the resources consist of an improved part-of-speech tagged corpus and a dependency treebank, as well as tools for text normalization, sentence segmentation, tokenization, part-of-speech tagging, and dependency parsing for Persian. In developing these resources and tools, two key requirements are observed: compatibility and reuse. The compatibility requirement encompasses two parts. First, the tools in the pipeline should be compatible with each other in such a way that the output of one tool is compatible with the input requirements of the next. Second, the tools should be compatible with the annotated corpora and deliver the same analysis that is found in these. The reuse requirement means that all the components in the pipeline are developed by reusing resources, standard methods, and open source state-of-the-art tools. This is necessary to make the project feasible. Given these requirements, the thesis investigates two main research questions. The first is how can we develop morphologically and syntactically annotated corpora and tools while satisfying the requirements of compatibility and reuse? The approach taken is to accept the tokenization variations in the corpora to achieve robustness. The tokenization variations in Persian texts are related to the orthographic variations of writing fixed expressions, as well as various types of affixes and clitics. Since these variations are inherent properties of Persian texts, it is important that the tools in the pipeline can handle them. Therefore, they should not be trained on idealized data. The second question concerns how accurately we can perform morphological and syntactic analysis for Persian by adapting and applying existing tools to the annotated corpora. The experimental evaluation of the tools shows that the sentence segmenter and tokenizer achieve an F-score close to 100%, the tagger has an accuracy of nearly 97.5%, and the parser achieves a best labeled accuracy of over 82% (with unlabeled accuracy close to 87%). Persian language technology corpus treebank preprocessing segmentation part-of-speech tagging dependency parsing
34	Preprocessing and Reduction for Semidefinite Programming via Facial Reduction: Theory and Practice Cheung, Yuen-Lam 05 November 2013 (has links) Semidefinite programming is a powerful modeling tool for a wide range of optimization and feasibility problems. Its prevalent use in practice relies on the fact that a (nearly) optimal solution of a semidefinite program can be obtained efficiently in both theory and practice, provided that the semidefinite program and its dual satisfy the Slater condition. This thesis focuses on the situation where the Slater condition (i.e., the existence of positive definite feasible solutions) does not hold for a given semidefinite program; the failure of the Slater condition often occurs in structured semidefinite programs derived from various applications. In this thesis, we study the use of the facial reduction technique, originally proposed as a theoretical procedure by Borwein and Wolkowicz, as a preprocessing technique for semidefinite programs. Facial reduction can be used either in an algorithmic or a theoretical sense, depending on whether the structure of the semidefinite program is known a priori. The main contribution of this thesis is threefold. First, we study the numerical issues in the implementation of the facial reduction as an algorithm on semidefinite programs, and argue that each step of the facial reduction algorithm is backward stable. Second, we illustrate the theoretical importance of the facial reduction procedure in the topic of sensitivity analysis for semidefinite programs. Finally, we illustrate the use of facial reduction technique on several classes of structured semidefinite programs, in particular the side chain positioning problem in protein folding. semidefinite programming preprocessing backward stability numerical optimization sensitvity analysis perturbation theory side chain positioning protein structures
35	[en] AN AUTOMATIC PREPROCESSING FOR TEXT MINING IN PORTUGUESE: A COMPUTER-AIDED APPROACH / [pt] UMA ABORDAGEM DE PRÉ-PROCESSAMENTO AUTOMÁTICO PARA MINERAÇÃO DE TEXTOS EM PORTUGUÊS: SOB O ENFOQUE DA INTELIGENCIA COMPUTACIONAL CHRISTIAN NUNES ARANHA 25 June 2007 (has links) [pt] O presente trabalho apresenta uma pesquisa onde é proposto um novo modelo de pré-processamento para mineração de textos em português utilizando técnicas de inteligência computacional baseadas em conceitos existentes, como redes neurais, sistemas dinâmicos, e estatística multidimensional. O objetivo dessa tese de doutorado é, portanto, inovar na fase de pré- processamento da mineração de textos, propondo um modelo automático de enriquecimento de dados textuais. Essa abordagem se apresenta como uma extensão do tradicional modelo de conjunto de palavras (bag-of-words), de preocupação mais estatística, e propõe um modelo do tipo conjunto de lexemas (bag-of-lexems) com maior aproveitamento do conteúdo lingüístico do texto em uma abordagem mais computacional, proporcionando resultados mais eficientes. O trabalho é complementado com o desenvolvimento e implementação de um sistema de préprocessamento de textos, que torna automática essa fase do processo de mineração de textos ora proposto. Apesar do objeto principal desta tese ser a etapa de préprocessamento, passaremos, de forma não muito aprofundada, por todas as etapas do processo de mineração de textos com o intuito de fornecer a teoria base completa para o entendimento do processo como um todo. Além de apresentar a teoria de cada etapa, individualmente, é executado um processamento completo (com coleta de dados, indexação, pré-processamento, mineração e pósprocessamento) utilizando nas outras etapas modelos já consagrados na literatura que tiveram sua implementação realizada durante esse trabalho. Ao final são mostradas funcionalidades e algumas aplicações como: classificação de documentos, extração de informações e interface de linguagem natural (ILN). / [en] This work presents a research that proposes a new model of pre-processing for text mining in portuguese using computational intelligence techniques based on existing concepts, such as neural networks, dinamic systems and multidimensional statistics. The object of this doctoral thesis is, therefore, innovation in the pre-processing phase of text-mining, proposing an automatic model for the enrichment of textual data. This approach is presented as an extension of the traditional bag-of-words model, that has a more statistical emphasis, and proposes a bag-of-lexemes model with greater usage of the texts' linguistic content in a more computational approach, providing more efficient results. The work is complemented by the development and implementation of a text pre-processing system that automates this phase of th text mining process as proposed. Despite the object of this thesis being the pre- processing stage, one feels apropriate to describe, in overview, every step of the text mining process in order to provide the basic theory necessary to understand the process as a whole. Beyond presenting the theory of every stage individually, one executes a complete process (with data collection, indexing, pre-processing, mining and postprocessing) using tried-and-true models in all the other stages, which were implemented during the development of this work. At last some functionalities and aplications are shown, such as: document classification, information extraction and natural language interface (NLI). [pt] INTELIGENCIA ARTIFICIAL [en] ARTIFICIAL INTELLIGENCE [pt] PREPROCESSAMENTO [en] PREPROCESSING [pt] MINERACAO DE TEXTOS [en] TEXTS MINING
36	Pré-processamento de dados na identificação de processos industriais. / Pre-processing data in the identification of industrial processes. Oscar Wilfredo Rodríguez Rodríguez 01 December 2014 (has links) Neste trabalho busca-se estudar as diferentes etapas de pre-processamento de dados na identificacao de sistemas, que sao: filtragem, normalizacao e amostragem. O objetivo principal e de acondicionar os dados empiricos medidos pelos instrumentos dos processos industriais, para que quando estes dados forem usados na identificacao de sistemas, se possa obter modelos matematicos que representem da forma mais proxima a dinamica do processo real. Vai-se tambem implementar as tecnicas de pre-processamento de dados no software MatLab 2012b e vai-se fazer testes na Planta Piloto de Vazao instalada no Laboratorio de Controle de Processos Industriais do Departamento de Engenharia de Telecomunicacoes e Controle da Escola Politecnica da USP; bem como em plantas simuladas de processos industriais, em que e conhecido a priori seu modelo matematico. Ao final, vai-se analisar e comparar o desempenho das etapas de pre-processamento de dados e sua influencia no indice de ajuste do modelo ao sistema real (fit), obtido mediante o metodo de validacao cruzada. Os parametros do modelo sao obtidos para predicoes infinitos passos a frente. / This work aims to study the different stages of data pre-processing in system identification, as are: filtering, normalization and sampling. The main goal is to condition the empirical data measured by the instruments of industrial processes, so that when these data are used to identify systems, one can obtain mathematical models that represent more closely the dynamics of the real process. It will also be implemented the techniques of preprocessing of data in MatLab 2012b and it will be performed tests in the Pilot Plant of Flow at the Laboratory of Industrial Process Control, Department of Telecommunications and Control Engineering from the Polytechnic School of USP; as well as with simulated plants of industrial processes where it is known a priori its mathematical model. At the end, it is analyzed and compared the performance of the pre-processing of data and its influence on the index of adjustment of the model to the real system (fit), obtained by the cross validation method. The model parameters are obtained for infinite step-ahead prediction. Filtragem Identicação de sistemas Normalização Pré-processamento de dados Reamostragem Filtering Identication systems Preprocessing of data Resampling Standardization
37	On an automatically parallel generation technique for tetrahedral meshes Globisch, G. 30 October 1998 (has links) (PDF) In order to prepare modern finite element analysis a program for the efficient parallel generation of tetrahedral meshes in a wide class of three dimensional domains having a generalized cylindric shape is presented. The applied mesh generation strategy is based on the decomposition of some 2D-reference domain into single con- nected subdomains by means of its triangulations the tetrahedral layers are built up in parallel. Adaptive grid controlling as well as nodal renumbering algorithms are involved. In the paper several examples are incorporated to demonstrate both program's capabilities and the handling with. Mesh generation Parallel preprocessing Finite elements Domain decomposition MSC 65Y05 ddc:510
38	Modeling and solving university timetabling / Modélisation et résolution de problèmes d’emploi du temps d’universités Arbaoui, Taha 10 December 2014 (has links) Cette thèse s’intéresse aux problèmes d’emploi du temps d’universités. Ces problèmes sont rencontrés chaque année par les utilisateurs. Nous proposons des bornes inférieures, des méthodes heuristiques et des modèles de programmation mixte en nombres entiers et de programmation par contraintes. Nous traitons le problème d’emploi du temps d’examens et celui d’affectation des étudiants. Nous proposons de nouvelles méthodes et formulations et les comparons aux approches existantes. Nous proposons, pour le problème d’emploi du temps d’examens, une amélioration d’un modèle mathématique en nombres entiers qui permettra d’obtenir des solutions optimales. Ensuite, des bornes inférieures, une formulation plus compacte des contraintes et un modèle de programmation par contraintes sont proposés. Pour le problème d’emploi du temps d’examens à l’Université de Technologie de Compiègne, nous proposons une approche mémétique. Enfin, nous présentons un modèle mathématique pour le problème d’affectation des étudiants et nous étudions sa performance sur un ensemble d’instances réelles. / This thesis investigates university timetabling problems. These problems occur across universities and are faced each year by the practitioners. We propose new lower bounds, heuristic approaches, mixed integer and constraint programming models to solve them. We address the exam timetabling and the student scheduling problem. We investigate new methods and formulations and compare them to the existing approaches. For exam timetabling, we propose an improvement to an existing mixed integer programming model that makes it possible to obtain optimal solutions. Next, lower bounds, a more compact reformulation for constraints and a constraint programming model are proposed. For the exam timetabling problem at Université de Technologie de Compiègne, we designed a memetic approach. Finally, we present a new formulation for the student scheduling problem and investigate its performance on a set of real-world instances. Modélisation Approches heuristiques Approches exactes Timetabling Metaheuristics Integer programming Exact approaches Preprocessing Heuristics
39	Framework pro předzpracování dopravních dat pro zjištění semantických míst / Trajectory Data Preprocessing Framework for Discovering Semantic Locations Ostroukh, Anna January 2018 (has links) Cílem práce je vytvoření přehledu o existujících přístupech pro předzpracování dopravních dat se zaměřením na objevování sémantických trajektorií a návrh a vývoj rámce, který integruje dopravní data z GPS senzorů se sémantikou. Problém analýzy nezpracovaných trajektorií spočíva v tom, že není natolik vyčerpávající, jako analýza trajektorií, které obsahují smysluplný kontext. Po nastudování různých přístupů a algoritmů sleduje návrh a vývoj rámce, který objevuje semantická místa pomocí schlukovací metody záložené na hustotě, aplikované na body zastavení v trajektoriích. Návrh a implementace rámce byl zhodnotěn na veřejně přístupných datových souborech obsahujících nezpracované GPS záznamy.
40	Predikce povahy spamových krátkých textů textovým klasifikátorem / Machine Learning Text Classifier for Short Texts Category Prediction Drápela, Karel January 2018 (has links) This thesis deals with categorization of short spam texts from SMS messages. First part summarizes current methods for text classification and~it's followed by description of several commonly used classifiers. In following chapters test data analysis, program implementation and results are described. The program is able to predict text categories based on predefined set of classes and also estimate classification accuracy on training data. For the two category types, that I designed, classifier reached accuracy of 82% and 92% . Both preprocessing and feature selection had a positive impact on resulting accuracy. It is possible to improve this accuracy further by removing portion of samples, which are difficult to classify. With 80\% recall it is possible to increase accuracy by 8-10%.

Search results