31 |
Design and implementation of a content aware image processing module on FPGAMudassar, Burhan Ahmad 08 June 2015 (has links)
In this thesis, we tackle the problem of designing and implementing a wireless video sensor network for a surveillance application. The goal was to design a low power content aware system that is able to take an image from an image sensor, determine blocks in the image that contain important information and encode those block for transmission thus reducing the overall transmission effort. At the same time, the encoder and the preprocessor must not consume so much computation power that the utility of this system is lost.
We have implemented such a system which uses a combination of Edge Detection and Frame Differencing to determine useful information within an image. A JPEG encoder then encodes the important blocks for transmission. An implementation on a FPGA is presented in this work. This work demonstrates that preprocessing gives us a 48.6 % reduction in power for a single frame while maintaining a delivery ratio of above 85 % for the given set of test frames.
|
32 |
Distributed and Multiphase Inference in Theory and Practice: Principles, Modeling, and Computation for High-Throughput ScienceBlocker, Alexander Weaver 18 September 2013 (has links)
The rise of high-throughput scientific experimentation and data collection has introduced new classes of statistical and computational challenges. The technologies driving this data explosion are subject to complex new forms of measurement error, requiring sophisticated statistical approaches. Simultaneously, statistical computing must adapt to larger volumes of data and new computational environments, particularly parallel and distributed settings. This dissertation presents several computational and theoretical contributions to these challenges. In chapter 1, we consider the problem of estimating the genome-wide distribution of nucleosome positions from paired-end sequencing data. We develop a modeling approach based on nonparametric templates that controls for variability due to enzymatic digestion. We use this to construct a calibrated Bayesian method to detect local concentrations of nucleosome positions. Inference is carried out via a distributed HMC algorithm that scales linearly in complexity with the length of the genome being analyzed. We provide MPI-based implementations of the proposed methods, stand-alone and on Amazon EC2, which can provide inferences on an entire S. cerevisiae genome in less than 1 hour on EC2. We then present a method for absolute quantitation from LC-MS/MS proteomics experiments in chapter 2. We present a Bayesian model for the non-ignorable missing data mechanism induced by this technology, which includes an unusual combination of censoring and truncation. We provide a scalable MCMC sampler for inference in this setting, enabling full-proteome analyses using cluster computing environments. A set of simulation studies and actual experiments demonstrate this approach's validity and utility. We close in chapter 3 by proposing a theoretical framework for the analysis of preprocessing under the banner of multiphase inference. Preprocessing forms an oft-neglected foundation for a wide range of statistical and scientific analyses. We provide some initial theoretical foundations for this area, including distributed preprocessing, building upon previous work in multiple imputation. We demonstrate that multiphase inferences can, in some cases, even surpass standard single-phase estimators in efficiency and robustness. Our work suggests several paths for further research into the statistical principles underlying preprocessing. / Statistics
|
33 |
Morphosyntactic Corpora and Tools for PersianSeraji, Mojgan January 2015 (has links)
This thesis presents open source resources in the form of annotated corpora and modules for automatic morphosyntactic processing and analysis of Persian texts. More specifically, the resources consist of an improved part-of-speech tagged corpus and a dependency treebank, as well as tools for text normalization, sentence segmentation, tokenization, part-of-speech tagging, and dependency parsing for Persian. In developing these resources and tools, two key requirements are observed: compatibility and reuse. The compatibility requirement encompasses two parts. First, the tools in the pipeline should be compatible with each other in such a way that the output of one tool is compatible with the input requirements of the next. Second, the tools should be compatible with the annotated corpora and deliver the same analysis that is found in these. The reuse requirement means that all the components in the pipeline are developed by reusing resources, standard methods, and open source state-of-the-art tools. This is necessary to make the project feasible. Given these requirements, the thesis investigates two main research questions. The first is how can we develop morphologically and syntactically annotated corpora and tools while satisfying the requirements of compatibility and reuse? The approach taken is to accept the tokenization variations in the corpora to achieve robustness. The tokenization variations in Persian texts are related to the orthographic variations of writing fixed expressions, as well as various types of affixes and clitics. Since these variations are inherent properties of Persian texts, it is important that the tools in the pipeline can handle them. Therefore, they should not be trained on idealized data. The second question concerns how accurately we can perform morphological and syntactic analysis for Persian by adapting and applying existing tools to the annotated corpora. The experimental evaluation of the tools shows that the sentence segmenter and tokenizer achieve an F-score close to 100%, the tagger has an accuracy of nearly 97.5%, and the parser achieves a best labeled accuracy of over 82% (with unlabeled accuracy close to 87%).
|
34 |
Preprocessing and Reduction for Semidefinite Programming via Facial Reduction: Theory and PracticeCheung, Yuen-Lam 05 November 2013 (has links)
Semidefinite programming is a powerful modeling tool for a wide range of optimization and feasibility problems. Its prevalent use in practice relies on the fact that a (nearly) optimal solution of a semidefinite program can be obtained efficiently in both theory and practice, provided that the semidefinite program and its dual satisfy the Slater condition.
This thesis focuses on the situation where the Slater condition (i.e., the existence of positive definite feasible solutions) does not hold for a given semidefinite program; the failure of the Slater condition often occurs in structured semidefinite programs derived from various applications. In this thesis, we study the use of the facial reduction technique, originally proposed as a theoretical procedure by Borwein and Wolkowicz, as a preprocessing technique for semidefinite programs. Facial reduction can be used either in an algorithmic or a theoretical sense, depending on whether the structure of the semidefinite program is known a priori.
The main contribution of this thesis is threefold. First, we study the numerical issues in the implementation of the facial reduction as an algorithm on semidefinite programs, and argue that each step of the facial reduction algorithm is backward stable. Second, we illustrate the theoretical importance of the facial reduction procedure in the topic of sensitivity analysis for semidefinite programs. Finally, we illustrate the use of facial reduction technique on several classes of structured semidefinite programs, in particular the side chain positioning problem in protein folding.
|
35 |
[en] AN AUTOMATIC PREPROCESSING FOR TEXT MINING IN PORTUGUESE: A COMPUTER-AIDED APPROACH / [pt] UMA ABORDAGEM DE PRÉ-PROCESSAMENTO AUTOMÁTICO PARA MINERAÇÃO DE TEXTOS EM PORTUGUÊS: SOB O ENFOQUE DA INTELIGENCIA COMPUTACIONALCHRISTIAN NUNES ARANHA 25 June 2007 (has links)
[pt] O presente trabalho apresenta uma pesquisa onde é proposto
um novo
modelo de pré-processamento para mineração de textos em
português utilizando
técnicas de inteligência computacional baseadas em
conceitos existentes, como
redes neurais, sistemas dinâmicos, e estatística
multidimensional. O objetivo dessa
tese de doutorado é, portanto, inovar na fase de pré-
processamento da mineração
de textos, propondo um modelo automático de enriquecimento
de dados textuais.
Essa abordagem se apresenta como uma extensão do
tradicional modelo de
conjunto de palavras (bag-of-words), de preocupação mais
estatística, e propõe
um modelo do tipo conjunto de lexemas (bag-of-lexems) com
maior
aproveitamento do conteúdo lingüístico do texto em uma
abordagem mais
computacional, proporcionando resultados mais eficientes.
O trabalho é
complementado com o desenvolvimento e implementação de um
sistema de préprocessamento
de textos, que torna automática essa fase do processo de
mineração
de textos ora proposto. Apesar do objeto principal desta
tese ser a etapa de préprocessamento,
passaremos, de forma não muito aprofundada, por todas as
etapas
do processo de mineração de textos com o intuito de
fornecer a teoria base
completa para o entendimento do processo como um todo.
Além de apresentar a
teoria de cada etapa, individualmente, é executado um
processamento completo
(com coleta de dados, indexação, pré-processamento,
mineração e pósprocessamento)
utilizando nas outras etapas modelos já consagrados na
literatura
que tiveram sua implementação realizada durante esse
trabalho. Ao final são
mostradas funcionalidades e algumas aplicações como:
classificação de
documentos, extração de informações e interface de
linguagem natural (ILN). / [en] This work presents a research that proposes a new model of
pre-processing
for text mining in portuguese using computational
intelligence techniques based
on existing concepts, such as neural networks, dinamic
systems and
multidimensional statistics. The object of this doctoral
thesis is, therefore,
innovation in the pre-processing phase of text-mining,
proposing an automatic
model for the enrichment of textual data. This approach is
presented as an
extension of the traditional bag-of-words model, that has
a more statistical
emphasis, and proposes a bag-of-lexemes model with greater
usage of the texts'
linguistic content in a more computational approach,
providing more efficient
results. The work is complemented by the development and
implementation of a
text pre-processing system that automates this phase of th
text mining process as
proposed. Despite the object of this thesis being the pre-
processing stage, one
feels apropriate to describe, in overview, every step of
the text mining process in
order to provide the basic theory necessary to understand
the process as a whole.
Beyond presenting the theory of every stage individually,
one executes a complete
process (with data collection, indexing, pre-processing,
mining and postprocessing)
using tried-and-true models in all the other stages, which
were
implemented during the development of this work. At last
some functionalities
and aplications are shown, such as: document
classification, information
extraction and natural language interface (NLI).
|
36 |
Pré-processamento de dados na identificação de processos industriais. / Pre-processing data in the identification of industrial processes.Oscar Wilfredo Rodríguez Rodríguez 01 December 2014 (has links)
Neste trabalho busca-se estudar as diferentes etapas de pre-processamento de dados na identificacao de sistemas, que sao: filtragem, normalizacao e amostragem. O objetivo principal e de acondicionar os dados empiricos medidos pelos instrumentos dos processos industriais, para que quando estes dados forem usados na identificacao de sistemas, se possa obter modelos matematicos que representem da forma mais proxima a dinamica do processo real. Vai-se tambem implementar as tecnicas de pre-processamento de dados no software MatLab 2012b e vai-se fazer testes na Planta Piloto de Vazao instalada no Laboratorio de Controle de Processos Industriais do Departamento de Engenharia de Telecomunicacoes e Controle da Escola Politecnica da USP; bem como em plantas simuladas de processos industriais, em que e conhecido a priori seu modelo matematico. Ao final, vai-se analisar e comparar o desempenho das etapas de pre-processamento de dados e sua influencia no indice de ajuste do modelo ao sistema real (fit), obtido mediante o metodo de validacao cruzada. Os parametros do modelo sao obtidos para predicoes infinitos passos a frente. / This work aims to study the different stages of data pre-processing in system identification, as are: filtering, normalization and sampling. The main goal is to condition the empirical data measured by the instruments of industrial processes, so that when these data are used to identify systems, one can obtain mathematical models that represent more closely the dynamics of the real process. It will also be implemented the techniques of preprocessing of data in MatLab 2012b and it will be performed tests in the Pilot Plant of Flow at the Laboratory of Industrial Process Control, Department of Telecommunications and Control Engineering from the Polytechnic School of USP; as well as with simulated plants of industrial processes where it is known a priori its mathematical model. At the end, it is analyzed and compared the performance of the pre-processing of data and its influence on the index of adjustment of the model to the real system (fit), obtained by the cross validation method. The model parameters are obtained for infinite step-ahead prediction.
|
37 |
On an automatically parallel generation technique for tetrahedral meshesGlobisch, G. 30 October 1998 (has links) (PDF)
In order to prepare modern finite element analysis a program for
the efficient parallel generation of tetrahedral meshes in a wide
class of three dimensional domains having a generalized cylindric
shape is presented. The applied mesh generation strategy is based
on the decomposition of some 2D-reference domain into single con-
nected subdomains by means of its triangulations the tetrahedral
layers are built up in parallel. Adaptive grid controlling as
well as nodal renumbering algorithms are involved. In the paper
several examples are incorporated to demonstrate both program's
capabilities and the handling with.
|
38 |
Modeling and solving university timetabling / Modélisation et résolution de problèmes d’emploi du temps d’universitésArbaoui, Taha 10 December 2014 (has links)
Cette thèse s’intéresse aux problèmes d’emploi du temps d’universités. Ces problèmes sont rencontrés chaque année par les utilisateurs. Nous proposons des bornes inférieures, des méthodes heuristiques et des modèles de programmation mixte en nombres entiers et de programmation par contraintes. Nous traitons le problème d’emploi du temps d’examens et celui d’affectation des étudiants. Nous proposons de nouvelles méthodes et formulations et les comparons aux approches existantes. Nous proposons, pour le problème d’emploi du temps d’examens, une amélioration d’un modèle mathématique en nombres entiers qui permettra d’obtenir des solutions optimales. Ensuite, des bornes inférieures, une formulation plus compacte des contraintes et un modèle de programmation par contraintes sont proposés. Pour le problème d’emploi du temps d’examens à l’Université de Technologie de Compiègne, nous proposons une approche mémétique. Enfin, nous présentons un modèle mathématique pour le problème d’affectation des étudiants et nous étudions sa performance sur un ensemble d’instances réelles. / This thesis investigates university timetabling problems. These problems occur across universities and are faced each year by the practitioners. We propose new lower bounds, heuristic approaches, mixed integer and constraint programming models to solve them. We address the exam timetabling and the student scheduling problem. We investigate new methods and formulations and compare them to the existing approaches. For exam timetabling, we propose an improvement to an existing mixed integer programming model that makes it possible to obtain optimal solutions. Next, lower bounds, a more compact reformulation for constraints and a constraint programming model are proposed. For the exam timetabling problem at Université de Technologie de Compiègne, we designed a memetic approach. Finally, we present a new formulation for the student scheduling problem and investigate its performance on a set of real-world instances.
|
39 |
Framework pro předzpracování dopravních dat pro zjištění semantických míst / Trajectory Data Preprocessing Framework for Discovering Semantic LocationsOstroukh, Anna January 2018 (has links)
Cílem práce je vytvoření přehledu o existujících přístupech pro předzpracování dopravních dat se zaměřením na objevování sémantických trajektorií a návrh a vývoj rámce, který integruje dopravní data z GPS senzorů se sémantikou. Problém analýzy nezpracovaných trajektorií spočíva v tom, že není natolik vyčerpávající, jako analýza trajektorií, které obsahují smysluplný kontext. Po nastudování různých přístupů a algoritmů sleduje návrh a vývoj rámce, který objevuje semantická místa pomocí schlukovací metody záložené na hustotě, aplikované na body zastavení v trajektoriích. Návrh a implementace rámce byl zhodnotěn na veřejně přístupných datových souborech obsahujících nezpracované GPS záznamy.
|
40 |
Predikce povahy spamových krátkých textů textovým klasifikátorem / Machine Learning Text Classifier for Short Texts Category PredictionDrápela, Karel January 2018 (has links)
This thesis deals with categorization of short spam texts from SMS messages. First part summarizes current methods for text classification and~it's followed by description of several commonly used classifiers. In following chapters test data analysis, program implementation and results are described. The program is able to predict text categories based on predefined set of classes and also estimate classification accuracy on training data. For the two category types, that I designed, classifier reached accuracy of 82% and 92% . Both preprocessing and feature selection had a positive impact on resulting accuracy. It is possible to improve this accuracy further by removing portion of samples, which are difficult to classify. With 80\% recall it is possible to increase accuracy by 8-10%.
|
Page generated in 0.0623 seconds