141 |
Automatic Classification of Fish in Underwater Video; Pattern Matching - Affine Invariance and Beyondgundam, madhuri, Gundam, Madhuri 15 May 2015 (has links)
Underwater video is used by marine biologists to observe, identify, and quantify living marine resources. Video sequences are typically analyzed manually, which is a time consuming and laborious process. Automating this process will significantly save time and cost. This work proposes a technique for automatic fish classification in underwater video. The steps involved are background subtracting, fish region tracking and classification using features. The background processing is used to separate moving objects from their surrounding environment. Tracking associates multiple views of the same fish in consecutive frames. This step is especially important since recognizing and classifying one or a few of the views as a species of interest may allow labeling the sequence as that particular species. Shape features are extracted using Fourier descriptors from each object and are presented to nearest neighbor classifier for classification. Finally, the nearest neighbor classifier results are combined using a probabilistic-like framework to classify an entire sequence.
The majority of the existing pattern matching techniques focus on affine invariance, mainly because rotation, scale, translation and shear are common image transformations. However, in some situations, other transformations may be modeled as a small deformation on top of an affine transformation. The proposed algorithm complements the existing Fourier transform-based pattern matching methods in such a situation. First, the spatial domain pattern is decomposed into non-overlapping concentric circular rings with centers at the middle of the pattern. The Fourier transforms of the rings are computed, and are then mapped to polar domain. The algorithm assumes that the individual rings are rotated with respect to each other. The variable angles of rotation provide information about the directional features of the pattern. This angle of rotation is determined starting from the Fourier transform of the outermost ring and moving inwards to the innermost ring. Two different approaches, one using dynamic programming algorithm and second using a greedy algorithm, are used to determine the directional features of the pattern.
|
142 |
Optimalizace rozvozu piva společnosti Heineken / Heineken Beer Distribution OptimalisationVršecká, Renáta January 2009 (has links)
This thesis deals with real logistic problem of the Heineken CZ Company. The company sets down an itinerary for each vehicle to distribute its goods to particular customers on daily basis. These itineraries are created manually, only with the skill of experienced driver. The goal of this thesis is to find a solution with an algorithm, which will be able to set optimal itineraries of all vehicles, so the total distance and therefore operating costs are minimized, with only the knowledge of distances between each two nodes.
|
143 |
EVALUATING SPATIAL QUERIES OVER DECLUSTERED SPATIAL DATAEslam A Almorshdy (6832553) 02 August 2019 (has links)
<div>
<div>
<p>Due to the large volumes of spatial data, data is stored on clusters of machines
that inter-communicate to achieve a task. In such distributed environment; communicating intermediate results among computing nodes dominates execution time.
Communication overhead is even more dominant if processing is in memory. Moreover, the way spatial data is partitioned affects overall processing cost. Various partitioning strategies influence the size of the intermediate results. Spatial data poses
the following additional challenges: 1)Storage load balancing because of the skewed
distribution of spatial data over the underlying space, 2)Query load imbalance due to
skewed query workload and query hotspots over both time and space, and 3)Lack of
effective utilization of the computing resources. We introduce a new kNN query evaluation technique, termed BCDB, for evaluating nearest-neighbor queries (NN-queries,
for short). In contrast to clustered partitioning of spatial data, BCDB explores the
use of declustered partitioning of data to address data and query skew. BCDB uses
summaries of the underling data and a coarse-grained index to localize processing of
the NN-query on each local node as much as possible. The coarse-grained index is locally traversed using a new uncertain version of classical distance browsing resulting in minimal O( √k) elements to be communicated across all processing nodes.</p>
</div>
</div>
|
144 |
Weighing Machine Learning Algorithms for Accounting RWISs Characteristics in METRo : A comparison of Random Forest, Deep Learning & kNNLandmér Pedersen, Jesper January 2019 (has links)
The numerical model to forecast road conditions, Model of the Environment and Temperature of Roads (METRo), laid the foundation of solving the energy balance and calculating the temperature evolution of roads. METRo does this by providing a numerical modelling system making use of Road Weather Information Stations (RWIS) and meteorological projections. While METRo accommodates tools for correcting errors at each station, such as regional differences or microclimates, this thesis proposes machine learning as a supplement to the METRo prognostications for accounting station characteristics. Controlled experiments were conducted by comparing four regression algorithms, that is, recurrent and dense neural network, random forest and k-nearest neighbour, to predict the squared deviation of METRo forecasted road surface temperatures. The results presented reveal that the models utilising the random forest algorithm yielded the most reliable predictions of METRo deviations. However, the study also presents the promise of neural networks and the ability and possible advantage of seasonal adjustments that the networks could offer.
|
145 |
PCA-tree: uma proposta para indexação multidimensional / PCA-Tree: a multidimensional access method proposalBernardina, Philipe Dalla 15 June 2007 (has links)
Com o vislumbramento de aplicações que exigiam representações em espaços multidimensionais, surgiu a necessidade de desenvolvimento de métodos de acessos eficientes a estes dados representados em R^d. Dentre as aplicações precursoras dos métodos de acessos multidimensionais, podemos citar os sistemas de geoprocessamento, aplicativos 3D e simuladores. Posteriormente, os métodos de acessos multidimensionais também apresentaram-se como uma importante ferramenta no projeto de classificadores, principalmente classificadores pelos vizinhos mais próximos. Com isso, expandiu-se o espaço de representação, que antes se limitava no máximo a quatro dimensões, para dimensionalidades superiores a mil. Dentre os vários métodos de acesso multidimensional existentes, destaca-se uma classe de métodos baseados em árvores balanceadas com representação em R^d. Estes métodos constituem evoluções da árvore de acesso unidimenisonal B-tree e herdam várias características deste último. Neste trabalho, apresentamos alguns métodos de acessos dessa classe de forma a ilustrar a idéia central destes algoritmos e propomos e implementamos um novo método de acesso, a PCA-tree. A PCA-tree utiliza uma heurística de quebra de nós baseada na extração da componente principal das amostras a serem divididas. Um hiperplano que possui essa componente principal como seu vetor normal é definido como o elemento que divide o espaço associado ao nó. A partir dessa idéia básica geramos uma estrutura de dados e algoritmos que utilizam gerenciamento de memória secundária como a B-tree. Finalmente, comparamos o desempenho da PCA-tree com o desempenho de alguns outros métodos de acesso da classe citada, e apresentamos os prós e contras deste novo método de acesso através de análise de resultados práticos. / The advent of applications demanding the representation of objects in multi-dimensional spaces fostered the development of efficient multi-dimensional access methods. Among some early applications that required multi-dimensional access methods, we can cite geo-processing systems, 3D applications and simulators. Later on, multi-dimensional access methods also became important tools in the design of classifiers, mainly of those based on nearest neighbors technique. Consequently, the dimensionality of the spaces has increased, from earlier at most four to dimensionality larger than a thousand. Among several multi-dimensional access methods, the class of approaches based on balanced tree structures with data represented in Rd has received a lot of attention. These methods constitute evolues from the B-tree for unidimensional accesses, and inherit several of its characteristics. In this work, we present some of the access methods based on balanced trees in order to illustrate the central idea of these algorithms, and we propose and implement a new multi-dimensional access method, which we call PCA-tree. It uses an heuristic to break nodes based on the principal component of the sample to be divided. A hyperplane, whose normal is the principal component, is defined as the one that will split the space represented by the node. From this basic idea we define the data structure and the algorithms for the PCA-tree employing secondary memory management, as in B-trees. Finally, we compare the performance of the PCA-tree with the performance of other methods in the cited class, and present advantages and disadvantages of the proposed access method through analysis of experimental results.
|
146 |
Artificial intelligence and Machine learning : a diabetic readmission studyForsman, Robin, Jönsson, Jimmy January 2019 (has links)
The maturing of Artificial intelligence provides great opportunities for healthcare, but also comes with new challenges. For Artificial intelligence to be adequate a comprehensive analysis of the data is necessary along with testing the data in multiple algorithms to determine which algorithm is appropriate to use. In this study collection of data has been gathered that consists of patients who have either been readmitted or not readmitted to hospital within 30-days after being admitted. The data has then been analyzed and compared in different algorithms to determine the most appropriate algorithm to use.
|
147 |
Floodplain Mapping in Data-Scarce Environments Using Regionalization TechniquesKeighobad Jafarzadegan (5929811) 10 June 2019 (has links)
<p>Flooding
is one of the most devastating and frequently occurring natural phenomena in
the world. Due to the adverse impacts of floods on the life and property of
humans, it is crucial to investigate the best flood modeling approaches for
delineation of floodplain areas. Conventionally, different hydrodynamic models
are used to identify the floodplain areas. However, the high computational cost,
and the dependency of these models on detailed input datasets limit their
application for large scale floodplain mapping in data-scarce regions. Recently, a new floodplain mapping method
based on a hydrogeomorphic feature, named Height Above Nearest Drainage (<i>HAND</i>),
has been proposed as a successful alternative for fast and efficient floodplain
mapping at the large scale. The overall goal of this study is to improve the
performance of <i>HAND</i>-based method by overcoming its current limitations.
The main focus will be on extending the application of the <i>HAND</i>-based
method to data-scarce environments. To achieve this goal, regionalization
techniques are integrated with the floodplain models at the regional and
continental scales. Considering these facts, four research objective are
established to (1) Develop a regression model to create 100-year floodplain
maps at a regional scale (2) Develop a classification framework for creating
100-year floodplain maps for the Contiguous United States (3) Develop a new
version of the <i>HAND</i>-based method for creating probabilistic 100-year
floodplain maps, and (4) Propose a general regionalization framework for
transferring information from data-rich basins to data-scarce environments. </p>
<p> </p>
<p>In the
first objective, the state of North Carolina is selected as the study area, and
a regression model is developed to regionalize the available 100-year Flood
Insurance Rate Maps (FIRMs) to the data-scarce regions. The regression model is
an exponential equation with three independent variables including the average
slope, the average elevation, and the main stream slope of the watershed. The
results show that the estimated floodplains are within the expected range of
accuracy of C>0.6 and F>0.9 for majority of watersheds located in the
mid-altitude regions, but it overpredicts and underpredicts in the flat and
mountainous regions respectively. </p>
<p> </p>
<p>The
second objective of this research extends the spatial application of the <i>HAND</i>-based
method to the entire United States by proposing a new classification framework.
The proposed framework classifies the watersheds into three groups by using
seven watershed characteristics related to the topography, climate and land
use. The validation results show that the average error of floodplain maps is
around 14% which demonstrate the reliability and robustness of the proposed
framework for continental floodplain mapping. In addition to the acceptable
accuracy, the proposed framework creates the floodplain maps for any watershed
within the United States. </p>
<p> </p>
<p>The <i>HAND</i>-based
method is a deterministic modeling approach to floodplain mapping. In the third
objective, the probabilistic version of this method is proposed. Using a
probabilistic approach to floodplain mapping provides more informative maps. In
this study, a flat watershed in the state of Kansas is selected as the case
study, and the performance of four probabilistic functions for floodplain
mapping is compared. The results show that a linear function with one parameter
and a gamma function with two parameters are the best options for this study
area. It is also shown that the proposed probabilistic approach can reduce the
overpredictions and underpredictions made by the deterministic <i>HAND</i>-based
approach. </p>
<p> </p>
<p>In the
fourth objective, a new regionalization framework for transferring the
calibrated environmental models to data-scarce regions is proposed. This
framework aims to improve the current similarity-based regionalization methods
by reducing the subjectivity that exists in the selection of basin descriptors.
Using this framework for the probabilistic <i>HAND</i>-based method in the
third objective, the floodplains are regionalized for a large set of watersheds
in the Central United States. The results show that “vertical component of
centroid (or latitude)” is the dominant descriptor of spatial variabilities in
the probabilistic floodplain maps. This is an interesting finding which shows
how a systematic approach can help to explore the hidden descriptors for
regionalization. It is demonstrated that using common methods, such as
correlation coefficient calculation, or stepwise regression analysis, will not
reveal the critical role of latitude on the spatial variability of floodplains.</p>
|
148 |
Fraud or Not?Åkerblom, Thea, Thor, Tobias January 2019 (has links)
This paper uses statistical learning to examine and compare three different statistical methods with the aim to predict credit card fraud. The methods compared are Logistic Regression, K-Nearest Neighbour and Random Forest. They are applied and estimated on a data set consisting of nearly 300,000 credit card transactions to determine their performance using classification of fraud as the outcome variable. The three models all have different properties and advantages. The K-NN model preformed the best in this paper but has some disadvantages, since it does not explain the data but rather predict the outcome accurately. Random Forest explains the variables but performs less precise. The Logistic Regression model seems to be unfit for this specific data set.
|
149 |
PCA-tree: uma proposta para indexação multidimensional / PCA-Tree: a multidimensional access method proposalPhilipe Dalla Bernardina 15 June 2007 (has links)
Com o vislumbramento de aplicações que exigiam representações em espaços multidimensionais, surgiu a necessidade de desenvolvimento de métodos de acessos eficientes a estes dados representados em R^d. Dentre as aplicações precursoras dos métodos de acessos multidimensionais, podemos citar os sistemas de geoprocessamento, aplicativos 3D e simuladores. Posteriormente, os métodos de acessos multidimensionais também apresentaram-se como uma importante ferramenta no projeto de classificadores, principalmente classificadores pelos vizinhos mais próximos. Com isso, expandiu-se o espaço de representação, que antes se limitava no máximo a quatro dimensões, para dimensionalidades superiores a mil. Dentre os vários métodos de acesso multidimensional existentes, destaca-se uma classe de métodos baseados em árvores balanceadas com representação em R^d. Estes métodos constituem evoluções da árvore de acesso unidimenisonal B-tree e herdam várias características deste último. Neste trabalho, apresentamos alguns métodos de acessos dessa classe de forma a ilustrar a idéia central destes algoritmos e propomos e implementamos um novo método de acesso, a PCA-tree. A PCA-tree utiliza uma heurística de quebra de nós baseada na extração da componente principal das amostras a serem divididas. Um hiperplano que possui essa componente principal como seu vetor normal é definido como o elemento que divide o espaço associado ao nó. A partir dessa idéia básica geramos uma estrutura de dados e algoritmos que utilizam gerenciamento de memória secundária como a B-tree. Finalmente, comparamos o desempenho da PCA-tree com o desempenho de alguns outros métodos de acesso da classe citada, e apresentamos os prós e contras deste novo método de acesso através de análise de resultados práticos. / The advent of applications demanding the representation of objects in multi-dimensional spaces fostered the development of efficient multi-dimensional access methods. Among some early applications that required multi-dimensional access methods, we can cite geo-processing systems, 3D applications and simulators. Later on, multi-dimensional access methods also became important tools in the design of classifiers, mainly of those based on nearest neighbors technique. Consequently, the dimensionality of the spaces has increased, from earlier at most four to dimensionality larger than a thousand. Among several multi-dimensional access methods, the class of approaches based on balanced tree structures with data represented in Rd has received a lot of attention. These methods constitute evolues from the B-tree for unidimensional accesses, and inherit several of its characteristics. In this work, we present some of the access methods based on balanced trees in order to illustrate the central idea of these algorithms, and we propose and implement a new multi-dimensional access method, which we call PCA-tree. It uses an heuristic to break nodes based on the principal component of the sample to be divided. A hyperplane, whose normal is the principal component, is defined as the one that will split the space represented by the node. From this basic idea we define the data structure and the algorithms for the PCA-tree employing secondary memory management, as in B-trees. Finally, we compare the performance of the PCA-tree with the performance of other methods in the cited class, and present advantages and disadvantages of the proposed access method through analysis of experimental results.
|
150 |
Extensão do Método de Predição do Vizinho mais Próximo para o modelo Poisson misto / An Extension of Nearest Neighbors Prediction Method for mixed Poisson modelArruda, Helder Alves 28 March 2017 (has links)
Várias propostas têm surgido nos últimos anos para problemas que envolvem a predição de observações futuras em modelos mistos, contudo, para os casos em que o problema trata-se em atribuir valores para os efeitos aleatórios de novos grupos existem poucos trabalhos. Tamura, Giampaoli e Noma (2013) propuseram um método que consiste na computação das distâncias entre o novo grupo e os grupos com efeitos aleatórios conhecidos, baseadas nos valores das covariáveis, denominado Método de Predição do Vizinho Mais Próximo ou NNPM (Nearest Neighbors Prediction Method), na sigla em inglês, considerando o modelo logístico misto. O objetivo deste presente trabalho foi o de estender o método NNPM para o modelo Poisson misto, além da obtenção de intervalos de confiança para as predições, para tais fins, foram propostas novas medidas de desempenho da predição e o uso da metodologia Bootstrap para a criação dos intervalos. O método de predição foi aplicado em dois conjuntos de dados reais e também no âmbito de estudos de simulação, em ambos os casos, obtiveram-se bons desempenhos. Dessa forma, a metodologia NNPM apresentou-se como um método de predição muito satisfatório também no caso Poisson misto. / Many proposals have been created in the last years for problems in the prediction of future observations in mixed models, however, there are few studies for cases that is necessary to assign random effects values for new groups. Tamura, Giampaoli and Noma (2013) proposed a method that computes the distances between a new group and groups with known random effects based on the values of the covariates, named as Nearest Neighbors Prediction Method (NNPM), considering the mixed logistic model. The goal of this dissertation was to extend the NNPM for the mixed Poisson model, in addition to obtaining confidence intervals for predictions. To attain such purposes new prediction performance measures were proposed as well as the use of Bootstrap methodology for the creation of intervals. The prediction method was applied in two sets of real data and in the simulation studies framework. In both cases good performances were obtained. Thus, the NNPM proved to be a viable prediction method also in the mixed Poisson case.
|
Page generated in 0.052 seconds