1 |
Anomaly handling in visual analyticsNguyen, Quyen Do. January 2008 (has links)
Thesis (M.S.)--Worcester Polytechnic Institute. / Keywords: anomaly; outlier; visuallization; visual analytics. Includes bibliographical references (leaves 68-72 ).
|
2 |
Automated quantification of plant water transport network failure using deep learningNaidoo, Tristan 08 March 2022 (has links)
Droughts, exacerbated by anthropogenic climate change, threaten plants through hydraulic failure. This hydraulic failure is caused by the formation of embolisms which block water flow in a plant's xylem conduits. By tracking these failures over time, vulnerability curves (VCs) can be created. The creation of these curves is laborious and time consuming. This study seeks to automate the creation of these curves. In particular, it seeks to automate the optical vulnerability (OV) method of determining hydraulic failure. To do this, embolisms need to be segmented across a sequence of images. Three fully convolutional models were considered for this task, namely U-Net, U-Net (ResNet34), and W-Net. The sample consisted of four unique leaves, each with its own sequence of images. Using these leaves, three experiments were conducted. They considered whether a leaf could generalise across samples from the same leaf, across different leaves of the same species, and across different species. The results were assessed on two levels; the first considered the results of the segmentation, and the second considered how well VCs could be constructed. Across the three experiments, the highest test precision-recall AUCs achieved were 81%, 45%, and 40%. W-Net performed the worst across the models, while U-Net and U-Net (ResNet-34) performed similarly to one another. VC reconstruction was assessed using two metrics. The first is Normalised Root Mean Square Error. The second is the difference in Ψ50 values between the true VC and the predicted VC, where Ψ50 is a physiological value of interest. This study found that the shape of the VCs could be reconstructed well if the model was able to recall a portion of embolisms across all images which had embolisms. Moreover, it found that some images may be more important than others due to a non-linear mapping between time and water potential. VC reconstruction was satisfactory, except for the third experiment. This study demonstrates that, in certain scenarios, automation of the OV method is attainable. To support the ubiquitous use and development of the work done in this study, a website was created to document the code base. In addition, this website contains instructions on how to interact with the code base. For more information please visit: https://plant-network-segmentation.readthedocs.io/.
|
3 |
Estimating Poverty from Aerial Images Using Convolutional Neural Networks Coupled with Statistical Regression ModellingMaluleke, Vongani 30 April 2020 (has links)
Policy makers and the government rely heavily on survey data when making policyrelated decisions. Survey data is labour intensive, costly and time consuming, hence it cannot be frequently or extensively collected. The main aim of this research is to demonstrate how Convolutional Neural Network (CNN) coupled with statistical regression modelling can be used to estimate poverty from aerial images supplemented with national household survey data. This provides a more frequent and automated method for updating data that can be used for policy making. This aerial poverty estimation approach is executed in two phases; aerial classification and detection phase and poverty modelling phase. The aerial classification and detection phase use CNN to perform settlement typology classification of the aerial images into three broad geotype classes namely; urban, rural and farm. This is then followed by object detection to detect three broad dwelling type classes in the aerial images namely; brick house, traditional house, and informal settlement. Mask Region-based Convolutional Neural Network (Mask R-CNN) model with a resnet101 CNN backbone model is used to perform this task. The second phase, poverty modelling phase, involves using NIDS data to compute the poverty measure Sen-Shorrocks-Thon (SST) index. This is followed by using regression models to model the poverty measure using aggregated results from the aerial classification and detection phase. The study area for this research is Kwa-Zulu Natal (KZN), South Africa. However, this approach can be extended to other provinces in South Africa, by retraining the models on data associated with the location in question.
|
4 |
Optimising the Optimiser: Meta NeuroEvolution for Artificial Intelligence ProblemsHayes, Max Nieuwoudt 26 January 2022 (has links)
Since reinforcement learning algorithms have to fully solve a task in order to evaluate a set of hyperparameter values, conventional hyperparameter tuning methods can be highly sample inefficient and computationally expensive. Many widely used reinforcement learning architectures originate from scientific papers which include optimal hyperparameter values in the publications themselves, but do not indicate how the hyperparameter values were found. To address the issues related to hyperparameter tuning, three different experiments were investigated. In the first two experiments, Bayesian Optimisation and random search are compared. In the third and final experiment, the hyperparameter values found in second experiment are used to solve a more difficult reinforcement learning task, effectively performing hyperparameter transfer learning (later referred to as meta-transfer learning). The results from experiment 1 showed that there are certain scenarios in which Bayesian Optimisation outperforms random search for hyperparameter tuning, while the results of experiment 2 show that as more hyperparameters are simultaneously tuned, Bayesian Optimisation consistently finds better hyperparameter values than random search. However, BO took more than twice the amount of time to find these hyperparameter values than random search. Results from the third and final experiment indicate that hyperparameter values learned while tuning hyperparameters for a relatively easy to solve reinforcement learning task (Task A), can be used to solve a more complex task (Task B). With the available computing power for this thesis, hyperparameter optimisation was possible on the tasks in experiment 1 and experiment 2. This was not possible on the task in experiment 3, due to limited computing resources and the increased complexity of the reinforcement learning task in experiment 3, making the transfer of hyperparameters from one task (Task A) to the more difficult task (Task B) highly beneficial for solving the more computationally expensive task. The purpose of this work is to explore the effectiveness of Bayesian Optimisation as a hyperparameter tuning algorithm on the reinforcement learning algorithm NEAT's hyperparemters. An additional goal of this work is the experimental use of hyperparameter value transfer between reinforcement learning tasks, referred to in this work as Meta-Transfer Learning. This is introduced and discussed in greater detail in the Introduction chapter. All code used for this work is available in the repository: • https://github.com/maaxnaax/MSc_code
|
5 |
Modelling non-linearity in 3D shapes: A comparative study of Gaussian process morphable models and variational autoencoders for 3D shape dataFehr, Fabio 10 February 2022 (has links)
The presence of non-linear shape variation in 3D data is known to influence the reliability of linear statistical shape models (SSM). This problem is regularly acknowledged, but disregarded, as it is assumed that linear models are able to adequately approximate such non-linearities. Model reliability is crucial for medical imaging and computer vision tasks; however, prior to modelling, the non-linearity in the data is not often considered. The study provides a framework to identify the presence of non-linearity in using principal component analysis (PCA) and autoencoders (AE) shape modelling methods. The data identified to have linear and non-linear shape variations is used to compare two sophisticated techniques: linear Gaussian process morphable models (GPMM) and non-linear variational autoencoders (VAE). Their model performance is measured using generalisation, specificity and computational efficiency in training. The research showed that, given limited computational power, GPMMs managed to achieve improved relative generalisation performance compared to VAEs, in the presence of non-linear shape variation by at least a factor of six. However, the non-linear VAEs, despite the simplistic training scheme, presented improved specificity generative performance of at least 18% for both datasets.
|
6 |
A Model-driven Visual Analytic Framework for Local Pattern AnalysisZhao, Kaiyu 09 February 2016 (has links)
The ultimate goal of any visual analytic task is to make sense of the data and gain insights. Unfortunately, the process of discovering useful information is becoming more challenging nowadays due to the growing data scale. Particularly, the human cognitive capabilities remain constant whereas the scale and complexity of data are not. Meanwhile, visual analytics largely relies on human analytic in the loop which imposes challenge to traditional human-driven workflow. It is almost impossible to show every aspect of details to the user while diving into local region of the data to explain phenomenons hidden in the data. For example, while exploring the data subsets, it is always important to determine which partitions of data contain more important information. Also, determining the subset of features is vital before further doing other analysis. Furthermore, modeling on these subsets of data locally can yield great finding but also introduces bias. In this work, a model driven visual analytic framework is proposed to help identify interesting local patterns from the above three aspects. This dissertation work aims to tackle these subproblems in the following three topics: model-driven data exploration, model-driven feature analysis and local model diagnosis. First, the model-driven data exploration focus on the problem of modeling subset of data to identify the co-movement of time-series data within certain subset time partitions, which is an important application in a number of domains such as medical science, finance, business and engineering. Second, the model-driven feature analysis is to discover the important subset of interesting features while analyzing local feature similarities. Within the financial risk dataset collected by domain expert, we discover that the feature correlation among different data partitions (i.e., small and large companies) are very different. Third, local model diagnosis provides a tool to identify interesting local regression models at local regions of the data space which makes it possible for the analysts to model the whole data space with a set of local models while knowing the strength and weakness of them. The three tools provide an integrated solution for identifying interesting patterns within local subsets of data.
|
7 |
Anomaly Handling in Visual AnalyticsNguyen, Quyen Do 23 December 2007 (has links)
"Visual analytics is an emerging field which uses visual techniques to interact with users in the analytical reasoning process. Users can choose the most appropriate representation that conveys the important content of their data by acting upon different visual displays. The data itself has many features of interest, including clusters, trends (commonalities) and anomalies. Most visualization techniques currently focus on the discovery of trends and other relations, where uncommon phenomena are treated as outliers and are either removed from the datasets or de-emphasized on the visual displays. Much less work has been done on the visual analysis of outliers, or anomalies. In this thesis, I will introduce a method to identify the different levels of “outlierness†by using interactive selection and other approaches to process outliers after detection. In one approach, the values of these outliers will be estimated from the values of their k-Nearest Neighbors and replaced to increase the consistency of the whole dataset. Other approaches will leave users with the choice of removing the outliers from the graphs or highlighting the unusual patterns on the graphs if points of interest lie in these anomalous regions. I will develop and test these anomaly handling methods within the XMDV Tool."
|
8 |
Personal Analytical CalendarTavakkol, Sanaz 02 May 2014 (has links)
Data is all around us, everywhere we go and in every activity we do. It exists in all aspects
of our everyday personal life. Making sense of these personal daily data, which leads to
more self-awareness is becoming remarkably important as we can learn more about our
habits and behavior and therefore we can reflect upon this extended self-knowledge.
Particularly, these data can assist people to learn more about themselves, uncover existing
patterns in their behaviors or habits and help them to take action towards newly developed
goals. Accordingly, they can either try to improve their behaviors to gain better results and
trends or to maintain existing ones. Through the interviews that I conducted, I learned that
“Productivity” is one of the most important personal attributes that people are very
interested to monitor, track and improve in their daily lives. People are interested to learn
more about the supportive or preventive causes that effect their daily productivity, which
eventually can help them to improve their time-management and self-management. In this
thesis, I focus on two research questions: (1) How can we design a visualization tool to help
people be more engaged in understanding their daily productivity? In order for people to
learn more about themselves, they need context about their living habits and activities. So
I chose digital calendars as a platform to integrate productivity related information as they
provide beneficial contextual information, supporting many of the questions that people
ask themselves about their personal data. As the next step, I had to find an effective way of
representing influential factors on productivity on the calendar. This led to define my
second research question: (2) What combination of visual encodings will enable people to
most easily identify a relationship between two different pieces of daily information
rendered on a calendar? For finding the best visual encoding, I considered encoding
Numeric data using Saturation and Length encodings, and Nominal data using Shape
encoding. I designed two types of questions: Calendar related questions, to investigate the
interference level of visualizations in calendar related tasks, and Visualization related
questions to identify which visualization is faster and leads to more accurate results and
better user ratings. I compared the combination of Numeric x Numeric (Saturation x
Saturation, Saturation x Length, Length x Length) and Numeric x Nominal (Shape x
Length, Shape x Saturation) data encodings. My results demonstrated the following: for
Calendar Task questions and in Numeric x Numeric category, Length x Length had the
overall best results. For the same task set and in Numeric x Nominal category, Shape x
Length was rated the best. For Visualization Task questions and in Numeric x Numeric
category, Saturation x Saturation had the better performance overall in most of the cases
and for same task set and in Numeric x Nominal category, Shape x Saturation was the
fastest while Shape x Length was the most accurate. These findings along with interviews
provided me with useful information for refining the visualization designs to more accurate,
more user-friendly and faster visualizations which assist people in monitoring goals, trends,
status, contexts, influencing factors and differences in their productivity related personal
daily data and brings them more insight awareness and possibly self-reflection. / Graduate / 0984 / tavakkol@uvic.ca
|
9 |
Personal Analytical CalendarTavakkol, Sanaz 02 May 2014 (has links)
Data is all around us, everywhere we go and in every activity we do. It exists in all aspects
of our everyday personal life. Making sense of these personal daily data, which leads to
more self-awareness is becoming remarkably important as we can learn more about our
habits and behavior and therefore we can reflect upon this extended self-knowledge.
Particularly, these data can assist people to learn more about themselves, uncover existing
patterns in their behaviors or habits and help them to take action towards newly developed
goals. Accordingly, they can either try to improve their behaviors to gain better results and
trends or to maintain existing ones. Through the interviews that I conducted, I learned that
“Productivity” is one of the most important personal attributes that people are very
interested to monitor, track and improve in their daily lives. People are interested to learn
more about the supportive or preventive causes that effect their daily productivity, which
eventually can help them to improve their time-management and self-management. In this
thesis, I focus on two research questions: (1) How can we design a visualization tool to help
people be more engaged in understanding their daily productivity? In order for people to
learn more about themselves, they need context about their living habits and activities. So
I chose digital calendars as a platform to integrate productivity related information as they
provide beneficial contextual information, supporting many of the questions that people
ask themselves about their personal data. As the next step, I had to find an effective way of
representing influential factors on productivity on the calendar. This led to define my
second research question: (2) What combination of visual encodings will enable people to
most easily identify a relationship between two different pieces of daily information
rendered on a calendar? For finding the best visual encoding, I considered encoding
Numeric data using Saturation and Length encodings, and Nominal data using Shape
encoding. I designed two types of questions: Calendar related questions, to investigate the
interference level of visualizations in calendar related tasks, and Visualization related
questions to identify which visualization is faster and leads to more accurate results and
better user ratings. I compared the combination of Numeric x Numeric (Saturation x
Saturation, Saturation x Length, Length x Length) and Numeric x Nominal (Shape x
Length, Shape x Saturation) data encodings. My results demonstrated the following: for
Calendar Task questions and in Numeric x Numeric category, Length x Length had the
overall best results. For the same task set and in Numeric x Nominal category, Shape x
Length was rated the best. For Visualization Task questions and in Numeric x Numeric
category, Saturation x Saturation had the better performance overall in most of the cases
and for same task set and in Numeric x Nominal category, Shape x Saturation was the
fastest while Shape x Length was the most accurate. These findings along with interviews
provided me with useful information for refining the visualization designs to more accurate,
more user-friendly and faster visualizations which assist people in monitoring goals, trends,
status, contexts, influencing factors and differences in their productivity related personal
daily data and brings them more insight awareness and possibly self-reflection. / Graduate / 0984 / tavakkol@uvic.ca
|
10 |
Criação de valor estratégico através de digital analytics / STRATEGIC VALUE CREATION THROUGH DIGITAL ANALYTICSOliveira, Claudio Luis Cruz de 20 December 2012 (has links)
A Internet mudou a competição entre as empresas, alterou produtos, cadeias de valor e até mesmo os mercados. Sua democratização aumentou o poder dos consumidores; esta mudança pode ser uma ameaça para as corporações. No entanto, o conhecimento emergente derivado de Digital Analtics gerou uma série de benefícios como a personalização de serviços, o impulso à inovação e a promoção de um diálogo com o consumidor em tempo real. O conceito de Digital Analytics inclui o monitoramento, a coleta, a análise e a elaboração de relatórios de dados digitais com a finalidade de entendimento e otimização da performance dos negócios. Esse estudo objetiva entender como as empresas brasileiras implementam Digital Analytics para atingir seus objetivos de negócios e, por consequência, suportar uma vantagem competitiva. Uma survey exploratória e múltiplos estudos de caso compõem a pesquisa de campo dessa tese. / Internet has changed the competition, shifting products, supply-chains and even markets. Its democratization gives more weight to the consumers; this change could be considered a threat to corporations. Although, the emergent knowledge derived from Digital Analytics brings a lot of benefits: delivering personalized services, fostering innovation and promoting the dialogue with the consumer in a real time basis. The concept of Digital Analytics includes the measurement, collection, analysis and reporting of digital data for the purposes of understanding and optimizing business performance. This paper aims to understand why and how the companies implement Digital Analytics in order to achieve business goals and thus support the competitive advantage. An exploratory survey and multiple case studies compound the methodological approach of this paper.
|
Page generated in 0.0296 seconds