Spelling suggestions: "subject:"classification algorithm"" "subject:"classification allgorithm""
1 |
Change detection for activity recognitionBashir, Sulaimon A. January 2017 (has links)
Activity Recognition is concerned with identifying the physical state of a user at a particular point in time. Activity recognition task requires the training of classification algorithm using the processed sensor data from the representative population of users. The accuracy of the generated model often reduces during classification of new instances due to the non-stationary sensor data and variations in user characteristics. Thus, there is a need to adapt the classification model to new user haracteristics. However, the existing approaches to model adaptation in activity recognition are blind. They continuously adapt a classification model at a regular interval without specific and precise detection of the indicator of the degrading performance of the model. This approach can lead to wastage of system resources dedicated to continuous adaptation. This thesis addresses the problem of detecting changes in the accuracy of activity recognition model. The thesis developed a classifier for activity recognition. The classifier uses three statistical summaries data that can be generated from any dataset for similarity based classification of new samples. The weighted ensemble combination of the classification decision from each statistical summary data results in a better performance than three existing benchmarked classification algorithms. The thesis also presents change detection approaches that can detect the changes in the accuracy of the underlying recognition model without having access to the ground truth label of each activity being recognised. The first approach called `UDetect' computes the change statistics from the window of classified data and employed statistical process control method to detect variations between the classified data and the reference data of a class. Evaluation of the approach indicates a consistent detection that correlates with the error rate of the model. The second approach is a distance based change detection technique that relies on the developed statistical summaries data for comparing new classified samples and detects any drift in the original class of the activity. The implemented approach uses distance function and a threshold parameter to detect the accuracy change in the classifier that is classifying new instances. Evaluation of the approach yields above 90% detection accuracy. Finally, a layered framework for activity recognition is proposed to make model adaptation in activity recognition informed using the developed techniques in this thesis.
|
2 |
Irrigator Responses to Changes in Water Availability in Idaho's Snake River PlainChance, Eric Wilson 18 July 2017 (has links)
Understanding irrigator responses to previous changes in water availability is critical to building effective institutions that allow for efficient and resilient management of water resources in the face of potentially increasing scarcity due to climate change. Using remote sensing data, I examined irrigator responses to seasonal changes in water availability in Idaho's Snake River Plain over the past 33 years. Google Earth Engine's high performance cloud computing and big data processing capabilities were used to compare the performance of three spectral indices, three compositing algorithms and two sensors for 2002 and 2007 for distinguishing between irrigated and non-irrigated parcels. We demonstrate that, on average, the seasonal-maximum algorithm yields a 60% reduction in county scale root mean square error (RMSE) over the accepted single-date approach. We use the best performing classification method, a binary threshold of the seasonal maximum of the Normalized Difference Moisture Index (NDMI), to identify irrigated and non-irrigated lands in Idaho's Snake River Basin for 1984-2016 using Landsat 5-8 data. NDMI of irrigated lands was found to generally increase over time, likely as a result of changes in agricultural practices increasing crop productivity. Furthermore, we find that irrigators with rights to small areas, and those with only surface water rights are more likely to have a major reduction (>25%) in irrigated area and conversely those with a large, groundwater rights are more likely to have major increases (>25%) in the extent of their irrigation. / Master of Science / Understanding irrigator responses to previous changes in water availability is critical to building effective institutions that allow for efficient and resilient management of water resources in the face of potentially increasing scarcity due to climate change. Using remote sensing data, I examined irrigator responses to seasonal changes in water availability in Idaho’s Snake River Plain over the past 33 years. Google Earth Engine’s high performance cloud computing and big data processing capabilities were used to compare the performance of three spectral indices, three compositing algorithms and two sensors for 2002 and 2007 for distinguishing between irrigated and non-irrigated parcels. We demonstrate that, on average, the seasonal-maximum algorithm yields a 60% reduction in county scale root mean square error (RMSE) over the accepted single-date approach. We use the best performing classification method, a binary threshold of the seasonal maximum of the Normalized Difference Moisture Index (NDMI), to identify irrigated and non-irrigated lands in Idaho’s Snake River Basin for 1984-2016 using Landsat 5-8 data. NDMI of irrigated lands was found to generally increase over time, likely as a result of changes in agricultural practices increasing crop productivity. Furthermore, we find that irrigators with rights to small areas, and those with only surface water rights are more likely to have a major reduction (>25%) in irrigated area and conversely those with a large, groundwater rights are more likely to have major increases (>25%) in the extent of their irrigation.
|
3 |
A system of deception and fraud detection using reliable linguistic cues including hedging, disfluencies, and repeated phrasesHumpherys, Sean L. January 2010 (has links)
Given the increasing problem of fraud, crime, and national security threats, assessing credibility is a recurring research topic in Information Systems and in other disciplines. Decision support systems can help. But the success of the system depends on reliable cues that can distinguish deceptive/truthful behavior and on a proven classification algorithm. This investigation aims to identify linguistic cues that distinguish deceivers from truthtellers; and it aims to demonstrate how the cues can successfully classify deception and truth.Three new datasets were gathered: 202 fraudulent and nonfraudulent financial disclosures (10-Ks), a laboratory experiment that asked twelve questions of participants who answered deceptively to some questions and truthfully to others (Cultural Interviews), and a mock crime experiment where some participants stole a ring from an office and where all participants were interviewed as to their guilt or innocence (Mock Crime). Transcribed participant responses were investigated for distinguishing cues and used for classification testing.Disfluencies (e.g., um, uh, repeated phrases, etc.), hedging words (e.g., perhaps, may, etc.), and interjections (e.g., okay, like, etc.) are theoretically developed as potential cues to deception. Past research provides conflicting evidence regarding disfluency use and deception. Some researchers opine that deception increases cognitive load, which lowers attentional resources, which increases speech errors, and thereby increases disfluency use (i.e., Cognitive-Load Disfluency theory). Other researchers argue against the causal link between disfluencies and speech errors, positing that disfluencies are controllable and that deceivers strategically avoid disfluencies to avoid appearing hesitant or untruthful (i.e., Suppression-Disfluency theory). A series of t-tests, repeated measures GLMs, and nested-model design regressions disconfirm the Suppression-Disfluency theory. Um, uh, and interjections are used at an increased rate by deceivers in spontaneous speech. Reverse order questioning did not increase disfluency use. Fraudulent 10-Ks have a higher mean count of hedging words.Statistical classifiers and machine learning algorithms are demonstrated on the three datasets. A feature reduction by backward Wald stepwise with logistic regression had the highest classification accuracies (69%-87%). Accuracies are compared to professional interviewers and to previously researched classification models. In many cases the new models demonstrated improvements. 10-Ks are classified with 69% overall accuracy.
|
4 |
Mutual k Nearest Neighbor based ClassifierGupta, Nidhi January 2010 (has links)
No description available.
|
5 |
Classificação semiautomática de imagens de satélites e suas implicações na modelação do escoamento superficial direto em bacias urbanas / Semi-automatic classification of satellite images and their implications in modeling direct runoff in urban watershedsAngelini Sobrinha, Lôide 15 July 2016 (has links)
A modelagem hidrológica quando associada aos recursos do sensoriamento remoto e do geoprocessamento torna-se uma ferramenta importante, pois é capaz de estabelecer diferentes cenários da cobertura e do uso da terra e suas implicações na drenagem urbana, auxiliando no planejamento urbano. Entretanto, a relação entre o modelo chuva x vazão e tais técnicas, com finalidade de avaliar classificadores de imagens a partir de hidrogramas de cheia não foi encontrada na literatura, tornando esse o objetivo principal desta tese. Para isso, foram utilizadas três imagens de satélite de diferentes resoluções espaciais (0,5m, 5m e 15m) e três algoritmos classificadores (Máxima Verossimilhança, Máquinas Vetores Suporte e Análise Orientada a Objeto) e formados conjuntos denominado \"classificador-imagem\" para classificação da cobertura e do uso da terra. As áreas das classes dos usos da terra de cada conjunto \"classificador-imagem\" e os valores de Curve Number foram os principais dados de entrada do modelo chuva-vazão NRCS, que permitiu gerar os hidrogramas de cheia para cada caso. Os hidrogramas simulados foram comparados aos hidrogramas observados na bacia e avaliados, quanto a sua representatividade, pelo coeficiente de Nash Sutcliffe. As classificações do uso da terra foram avaliadas pelo Índice Kappa, com valores de 0,58 a 0,99 e pela Exatidão Global, com valores de 0,64 a 0,99. Para as vazões, o coeficiente de Nash Sutcliffe foi considerado satisfatório (NS<0,50) em duas simulações e, nas demais simulações, considerado muito bom (NS>0,75). Para fornecer subsídio a tomada de decisão, foi realizada uma análise multicritério dos conjuntos classificador-imagem, que permitiu classificar os conjuntos com maior desempenho: 1°) o classificador SVM e a imagem Landsat-8; 2°) o classificador MaxVer e a imagem WordView-II; 3°) o classificador NN e a imagem RapidEye. / Hydrological modeling when associated with remote sensing and geoprocessing resources becomes an important tool, because it is able to establish different land use scenarios and its implications for urban drainage, assisting in urban planning. However, the relationship between the routing model and such techniques, for purpose to evaluate images classifiers from the runoff hydrograph was not found in the literature, making this the main objective of this thesis. Thereunto, three satellite images were used in different spatial resolutions (0.5m, 5m and 15m) and three algorithms classifiers (Maximum Likelihood, Support Vector Machine and Oriented Object Analysis) and composed sets called \"classifier-image\" for the land use classification. The areas of the land use classes of each set \"classifier-image\" and the Curve Number values were the main input of the routing model NRCS, which allowed generating the runoff hydrograph for each case. The simulated hydrographs were compared to the observed hydrograph in the basin and evaluated their representativeness through the Nash Sutcliffe coefficient. Kappa Index was calculated to evaluate land use classifications, with values between 0.58 to 0.99 and Global accuracy between 0.64 to 0.99. Towards the flows rates, the Nash Sutcliffe coefficient was considered satisfactory for two simulations (NS<0,50) and, to other simulations, considered very good (NS>0,75). To provide subsidy to decision-making, it carried out a multi-criteria analysis of the classifier-image sets, that allowed to classify the set with higher performance: 1) SVM classifier and Landsat-8 image; 2) MaxVer classifier and WorldView-II image; 3) NN classifier and RapidEye image.
|
6 |
Modelo Predictivo para el diagnóstico de la Diabetes Mellitus Tipo 2 soportado por SAP Predictive AnalyticsOrdóñez Barrios, Diego Alberto, Vizcarra Infantes, Erick Raphael 31 July 2018 (has links)
El presente proyecto se centra en el desarrollo de un modelo predictivo que permite pronosticar el diagnóstico de la diabetes mellitus tipo 2, siendo soportado por la herramienta SAP Predictive Analytics. Tiene como propósito el definir un modelo predictivo cuya implementación permita la optimización del proceso de diagnóstico de la Diabetes Mellitus tipo 2, además permitiendo que el resultado pueda brindar indicios sobre las acciones que una institución prestadora de servicios de cobertura de salud (tanto pública como privada) puede tomar por cada paciente en beneficio del mismo.
Para lograr el propósito del proyecto, se ha realizado una investigación donde hemos alineado las 10 metas mundiales planteadas por la Organización Mundial de la Salud (OMS) a las 4 agrupaciones de enfermedades crónicas de mayor impacto económico, con lo que se ha identificado a la diabetes como la enfermedad crónica de mayor impacto para el Perú debido al creciente factor de incidencia en el país, causado principalmente por serias deficiencias en las costumbres diarias de alimentación y ejercicio en la población peruana, además de ser una enfermedad cuya propagación es alta en países en vías de desarrollo como el Perú debido a que no es mitigada adecuadamente por falta de prevención, desconocimiento o por motivos tan diversos como los económicos. Seguidamente, se realiza un benchmarking de herramientas de Predictive Analytics y las capacidades disponibles de las mismas para identificar cuál de ellas brinda el mejor soporte al modelo predictivo planteado, según el contexto identificado. / The project is focused on the development of a predictive model that enables prediction of the development of type 2 diabetes mellitus supported by SAP Predictive Analytics. Its main purpose is the definition of a predictive model that allows institutions that offer health coverage (both public and private) to optimize their diagnostic process and also enables the use of the prediction result in order to determine which actions could be taken, based on medical recommendations, on behalf of the patients benefit.
To achieve the purpose of the project, an investigation has been done where there was an alignment between the 4 main chronic diseases based on their economic impact and the 10 global goals set by the World Health Organization, identifying diabetes as the chronic disease of the biggest impact for Peru due to the growing incidence factor in the country, caused mainly because of serious deficiencies in daily nutritional habits and a lack of workout culture, along with being a disease that has the most incidence in developing countries such as Peru since it is not mitigated accordingly because of lack of prevention, knowledge or economic motives. There has also been a benchmarking of Predictive Analytics tools in order to see which one complies the best with the requirements of both the chronic disease and the Peruvian context. / Tesis
|
7 |
Classificação semiautomática de imagens de satélites e suas implicações na modelação do escoamento superficial direto em bacias urbanas / Semi-automatic classification of satellite images and their implications in modeling direct runoff in urban watershedsLôide Angelini Sobrinha 15 July 2016 (has links)
A modelagem hidrológica quando associada aos recursos do sensoriamento remoto e do geoprocessamento torna-se uma ferramenta importante, pois é capaz de estabelecer diferentes cenários da cobertura e do uso da terra e suas implicações na drenagem urbana, auxiliando no planejamento urbano. Entretanto, a relação entre o modelo chuva x vazão e tais técnicas, com finalidade de avaliar classificadores de imagens a partir de hidrogramas de cheia não foi encontrada na literatura, tornando esse o objetivo principal desta tese. Para isso, foram utilizadas três imagens de satélite de diferentes resoluções espaciais (0,5m, 5m e 15m) e três algoritmos classificadores (Máxima Verossimilhança, Máquinas Vetores Suporte e Análise Orientada a Objeto) e formados conjuntos denominado \"classificador-imagem\" para classificação da cobertura e do uso da terra. As áreas das classes dos usos da terra de cada conjunto \"classificador-imagem\" e os valores de Curve Number foram os principais dados de entrada do modelo chuva-vazão NRCS, que permitiu gerar os hidrogramas de cheia para cada caso. Os hidrogramas simulados foram comparados aos hidrogramas observados na bacia e avaliados, quanto a sua representatividade, pelo coeficiente de Nash Sutcliffe. As classificações do uso da terra foram avaliadas pelo Índice Kappa, com valores de 0,58 a 0,99 e pela Exatidão Global, com valores de 0,64 a 0,99. Para as vazões, o coeficiente de Nash Sutcliffe foi considerado satisfatório (NS<0,50) em duas simulações e, nas demais simulações, considerado muito bom (NS>0,75). Para fornecer subsídio a tomada de decisão, foi realizada uma análise multicritério dos conjuntos classificador-imagem, que permitiu classificar os conjuntos com maior desempenho: 1°) o classificador SVM e a imagem Landsat-8; 2°) o classificador MaxVer e a imagem WordView-II; 3°) o classificador NN e a imagem RapidEye. / Hydrological modeling when associated with remote sensing and geoprocessing resources becomes an important tool, because it is able to establish different land use scenarios and its implications for urban drainage, assisting in urban planning. However, the relationship between the routing model and such techniques, for purpose to evaluate images classifiers from the runoff hydrograph was not found in the literature, making this the main objective of this thesis. Thereunto, three satellite images were used in different spatial resolutions (0.5m, 5m and 15m) and three algorithms classifiers (Maximum Likelihood, Support Vector Machine and Oriented Object Analysis) and composed sets called \"classifier-image\" for the land use classification. The areas of the land use classes of each set \"classifier-image\" and the Curve Number values were the main input of the routing model NRCS, which allowed generating the runoff hydrograph for each case. The simulated hydrographs were compared to the observed hydrograph in the basin and evaluated their representativeness through the Nash Sutcliffe coefficient. Kappa Index was calculated to evaluate land use classifications, with values between 0.58 to 0.99 and Global accuracy between 0.64 to 0.99. Towards the flows rates, the Nash Sutcliffe coefficient was considered satisfactory for two simulations (NS<0,50) and, to other simulations, considered very good (NS>0,75). To provide subsidy to decision-making, it carried out a multi-criteria analysis of the classifier-image sets, that allowed to classify the set with higher performance: 1) SVM classifier and Landsat-8 image; 2) MaxVer classifier and WorldView-II image; 3) NN classifier and RapidEye image.
|
8 |
Klasifikační metody analýzy vrstvy nervových vláken na sítnici / A Classification Methods for Retinal Nerve Fibre Layer AnalysisZapletal, Petr January 2010 (has links)
This thesis is deal with classification for retinal nerve fibre layer. Texture features from six texture analysis methods are used for classification. All methods calculate feature vector from inputs images. This feature vector is characterized for every cluster (class). Classification is realized by three supervised learning algorithms and one unsupervised learning algorithm. The first testing algorithm is called Ho-Kashyap. The next is Bayess classifier NDDF (Normal Density Discriminant Function). The third is the Nearest Neighbor algorithm k-NN and the last tested classifier is algorithm K-means, which belongs to clustering. For better compactness of this thesis, three methods for selection of training patterns in supervised learning algorithms are implemented. The methods are based on Repeated Random Subsampling Cross Validation, K-Fold Cross Validation and Leave One Out Cross Validation algorithms. All algorithms are quantitatively compared in the sense of classication error evaluation.
|
9 |
Developing Machine Learning-based Recommender System on Movie Genres Using KNNEzeh, Anthony January 2023 (has links)
With an overwhelming number of movies available globally, it can be a daunting task for users to find movies that cater to their individual preferences. The vast selection can often leave people feeling overwhelmed, making it challenging to pick a suitable movie. As a result, movie service providers need to offer a recommendation system that adds value to their customers. A movie recommendation system can help customers in this regard by providing a process that assists in finding movies that match their preferences. Previous studies on recommendation systems that use Machine Learning (ML) algorithms have demonstrated that these algorithms outperform some of the existing recommendation methods regarding recommendation strategy. However, there is still room for further improvement, especially when it comes to exploring scenarios where users need to spend a considerable amount of time finding movies related to their preferred genres. This prolonged search for the right movies can give rise to problems such as data sparsity and cold start. To address these issues, we propose a machine learning-based recommender system for movie genres using the K-nearest Neighbours (KNN) algorithm. Our final system utilizes a slider bar on a Streamlit web app, allowing users to select their preferred movies and see recommendations for similar movies. By incorporating user preferences, our system provides personalized recommendations that are more likely to meet the user's interests and preferences. To address our research question: “How and to what extent can a machine learning-based recommender system be developed focusing on movie genres where movie popularity can be predicted based on its content?” we propose three main research objectives. Firstly, we investigate the employment of a classification algorithm in recommending movies focusing on interest genres. Secondly, we evaluate the performance of our classification algorithm concerning movie viewers. Thirdly, we represent the popularity of movie genres based on the content and investigate how this representation can inform the movie recommendation algorithm. On the heels of an experimental strategy, we extract and pre-process a dataset of movies and their associated genre labels from Kaggle. The dataset consists of two files derived from The Movie Database (TMDB) 5000 Movie Dataset. We develop a machine learning-based recommender system based on the similarity of movie genres using the extracted and pre-processed dataset. We vary the KNN algorithm with a slider bar to recommend movies of varying similarity to the selected movie, ranging from similar to diverse in genre. This approach can suggest movies with different titles for users with diverse preferences. We evaluate the performance of the KNN classification algorithm using a user's interest genres, measuring its accuracy, precision, recall, and F1-score. The algorithm's accuracy ranges from low to moderate across different values of K, indicating its moderate effectiveness in predicting user preferences. The algorithm's precision ranges from moderate to high, implying that it provides accurate recommendations to the user. The recall score improves with increasing K and reaches its maximum at K=15, demonstrating its ability to retrieve relevant recommendations. The algorithm achieves a good balance between precision and recall, with an average F1-score of 0.60. This means that the algorithm can accurately identify relevant movies and recommend them to users with a high degree of accuracy. Furthermore, our result shows that the popularity visualization technique using KNN is a powerful tool for analysing and understanding the popularity of different movie genres, which can inform important decisions related to marketing, distribution, and production in the movie industry. In conclusion, our machine learning-based recommender system using KNN for movie genres is a game changer. It allows users to select their preferred movies and see recommendations for similar movies using a slider bar on a Streamlit web app. If confirmed by future research, the promising findings of this thesis can pave the way for developing and incorporating other classification algorithms and features for movie recommendation and evaluation. Furthermore, the adjustable slider bar ranges on the Streamlit web app allow users to customize their movie preferences and receive tailored recommendations.
|
10 |
Development of Artificial Intelligence-based In-Silico Toxicity Models. Data Quality Analysis and Model Performance Enhancement through Data Generation.Malazizi, Ladan January 2008 (has links)
Toxic compounds, such as pesticides, are routinely tested against a range of aquatic,
avian and mammalian species as part of the registration process. The need for
reducing dependence on animal testing has led to an increasing interest in alternative
methods such as in silico modelling. The QSAR (Quantitative Structure Activity
Relationship)-based models are already in use for predicting physicochemical
properties, environmental fate, eco-toxicological effects, and specific biological
endpoints for a wide range of chemicals. Data plays an important role in modelling
QSARs and also in result analysis for toxicity testing processes. This research
addresses number of issues in predictive toxicology. One issue is the problem of data
quality. Although large amount of toxicity data is available from online sources, this
data may contain some unreliable samples and may be defined as of low quality. Its
presentation also might not be consistent throughout different sources and that makes
the access, interpretation and comparison of the information difficult. To address this
issue we started with detailed investigation and experimental work on DEMETRA
data. The DEMETRA datasets have been produced by the EC-funded project
DEMETRA. Based on the investigation, experiments and the results obtained, the
author identified a number of data quality criteria in order to provide a solution for
data evaluation in toxicology domain. An algorithm has also been proposed to assess
data quality before modelling. Another issue considered in the thesis was the missing
values in datasets for toxicology domain. Least Square Method for a paired dataset
and Serial Correlation for single version dataset provided the solution for the problem
in two different situations. A procedural algorithm using these two methods has been
proposed in order to overcome the problem of missing values. Another issue we paid
attention to in this thesis was modelling of multi-class data sets in which the severe
imbalance class samples distribution exists. The imbalanced data affect the
performance of classifiers during the classification process. We have shown that as
long as we understand how class members are constructed in dimensional space in
each cluster we can reform the distribution and provide more knowledge domain for
the classifier.
|
Page generated in 0.1189 seconds