Global ETD Search

411	Interpretation of Dimensionality Reduction with Supervised Proxies of User-defined Labels Leoni, Cristian January 2021 (has links) Research on Machine learning (ML) explainability has received a lot of focus in recent times. The interest, however, mostly focused on supervised models, while other ML fields have not had the same level of attention. Despite its usefulness in a variety of different fields, unsupervised learning explainability is still an open issue. In this paper, we present a Visual Analytics framework based on eXplainable AI (XAI) methods to support the interpretation of Dimensionality reduction methods. The framework provides the user with an interactive and iterative process to investigate and explain user-perceived patterns for a variety of DR methods by using XAI methods to explain a supervised method trained on the selected data. To evaluate the effectiveness of the proposed solution, we focus on two main aspects: the quality of the visualization and the quality of the explanation. This challenge is tackled using both quantitative and qualitative methods, and due to the lack of pre-existing test data, a new benchmark has been created. The quality of the visualization is established using a well-known survey-based methodology, while the quality of the explanation is evaluated using both case studies and a controlled experiment, where the generated explanation accuracy is evaluated on the proposed benchmark. The results show a strong capacity of our framework to generate accurate explanations, with an accuracy of 89% over the controlled experiment. The explanation generated for the two case studies yielded very similar results when compared with pre-existing, well-known literature on ground truths. Finally, the user experiment generated high quality overall scores for all assessed aspects of the visualization. Dimensionality Reduction Clustering Supervised Learning Explainable Artificial Intelligence XAI Dimensionality Reduction Explanation Övrig annan teknik
412	A Comparative Study of Reinforcement-based and Semi-classical Learning in Sensor Fusion Bodén, Johan January 2021 (has links) Reinforcement learning has proven itself very useful in certain areas, such as games. However, the approach has been seen as quite limited. Reinforcement-based learning has for instance not been commonly used for classification tasks as it is receiving feedback on how well it did for an action performed on a specific input. This slows the performance convergence rate as compared to other classification approaches which has the input and the corresponding output to train on. Nevertheless, this thesis aims to investigate whether reinforcement-based learning could successfully be employed on a classification task. Moreover, as sensor fusion is an expanding field which can for instance assist autonomous vehicles in understanding its surroundings, it is also interesting to see how sensor fusion, i.e., fusion between lidar and RGB images, could increase the performance in a classification task. In this thesis, a reinforcement-based learning approach is compared to a semi-classical approach. As an example of a reinforcement learning model, a deep Q-learning network was chosen, and a support vector machine classifier built on top of a deep neural network, was chosen as an example of a semi-classical model. In this work, these frameworks are compared with and without sensor fusion to see whether fusion improves their performance. Experiments show that the evaluated reinforcement-based learning approach underperforms in terms of metrics but mainly due to its slow learning process, in comparison to the semi-classical approach. However, on the other hand using reinforcement-based learning to carry out a classification task could still in some cases be advantageous, as it still performs fairly well in terms of the metrics presented in this work, e.g. F1-score, or for instance imbalanced datasets. As for the impact of sensor fusion, a notable improvement can be seen, e.g. when training the deep Q-learning model for 50 episodes, the F1-score increased with 0.1329; especially, when taking into account that the most of the lidar data used in the fusion is lost since this work projects the 3D lidar data onto the same 2D plane as the RGB images. machine learning reinforcement learning deep Q-learning network classical learning supervised learning support vector machine deep neural network sensor fusion Computer and Information Sciences Data- och informationsvetenskap
413	Proyecto “edukids” Chapoñan Damian, Pamela Katherine, Lozano Romero, María Delia, Olivares Li, Vanesa Isabel, Ricra Chinchayan, Edson Ronaldo, Quintana Quichca, Raquel Rosario 03 December 2021 (has links) Dada la coyuntura actual en la que nos encontramos, las universidades, institutos y colegios se han visto en la obligación de dictar clases de manera online. No obstante, muchos de estos no cuentan con la capacidad necesaria para dictar clases de manera eficiente, lo cual perjudica de cierta forma a los estudiantes. A raíz de esta problemática, el objetivo de nuestra idea de negocio se basa en ofrecer un aplicativo que ayude a los estudiantes del nivel primaria a reforzar los temas aprendidos en las clases modalidad virtual. El aplicativo, que lleva como nombre “EduKids”, logrará potenciar los conocimientos y habilidades de los niños a través de ejercicios educativos que se presentarán de acuerdo al grado y nivel en el que este se encuentre. A su vez, se ofrece un valor agregado que consiste en un juego de entretenimiento, el cual fue implementado para que el niño o niña motive su aprendizaje y continúe desbloqueando los niveles. Cabe mencionar que los indicadores financieros que presenta el alcance de este proyecto apuntan a que EduKids será rentable en el tiempo, debido a que el Valor Neto Actual (VAN) es de S/. 218,848.90, lo cual se considera que el proyecto será viable y nos generará un alto rendimiento. También mencionar que la Tasa de Retorno Interno (TIR) que ofrece esta inversión es de 113%, logrando ser aprobada por los accionistas dado que la inversión será viable en el tiempo. / Given the current situation in which we find ourselves, universities, institutes and colleges have been forced to teach classes online. However, many of them do not have the necessary capacity to teach classes efficiently, which is somewhat detrimental to students. As a result of this problem, the objective of our business idea is based on offering an application that helps primary-level students to reinforce the topics learned in virtual mode classes. The application, which is called “EduKids”, will be able to enhance children's knowledge and skills through educational exercises that will be presented according to their grade and level. In turn, an added value is offered that consists of an entertainment game, which was implemented so that the boy or girl motivates her learning and continues to unlock the levels. It is worth mentioning that the financial indicators presented by the scope of this project indicate that EduKids will be profitable over time, since the Net Present Value (NPV) is S /. 218,848.90, which is considered that the project will be viable and will generate a high yield. Also mention that the Internal Return Rate (IRR) offered by this investment is 113%, managing to be approved by the shareholders since the investment will be viable in time. / Trabajo de investigación Educación virtual Reforzamiento escolar Aprendizaje supervisado Virtual education School reinforcement Supervised learning
414	Classifying human activities through machine learning Lannge, Jakob, Majed, Ali January 2018 (has links) Klassificering av dagliga aktiviteter (ADL) kan användas i system som bevakar människors aktiviteter i olika syften. T.ex., i nödsituationssystem. Med machine learning och bärbara sensor som samlar in data kan ADL klassificeras med hög noggrannhet. I detta arbete, ett proof-of-concept system med tre olika machine learning algoritmer utvärderas och jämförs mellan tre olika dataset, ett som är allmänt tillgängligt på (Ugulino, et al., 2012), och två som har samlats in i rapporten med hjälp av en android enhet. Algoritmerna som har använts är: Multiclass Decision Forest, Multiclass Decision Jungle and Multiclass Neural Network. Sensorerna som har använts är en accelerometer och ett gyroskop. Resultatet visar hur ett konceptuellt system kan byggas i Azure Machine Learning Studio, och hur tre olika algoritmer presterar vid klassificering av tre olika dataset. En algoritm visar högre precision vid klassning av Ugolino’s dataset, jämfört med machine learning modellen som ursprungligen används i rapporten. / Classifying Activities of daily life (ADL) can be used in a system that monitor people’s activities for different purposes. For example, in emergency systems. Machine learning is a way to classify ADL with high accuracy, using wearable sensors as an input. In this paper, a proof-of-concept system consisting of three different machine learning algorithms is evaluated and compared between tree different datasets, one publicly available at (Ugulino, et al., 2012), and two collected in this paper using an android device’s accelerometer and gyroscope sensor. The algorithms are: Multiclass Decision Forest, Multiclass Decision Jungle and Multiclass Neural Network. The two sensors used are an accelerometer and a gyroscope. The result shows how a system can be implemented using Azure Machine Learning Studio, and how three different algorithms performs when classifying three different datasets. One algorithm achieves a higher accuracy compared to the machine learning model initially used with the Ugolino data set. machine learning activity of daily life ADL supervised learning multiclass decision forest multiclass decision jungle multiclass neural network cross validation Azure Android Java gyroscope accelerometer Engineering and Technology Teknik och teknologier
415	Maskininlärning: avvikelseklassificering på sekventiell sensordata. En jämförelse och utvärdering av algoritmer för att klassificera avvikelser i en miljövänlig IoT produkt med sekventiell sensordata Heidfors, Filip, Moltedo, Elias January 2019 (has links) Ett företag har tagit fram en miljövänlig IoT produkt med sekventiell sensordata och vill genom maskininlärning kunna klassificera avvikelser i sensordatan. Det har genom åren utvecklats ett flertal väl fungerande algoritmer för klassificering men det finns emellertid ingen algoritm som fungerar bäst för alla olika problem. Syftet med det här arbetet var därför att undersöka, jämföra och utvärdera olika klassificerare inom "supervised machine learning" för att ta reda på vilken klassificerare som ger högst träffsäkerhet att klassificera avvikelser i den typ av IoT produkt som företaget tagit fram. Genom en litteraturstudie tog vi först reda på vilka klassificerare som vanligtvis använts och fungerat bra i tidigare vetenskapliga arbeten med liknande applikationer. Vi kom fram till att jämföra och utvärdera Random Forest, Naïve Bayes klassificerare och Support Vector Machines ytterligare. Vi skapade sedan ett dataset på 513 exempel som vi använde för träning och validering för respektive klassificerare. Resultatet visade att Random Forest hade betydligt högre träffsäkerhet med 95,7% jämfört med Naïve Bayes klassificerare (81,5%) och Support Vector Machines (78,6%). Slutsatsen för arbetet är att Random Forest med sina 95,7% ger en tillräckligt hög träffsäkerhet så att företaget kan använda maskininlärningsmodellen för att förbättra sin produkt. Resultatet pekar också på att Random Forest, för det här arbetets specifika klassificeringsproblem, är den klassificerare som fungerar bäst inom "supervised machine learning" men att det eventuellt finns möjlighet att få ännu högre träffsäkerhet med andra tekniker som till exempel "unsupervised machine learning" eller "semi-supervised machine learning". / A company has developed a environment-friendly IoT device with sequential sensor data and want to use machine learning to classify anomalies in their data. Throughout the years, several well working algorithms for classifications have been developed. However, there is no optimal algorithm for every problem. The purpose of this work was therefore to investigate, compare and evaluate different classifiers within supervised machine learning to find out which classifier that gives the best accuracy to classify anomalies in the kind of IoT device that the company has developed. With a literature review we first wanted to find out which classifiers that are commonly used and have worked well in related work for similar purposes and applications. We concluded to further compare and evaluate Random Forest, Naïve Bayes and Support Vector Machines. We created a dataset of 513 examples that we used for training and evaluation for each classifier. The result showed that Random Forest had superior accuracy with 95.7% compared to Naïve Bayes (81.5%) and Support Vector Machines (78.6%). The conclusion for this work is that Random Forest, with 95.7%, gives a high enough accuracy for the company to have good use of the machine learning model. The result also indicates that Random Forest, for this thesis specific classification problem, is the best classifier within supervised machine learning but that there is a potential possibility to get even higher accuracy with other techniques such as unsupervised machine learning or semi-supervised machine learning. Machine learning Supervised learning Classifying algorithms Classifiers Random Forest Naïve bayes Support vector machine Sensor data Sequential data Engineering and Technology Teknik och teknologier
416	Gaze based weakly supervised localization for image classification : application to visual recognition in a food dataset / Apprentissage faiblement supervisé basé sur le regard : application à la reconnaissance visuelle dans un ensemble de données sur l'alimentation Wang, Xin 29 September 2017 (has links) Dans cette dissertation, nous discutons comment utiliser les données du regard humain pour améliorer la performance du modèle d'apprentissage supervisé faible dans la classification des images. Le contexte de ce sujet est à l'ère de la technologie de l'information en pleine croissance. En conséquence, les données à analyser augmentent de façon spectaculaire. Étant donné que la quantité de données pouvant être annotées par l'humain ne peut pas tenir compte de la quantité de données elle-même, les approches d'apprentissage supervisées bien développées actuelles peuvent faire face aux goulets d'étranglement l'avenir. Dans ce contexte, l'utilisation de annotations faibles pour les méthodes d'apprentissage à haute performance est digne d'étude. Plus précisément, nous essayons de résoudre le problème à partir de deux aspects: l'un consiste à proposer une annotation plus longue, un regard de suivi des yeux humains, comme une annotation alternative par rapport à l'annotation traditionnelle longue, par exemple boîte de délimitation. L'autre consiste à intégrer l'annotation du regard dans un système d'apprentissage faiblement supervisé pour la classification de l'image. Ce schéma bénéficie de l'annotation du regard pour inférer les régions contenant l'objet cible. Une propriété utile de notre modèle est qu'elle exploite seulement regardez pour la formation, alors que la phase de test est libre de regard. Cette propriété réduit encore la demande d'annotations. Les deux aspects isolés sont liés ensemble dans nos modèles, ce qui permet d'obtenir des résultats expérimentaux compétitifs. / In this dissertation, we discuss how to use the human gaze data to improve the performance of the weak supervised learning model in image classification. The background of this topic is in the era of rapidly growing information technology. As a consequence, the data to analyze is also growing dramatically. Since the amount of data that can be annotated by the human cannot keep up with the amount of data itself, current well-developed supervised learning approaches may confront bottlenecks in the future. In this context, the use of weak annotations for high-performance learning methods is worthy of study. Specifically, we try to solve the problem from two aspects: One is to propose a more time-saving annotation, human eye-tracking gaze, as an alternative annotation with respect to the traditional time-consuming annotation, e.g. bounding box. The other is to integrate gaze annotation into a weakly supervised learning scheme for image classification. This scheme benefits from the gaze annotation for inferring the regions containing the target object. A useful property of our model is that it only exploits gaze for training, while the test phase is gaze free. This property further reduces the demand of annotations. The two isolated aspects are connected together in our models, which further achieve competitive experimental results. Apprentissage faiblement supervisé Regard humain Ensemble de données multimodales Apprentissage en profondeur Classification de l'image Localisation d'objet Weakly supervised learning Human gaze Multimodal dataset 006.42
417	Aplicación de Data Science en la pequeña empresa, caso: Pollería Mister Pollo Baldeón Maraví, Brian, Fukushima Castillo, Hugo Kenji, Ochante Quispe, Milagros Karina, Quevedo Trujillo, Haedly Victoria, Tejada Alarcón, Ernesto Rosendo 14 July 2021 (has links) El presente trabajo tiene como finalidad aplicar los conocimientos y técnicas impartidas durante los tres cursos de Ciencia de Datos. Específicamente identificar y utilizar las variables encontradas en el negocio para determinar un modelo que permita una mayor permanencia del personal en la empresa Mister Pollo. En ese contexto, la investigación se apoyará en la metodología de ciencia de datos de IBM, la cual inicia con la fase de comprensión del negocio para identificar el problema de la organización, analizando sus fortalezas y debilidades; así como la fase de recopilación y preparación de los datos, análisis, interpretación, modelado y evaluación de la data. Asimismo, el tipo de investigación que se emplea es mixto, pues en la fase inicial tiene un enfoque descriptivo que permite entender la importancia de las variables utilizadas. En la segunda fase, el enfoque se vuelve predictivo gracias a la utilización de una técnica de aprendizaje supervisado, en este caso, el modelo de árbol de decisión para la determinación de una herramienta que permita evaluar la mayor permanencia de trabajadores en el restaurante. Esto permitirá que el Gerente General de la empresa pueda elaborar un plan de acción para poder controlar y minimizar la rotación del personal, considerando diferentes escenarios, perfiles y necesidades de la empresa. Finalmente, en la conclusión de este proyecto se evaluarán los hallazgos en el modelo seleccionado para verificar que responden a los objetivos planteados por el Gerente de la empresa Míster Pollo en coordinación con el equipo de trabajo. / The purpose of this work is to apply the knowledge and techniques taught during the three Data Science courses. Specifically, to identify and use the variables found in the business to determine a model that allows a greater permanence of the personnel in the company Mister Pollo. In this context, the research will be supported by IBM's data science methodology, which begins with the phase of understanding the business to identify the organization's problem, analyzing its strengths and weaknesses; as well as the phase of data collection and preparation, analysis, interpretation, modeling and evaluation of the data. Likewise, the type of research used is mixed, since in the initial phase it has a descriptive approach that allows understanding the importance of the variables used. In the second phase, the approach becomes predictive thanks to the use of a supervised learning technique, in this case, the decision tree model for the determination of a tool to evaluate the greater permanence of workers in the restaurant. This will allow the General Manager of the company to develop an action plan to control and minimize staff turnover, considering different scenarios, profiles and needs of the company. Finally, at the conclusion of this project, the findings of the selected model will be evaluated to verify that they respond to the objectives set by the Manager of the company Míster Pollo in coordination with the work team. / Trabajo de investigación Aprendizaje supervisado Árbol de decisión Data science Supervised learning Decision tree
418	Deep learning methods for reverberant and noisy speech enhancement Zhao, Yan 15 September 2020 (has links) No description available. Computer Science Engineering Deep neural networks Supervised learning Attention Speech enhancement Speech denoising Speech dereverberation Time-frequency masking Speech intelligibility Speech quality Computational auditory scene analysis
419	Multi-site Organ Detection in CT Images using Deep Learning / Regionsoberoende organdetektion i CT-bilder meddjupinlärning Jacobzon, Gustaf January 2020 (has links) When optimizing a controlled dose in radiotherapy, high resolution spatial information about healthy organs in close proximity to the malignant cells are necessary in order to mitigate dispersion into these organs-at-risk. This information can be provided by deep volumetric segmentation networks, such as 3D U-Net. However, due to limitations of memory in modern graphical processing units, it is not feasible to train a volumetric segmentation network on full image volumes and subsampling the volume gives a too coarse segmentation. An alternative is to sample a region of interest from the image volume and train an organ-specific network. This approach requires knowledge of which region in the image volume that should be sampled and can be provided by a 3D object detection network. Typically the detection network will also be region specific, although a larger region such as the thorax region, and requires human assistance in choosing the appropriate network for a certain region in the body. Instead, we propose a multi-site object detection network based onYOLOv3 trained on 43 different organs, which may operate on arbitrary chosen axial patches in the body. Our model identifies the organs present (whole or truncated) in the image volume and may automatically sample a region from the input and feed to the appropriate volumetric segmentation network. We train our model on four small (as low as 20 images) site-specific datasets in a weakly-supervised manner in order to handle the partially unlabeled nature of site-specific datasets. Our model is able to generate organ-specific regions of interests that enclose 92% of the organs present in the test set. / Vid optimering av en kontrollerad dos inom strålbehandling krävs det information om friska organ, så kallade riskorgan, i närheten av de maligna cellerna för att minimera strålningen i dessa organ. Denna information kan tillhandahållas av djupa volymetriskta segmenteringsnätverk, till exempel 3D U-Net. Begränsningar i minnesstorleken hos moderna grafikkort gör att det inte är möjligt att träna ett volymetriskt segmenteringsnätverk på hela bildvolymen utan att först nedsampla volymen. Detta leder dock till en lågupplöst segmentering av organen som inte är tillräckligt precis för att kunna användas vid optimeringen. Ett alternativ är att endast behandla en intresseregion som innesluter ett eller ett fåtal organ från bildvolymen och träna ett regionspecifikt nätverk på denna mindre volym. Detta tillvägagångssätt kräver dock information om vilket område i bildvolymen som ska skickas till det regionspecifika segmenteringsnätverket. Denna information kan tillhandahållas av ett 3Dobjektdetekteringsnätverk. I regel är även detta nätverk regionsspecifikt, till exempel thorax-regionen, och kräver mänsklig assistans för att välja rätt nätverk för en viss region i kroppen. Vi föreslår istället ett multiregions-detekteringsnätverk baserat påYOLOv3 som kan detektera 43 olika organ och fungerar på godtyckligt valda axiella fönster i kroppen. Vår modell identifierar närvarande organ (hela eller trunkerade) i bilden och kan automatiskt ge information om vilken region som ska behandlas av varje regionsspecifikt segmenteringsnätverk. Vi tränar vår modell på fyra små (så lågt som 20 bilder) platsspecifika datamängder med svag övervakning för att hantera den delvis icke-annoterade egenskapen hos datamängderna. Vår modell genererar en organ-specifik intresseregion för 92 % av organen som finns i testmängden. Organ Detection Organs-at-risk 3D Object Detection Segmentation Deep Learning Machine Learning Weakly-supervised Learning YOLOv3 3D U-Net Elektroteknik och elektronik
420	NONLINEAR DIFFUSIONS ON GRAPHS FOR CLUSTERING, SEMI-SUPERVISED LEARNING AND ANALYZING PREDICTIONS Meng Liu (14075697) 09 November 2022 (has links) <p>Graph diffusion is the process of spreading information from one or few nodes to the rest of the graph through edges. The resulting distribution of the information often implies latent structure of the graph where nodes more densely connected can receive more signal. This makes graph diffusions a powerful tool for local clustering, which is the problem of finding a cluster or community of nodes around a given set of seeds. Most existing literatures on using graph diffusions for local graph clustering are linear diffusions as their dynamics can be fully interpreted through linear systems. They are also referred as eigenvector, spectral, or random walk based methods. While efficient, they often have difficulty capturing the correct boundary of a target label or target cluster. On the contrast, maxflow-mincut based methods that can be thought as 1-norm nonlinear variants of the linear diffusions seek to "improve'' or "refine'' a given cluster and can often capture the boundary correctly. However, there is a lack of literature to adopt them for problems such as community detection, local graph clustering, semi-supervised learning, etc. due to the complexity of their formulation. We addressed these issues by performing extensive numerical experiments to demonstrate the performance of flow-based methods in graphs from various sources. We also developed an efficient LocalGraphClustering Python Package that allows others to easily use these methods in their own problems. While studying these flow-based methods, we find that they cannot grow from small seed set. Although there are hybrid procedures that incorporate ideas from both linear diffusions and flow-based methods, they have many hard to set parameters. To tackle these issues, we propose a simple generalization of the objective function behind linear diffusion and flow-based methods which we call generalized local graph min-cut problem. We further show that by involving p-norm in this cut problem, we can develop a nonlinear diffusion procedure that can find local clusters from small seed set and capture the correct boundary simultaneously. Our method can be thought as a nonlinear generalization of the Anderson-Chung-Lang push procedure to approximate a personalized PageRank vector efficiently and is a strongly local algorithm-one whose runtime depends on the size of the output rather than the size of the graph. We also show that the p-norm cut functions improve on the standard Cheeger inequalities for linear diffusion methods. We further extend our generalized local graph min-cut problem and the corresponding diffusion solver to hypergraph-based machine learning problems. Although many methods for local graph clustering exist, there are relatively few for localized clustering in hypergraphs. Moreover, those that exist often lack flexibility to model a general class of hypergraph cut functions or cannot scale to large problems. Our new hypergraph diffusion method on the other hand enables us to compute with a wide variety of cardinality-based hypergraph cut functions and still maintains the strongly local property. We also show that the clusters found by solving the new objective function satisfy a Cheeger-like quality guarantee.</p> <p>Besides clustering, recent work on graph-based learning often focuses on node embeddings and graph neural networks. Although these GNN based methods can beat traditional ones especially when node attributes data is available, it is challenging to understand them because they are highly over-parameterized. To solve this issue, we propose a novel framework that combines topological data analysis and diffusion to transform the complex prediction space into human understandable pictures. The method can be applied to other datasets not in graph formats and scales up to large datasets across different domains and enable us to find many useful insights about the data and the model.</p> Graph, social and multimedia data Topology topological data analysis clustering pagerank semi-supervised learning visualization neural networks diffusions graph social network hypergraph

Search results