• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • 1
  • 1
  • 1
  • Tagged with
  • 9
  • 9
  • 4
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Automatic Text Classification of Research Grant Applications / Automatisk textklassificering av forskningsbidragsansökningar

Lindqvist, Robin January 2024 (has links)
This study aims to construct a state-of-the-art classifier model and compare it against a largelanguage model. A variation of SVM called LinearSVC was utilised and the BERT model usingbert-base-uncased was used. The data, provided by the Swedish Research Council, consisted ofresearch grant applications. The research grant applications were divided into two groups, whichwere further divided into several subgroups. The subgroups represented research fields such ascomputer science and applied physics. Significant class imbalances were present, with someclasses having only a tenth of the applications of the largest class. To address these imbalances,a new dataset was created using data that had been randomly oversampled. The models weretrained and tested on their ability to correctly assign a subgroup to a research grant application.Results indicate that the BERT model outperformed the SVM model on the original dataset,but not on the balanced dataset . Furthermore, the BERT model’s performance decreased whentransitioning from the original to the balanced dataset, due to overfitting or randomness. / Denna studie har som mål att bygga en state-of-the-art klassificerar model och sedan jämföraden mot en stor språkmodel. SVM modellen var en variation av SVM vid namn LinearSVC ochför BERT användes bert-base-uncased. Data erhölls från Vetenskapsrådet och bestod av forskn-ingsbidragsansökningar. Forskningsbidragsansökningarna var uppdelade i två grupper, som varytterligare uppdelade i ett flertal undergrupper. Dessa undergrupper representerar forsknings-fält såsom datavetenskap och tillämpad fysik. I den data som användes i studien fanns storaskillnader mellan klasserna, där somliga klasser hade en tiondel av ansökningarna som de storaklasserna hade. I syfte att lösa dessa klassbalanseringsproblem skapades en datamängd somundergått slumpmässig översampling. Modellerna tränades och testades på deras förmåga attkorrekt klassificera en forskningsbidragsansökan in i rätt undergrupp. Studiens fynd visade attBERT modellen presterade bättre än SVM modellen på både den ursprungliga datamängden,dock inte på den balanserade datamängden. Tilläggas kan, BERTs prestanda sjönk vid övergångfrån den ursprungliga datamängden till den balanserade datamängden, något som antingen berorpå överanpassning eller slump.
2

Machine Learning Classification of Facial Affect Recognition Deficits after Traumatic Brain Injury for Informing Rehabilitation Needs and Progress

Syeda Iffat Naz (9746081) 07 January 2021 (has links)
A common impairment after a traumatic brain injury (TBI) is a deficit in emotional recognition, such as inferences of others’ intentions. Some researchers have found these impairments in 39\% of the TBI population. Our research information needed to make inferences about emotions and mental states comes from visually presented, nonverbal cues (e.g., facial expressions or gestures). Theory of mind (ToM) deficits after TBI are partially explained by impaired visual attention and the processing of these important cues. This research found that patients with deficits in visual processing differ from healthy controls (HCs). Furthermore, we found visual processing problems can be determined by looking at the eye tracking data developed from industry standard eye tracking hardware and software. We predicted that the eye tracking data of the overall population is correlated to the TASIT test. The visual processing of impaired (who got at least one answer wrong from TASIT questions) and unimpaired (who got all answer correctly from TASIT questions) differs significantly. We have divided the eye-tracking data into 3 second time blocks of time series data to detect the most salient individual blocks to the TASIT score. Our preliminary results suggest that we can predict the whole population's impairment using eye-tracking data with an improved f1 score from 0.54 to 0.73. For this, we developed optimized support vector machine (SVM) and random forest (RF) classifier.
3

Machine Learning Classification of Facial Affect Recognition Deficits after Traumatic Brain Injury for Informing Rehabilitation Needs and Progress

Iffat Naz, Syeda 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / A common impairment after a traumatic brain injury (TBI) is a deficit in emotional recognition, such as inferences of others’ intentions. Some researchers have found these impairments in 39\% of the TBI population. Our research information needed to make inferences about emotions and mental states comes from visually presented, nonverbal cues (e.g., facial expressions or gestures). Theory of mind (ToM) deficits after TBI are partially explained by impaired visual attention and the processing of these important cues. This research found that patients with deficits in visual processing differ from healthy controls (HCs). Furthermore, we found visual processing problems can be determined by looking at the eye tracking data developed from industry standard eye tracking hardware and software. We predicted that the eye tracking data of the overall population is correlated to the TASIT test. The visual processing of impaired (who got at least one answer wrong from TASIT questions) and unimpaired (who got all answer correctly from TASIT questions) differs significantly. We have divided the eye-tracking data into 3 second time blocks of time series data to detect the most salient individual blocks to the TASIT score. Our preliminary results suggest that we can predict the whole population's impairment using eye-tracking data with an improved f1 score from 0.54 to 0.73. For this, we developed optimized support vector machine (SVM) and random forest (RF) classifier.
4

Evaluation of Supervised Machine LearningAlgorithms for Detecting Anomalies in Vehicle’s Off-Board Sensor Data

Wahab, Nor-Ul January 2018 (has links)
A diesel particulate filter (DPF) is designed to physically remove diesel particulate matter or soot from the exhaust gas of a diesel engine. Frequently replacing DPF is a waste of resource and waiting for full utilization is risky and very costly, so, what is the optimal time/milage to change DPF? Answering this question is very difficult without knowing when the DPF is changed in a vehicle. We are finding the answer with supervised machine learning algorithms for detecting anomalies in vehicles off-board sensor data (operational data of vehicles). Filter change is considered an anomaly because it is rare as compared to normal data. Non-sequential machine learning algorithms for anomaly detection like oneclass support vector machine (OC-SVM), k-nearest neighbor (K-NN), and random forest (RF) are applied for the first time on DPF dataset. The dataset is unbalanced, and accuracy is found misleading as a performance measure for the algorithms. Precision, recall, and F1-score are found good measure for the performance of the machine learning algorithms when the data is unbalanced. RF gave highest F1-score of 0.55 than K-NN (0.52) and OCSVM (0.51). It means that RF perform better than K-NN and OC-SVM but after further investigation it is concluded that the results are not satisfactory. However, a sequential approach should have been tried which could yield better result.
5

Effekten av textaugmenteringsstrategier på träffsäkerhet, F1-värde och viktat F1-värde / The effect of text data augmentation strategies on Accuracy, F1-score, and weighted F1-score

Svedberg, Jonatan, Shmas, George January 2021 (has links)
Att utveckla en sofistikerad chatbotlösning kräver stora mängder textdata för att kunna anpassalösningen till en specifik domän. Att manuellt skapa en komplett uppsättning textdata, specialanpassat för den givna domänen och innehållandes ett stort antal varierande meningar som en människa kan tänkas yttra, är ett enormt tidskrävande arbete. För att kringgå detta tillämpas dataaugmentering för att generera mer data utifrån en mindre uppsättning redan existerande textdata. Softronic AB vill undersöka alternativa strategier för dataaugmentering med målet att eventuellt ersätta den nuvarande lösningen med en mer vetenskapligt underbyggd sådan. I detta examensarbete har prototypmodeller utvecklats för att jämföra och utvärdera effekten av olika textaugmenteringsstrategier. Resultatet av genomförda experiment med prototypmodellerna visar att augmentering genom synonymutbyten med en domänanpassad synonymordlista, presenterade märkbart förbättrade effekter på förmågan hos en NLU-modell att korrekt klassificera data, gentemot övriga utvärderade strategier. Vidare indikerar resultatet att ett samband föreligger mellan den strukturella variationsgraden av det augmenterade datat och de tillämpade språkparens semantiska likhetsgrad under tillbakaöversättningar. / Developing a sophisticated chatbot solution requires large amounts of text data to be able to adapt the solution to a specific domain. Manually creating a complete set of text data, specially adapted for the given domain, and containing a large number of varying sentences that a human conceivably can express, is an exceptionally time-consuming task. To circumvent this, data augmentation is applied to generate more data based on a smaller set of already existing text data. Softronic AB wants to investigate alternative strategies for data augmentation with the aim of possibly replacing the current solution with a more scientifically substantiated one. In this thesis, prototype models have been developed to compare and evaluate the effect of different text augmentation strategies. The results of conducted experiments with the prototype models show that augmentation through synonym swaps with a domain-adapted thesaurus, presented noticeably improved effects on the ability of an NLU-model to correctly classify data, compared to other evaluated strategies. Furthermore, the result indicates that there is a relationship between the structural degree of variation of the augmented data and the applied language pair's semantic degree of similarity during back-translations.
6

Intelligent ECG Acquisition and Processing System for Improved Sudden Cardiac Arrest (SCA) Prediction

Kota, Venkata Deepa 12 1900 (has links)
The survival rate for a suddent cardiac arrest (SCA) is incredibly low, with less than one in ten surviving; most SCAs occur outside of a hospital setting. There is a need to develop an effective and efficient system that can sense, communicate and remediate potential SCA situations on a near real-time basis. This research presents a novel Zeolite-PDMS-based optically unobtrusive flexible dry electrodes for biosignal acquisition from various subjects while at rest and in motion. Two zeolite crystals (4A and 13X) are used to fabricate the electrodes. Three different sizes and two different filler concentrations are compared to identify the better performing electrode suited for electrocardiogram (ECG) data acquisition. A low-power, low-noise amplifier with chopper modulation is designed and implemented using the standard 180nm CMOS process. A commercial off-the-shelf (COTS) based wireless system is designed for transmitting ECG signals. Further, this dissertation provides a framework for Machine Learning Classification algorithms on large, open-source Arrhythmia and SCA datasets. Supervised models with features as the input data and deep learning models with raw ECG as input are compared using different methods. The machine learning tool classifies the datasets within a few minutes, saving time and effort for the physicians. The experimental results show promising progress towards advancing the development of a wireless ECG recording system combined with efficient machine learning models that can positively impact SCA outcomes.
7

An Efficient Classification Model for Analyzing Skewed Data to Detect Frauds in the Financial Sector / Un modèle de classification efficace pour l'analyse des données déséquilibrées pour détecter les fraudes dans le secteur financier

Makki, Sara 16 December 2019 (has links)
Différents types de risques existent dans le domaine financier, tels que le financement du terrorisme, le blanchiment d’argent, la fraude de cartes de crédit, la fraude d’assurance, les risques de crédit, etc. Tout type de fraude peut entraîner des conséquences catastrophiques pour des entités telles que les banques ou les compagnies d’assurances. Ces risques financiers sont généralement détectés à l'aide des algorithmes de classification. Dans les problèmes de classification, la distribution asymétrique des classes, également connue sous le nom de déséquilibre de classe (class imbalance), est un défi très commun pour la détection des fraudes. Des approches spéciales d'exploration de données sont utilisées avec les algorithmes de classification traditionnels pour résoudre ce problème. Le problème de classes déséquilibrées se produit lorsque l'une des classes dans les données a beaucoup plus d'observations que l’autre classe. Ce problème est plus vulnérable lorsque l'on considère dans le contexte des données massives (Big Data). Les données qui sont utilisées pour construire les modèles contiennent une très petite partie de groupe minoritaire qu’on considère positifs par rapport à la classe majoritaire connue sous le nom de négatifs. Dans la plupart des cas, il est plus délicat et crucial de classer correctement le groupe minoritaire plutôt que l'autre groupe, comme la détection de la fraude, le diagnostic d’une maladie, etc. Dans ces exemples, la fraude et la maladie sont les groupes minoritaires et il est plus délicat de détecter un cas de fraude en raison de ses conséquences dangereuses qu'une situation normale. Ces proportions de classes dans les données rendent très difficile à l'algorithme d'apprentissage automatique d'apprendre les caractéristiques et les modèles du groupe minoritaire. Ces algorithmes seront biaisés vers le groupe majoritaire en raison de leurs nombreux exemples dans l'ensemble de données et apprendront à les classer beaucoup plus rapidement que l'autre groupe. Dans ce travail, nous avons développé deux approches : Une première approche ou classifieur unique basée sur les k plus proches voisins et utilise le cosinus comme mesure de similarité (Cost Sensitive Cosine Similarity K-Nearest Neighbors : CoSKNN) et une deuxième approche ou approche hybride qui combine plusieurs classifieurs uniques et fondu sur l'algorithme k-modes (K-modes Imbalanced Classification Hybrid Approach : K-MICHA). Dans l'algorithme CoSKNN, notre objectif était de résoudre le problème du déséquilibre en utilisant la mesure de cosinus et en introduisant un score sensible au coût pour la classification basée sur l'algorithme de KNN. Nous avons mené une expérience de validation comparative au cours de laquelle nous avons prouvé l'efficacité de CoSKNN en termes de taux de classification correcte et de détection des fraudes. D’autre part, K-MICHA a pour objectif de regrouper des points de données similaires en termes des résultats de classifieurs. Ensuite, calculez les probabilités de fraude dans les groupes obtenus afin de les utiliser pour détecter les fraudes de nouvelles observations. Cette approche peut être utilisée pour détecter tout type de fraude financière, lorsque des données étiquetées sont disponibles. La méthode K-MICHA est appliquée dans 3 cas : données concernant la fraude par carte de crédit, paiement mobile et assurance automobile. Dans les trois études de cas, nous comparons K-MICHA au stacking en utilisant le vote, le vote pondéré, la régression logistique et l’algorithme CART. Nous avons également comparé avec Adaboost et la forêt aléatoire. Nous prouvons l'efficacité de K-MICHA sur la base de ces expériences. Nous avons également appliqué K-MICHA dans un cadre Big Data en utilisant H2O et R. Nous avons pu traiter et analyser des ensembles de données plus volumineux en très peu de temps / There are different types of risks in financial domain such as, terrorist financing, money laundering, credit card fraudulence and insurance fraudulence that may result in catastrophic consequences for entities such as banks or insurance companies. These financial risks are usually detected using classification algorithms. In classification problems, the skewed distribution of classes also known as class imbalance, is a very common challenge in financial fraud detection, where special data mining approaches are used along with the traditional classification algorithms to tackle this issue. Imbalance class problem occurs when one of the classes have more instances than another class. This problem is more vulnerable when we consider big data context. The datasets that are used to build and train the models contain an extremely small portion of minority group also known as positives in comparison to the majority class known as negatives. In most of the cases, it’s more delicate and crucial to correctly classify the minority group rather than the other group, like fraud detection, disease diagnosis, etc. In these examples, the fraud and the disease are the minority groups and it’s more delicate to detect a fraud record because of its dangerous consequences, than a normal one. These class data proportions make it very difficult to the machine learning classifier to learn the characteristics and patterns of the minority group. These classifiers will be biased towards the majority group because of their many examples in the dataset and will learn to classify them much faster than the other group. After conducting a thorough study to investigate the challenges faced in the class imbalance cases, we found that we still can’t reach an acceptable sensitivity (i.e. good classification of minority group) without a significant decrease of accuracy. This leads to another challenge which is the choice of performance measures used to evaluate models. In these cases, this choice is not straightforward, the accuracy or sensitivity alone are misleading. We use other measures like precision-recall curve or F1 - score to evaluate this trade-off between accuracy and sensitivity. Our objective is to build an imbalanced classification model that considers the extreme class imbalance and the false alarms, in a big data framework. We developed two approaches: A Cost-Sensitive Cosine Similarity K-Nearest Neighbor (CoSKNN) as a single classifier, and a K-modes Imbalance Classification Hybrid Approach (K-MICHA) as an ensemble learning methodology. In CoSKNN, our aim was to tackle the imbalance problem by using cosine similarity as a distance metric and by introducing a cost sensitive score for the classification using the KNN algorithm. We conducted a comparative validation experiment where we prove the effectiveness of CoSKNN in terms of accuracy and fraud detection. On the other hand, the aim of K-MICHA is to cluster similar data points in terms of the classifiers outputs. Then, calculating the fraud probabilities in the obtained clusters in order to use them for detecting frauds of new transactions. This approach can be used to the detection of any type of financial fraud, where labelled data are available. At the end, we applied K-MICHA to a credit card, mobile payment and auto insurance fraud data sets. In all three case studies, we compare K-MICHA with stacking using voting, weighted voting, logistic regression and CART. We also compared with Adaboost and random forest. We prove the efficiency of K-MICHA based on these experiments
8

Klasifikace emailové komunikace / Classification of eMail Communication

Piják, Marek January 2018 (has links)
This diploma's thesis is based around creating a classifier, which will be able to recognize an email communication received by Topefekt.s.r.o on daily basis and assigning it into classification class. This project will implement some of the most commonly used classification methods including machine learning. Thesis will also include evaluation comparing all used methods.
9

3D OBJECT DETECTION USING VIRTUAL ENVIRONMENT ASSISTED DEEP NETWORK TRAINING

Ashley S Dale (8771429) 07 January 2021 (has links)
<div> <div> <div> <p>An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ <sub>F1 </sub>= 0.015, compared to σF 1 = 0.020 for the networks trained exclusively with real data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background. </p></div></div></div>

Page generated in 0.023 seconds