• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 7
  • 1
  • 1
  • 1
  • Tagged with
  • 17
  • 17
  • 17
  • 5
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Анализ средств для интерпретирования моделей машинного обучения при анализе табличных данных : магистерская диссертация / Analysis of tools for interpreting machine learning models when analyzing tabular data

Бабий, И. Н., Babiy, I. N. January 2023 (has links)
Цель работы – анализ средств для интерпретирования моделей машинного обучения и их практического применения для интерпретирования результатов моделей машинного обучения при анализе табличных данных. Объект исследования – средства для интерпретирования моделей машинного обучения. Методы исследования: теоретический анализ литературы по теме исследования, изучение документации библиотек машинного обучения, классификация исследуемых методов, экспериментальный включающий проведение исследовательского анализа данных, обучение моделей машинного обучения и применение интерпретирования, обобщение полученных данных и их сравнение. Результаты работы: подготовлен обзор и практическое руководство по интерпретации результатов машинного обучения для табличных данных. Выпускная квалификационная работа выполнена в текстовом редакторе Microsoft Word и представлена в твердой копии. / The purpose of the work is to analyze tools for interpreting machine learning models and their practical application for interpreting the results of machine learning models when analyzing tabular data. The object of study is tools for interpreting machine learning models. Research methods: theoretical analysis of literature on the research topic, study of documentation of machine learning libraries, classification of methods being studied, experimental, including conducting exploratory data analysis, training machine learning models and applying interpretation, summarizing the data obtained and comparison. their. Results of the work: a review and practical guidance on interpreting the results of machine learning of tabular data has been prepared. The final qualifying work was completed in the text editor Microsoft Word and presented on paper.
12

INVESTIGATING DATA ACQUISITION TO IMPROVE FAIRNESS OF MACHINE LEARNING MODELS

Ekta (18406989) 23 April 2024 (has links)
<p dir="ltr">Machine learning (ML) algorithms are increasingly being used in a variety of applications and are heavily relied upon to make decisions that impact people’s lives. ML models are often praised for their precision, yet they can discriminate against certain groups due to biased data. These biases, rooted in historical inequities, pose significant challenges in developing fair and unbiased models. Central to addressing this issue is the mitigation of biases inherent in the training data, as their presence can yield unfair and unjust outcomes when models are deployed in real-world scenarios. This study investigates the efficacy of data acquisition, i.e., one of the stages of data preparation, akin to the pre-processing bias mitigation technique. Through experimental evaluation, we showcase the effectiveness of data acquisition, where the data is acquired using data valuation techniques to enhance the fairness of machine learning models.</p>
13

Estimation and misspecification Risks in VaR estimation / Estimation and misspecification risks in VaR evaluation

Telmoudi, Fedya 19 December 2014 (has links)
Dans cette thèse, nous étudions l'estimation de la valeur à risque conditionnelle (VaR) en tenant compte du risque d'estimation et du risque de modèle. Tout d'abord, nous considérons une méthode en deux étapes pour estimer la VaR. La première étape évalue le paramètre de volatilité en utilisant un estimateur quasi maximum de vraisemblance généralisé (gQMLE) fondé sur une densité instrumentale h. La seconde étape estime un quantile des innovations à partir du quantile empirique des résidus obtenus dans la première étape. Nous donnons des conditions sous lesquelles l'estimateur en deux étapes de la VaR est convergent et asymptotiquement normal. Nous comparons également les efficacités des estimateurs obtenus pour divers choix de la densité instrumentale h. Lorsque l'innovation n'est pas de densité h, la première étape donne généralement un estimateur biaisé de paramètre de volatilité et la seconde étape donne aussi un estimateur biaisé du quantile des innovations. Cependant, nous montrons que les deux erreurs se contrebalancent pour donner une estimation consistante de la VaR. Nous nous concentrons ensuite sur l'estimation de la VaR dans le cadre de modèles GARCH en utilisant le gQMLE fondé sur la classe des densités instrumentales double gamma généralisées qui contient la distribution gaussienne. Notre objectif est de comparer la performance du QMLE gaussien par rapport à celle du gQMLE. Le choix de l'estimateur optimal dépend essentiellement du paramètre d qui minimise la variance asymptotique. Nous testons si le paramètre d qui minimise la variance asymptotique est égal à 2. Lorsque le test est appliqué sur des séries réelles de rendements financiers, l'hypothèse stipulant l'optimalité du QMLE gaussien est généralement rejetée. Finalement, nous considérons les méthodes non-paramétriques d'apprentissage automatique pour estimer la VaR. Ces méthodes visent à s'affranchir du risque de modèle car elles ne reposent pas sur une forme spécifique de la volatilité. Nous utilisons la technique des machines à vecteurs de support pour la régression (SVR) basée sur la fonction de perte moindres carrés (en anglais LS). Pour améliorer la solution du modèle LS-SVR nous utilisons les modèles LS-SVR pondérés et LS-SVR de taille fixe. Des illustrations numériques mettent en évidence l'apport des modèles proposés pour estimer la VaR en tenant compte des risques de spécification et d'estimation. / In this thesis, we study the problem of conditional Value at Risk (VaR) estimation taking into account estimation risk and model risk. First, we considered a two-step method for VaR estimation. The first step estimates the volatility parameter using a generalized quasi maximum likelihood estimator (gQMLE) based on an instrumental density h. The second step estimates a quantile of innovations from the empirical quantile of residuals obtained in the first step. We give conditions under which the two-step estimator of the VaR is consistent and asymptotically normal. We also compare the efficiencies of the estimators for various instrumental densities h. When the distribution of is not the density h the first step usually gives a biased estimator of the volatility parameter and the second step gives a biased estimator of the quantile of the innovations. However, we show that both errors counterbalance each other to give a consistent estimate of the VaR. We then focus on the VaR estimation within the framework of GARCH models using the gQMLE based on a class of instrumental densities called double generalized gamma which contains the Gaussian distribution. Our goal is to compare the performance of the Gaussian QMLE against the gQMLE. The choice of the optimal estimator depends on the value of d that minimizes the asymptotic variance. We test if this parameter is equal 2. When the test is applied to real series of financial returns, the hypothesis stating the optimality of Gaussian QMLE is generally rejected. Finally, we consider non-parametric machine learning models for VaR estimation. These methods are designed to eliminate model risk because they are not based on a specific form of volatility. We use the support vector machine model for regression (SVR) based on the least square loss function (LS). In order to improve the solution of LS-SVR model, we used the weighted LS-SVR and the fixed size LS-SVR models. Numerical illustrations highlight the contribution of the proposed models for VaR estimation taking into account the risk of specification and estimation.
14

Diskriminerande utfall från maskininlärningsmodeller : En kvalitativ studie av identifierade faktorer och lösningar fördiskriminerande utfall

Wedin, Ebba, Eriksson, Johan January 2020 (has links)
In a world where artificial intelligence and machine learning aregrowing and spreading in society, its impact and consequence forpeople is increasing. The technology is used in services that peopleuse every day. Both privately but also in a commercial context, forexample social media and to identify fraud in the banking sector.Previous studies show that machine learning models can givediscriminatory outcomes when it comes to, among other things,gender and ethnicity. This study aims to investigate how, in systemdevelopment projects where machine learning is used, one works tocounteract discriminatory outcomes. The study examines both thefactors that contribute to the emergence of discriminatoryoutcomes, as well as the solutions that exist to counteract theproblem. The study is conducted at a global IT consultingcompany.To investigate the area, a study, with qualitative researchmethodology, has been conducted. The empirical material has beencollected through six semi-structured interviews. All respondentswho participated in the study work within the same organization, indifferent projects and with varying experiences in the area. Therespondents have been selected through a subjective selectionbased on their experience in the field in relation to the purpose ofthe study.The results of the study show that the decisive factor for theemergence of discrimination is the training data which the modelsare trained with. The majority of solutions to counteractdiscriminatory outcomes have also been identified. The results ofthe study differ to some extent from the previous research done inthe field. Regarding factors, previous research and the results of thestudy agree that data is the decisive factor that contributes todiscriminatory outcomes arising from machine learning models.The main difference among the solutions is that previous researchshows more specific techniques, which are used to identify ormitigate discriminatory outcomes, while the results of the studyshow softer values and almost no specific techniques at all. In theresults of the study, for example, the individual is seen as a centralpart of the process instead of automatic techniques and tools.The study concludes that data is the most decisive factor indiscriminatory outcomes in machine learning models. The modelsare not discriminatory in themselves, they only reflect the trainingdata. If the data contains discrimination, the model will learn thisand ultimately give discriminatory outcomes. The very basicproblem for this is the human being, who creates the prejudices thatexist in society and from which the data is collected. At the sametime, man is a central part of the process of reducing discriminatoryoutcomes and is needed to counteract this problem. / I en värld där artificiell intelligens och maskininlärning växer ochsprids i samhället ökar samtidigt dess påverkan och konsekvens förmänniskor. Tekniken används i tjänster som människor användervarje dag. Både privat men även i ett kommersiellt sammanhang,exempelvis sociala medier och för att identifiera bedrägerier inombanksektorn. Tidigare studier visar att maskininlärningsmodellerkan ge diskriminerande utfall när det kommer till bland annat könoch etnicitet. Denna studie syftar till att undersöka hur man, isystemutvecklingsprojekt där maskininlärning används, arbetar föratt motverka diskriminerande utfall. Studien undersöker både vilkafaktorer som bidrar till att diskriminerande utfall uppstår, samtvilka lösningar som finns för att motverka problemet. Studiengenomförs på ett globalt IT-konsultbolag.För att undersöka området har en studie, med kvalitativforskningsmetodik genomförts. Det empiriska materialet harsamlats in via sex stycken semistrukturerade intervjuer. Samtligarespondenter som deltagit i studien arbetar inom sammaorganisation i olika systemutvecklingsprojekt samt med varierandeerfarenheter inom området. Respondenterna har valts ut genom ettsubjektivt urval baserad på deras erfarenhet inom området samt irelation med studiens syfte.Studiens resultat visar att den mest avgörande faktorn för uppkomstav diskriminering är träningsdatat som modellerna tränas med.Flertalet lösningar för att motverka diskriminerande utfall har ävenidentifierats i studien. Studiens resultat skiljer sig till viss del motden tidigare forskning som gjorts inom området. Gällande faktorerär tidigare forskning och studiens resultat eniga om att datat är denavgörande faktorn som bidrar att diskriminerande utfall uppstårfrån maskininlärningsmodeller. Den största skillnaden blandlösningarna är att tidigare forskning visar på mer specifika teknikeroch verktyg som används för att identifiera eller mildradiskriminerande utfall, medan resultatet i studien visar mer mjukavärden och nästan inga specifika tekniker alls. I studiens resultatses exempelvis den enskilda individen som en central del iprocessen istället för automatiska tekniker och verktyg. Vidareframkommer det i resultatet blandade åsikter gällande ansvaret förmaskininlärningsmodeller samt behov av regleringar på området.Studiens slutsats är att datat är den mest avgörande faktorn till attdiskriminerande utfall uppstår i maskininlärningsmodeller.Modellerna är inte diskriminerande i sig, utan de speglar bara8. Handledare9. Examinator10. Termin11. Övrigt/AnmärkningKomplettera i alla blanka fält. Gråmarkerade fält skall kompletteras när det finns anledning. I annatfall ska de avlägsnas. För mer information se ”HANDLÄGGNING AV RAPPORT, DEL AV SJÄLVSTÄNDIGT ARBETE(EXAMENSARBETE), INOM NMT”, MIUN 2015/XXX. Det är examinator som är ansvarig för innehållet idetta dokument.träningsdatat. Om datat innehåller diskriminering kommermodellen att lära sig detta och slutligen ge diskriminerande utfall.Själva grundproblemet till detta är människan som skapat defördomar som finns i samhället vilket är där träningsdatat samlas infrån. Samtidigt visar studiens resultat att människan idag är encentral del i processen med att både motverka och identifieradiskriminerande utfall från maskininlärningsmodeller
15

Sensor-based jump detection and classification with machine learning in trampoline gymnastics

Woltmann, Lucas, Hartmann, Claudio, Lehner, Wolfgang, Rausch, Paul, Ferger, Katja 22 April 2024 (has links)
The task of the judge of difficulty in trampoline gymnastics is to check the elements and difficulty values entered on the competition cards and the difficulty of each element according to a numeric system. To do this, the judge must count all somersaults and twists for each jump during a routine and thus record the difficulty of the routine. This assessment can be automated with the help of inertial measurement units (IMUs) and facilitate the judges’ task during the competition. Currently, there is no known reliable method for the automated detection and recognition of the various elements to determine the difficulty of an exercise in trampoline gymnastics. Accordingly, a total of 2076 jumps and 50 different jump types were recorded over the course of several training sessions. In the first instance, 10 different jump types were used to train different machine learning (ML) models. Eight ML models were used for the automatic jump classification. Supervised learning approaches include a naive classifier, deep feedforward neural network, convolutional neural network, k‑nearest neighbors, Gaussian naive Bayes, support-vector classification, gradient boosting classifier, and stochastic gradient descent. When all classifiers were compared for accuracy, i.e., how many jumps were correctly detected by the ML model, the deep feedforward neural network and the convolutional neural network provided the best matches with 96.4 and 96.1%, respectively. The findings of this study will help to develop the automated classification of sensor-based data to support the judge and, simultaneously, for automated training logging.
16

INFLUENCE OF SAMPLE DENSITY, MODEL SELECTION, DEPTH, SPATIAL RESOLUTION, AND LAND USE ON PREDICTION ACCURACY OF SOIL PROPERTIES IN INDIANA, USA

Samira Safaee (17549649) 09 December 2023 (has links)
<p dir="ltr">Digital soil mapping (DSM) combines field and laboratory data with environmental factors to predict soil properties. The accuracy of these predictions depends on factors such as model selection, data quality and quantity, and landscape characteristics. In our study, we investigated the impact of sample density and the use of various environmental covariates (ECs) including slope, topographic position index, topographic wetness index, multiresolution valley bottom flatness, and multiresolution ridge top flatness, as well as the spatial resolution of these ECs on the predictive accuracy of four predictive models; Cubist (CB), Random Forest (RF), Regression Kriging (RK), and Ordinary Kriging (OK). Our analysis was conducted at three sites in Indiana: the Purdue Agronomy Center for Research and Education (ACRE), Davis Purdue Agriculture Center (DPAC), and Southeast Purdue Agricultural Center (SEPAC). Each site had its unique soil data sampling designs, management practices, and topographic conditions. The primary focus of this study was to predict the spatial distribution of soil properties, including soil organic matter (SOM), cation exchange capacity (CEC), and clay content, at different depths (0-10cm, 0-15cm, and 10-30cm) by utilizing five environmental covariates and four spatial resolutions for the ECs (1-1.5 m, 5 m, 10 m, and 30 m).</p><p dir="ltr">Various evaluation metrics, including R<sup>2</sup>, root mean square error (RMSE), mean square error (MSE), concordance coefficient (pc), and bias, were used to assess prediction accuracy. Notably, the accuracy of predictions was found to be significantly influenced by the site, sample density, model type, soil property, and their interactions. Sites exhibited the largest source of variation, followed by sampling density and model type for predicted SOM, CEC, and clay spatial distribution across the landscape.</p><p dir="ltr">The study revealed that the RF model consistently outperformed other models, while OK performed poorly across all sites and properties as it only relies on interpolating between the points without incorporating the landscape characteristics (ECs) in the algorithm. Increasing sample density improved predictions up to a certain threshold (e.g., 66 samples at ACRE for both SOM and CEC; 58 samples for SOM and 68 samples for CEC at SEPAC), beyond which the improvements were marginal. Additionally, the study highlighted the importance of spatial resolution, with finer resolutions resulting in better prediction accuracy, especially for SOM and clay content. Overall, comparing data from the two depths (0-10cm vs 10-30cm) for soil properties predications, deeper soil layer data (10-30cm) provided more accurate predictions for SOM and clay while shallower depth data (0-10cm) provided more accurate predictions for CEC. Finally, higher spatial resolution of ECs such as 1-1.5 m and 5 m contributed to more accurate soil properties predictions compared to the coarser data of 10 m and 30 m resolutions.</p><p dir="ltr">In summary, this research underscores the significance of informed decisions regarding sample density, model selection, and spatial resolution in digital soil mapping. It emphasizes that the choice of predictive model is critical, with RF consistently delivering superior performance. These findings have important implications for land management and sustainable land use practices, particularly in heterogeneous landscapes and areas with varying management intensities.</p>
17

Exploring the Correlation Between Reading Ability and Mathematical Ability : KTH Master thesis report

Sol, Richard, Rasch, Alexander January 2023 (has links)
Reading and mathematics are two essential subjects for academic success and cognitive development. Several studies show a correlation between the reading ability and mathematical ability of pupils (Korpershoek et al., 2015; Ní Ríordáin &amp; O’Donoghue, 2009; Reikerås, 2006; Walker et al., 2008). The didactical part of this thesis presents a study investigating a correlation between reading ability and mathematical ability among pupils in upper secondary schools in Sweden. This study collaborated with Lexplore AB to use machine learning and eye-tracking to measure reading ability. Mathematical ability was measured with Mathematics 1c grades and Stockholmsprovet, which is a diagnostic mathematics test. Although no correlation was found, there are several insights about selection and measures following the result that may improve future studies on the subject. This thesis finds that the result could have been affected by a biased selection of the participants. This thesis also suggests that the measure through machine learning and eye-tracking used in the study may not fully capture the concept of reading ability as defined in previous studies. The technological aspect of this thesis focuses on modifying and improving the model used to calculate users’ reading ability scores. As the model’s estimation tends to plateau after the fifth year of compulsory school, the study aims to maintain the same level of progression observed before this point. Previous research indicates that silent reading, being unconstrained by vocalization, is faster than reading aloud. To address this progression flattening, a grid search algorithm was employed to adjust hyperparameters and assign appropriate weight to silent and aloud reading. The findings emphasize that reading aloud should be prioritized in the weighted average and the corresponding hyperparameters adjusted accordingly. Furthermore, gathering more data for older pupils can improve the machine learning model by accounting for individual reading strategies. Introducing different word complexity factors can also enhance the model’s performance. / Läsning och matematik är två avgörande ämnen för akademisk framgång och kognitiv utveckling. Flera studier visar på ett samband mellan elevers läsförmåga och matematiska förmåga (Korpershoek et al., 2015; Ní Ríordáin &amp; O’Donoghue, 2009; Reikerås, 2006; Walker et al., 2008). Den didaktiska delen av denna rapport presenterar en studie som undersöker sambandet mellan läsförmåga och matematisk förmåga hos elever på gymnasiet i Sverige. Studien samarbetade med Lexplore AB för att använda maskininlärning och ögonspårning för att mäta läsförmåga. Matematisk förmåga mättes genom matematikbetyg och Stockholms provet, som är ett diagnostiskt matematiktest. Trotsatt inget samband hittades uppges insikter om urvalet och åtgärder som kan förbättra framtida studier i ämnet. Rapporten konstaterar att resultatet kan ha påverkats avett sned vridet urval av deltagare. Dessutom föreslår rapporten att mätningen genom maskininlärning och ögonspårning som användes i studien kanske inte helt fångar upp begreppet läsförmåga som används i tidigare studier. Teknikdelen av denna rapport fokuserar på att modifiera och förbättra modellen som används för att beräkna användarnas läsförmågepoäng. Eftersom modellens uppskattning tenderar att avplattas efter femte året i grundskola, syftar studien till att bibehålla samma nivå av progression som observerats före denna punkt. Tidigare forskning indikerar att tyst läsning, som inte begränsas av att uttala orden, är snabbare än högläsning. För att adressera denna avplattning av progression användes en rutnätssöknings-algoritm för att justera hyperparametrar och tilldela rätt viktning åt tyst läsning. Resultaten betonar att högläsning bör prioriteras i viktade medelvärdet och att motsvarande justeringar av hyperparametrar bör implementeras. Dessutom kan insamling av mer data för äldre elever förbättra maskininlärningsmodellen genom att ta hänsyn till individuella lässtrategier. Införandet av olika faktorer för textkomplexitet kan också förbättra modellens prestanda.

Page generated in 0.0296 seconds