Global ETD Search

351	Bedömning av elevuppsatser genom maskininlärning / Essay Scoring for Swedish using Machine Learning Dyremark, Johanna, Mayer, Caroline January 2019 (has links) Betygsättning upptar idag en stor del av lärares arbetstid och det finns en betydande inkonsekvens vid bedömning utförd av olika lärare. Denna studie ämnar undersöka vilken träffsäkerhet som en automtiserad bedömningsmodell kan uppnå. Tre maskininlärningsmodeller för klassifikation i form av Linear Discriminant Analysis, K-Nearest Neighbor och Random Forest tränas och testas med femfaldig korsvalidering på uppsatser från nationella prov i svenska. Klassificeringen baseras på språk och formrelaterade attribut inkluderande ord och teckenvisa längdmått, likhet med texter av olika formalitetsgrad och grammatikrelaterade mått. Detta utmynnar i ett maximalt quadratic weighted kappa-värde på 0,4829 och identisk överensstämmelse med expertgivna betyg i 57,53 % av fallen. Dessa resultat uppnåddes av en modell baserad på Linear Discriminant Analysis och uppvisar en högre korrelation med expertgivna betyg än en ordinarie lärare. Trots pågående digitalisering inom skolväsendet kvarstår ett antal hinder innan fullständigt maskininlärningsbaserad bedömning kan realiseras, såsom användarnas inställning till tekniken, etiska dilemman och teknikens svårigheter med förståelse av semantik. En delvis integrerad automatisk betygssättning har dock potential att identifiera uppsatser där behov av dubbelrättning föreligger, vilket kan öka överensstämmelsen vid storskaliga prov till en låg kostnad. / Today, a large amount of a teacher’s workload is comprised of essay scoring and there is a large variability between teachers’ gradings. This report aims to examine what accuracy can be acceived with an automated essay scoring system for Swedish. Three following machine learning models for classification are trained and tested with 5-fold cross-validation on essays from Swedish national tests: Linear Discriminant Analysis, K-Nearest Neighbour and Random Forest. Essays are classified based on 31 language structure related attributes such as token-based length measures, similarity to texts with different formal levels and use of grammar. The results show a maximal quadratic weighted kappa value of 0.4829 and a grading identical to expert’s assessment in 57.53% of all tests. These results were achieved by a model based on Linear Discriminant Analysis and showed higher inter-rater reliability with expert grading than a local teacher. Despite an ongoing digitilization within the Swedish educational system, there are a number of obstacles preventing a complete automization of essay scoring such as users’ attitude, ethical issues and the current techniques difficulties in understanding semantics. Nevertheless, a partial integration of automatic essay scoring has potential to effectively identify essays suitable for double grading which can increase the consistency of large-scale tests to a low cost. Automated essay scoring machine learning classification linear discriminant analysis k-nearest neighbour random forest language technology natural language processing Computer and Information Sciences Data- och informationsvetenskap
352	Automatic Feature Extraction for Human Activity Recognitionon the Edge Cleve, Oscar, Gustafsson, Sara January 2019 (has links) This thesis evaluates two methods for automatic feature extraction to classify the accelerometer data of periodic and sporadic human activities. The first method selects features using individual hypothesis tests and the second one is using a random forest classifier as an embedded feature selector. The hypothesis test was combined with a correlation filter in this study. Both methods used the same initial pool of automatically generated time series features. A decision tree classifier was used to perform the human activity recognition task for both methods.The possibility of running the developed model on a processor with limited computing power was taken into consideration when selecting methods for evaluation. The classification results showed that the random forest method was good at prioritizing among features. With 23 features selected it had a macro average F1 score of 0.84 and a weighted average F1 score of 0.93. The first method, however, only had a macro average F1 score of 0.40 and a weighted average F1 score of 0.63 when using the same number of features. In addition to the classification performance this thesis studies the potential business benefits that automation of feature extractioncan result in. / Denna studie utvärderar två metoder som automatiskt extraherar features för att klassificera accelerometerdata från periodiska och sporadiska mänskliga aktiviteter. Den första metoden väljer features genom att använda individuella hypotestester och den andra metoden använder en random forest-klassificerare som en inbäddad feature-väljare. Hypotestestmetoden kombinerades med ett korrelationsfilter i denna studie. Båda metoderna använde samma initiala samling av automatiskt genererade features. En decision tree-klassificerare användes för att utföra klassificeringen av de mänskliga aktiviteterna för båda metoderna. Möjligheten att använda den slutliga modellen på en processor med begränsad hårdvarukapacitet togs i beaktning då studiens metoder valdes. Klassificeringsresultaten visade att random forest-metoden hade god förmåga att prioritera bland features. Med 23 utvalda features erhölls ett makromedelvärde av F1 score på 0,84 och ett viktat medelvärde av F1 score på 0,93. Hypotestestmetoden resulterade i ett makromedelvärde av F1 score på 0,40 och ett viktat medelvärde av F1 score på 0,63 då lika många features valdes ut. Utöver resultat kopplade till klassificeringsproblemet undersöker denna studie även potentiella affärsmässiga fördelar kopplade till automatisk extrahering av features. Human Activity Recognition Automatic Feature Extraction Automatic Feature Selection Automated Machine Learning Random Forest Classifier Hypothesis Test Computer and Information Sciences Data- och informationsvetenskap
353	Machine Learning for Activity Recognition of Dumpers Axelsson, Henrik, Wass, Daniel January 2019 (has links) The construction industry has lagged behind other industries in productivity growth rate. Earth-moving sites, and other practices where dumpers are used, are no exceptions. Such projects lack convenient and accurate solutions for utilization mapping and tracking of mass flows, which both currently and mainly rely on manual activity tracking. This study intends to provide insights of how autonomous systems for activity tracking of dumpers can contribute to the productivity at earthmoving sites. Autonomous systems available on the market are not implementable to dumper fleets of various manufacturers and model year, whereas this study examines the possibilities of using activity recognition by machine learning for a system based on smartphones mounted in the driver’s cabin. Three machine learning algorithms (naive Bayes, random forest and feed-forward backpropagation neural network) are trained and tested on data collected by smartphone sensors. Conclusions are that machine learning models, and particularly the neural network and random forest algorithms, trained on data from a standard smartphone, are able to estimate a dumper’s activities at a high degree of certainty. Finally, a market analysis is presented, identifying the innovation opportunity for a potential end-product as high. / Byggnadsbranschen har halkat efter andra branscher i produktivitetsökning. Markarbetesprojekt och andra arbeten där dumprar används är inga undantag. Sådana projekt saknar användarvänliga system för att kartlägga maskinutnyttjande och massaflöde. Nuvarande lösningar bygger framförallt på manuellt arbete. Denna studie syftar skapa kännedom kring hur autonoma system för aktivitetsspårning av dumprar kan öka produktiviteten på markarbetesprojekt. Befintliga autonoma lösningar är inte implementerbara på maskinparker med olika fabrikat eller äldre årsmodeller. Denna studie undersöker möjligheten att applicera aktivitetsigenkänning genom maskininlärning baserad på smartphones placerade i förarhytten för en sådan autonom lösning. Tre maskininlärningsalgoritmer (naive Bayes, random forest och backpropagation neuralt nätverk) tränas och testas på data från sensorer tillgängliga i vanliga smartphones. Studiens slutsatser är att maskininlärningsmodeller, i synnerhet neuralt nätverk och random forest-algoritmerna, tränade på data från vanliga smartphones, till hög grad kan känna igen en dumpers aktiviteter. Avslutningsvis presenteras en marknadsanalys som bedömer innovationsmöjligheten för en eventuell slutprodukt som hög. Civil engineering earth-moving dumper machine learning naive bayes random forest neural networks smartphone sensors accelerometer gyroscope. Computer and Information Sciences Data- och informationsvetenskap
354	Encoding Temporal Healthcare Data for Machine Learning Laczik, Tamás January 2021 (has links) This thesis contains a review of previous work in the fields of encoding sequential healthcare data and predicting graft- versus- host disease, a medical condition, based on patient history using machine learning. A new encoding of such data is proposed for machine learning purposes. The proposed encoding, called bag of binned weighted events, is a combination of two strategies proposed in previous work, called bag of binned events and bag of weighted events. An empirical experiment is designed to evaluate the predictive performance of the proposed encoding over various binning windows to that of the previous encodings, based on the area under the receiver operating characteristic curve (AUC) metric. The experiment is carried out on real- world healthcare data obtained from Swedish registries, using the random forest and the logistic regression algorithms. After filtering the data, solving quality issues and tuning hyperparameters of the models, final results are obtained. These results indicate that the proposed encoding strategy performs on par, or slightly better than the bag of weighted events, and outperforms the bag of binned events in most cases. However, differences in metrics show small differences. It is also observed that the proposed encoding usually performs better with longer binning windows which may be attributed to data noise. Future work is proposed in the form of repeating the experiment with different datasets and models, as well as changing the binning window length of the baseline algorithms. / Denna avhandling innehåller en recension av tidigare arbete inom områden av kodning av sekventiell sjukvårdsdata och förutsägelse av transplantat- mot- värdsjukdom, ett medicinskt tillstånd, baserat på patienthistoria med maskininlärning. En ny kodning av sådan data föreslås i maskininlärningssyfte. Den föreslagna kodningen, kallad bag of binned weighted events, är en kombination av två strategier som föreslagits i tidigare arbete, kallad bag of binned events och bag of weighted events. Ett empiriskt experiment är utformat för att utvärdera den föreslagna prestandan för den föreslagna kodningen över olika binningfönster jämfört med tidigare kodningar, baserat på AUC- måttet. Experimentet utförs på verkliga sjukvårdsdata som erhållits från svenska register, med random forest och logistic regression. Efter filtrering av data, lösning av kvalitetsproblem och justering av hyperparametrar för modellerna, erhålls slutliga resultat. Dessa resultat indikerar att den föreslagna kodningsstrategin presterar i nivå med, eller något bättre än bag of weighted events, och överträffar i de flesta fall bag of binned events. Skillnader i mått är dock små. Det observeras också att den föreslagna kodningen vanligtvis fungerar bättre med längre binningfönster som kan tillskrivas dataljud. Framtida arbete föreslås i form av att upprepa experimentet med olika datamängder och modeller, samt att ändra binningfönstrets längd för basalgoritmerna. Machine Learning Temporal Data Disease Prediction Feature Engineering Random Forest Logistic Regression Maskininlärning tidsdata förutsägelse av sjukdom funktionsteknik slumpmässig skog logistisk regression Computer and Information Sciences Data- och informationsvetenskap
355	Algorithmic Approaches to Output Prediction in a Virtual Power Plant / Algoritmiska Tillvägagångssätt för Effektprognoser i ett Virtuellt Kraftverk Rosing, Johannes, Ekhed, Oscar January 2023 (has links) Virtual Power Plants (VPPs) are an emerging form of technology that allows owners of electricity producing appliances, such as electric vehicles, to partake in a pool of producers of sustainable energy. The Swedish electricity grid owner Svenska Kraftnät hosts a platform where VPPs act as intermediaries between energy producing customers and third party buyers. A requirement to participate in these transactions, however, is to post a bid specifying the amount of power that can be produced from a VPP during a given hour at least 48 hours into the future. This is where forecasting comes into the picture. This report compares the accuracy of eight different machine learning models when tasked with forecasting power output using the same training data from an electric vehicle-based VPP. The study also examines which inferences about customer behavior can be drawn from the same data and give strategic recommendations to VPPs based on the findings of the study. Upon evaluating the results, it was found that deep learning models outperformed autoregressive models, which in turn outperformed Random Forest Regression and Support Vector Regression. As for customer behaviors found in the data, a small negative correlation between spot prices and delivered output was found, suggesting that customers limit their charging when spot prices are high. Further, more power is generally produced during nighttime and on weekends. The data also shows an autocorrelation with a lag of 24 hours, suggesting that charging behaviors on a given day influence charging behaviors the subsequent day. / Virtuella kraftverk (VPPs) är en framväxande form av teknologi som tillåter ägare av elproducerande enheter, till exempel elbilar, att delta i ett nätverk av producenter av hållbar energi. Den svenska elnätsägaren Svenska Kraftnät driver en plattform där VPPs agerar mellanhänder mellan energiproducerande kunder och tredjepartsköpare. Ett krav för att delta i budgivningen är dock att som VPP kunna lägga ett bud som specificerar hur stor effekt som kan produceras under en viss timme, minst 48 timmar i framtiden. Här kommer prognoser in i bilden. Denna rapport jämför precisionen för åtta olika maskininlärningsmodeller som har i uppgift att predicera effektproduktion med hjälp av samma data från ett elbilsbaserat VPP. Denna studie undersöker också vilka slutsatser som kan dras angående kundbeteenden från given data och ger strategiska rekommendationer baserat på studiens resultat. Efter utvärdering av resultaten kunde det konstateras att Deep Learning-modeller överträffade autoregressiva modeller, som i sin tur överträffade Random Forest Regression och Support Vector Regression. Vad gäller kundbeteenden i given data, kan sägas att en låg negativ korrelation fanns mellan spotpriser och effektproduktion, vilket tyder på att kunder begränsar laddning av elbilar när spotpriserna är höga. Vidare kan sägas att mer effekt generellt sett produceras på kvällar och helger. Studiens data visar också på en autokorrelation med en eftersläpning (lag) på 24 timmar, vilket tyder på att laddningsmönster under en given dag influerar laddningsmönster under nästkommande dag. Autoregressive Models Deep Learning Machine Learning Power Output Random Forest Regression Strategic Recommendations Support Vector Regression Virtual Power Plant Computer and Information Sciences Data- och informationsvetenskap
356	Swedish Stock and Index Price Prediction Using Machine Learning Wik, Henrik January 2023 (has links) Machine learning is an area of computer science that only grows as time goes on, and there are applications in areas such as finance, biology, and computer vision. Some common applications are stock price prediction, data analysis of DNA expressions, and optical character recognition. This thesis uses machine learning techniques to predict prices for different stocks and indices on the Swedish stock market. These techniques are then compared to see which performs best and why. To accomplish this, we used some of the most popular models with sets of historical stock and index data. Our best-performing models are linear regression and neural networks, this is because they are the best at handling the big spikes in price action that occur in certain cases. However, all models are affected by overfitting, indicating that feature selection and hyperparameter optimization could be improved. Stock Price Prediction Machine Learning Time Series Analysis Linear Regression K-Nearest Neighbors Random Forest Support Vector Machines Neural Networks Probability Theory and Statistics Sannolikhetsteori och statistik
357	An Investigation and Comparison of Machine Learning Methods for Selecting Stressed Value-at-Risk Scenarios Tennberg, Moa January 2023 (has links) Stressed Value-at-Risk (VaR) is a statistic used to measure an entity's exposure to market risk by evaluating possible extreme portfolio losses. Stressed VaR scenarios can be used as a metric to describe the state of the financial market and can be used to detect and counter procyclicality by allowing central clearing counterparities (CCP) to increase margin requirements. This thesis aims to implement and evaluate machine learning methods (e.g., neural networks) for selecting stressed VaR scenarios in price return stock datasets where one liquidity day is assumed. The models are implemented to counter the procyclical effects present in NASDAQ's dual lambda method such that the selection maximises the total margin metric. Three machine learning models are implemented together with a labelling algorithm, a supervised and unsupervised multilayer perceptron and a random forest model. The labelling algorithm employs a deviation metric to differentiate between stressed VaR and standard scenarios. The models are trained and tested using 5000 scenarios of price return values from historical stock datasets. The models are tested using visual results, confusion matrix, Cohen's kappa statistic, the adjusted rand index and the total margin metric. The total margin metric is computed using normalised profit and loss values from artificially generated portfolios. The implemented machine learning models and the labelling algorithm manage to counter the procyclical effects evident in the dual lambda method and selected stressed VaR scenarios such that the selection maximise the total margin metric. The random forest model shows the most promise in classifying stressed VaR scenarios, since it manages to maximise the total margin overall. Value-at-Risk Total margin Procyclicality Machine learning Binary classification Supervised learning Unsupervised learning Random forest Multilayer perceptron Computer Sciences Datavetenskap (datalogi)
358	Maskininlärning med konform förutsägelse för prediktiva underhållsuppgifter i industri 4.0 / Machine Learning with Conformal Prediction for Predictive Maintenance tasks in Industry 4.0 : Data-driven Approach Liu, Shuzhou, Mulahuko, Mpova January 2023 (has links) This thesis is a cooperation with Knowit, Östrand \& Hansen, and Orkla. It aimed to explore the application of Machine Learning and Deep Learning models with Conformal Prediction for a predictive maintenance situation at Orkla. Predictive maintenance is essential in numerous industrial manufacturing scenarios. It can help to reduce machine downtime, improve equipment reliability, and save unnecessary costs. In this thesis, various Machine Learning and Deep Learning models, including Decision Tree, Random Forest, Support Vector Regression, Gradient Boosting, and Long short-term memory, are applied to a real-world predictive maintenance dataset. The Orkla dataset was originally planned to use in this thesis project. However, due to some challenges met and time limitations, one NASA C-MAPSS dataset with a similar data structure was chosen to study how Machine Learning models could be applied to predict the remaining useful lifetime (RUL) in manufacturing. Besides, conformal prediction, a recently developed framework to measure the prediction uncertainty of Machine Learning models, is also integrated into the models for more reliable RUL prediction. The thesis project results show that both the Machine Learning and Deep Learning models with conformal prediction could predict RUL closer to the true RUL while LSTM outperforms the Machine Learning models. Also, the conformal prediction intervals provide informative and reliable information about the uncertainty of the predictions, which can help inform personnel at factories in advance to take necessary maintenance actions. Overall, this thesis demonstrates the effectiveness of utilizing machine learning and Deep Learning models with Conformal Prediction for predictive maintenance situations. Moreover, based on the modeling results of the NASA dataset, some insights are discussed on how to transfer these experiences into Orkla data for RUL prediction in the future. Machine Learning Deep Learning Uncertainty estimation Conformal prediction Predictive maintenance RUL Probabilistic predictions Decision Tree Random Forest Support Vector Regression Gradient Boosting LSTM Computer Sciences Datavetenskap (datalogi)
359	Predicting Chronic Kidney Disease using a multimodal Machine Learning approach Mishra, Aakruti, Puthiyandi, Navaneeth January 2023 (has links) Chronic Kidney Disease (CKD) is a common and dangerous health condition that requires early detection and treatment to be effective. Current diagnostic methods are time-consuming and expensive. In this research, we hope to construct a predictive model for CKD utilizing a combination of time series and static variables for early detection of CKD. In this study, we investigate the influence of multimodal approach by combining the predictions from multiple models that utilize different modalities. The ROCKET method is utilized for classification using time series features, whilst the Random Forest approach is employed for static data. XGBoost has been utilized to gain information about feature importance among labs and demographics-comorbidities data. In this study, we use the MIMIC-III database, adopting various strategies to handle data and class imbalance, such as stratification, balancing techniques, and backwards and forward fill for missing value imputation. The evaluation metrics for CKD and non-CKD class labels include precision, recall, F1, and accuracy. Our findings show that aggregating time series data produce contrasting results for labs compared to vitals data. We also addressed the significance of the different demographic, comorbidities and lab events features. The findings indicate that a multimodal approach did not show significant advantages over individual models when the individual models performed suboptimal. The study also found that Ethnicity is more significant than age and gender in predicting CKD. Furthermore, the study revealed some significant features from lab events and comorbidities. The study also provides some recommendations for future work to explore the potential of a multimodal approach further. Chronic kidney disease Multimodal approach ROCKET Random Forest XGBoost MIMIC-III database Data imbalance Temporal and static modalities Soft voting Computer Sciences Datavetenskap (datalogi)
360	Detecting Lumbar Muscle Fatigue Using Nanocomposite Strain Gauges Billmire, Darci Ann 26 June 2023 (has links) (PDF) Introduction: Muscle fatigue can contribute to acute flare-ups of lower back pain with associated consequences such as pain, disability, lost work time, increased healthcare utilization, and increased opioid use and potential abuse. The SPINE Sense system is a wearable device with 16 high deflection nanocomposite strain gauge sensors on kinesiology tape which is adhered to the skin of the lower back. This device is used to correlate lumbar skin strains with the motion of the lumbar vertebrae and to phenotype lumbar spine motion. In this work it was hypothesized that the SPINE Sense device can be used to detect differences in biomechanical movements consequent to muscle fatigue. A human subject study was completed with 30 subjects who performed 14 functional movements before and after fatiguing their back muscles through the Biering-SÃ¸rensen endurance test with the SPINE Sense device on their lower back collecting skin strain data. Various features from the strain gauge sensors were extracted from these data and were used as inputs to a random forest classification machine learning model. The accuracy of the model was assessed under two training/validation conditions, namely a hold-out method and a leave-one-out method. The random forest classification models were able to achieve up to 84.22% and 78.37% accuracies for the hold-out and leave-one-out methods respectively. Additionally, a system usability study was performed by presenting the device to 32 potential users (clinicians and individuals with lower back pain) of their device. They received a scripted explanation of the use of the device and were then instructed to score it with the validated System Usability Score. In addition they were given the opportunity to voice concerns, questions, and offer any other additional feedback about the design and use of the device. The average System Usability Score from all participants from the system usability study was 72.03 with suggestions of improving the robustness of electrical connections and smaller profiles of accompanying electronics. Feedback from the potential users of the device was used to make more robust electrical connections and smaller wires and electronics modules. These improvements were achieved by making a two-piece design: one piece contains the sensors on kinesiology tape that is directly attached to the patient and the other one contains the wires sewn into stretch fabric to create stretchable electronic connections to the device. It is concluded that a machine-learning model of the data from the SPINE Sense device can classify lumbar motion with sufficient accuracy for clinical utility. It is also concluded that the device is usable and intuitive to use. muscle fatigue low back pain high deflection strain gauges nanocomposite sensors system usability biomechanics sensors machine learning random forest classification and cross-validation Engineering

Search results