Global ETD Search

61	Diferenční analýza multilingválního řečového korpusu pacientů s neurodegenerativními onemocněními / Differential analysis of multilingual corpus in patients with neurodegenerative diseases Kováč, Daniel January 2020 (has links) This diploma thesis focuses on the automated diagnosis of hypokinetic dysarthria in the multilingual speech corpus, which is a motor speech disorder that occurs in patients with neurodegenerative diseases such as Parkinson’s disease. The automatic speech recognition approach to diagnosis is based on the acoustic analysis of speech and subsequent use of mathematical models. The popularity of this method is on the rise due to its objectivity and the possibility of working simultaneously on different languages. The aim of this work is to find out which acoustic parameters have high discriminative power and are universal for multiple languages. To achieve this, a statistical analysis of parameterized speech tasks and subsequent modelling by machine learning methods was used. The analyses were performed for Czech, American English, Hungarian and all languages together. It was found that only some parameters enable the diagnosis of the hypokinetic disorder and are, at the same time, universal for multiple languages. The relF2SD parameter shows the best results, followed by the NST parameter. When classifying speakers of all the languages together, the model achieves accuracy of 59 % and sensitivity of 72 %.
62	Predicting Multimodal Rehabilitation Outcomes using Machine Learning Cheltuitor, Alexandru, Jones-Quartey, Niklas January 2020 (has links) Chronic pain is a complex health issue and a major cause of disability worldwide. Although multimodal rehabilitation (MMR) has been recognized as an effective form of treatment for chronic pain, some patients do not benefit from it. If treatment outcomes could be reliably predicted, then patients who would benefit more from MMR could be prioritized over others. Machine learning has been proven capable of accurately predicting outcomes in other healthcare related domains. Therefore, this study aims to investigate the use of it to predict outcomes of MMR, using data from the Swedish Quality Registry for Pain Rehabilitation (SQRP). XGBoost regression was used for this purpose, and its predictive performance was compared to Ridge regression. 12 models were trained on SQRP data for each algorithm, in order to predict pain and quality of life related outcomes. The results show similar performances for both algorithms, with mean cross-validated R² values of 0.323 and 0.321 for the XGBoost and Ridge models respectively. The average root mean squared errors of 6.744 for XGBoost and 6.743 for Ridge were similar as well. Since XGBoost performed similarly to a less computationally expensive method, the use of this method for MMR outcome prediction was not supported by the results of this study. However, machine learning has the potential to be more effective for this purpose, through the use of different hyperparameter values, correlation-based feature selection or other machine learning algorithms. Machine learning XGBoost regression Multimodal Rehabilitation SQRP chronic pain treatment outcome prediction Information Systems, Social aspects
63	BNPL Probability of Default Modeling Including Macroeconomic Factors: A Supervised Learning Approach Hardin, Patrik, Ingre, Robert January 2021 (has links) In recent years, the Buy Now Pay Later (BNPL) consumer credit industry associated with e-commerce has been rapidly emerging as an alternative to credit cards and traditional consumer credit products. In parallel, the regulation IFRS 9 was introduced in 2018 requiring creditors to become more proactive in forecasting their Expected Credit Losses and include the impact of macroeconomic factors. This study evaluates several methods of supervised statistical learning to model the Probability of Default (PD) for BNPL credit contracts. Furthermore, the study analyzes to what extent macroeconomic factors impact the prediction under the requirements in IFRS 9 and was carried out as a case study with the Swedish fintech firm Klarna. The results suggest that XGBoost produces the highest predictive power measured in Precision-Recall and ROC Area Under Curve, with ROC values between 0.80 and 0.91 in three modeled scenarios. Moreover, the inclusion of macroeconomic variables generally improves the Precision-Recall Area Under Curve. Real GDP growth, housing prices, and unemployment rate are frequently among the most important macroeconomic factors. The findings are in line with previous research on similar industries and contribute to the literature on PD modeling in the BNPL industry, where limited previous research was identified. / De senaste åren har Buy Now Pay Later (BNPL) snabbt vuxit fram som ett alternativ till kreditkort och traditionella kreditprodukter, i synnerhet inom e-handel. Dessutom introducerades 2018 det nya regelverket IFRS 9, vilket kräver att banker och andra kreditgivare ska bli mer framåtblickande i modelleringen av sina förväntade kreditförluster, samt ta hänsyn till effekter från makroekonomiska faktorer. I denna studie utvärderas flera metoder inom statistisk inlärning för att modellera Probability of Default (PD), sannolikheten att en kreditförlust inträffar, för BNPL-kreditkontrakt. Dessutom analyseras i vilken utsträckning makroekonomiska faktorer påverkar modellernas prediktiva förmågor enligt kraven i IFRS 9. Studien genomfördes som en fallstudie med det svenska fintechföretaget Klarna. Resultaten tyder på att XGBoost har den största prediktionsförmågan mätt i Precision-Recall och ROC Area Under Curve, med ROC-värden mellan 0.80 och 0.91 i tre scenarier. Inkludering av makroekonomiska variabler förbättrar generellt PR-Area Under Curve. Real BNP-tillväxt, bostadspriser och arbetslöshet återfinns frekvent bland de viktigaste makroekonomiska faktorerna. Resultaten är i linje med tidigare forskning inom liknande branscher och bidrar till litteraturen om att modellera PD i BNPL-branschen där begränsad tidigare forskning hittades. Buy Now Pay Later IFRS 9 Probability of Default Expected Credit Loss Macroeconomic factors Machine Learning Artificial Neural Network XGBoost Other Mathematics Annan matematik
64	Sales Forecasting by Assembly of Multiple Machine Learning Methods : A stacking approach to supervised machine learning Falk, Anton, Holmgren, Daniel January 2021 (has links) Today, digitalization is a key factor for businesses to enhance growth and gain advantages and insight in their operations. Both in planning operations and understanding customers the digitalization processes today have key roles, and companies are spending more and more resources in this fields to gain critical insights and enhance growth. The fast-food industry is no exception where restaurants need to be highly flexible and agile in their work. With this, there exists an immense demand for knowledge and insights to help restaurants plan their daily operations and there is a great need for organizations to continuously adapt new technological solutions into their existing processes. Well implemented Machine Learning solutions in combination with feature engineering are likely to bring value into the existing processes. Sales forecasting, which is the main field of study in this thesis work, has a vital role in planning of fast food restaurant's operations, both for budgeting purposes, but also for staffing purposes. The word fast food describes itself. With this comes a commitment to provide high quality food and rapid service to the customers. Understaffing can risk violating either quality of the food or service while overstaffing leads to low overall productivity. Generating highly reliable sales forecasts are thus vital to maximize profits and minimize operational risk. SARIMA, XGBoost and Random Forest were evaluated on training data consisting of sales numbers, business hours and categorical variables describing date and month. These models worked as base learners where sales predictions from a specific dataset were used as training data for a Support Vector Regression model (SVR). A stacking approach to this type of project shows sufficient results with a significant gain in prediction accuracy for all investigated restaurants on a 6-week aggregated timeline compared to the existing solution. / Digitalisering har idag en nyckelroll för att skapa tillväxt och insikter för företag, dessa insikter ger fördelar både inom planering och i förståelsen om deras kunder. Det här är ett område som företag lägger mer och mer resurser på för att skapa större förståelse om sin verksamhet och på så sätt öka tillväxten. Snabbmatsindustrin är inget undantag då restauranger behöver en hög grad av flexibilitet i sina arbetssätt för att möta kundbehovet. Det här skapar en stor efterfrågan av kunskap och insikter för att hjälpa dem i planeringen av deras dagliga arbete och det finns ett stort behov från företagen att kontinuerligt implementera nya tekniska lösningar i befintliga processer. Med väl implementerade maskininlärningslösningar i kombination med att skapa mer informativa variabler från befintlig data kan aktörer skapa mervärde till redan existerande processer. Försäljningsprognostisering, som är huvudområdet för den här studien, har en viktig roll för verksamhetsplaneringen inom snabbmatsindustrin, både inom budgetering och bemanning. Namnet snabbmat beskriver sig själv, med det följer ett löfte gentemot kunden att tillhandahålla hög kvalitet på maten samt att kunna tillhandahålla snabb service. Underbemanning kan riskera att bryta någon av dessa löften, antingen i undermålig kvalitet på maten eller att inte kunna leverera snabb service. Överbemanning riskerar i stället att leda till ineffektivitet i användandet av resurser. Att generera högst tillförlitliga prognoser är därför avgörande för att kunna maximera vinsten och minimera operativ risk. SARIMA, XGBoost och Random Forest utvärderades på ett träningsset bestående av försäljningssiffror, timme på dygnet och kategoriska variabler som beskriver dag och månad. Dessa modeller fungerar som basmodeller vars prediktioner från ett specifikt testset används som träningsdata till en Stödvektorsreggresionsmodell (SVR). Att använda stapling av maskininlärningsmodeller till den här typen av problem visade tillfredställande resultat där det påvisades en signifikant förbättring i prediktionssäkerhet under en 6 veckors aggregerad period gentemot den redan existerande modellen. machine learning statistical learning statistics random forest xgboost sarima stacking support vector regression svr linear regression sales sales forcasting forecasting time series Mathematics Matematik
65	Compression Selection for Columnar Data using Machine-Learning and Feature Engineering Persson, Douglas, Juelsson Larsen, Ludvig January 2023 (has links) There is a continuously growing demand for improved solutions that provide both efficient storage and efficient retrieval of big data for analytical purposes. This thesis researches the use of machine-learning together with feature engineering to recommend the most cost-effective compression algorithm and encoding combination for columns in a columnar database management system (DBMS). The framework consists of a cost function calculated using compression time, decompression time, and compression ratio. An XGBoost machine-learning model is trained on labels provided by the cost function to recommend the most cost-effective combination for columnar data within a column or vector-oriented DBMS. While the methods are applied on ClickHouse, one of the most popular open-source column-oriented DBMS on the market, the results are broadly applicable to column-oriented data which share data type and characteristics with IoT telemetry data. Using billions of available rows of numeric real business data obtained at Axis Communications in Lund, Sweden, a set of features are engineered to accurately describe the characteristics of a given column. The proposed framework allows for weighting the business interests (compression time, decompression time, and compression ratio) to determine the individually optimal cost-effective solution. The model reaches an accuracy of 99% on the test dataset and an accuracy of 90.1% on unseen data by leveraging data features that are predictive of compression algorithms and encodings performances. Following ClickHouse strategies and the most suitable practices in the field, combinations of general-purpose compression algorithms and data encodings are analysed that together yield the best results in efficiently compressing the data of certain columns. Applying the unweighted recommended combinations on all columns, the framework’s performance impact was measured to increase the average compression speed by 95.46%. Reducing the time to compress the columns from 31.17 seconds to compress the data to 13.17 seconds. Additionally, the decompression speed was increased by 59.87%, reducing the time to decompress the columns from 2.63 seconds to 2.02 seconds, at the cost of decreasing the compression ratio by 66.05%. Increasing the storage requirements by 94.9 MB. In column and vector databases, chunks of data belonging to a certain column are often stored together on a disk. Therefore, choosing the right compression algorithm can lower the storage requirements and boost database throughput. Machine Learning XGBoost Classification Feature Engineering Compression Algorithms Data Encodings Database Management System (DBMS) Column- Oriented DBMS Computer Sciences Datavetenskap (datalogi)
66	Predicting Chronic Kidney Disease using a multimodal Machine Learning approach Mishra, Aakruti, Puthiyandi, Navaneeth January 2023 (has links) Chronic Kidney Disease (CKD) is a common and dangerous health condition that requires early detection and treatment to be effective. Current diagnostic methods are time-consuming and expensive. In this research, we hope to construct a predictive model for CKD utilizing a combination of time series and static variables for early detection of CKD. In this study, we investigate the influence of multimodal approach by combining the predictions from multiple models that utilize different modalities. The ROCKET method is utilized for classification using time series features, whilst the Random Forest approach is employed for static data. XGBoost has been utilized to gain information about feature importance among labs and demographics-comorbidities data. In this study, we use the MIMIC-III database, adopting various strategies to handle data and class imbalance, such as stratification, balancing techniques, and backwards and forward fill for missing value imputation. The evaluation metrics for CKD and non-CKD class labels include precision, recall, F1, and accuracy. Our findings show that aggregating time series data produce contrasting results for labs compared to vitals data. We also addressed the significance of the different demographic, comorbidities and lab events features. The findings indicate that a multimodal approach did not show significant advantages over individual models when the individual models performed suboptimal. The study also found that Ethnicity is more significant than age and gender in predicting CKD. Furthermore, the study revealed some significant features from lab events and comorbidities. The study also provides some recommendations for future work to explore the potential of a multimodal approach further. Chronic kidney disease Multimodal approach ROCKET Random Forest XGBoost MIMIC-III database Data imbalance Temporal and static modalities Soft voting Computer Sciences Datavetenskap (datalogi)
67	Multivariate Time series Forecasting with applied Machine Learning on Electrical signals from High-Voltage Direct Current Equipment - Valve Cooling System Nilsson, Carolina January 2022 (has links) In a sustainable society, utilizing intermittent renewable power plants is an important building block for achieving green power production. However, the power production from these sources, e.g., wind farms and solar farms, are often located far away from the place of power consumption, and the electricity generation is affected by the weather conditions in the area. Therefore, there is a challenge in balancing power production and consumption with these sources. The HVDC (High-Voltage Direct Current) technology can be used to efficiently transport electricity over long distances and is a key concept in the utilization of renewable energy sources. However, the HVDC systems are sensitive to environmental effects such as elevated or dropping ambient temperatures, which can cause a forced stop in the system, e.g., when the remaining cooling capacity is low. Therefore, the HVDC systems are built to have a high redundancy to maintain a secure power transmission during seasonal changes. This thesis aimed to create a forecasting model with applied machine learning that could trend the remaining cooling capacity in an HVDC system, to stay aware of how much remaining cooling capacity there is at different seasons. This can be used to optimize the power transmission during seasons when there is a surplus of cooling capacity. The machine learning pipelines were constructed in Python utilizing Hitachi Energy’s PGML (Power Grid Machine Learning) platform. Two different forecasting models were used: LSTM (Long Short-Term Memory) and XGBoost (eXtreme Gradient Boosting). The models were trained to make a five hour ahead multistep prediction and were validated with several evaluation metrics. The best performing model was the XGBoost model, therefore it was chosen as the final model and was tested on a hold-out data set to estimate the general performance. The final model performed well on the hold-out data set, based on the scores from evaluation metrics. Residual diagnostics were used to improve the models during training and to evaluate the final model. At the end of the discussion in Chapter 5 future improvements were suggested. Machine Learning LSTM XGBoost Forecast HVDC Valve Cooling System Annan elektroteknik och elektronik Computer Sciences Datavetenskap (datalogi)
68	Forecasting checking account balance : Using supervised machine learning Dannelind, Martin January 2022 (has links) The introduction of open banking has made it possible for companies to build the next generation of applications based on transactional data. Enabling economic forecasts which private individuals can use to make responsible financial decisions. This project investigated forecasting account balances using supervised learning. 7 different regression models were run on transactional data from 377 anonymised checking accounts split into subgroups. The results concluded that multivariate XGBoost optimised with feature selection was the best performing forecasting model and the subgroup with recurring income transactions was easiest to forecast. Based on the result from this project it can be concluded that a viable option to forecast account balances is to split the transactional data into subgroups and forecast them separately. Minimising the errors given by certain random, infrequent and large types of transactions. Time series forecasting account balance forecasting economic predicition Python GRU LSTM RNN XGBoost prophet checking account
69	Employee Turnover Prediction - A Comparative Study of Supervised Machine Learning Models Kovvuri, Suvoj Reddy, Dommeti, Lydia Sri Divya January 2022 (has links) Background: In every organization, employees are an essential resource. For several reasons, employees are neglected by the organizations, which leads to employee turnover. Employee turnover causes considerable losses to the organization. Using machine learning algorithms and with the data in hand, a prediction of an employee’s future in an organization is made. Objectives: The aim of this thesis is to conduct a comparison study utilizing supervised machine learning algorithms such as Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost to predict an employee’s future in a company. Using evaluation metrics models are assessed in order to discover the best efficient model for the data in hand. Methods: The quantitative research approach is used in this thesis, and data is analyzed using statistical analysis. The labeled data set comes from Kaggle and includes information on employees at a company. The data set is used to train algorithms. The created models will be evaluated on the test set using evaluation measures including Accuracy, Precision, Recall, F1 Score, and ROC curve to determine which model performs the best at predicting employee turnover. Results: Among the studied features in the data set, there is no feature that has a significant impact on turnover. Upon analyzing the results, the XGBoost classifier has better mean accuracy with 85.3%, followed by the Random Forest classifier with 83% accuracy than the other two algorithms. XGBoost classifier has better precision with 0.88, followed by Random Forest Classifier with 0.82. Both the Random Forest classifier and XGBoost classifier showed a 0.69 Recall score. XGBoost classifier had the highest F1 Score with 0.77, followed by the Random Forest classifier with 0.75. In the ROC curve, the XGBoost classifier had a higher area under the curve(AUC) with 0.88. Conclusions: Among the studied four machine learning algorithms, Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost, the XGBoost classifier is the most optimal with a good performance score respective to the tested performance metrics. No feature is found majorly affect employee turnover. Machine Learning Employee Turnover Prediction Supervised Learn- ing Models Logistic Regression Naive Bayes Classifier Random Forest Classifier XGBoost Computer Sciences Datavetenskap (datalogi)
70	Restaurant Daily Revenue Prediction : Utilizing Synthetic Time Series Data for Improved Model Performance Jarlöv, Stella, Svensson Dahl, Anton January 2023 (has links) This study aims to enhance the accuracy of a demand forecasting model, XGBoost, by incorporating synthetic multivariate restaurant time series data during the training process. The research addresses the limited availability of training data by generating synthetic data using TimeGAN, a generative adversarial deep neural network tailored for time series data. A one-year daily time series dataset, comprising numerical and categorical features based on a real restaurant's sales history, supplemented by relevant external data, serves as the original data. TimeGAN learns from this dataset to create synthetic data that closely resembles the original data in terms of temporal and distributional dynamics. Statistical and visual analyses demonstrate a strong similarity between the synthetic and original data. To evaluate the usefulness of the synthetic data, an experiment is conducted where varying lengths of synthetic data are iteratively combined with the one-year real dataset. Each iteration involves retraining the XGBoost model and assessing its accuracy for a one-week forecast using the Root Mean Square Error (RMSE). The results indicate that incorporating 6 years of synthetic data improves the model's performance by 65%. The hyperparameter configurations suggest that deeper tree structures benefit the XGBoost model when synthetic data is added. Furthermore, the model exhibits improved feature selection with an increased amount of training data. This study demonstrates that incorporating synthetic data closely resembling the original data can effectively enhance the accuracy of predictive models, particularly when training data is limited. demand forecasting data augmentation time series data machine learning restaurant industry generative adversarial networks TimeGAN XGBoost Computer and Information Sciences Data- och informationsvetenskap

Search results