Global ETD Search

51	Utvärdering av maskininlärningsmodeller vid konkursprediktion / Review of bankruptcy prediction using machine learning methods Jansson, Mikaela, Ölander Gür, Katarina January 2021 (has links) Att identifiera finansiella svårigheter vid bedömning av ett företags ekonomiska situation är väsentligt för att kreditgivare ska undvika kreditförluster. En viktig del av kreditbedömningen är att analysera sannolikheten för att ett företag kommer gå i konkurs eller inte. Att identifiera en förhöjd konkursrisk ar därmed en faktor som kan hjälpa kreditgivare att fatta mer varsamma investeringsbeslut. Arbetet ämnar därför att undersöka hur väl fyra olika maskininlärningsalgoritmer kan predicera okad risk för konkurs utifrån finansiell bolagsdata. Modellerna som används är logistisk regression, Support Vector Machine, Decision Trees och Random Forest. Då datan var obalanserad där antalet icke-konkurser var överrepresenterad fick modellerna tränas och testas på flera olika fördelade dataset och de slutgiltiga resultaten bygger på ett dataset som är balanserat. Modellerna utvärderades med hjälp av en förväxlingsmatris och evalueringsmatten korrekthet, precision, täckning och F-score. Ju mer balanserad datan blev desto bättre blev resultaten men trots detta skiljde sig resultaten mellan modellerna. Studien visade att logistisk regression presterade sämst av samtliga modeller med ett F-score på 60%. Random Forest var den modell som hade bast prediktiv förmåga med ett F-score på 77%. Vid studerande av särdragen visade det sig bland annat att förändring i antalet anställda, soliditet och eget kapital har en förklaringsgrad till konkurs och är något som bör tas i beaktande vid kreditbedömning. Andra faktorer, såsom vilken industri ett företag tillhör, bör även det ha en betydelse vid kreditbedömning då olika branscher tenderar att ha fler konkurser än andra. / When evaluating a company’s financial situation it is essential to identify financial distress in order for creditors to avoid credit losses. An important part of credit assessment is to analyze the probability that a company will go bankrupt or not. Analyzing an increased risk of bankruptcy is thus a factor that can help lenders make more prudent investment decisions. Accordingly, this study aims to investigate how well four different machine learning algorithms can predict increased risk of bankruptcy based on financial company data. The models used are Logistic Regression, Support Vector Machine, Decision Trees and Random Forest. As the data was imbalanced where the number of non-bankruptcies was overrepresented, the models were trained on several different distributed datasets and the final results are based on a dataset that is balanced. The models were evaluated using a confusion matrix and the evaluation metrics accuracy, precision, recall and F-score. The more balanced the data was, the better the results were but despite this the results differed between the models. The study showed that logistic regression performed the worst of add models with an F-score of 60%. Random Forest was the model with the best predictive ability with an F-score of 77%. When investigating the features, change in number of employees, equity ratio and equity turned out to have a degree of explanation for bankruptcy and should be taken into account when assessing credit. Other factors, such as which industry a company belongs to, should also be a factor taken into account as some industries tend to have more bankruptcies than others. Other Computer and Information Science Annan data- och informationsvetenskap
52	Predicting Influencer Actual Reach Using Linear Regression Khogasteh, Sam, Wiorek, Edvin January 2021 (has links) The influencer marketing industry has seen a tremendous growth in recent years, yet the effectiveness of this marketing form is still largely unexplored. This report aims to explore how various performance measures are linked to the reach of social media pages, utilizing the linear regression model. Three different data sets were collected manually, or using web scraping. By splitting these data sets to training- and test data we examined the degree to which the linear regression model can predict the actual reach, the page views and the weekly growth of an influencer. We concluded that there is a statistically significant correlation between multiple performance metrics of a social media page and the actual reach or the page views of that account. This study is however limited by its narrow data set and time frame, warranting future research in order to further establish the degree of this correlation. The results of this study can benefit companies in their process of selecting influencers to collaborate with, as well as determining the expected return on investment for that particular collaboration. This can in turn lead to a more efficient, authentic and transparent marketplace, and to consumers being less exposed to advertisement from misleading and malicious influencers. / Under de senaste åren har marknadsföringsindustrin med influencers växt drastiskt, ändå är effektiviteten hos denna marknadsföringsform relativt outforskad. Denna rapport avser använda linjär regression för att utforska hur olika prestationsmått är kopplade till räckvidden hos profiler på sociala medier. De olika datamängderna samlades manuellt, eller med hjälp av web scraping. Genom att dela upp datamängderna i träningsdata och testdata undersökte vi i hur hög grad den linjära regressionsmodellen kan förutsäga faktisk räckvidd, sidvisningar och profilens tillväxt under en vecka. Vi drog slutsatsen att det finns en statistisk signifikant korrelation mellan flera prestationsmått för en profilsida, och antalet sidvisningar for det kontot. Studien är emellertid begränsad av sin datamängd och tidsspann, något som motiverar framtida studier for att ytterligare etablera korrelationsgraden. Studiens resultat kan gynna företag i deras process att välja vilka influencers de vill samarbeta med, såväl som i deras process att bestämma den förväntade avkastningen för ett specifikt samarbete. Detta kan i sin tur bidra till en mer effektiv, autentisk och transparent marknad, något som också gör att konsumenten ¨ blir mindre exponerad for marknadsföring från vilseledande och illvilliga influencers. Other Computer and Information Science Annan data- och informationsvetenskap
53	Intäktsestimering med hjälp av Maskininlärning / Company Revenue Estimation using Machine Learning Holmäng, Arvid, von Grothusen, Axel January 2021 (has links) Detta arbete undersöker möjligheten att estimera intäkter för företag med hjälp av maskininlärning. Datan som modellerna utgår ifrån består av punkter från bolagens balansräkningar och annan offentlig data. Eftersom frågeställningen som arbetet utreder ar outforskad sedan tidigare ligger arbetets huvudsakliga fokus på att utforska vilka metoder som är mest lämpliga för uppgiften samt vilka särdrag i datasetet som har störst inverkan på modellerna. I arbetet utreds frågan med hjälp av fyra olika modeller; Random Forest regression, XGBoost, Minstakvadratmetoden och Lasso. Modellerna utvärderades med kvantitativa mättal såsom R2-varde och absoluta genomsnittliga procentuella felet (MAPE). Den algoritm och slutgiltiga modell som presterade bast utifrån dessa mått var Random Forest regression med genomsnittligt R2-score på 0,8197 och MAPE-score på 0.3864. Denna studie drar slutsatsen att ensemble metoder som XGBoost och Random Forest troligtvis ar mer lämpliga att använda för denna typ av studier i jämförelse med simplare regressionsmodeller såsom Minstakvadratmetoden och Lasso. Avslutningsvis dras slutsatsen att modellerna kan bidra till beslutsunderlaget vid utvärdering av bolag för vilka intäkterna är ok ända. / This work examines the possibility of estimating revenue for companies using machine learning. The data on which the models are based consists of points from the companies’ balance sheets and other public data. Since the research area is unexplored prior to this study, the main focus of this thesis is to explore which methods are most suitable for the task and which features in the dataset have the greatest impact on the models. In the study, the issue is investigated with the help of four different models; Random Forest regression, XGBoost, ordinary least squares method and Lasso. The models were evaluated with quantitative measures such as R2 score and mean absolute percentage error (MAPE). The algorithm and final model that performed best based on these measures were Random Forest regression with an average R2 score of 0,8197 and MAPE score of 0.3864. This study concludes that ensemble methods such as XGBoost and Random Forest are probably more suitable to use for this type of study compared to simpler regression models such as least squares method and Lasso. In conclusion, the models can contribute to the initial financial analysis of companies for which the income is unknown. Other Computer and Information Science Annan data- och informationsvetenskap
54	The buzz behind the stock market : Analysis and characterization of the social media activity around the time of big stock valuation changes / : Analys av mönster inom diskussionen på sociala medier vid stora fluktuationer på börsmarknaden Envall, David, Blåberg Kristoffersson, Paul January 2022 (has links) As the discussion of stocks on social media is increasing its effect on the financial market is distinct. This has led to new opportunities in influencing private investors to make uninformed decisions affecting the value of stocks. This thesis aims to enable readers to distinguish patterns in social media discussion regarding stocks and thus provide an understanding of the effect it has on public opinion. By identifying significant events of big stock valuation changes and collecting corresponding stock-related data from the social media platforms Reddit and Twitter, analysis in the fields of frequency of posts and Sentiment analysis was performed. The results display trends of an increase in discussion on social media leading up to the occurrence of significant events and an overall increment of interest online for specific stocks after significant events have occurred. Furthermore, the overall sentiment in the discussion for both increasing and decreasing events is positive in almost every case, with the exception that the sentiment score of increasing events is higher than its counterpart. The day-to-day sentiment score during events indicates a much higher fluctuation in sentiment for Reddit compared to Twitter. However, a significant increase in score the day before an event occurs is prevalent for both. These findings imply the possibility to predict stock valuation changes using data gathered from social media platforms. Other Computer and Information Science Annan data- och informationsvetenskap
55	Improving the Utilization of Digital Services - Evaluating Contest-Driven Open Data Development and the Adoption of Cloud Services : Evaluating Contest-Driven Open Data Development and the Adoption of Cloud Services Ayele, Workneh Yilma January 2018 (has links) There is a growing interest in the utilization of digital services, such as software apps and cloud-based software services. The utilization of digital services enabled by ICT is increasing more rapidly than any other segment of the world trade. The availability of open data unlocks the possibility of generating huge market possibilities in the public and private sectors such as manufacturing, transportation, and trade. Digital service utilization can be improved by the adoption of cloud-based software services and through open data innovation for service development. However, open data has no value unless utilized and little is known about the development of digital services using open data. The use of contests to create awareness and call for crowd participation is vital to attract participation for digital service development. Also, digital innovation contests stimulate open data service development and are common means to generate digital services based on open data. Evaluation of digital service development processes stimulated by contests all the way to service deployment is indispensable. In spite of this, existing evaluation models are not specifically designed to measure open data innovation contest. Additionally, existing cloud-based digital service implications, opportunities and challenges, in literature are not prioritized and hence are not usable directly for adoption of cloud-based digital services. Furthermore, empirical research on user implications of cloud-based digital services is missing. Therefore, the purpose of this thesis is to facilitate the utilization of digital services by the adoption of cloud-based digital services and the development of digital services using open data. The main research question addressed in this thesis is: “How can contest-driven innovation of open data digital services be evaluated and the adoption of digital services be supported to improve the utilization of digital services?” The research approaches used are design science research, descriptive statistics, and case study for confirming the validity of the artifacts developed. The design science approach was used to design new artifacts for evaluating open data service development stimulated by contests. The descriptive statistics was applied on two surveys. The first one is for evaluating the implication of cloud-based digital service adoption. While the second one is a longitudinal survey to measure perceived barriers by external open data digital service developers. In this thesis, an evaluation model for digital innovation contest to stimulate service development, (Digital Innovation Contest Measurement Model) DICM-model, and (Designing and Refining DICM) DRD-method for designing and refining DICM-model to provide more agility are proposed. Additionally, the framework of barriers, constraining external developers of open data service, is also presented to better manage service deployment to enable viable service development. Organizers of open data innovation contests and project managers of digital service development are the beneficiaries of these arti-facts. The DICM-model and the DRD-method are used for the evaluation of contest and post contest deployment processes. Finally, the framework of adoption of cloud-based digital services is presented. This framework enables requirement engineers and cloud-based digital service adoption personnel to be able to prioritize factors responsible for an effective adoption. The automation of ideation, which is a key process of digital service development using open data, developer platforms assessment to suggest ways of including evaluation of innovation, ex-post evaluation of the proposed artifacts, and the expansion of cloud-based digital service adoption from the perspectives of sup-pliers are left for further investigations. / <p>DSV Report Series Series No. 18-008</p> Other Computer and Information Science Annan data- och informationsvetenskap
56	Implementation of ISO27001 standard in startups Fúska, Róbert January 2022 (has links) No description available. Other Computer and Information Science Annan data- och informationsvetenskap
57	ChatGPT’s Perception on Reddit : A Data-driven Topic Modeling Study / Reddits uppfattning av ChatGPT Nordell, Erik, Mogren, Max January 2023 (has links) This thesis examines the discussions on Reddit surrounding the launch of ChatGPTfrom late November 2022 until the end of March 2023. The objective of the study is to analyze the discussions concerning ChatGPT and how different topics have changed over time.Additionally, the thesis identifies significant events that have had an impact on the topicsand also how topics vary across different subreddits. To retrieve the data for the analysis, the PushShift API was used to gather almost half a million posts concerning ChatGPT.Topic modeling was then applied using BERTopic to identify common topics discussed onReddit and its unique subreddits. The results show several distinct topics, encompassing the technology behind ChatGPT, its societal implications, and its potential for creativeutilization. Furthermore, the thesis presents a clear correlation between significant newsconcerning ChatGPT and the frequency of posts on Reddit. Specifically, Microsoft’s investment in OpenAI and the incorporation of the GPT engine in Bing proved to have a greatinfluence on both the topics and frequency of posts. We also found some discrepancies between how subreddits discuss topics, most notably that more general topics tend to spreadout more, both over various subreddits as well as over time and being more sporadic, whilespecific topics tend to be more dictated by the occurence of significant events relevant tothe topic. Other Computer and Information Science Annan data- och informationsvetenskap
58	Efficient use of resources when implementing machine learning in an embedded environment Eklöf, Johannes January 2023 (has links) Machine learning and in particular deep-learning models have been in the spotlight for the last year. Particularly the release of ChatGPT caught the attention of the public. But many of the most popular models are large with millions or billions of parameters. Parallel with this, the number of smart products constituting the Internet of Things is rapidly increasing. The need for small resource-efficient machine-learning models can therefore be expected to increase in the coming years. This work investigates the implementation of two different models in embedded environments. The investigated models are, random forests, that are straight-forward and relatively easy to implement, and transformer models, that are more complex and challenging to implement. The process of training the models in a high-level language and implementing and running inference in a low-level language has been studied. It is shown that it is possible to train a transformer in Python and export it by hand to C, but that it comes with several challenges that should be taken into consideration before this approach is chosen. It is also shown that a transformer model can be successfully used for signal extraction, a new area of application. Different possible ways of optimizing the model, such as pruning and quantization, have been studied. Finally, it has been shown that a transformer model with an initial noise-filter performs better than the existing hand-written code on self-generated messages, but worse on real-world data. This indicates that the training data should be improved. Other Computer and Information Science Annan data- och informationsvetenskap
59	Exploring Food Waste in Private Households in Skåne Gabrielsson, Jonas, Zaki, Maria January 2022 (has links) In 2020, 200 million children under the age of 5 were reported to be malnourished and between 720 and 811 million people around the world faced hunger. Yet, the global food production have the potential to feed every human being twice the amount required. So what is happening with all that food? 1.3 billion tonnes of the global food supply is wasted every year, which accounts for one third of the food produced. In Sweden, private households stand for 70% of the total waste. Food waste has been a problem for some time now. So, the goal with this study is to investigate reasons that contribute to this high food waste and suggest a solution or guidelines to prevent/reduce that in private households in the Skåne county. To explore the topic, academic literature were reviewed and Nine semi-structured interviews were conducted with the target group for this study, i.e., families living in Skåne county with children living at home and both parents working. Additionally, 103 responses were gathered through an online questionnaire from the same target group. The findings revealed that families struggled with planning properly before they entered a grocery store, which meant that they ended up buying much more than they needed. Moreover, it was revealed that people had the tendency to get sidetracked during shopping. These practices, in most instances, resulted in double and over buying, and impulsive shopping, which meant that more food was going to waste without ever being consumed in their respective households. With these findings in mind, we have hypothesized that online shopping has the potential to prevent food waste in private households, as well as created a design on how to get more people feeling comfortable doing grocery shopping online based on a human centred design approach. To conclude this thesis, we define the contributing factors of household food waste and argue that food waste can be reduced by a significant amount if people are shopping online and are adhering to some sort of food budget to control their spendings. Other Computer and Information Science Annan data- och informationsvetenskap
60	The Privacy And Security Challenges Of Crowdsourcing Activities A Systematic Literature Review Shahzad, Muhammad Faisal January 2024 (has links) No description available. Other Computer and Information Science Annan data- och informationsvetenskap

Search results