41 |
Email Classification : An evaluation of Deep Neural Networks with Naive BayesMichailoff, John January 2019 (has links)
Machine learning (ML) is an area of computer science that gives computers the ability to learn data patterns without prior programming for those patterns. Using neural networks in this area is based on simulating the biological functions of neurons in brains to learn patterns in data, giving computers a predictive ability to comprehend how data can be clustered. This research investigates the possibilities of using neural networks for classifying email, i.e. working as an email case manager. A Deep Neural Network (DNN) are multiple layers of neurons connected to each other by trainable weights. The main objective of this thesis was to evaluate how the three input arguments - data size, training time and neural network structure – affects the accuracy of Deep Neural Networks pattern recognition; also an evaluation of how the DNN performs compared to the statistical ML method, Naïve Bayes, in the form of prediction accuracy and complexity; and finally the viability of the resulting DNN as a case manager. Results show an improvement of accuracy on our networks with the increase of training time and data size respectively. By testing increasingly complex network structures (larger networks of neurons with more layers) it is observed that overfitting becomes a problem with increased training time, i.e. how accuracy decrease after a certain threshold of training time. Naïve Bayes classifiers performs worse than DNN in terms of accuracy, but better in reduced complexity; making NB viable on mobile platforms. We conclude that our developed prototype may work well in tangent with existing case management systems, tested by future research.
|
42 |
Predicting SNI Codes from Company Descriptions : A Machine Learning SolutionLindholm, Erik, Nilsson, Jonas January 2023 (has links)
This study aims to develop an automated solution for assigning area of industry codes to businesses based on the contents of their business descriptions. The Swedish standard industrial classification (SNI) is a system used by Statistics Sweden (SCB) for categorizing businesses for their statistics reports. Assignment of SNI codes has so far been done manually by the person registering a new company, but this is a far from optimal solution. Some of the 88 main group areas of industry are hard to tell apart from one another, and this often leads to incorrect assignments. Our approach to this problem was to train a machine learning model using the Naive Bayes and SVM classifier algorithms and conduct an experiment. In 2019, Dahlqvist and Strandlund had attempted this and reached an accuracy score of 52 percent by use of the gradient boosting classifier, but this was considered too low for real-world implementation. Our main goal was to achieve a higher accuracy than that of Dahlqvist and Strandlund, which we eventually succeeded in - our best-performing SVM model reached a score of 60.11 percent. Similarly to Dahlqvist and Strandlund, we concluded that the low quality of the dataset was the main obstacle for achieving higher scores. The dataset we used was severely imbalanced, and much time was spent on investigating and applying oversampling and undersampling as strategies for mitigating this problem. However, we found during the testing phase that none of these strategies had any positive effect on the accuracy scores.
|
43 |
Variant Detection Using Next Generation Sequencing DataPyon, Yoon Soo 08 March 2013 (has links)
No description available.
|
44 |
Relationships Among Learning Algorithms and TasksLee, Jun won 27 January 2011 (has links) (PDF)
Metalearning aims to obtain knowledge of the relationship between the mechanism of learning and the concrete contexts in which that mechanisms is applicable. As new mechanisms of learning are continually added to the pool of learning algorithms, the chances of encountering behavior similarity among algorithms are increased. Understanding the relationships among algorithms and the interactions between algorithms and tasks help to narrow down the space of algorithms to search for a given learning task. In addition, this process helps to disclose factors contributing to the similar behavior of different algorithms. We first study general characteristics of learning tasks and their correlation with the performance of algorithms, isolating two metafeatures whose values are fairly distinguishable between easy and hard tasks. We then devise a new metafeature that measures the difficulty of a learning task that is independent of the performance of learning algorithms on it. Building on these preliminary results, we then investigate more formally how we might measure the behavior of algorithms at a ner grained level than a simple dichotomy between easy and hard tasks. We prove that, among all many possible candidates, the Classifi er Output Difference (COD) measure is the only one possessing the properties of a metric necessary for further use in our proposed behavior-based clustering of learning algorithms. Finally, we cluster 21 algorithms based on COD and show the value of the clustering in 1) highlighting interesting behavior similarity among algorithms, which leads us to a thorough comparison of Naive Bayes and Radial Basis Function Network learning, and 2) designing more accurate algorithm selection models, by predicting clusters rather than individual algorithms.
|
45 |
Identifying Interesting Posts on Social Media SitesSeethakkagari, Swathi, M.S. 21 September 2012 (has links)
No description available.
|
46 |
Exploring the Noise Resilience of Combined Sturges AlgorithmAgarwal, Akrita January 2015 (has links)
No description available.
|
47 |
A Massively Parallel Algorithm for Cell Classification Using CUDASchmidt, Samuel January 2015 (has links)
No description available.
|
48 |
Using sentiment analysis to craft a narrative of the COVID-19 pandemic from the perspective of social mediaRay, Taylor Breanna 06 August 2021 (has links)
Throughout the COVID-19 pandemic, people have turned to social media to share their experiences with the coronavirus and their feelings regarding subjects like social distancing, mask-wearing, COVID-19 vaccines, and other related topics. The publicly available nature of these social media posts provides researchers the chance to obtain a consensus on an array of issues, topics, people, and entities. For the COVID-19 pandemic, this is valuable information that can prepare communities and governing bodies for future epidemics or events of a similar magnitude. However, clearly defining such a consensus can be difficult, especially if researchers want to limit the amount of bias they introduce. The process of sentiment analysis helps to address this need by categorizing text sources into one of three distinct polarities. Namely, those polarities are often positive, neutral, and negative. While sentiment analysis can take form as a completely manual task, this becomes incredibly burdensome for projects that involve substantial amounts of data. This thesis attempts to overcome this challenge by programmatically classifying the sentiment of COVID-19 posts from 10 social media and web-based forums using a multinomial Naive Bayes classifier. The unique and contrasting qualities of the social networks being analyzed provide a robust take on the public's perception of the pandemic that has not yet been offered up to the present.
|
49 |
Social media analysis for product safety using text mining and sentiment analysisIsa, H., Trundle, Paul R., Neagu, Daniel January 2014 (has links)
No / The growing incidents of counterfeiting and associated economic and health consequences necessitate the development of active surveillance systems capable of producing timely and reliable information for all stake holders in the anti-counterfeiting fight. User generated content from social media platforms can provide early clues about product allergies, adverse events and product counterfeiting. This paper reports a work in progress with contributions including: the development of a framework for gathering and analyzing the views and experiences of users of drug and cosmetic products using machine learning, text mining and sentiment analysis; the application of the proposed framework on Facebook comments and data from Twitter for brand analysis, and the description of how to develop a product safety lexicon and training data for modeling a machine learning classifier for drug and cosmetic product sentiment prediction. The initial brand and product comparison results signify the usefulness of text mining and sentiment analysis on social media data while the use of machine learning classifier for predicting the sentiment orientation provides a useful tool for users, product manufacturers, regulatory and enforcement agencies to monitor brand or product sentiment trends in order to act in the event of sudden or significant rise in negative sentiment.
|
50 |
Modeling the decision making mind: Does form follow function?Jarecki, Jana Bianca 07 December 2017 (has links)
Die Verhaltenswissenschaften betrachten menschliche Entscheidungsprozesse aus zwei komplementären Perspektiven: Form und Funktion. Formfragen behandeln wie Denkprozesse ablaufen, Funktionsfragen behandeln, welche Ziele das resultierende Verhalten erfüllt. Die vorliegende Dissertation argumentiert für die Integration von Form und Funktion.
Ein Schritt zur Integration von Form und Funktion besteh darin, Prozessmodelle aus der Kognitionspsychologie in die evolutionäre Psychologie und Verhaltensbiologie (welche sich häufig mit Funktionsfragen befassen) einzuführen. Studie 1 untersucht die Eigenschaften kognitiver Prozessmodelle. Ich schlage ein Rahmenmodell für allgemeine kognitive Prozessmodelle vor, mit Hilfe dessen Prozessmodelle entwickelt werden können.
In Studie 2 untersuche ich Klassifikation aus Perspektive der Form und Funktion. Verhalten sich Menschen gemäss einer statistischen Annahme, die sich in der Informatik als robust gegenüber ihrer Verletzung herausstellte? Daten aus zwei Lernexperimenten und Modellierung mittels eines neuen probabilistischen Lernmodells zeigen, dass Menschen zu Beginn des Lernprozesses gemäß dem statistischen Prinzip der klassenkonditionalen Unabhängigkeit kategorisieren.
In Studie 3 geht es um Risikoentscheidungen aus der Perspektive der Form und Funktion. Sind Informationsverarbeitungsprozesse abhängig von der Zielgröße der Entscheidung? Ich messe Prozess- und Verhaltensindikatoren in zehn Risikodomänen welche die evolutionären Ziele wiederspiegeln. Im Ergebnis zeigt sich, dass Risikoeinstellungen domänenspezifisch sind. Insbesondere sind Frauen nicht universell risiko-averser als Männer. Auf der Prozessebene hat die Valenz der entscheidungsrelevanten Argumente geringeren Einfluss auf die Domänenunterschiede als die am häufigsten genannten Aspekte für/gegen das Risikoverhalten. / The behavioral sciences investigate human decision processes from two complementary perspectives: form and function. Formal questions include the processes that lead to decisions, functional aspects include the goals which the resulting behavior meets. This dissertation argues for the integration of form and functional questions.
One step towards a form-function integration is introducing cognitive process models into evolutionary psychology and behavioral biology (which are mostly asking about the goals of behavior). Study 1 investigates the properties of cognitive process models. I suggest the first general framework for building cognitive process models.
In study 2 I investigate human category learning from a functional and form centered perspective. Do humans, when learning a novel categorization task, follow a statistical principle which was been shown to perform the goals of correct classification robustly even in the face of violations of the underlying assumption? Data from two learning experiments and cognitive modeling with a novel probabilistic learning model show that humans start classifying by following the statistical principle of class-conditional independence of features.
Study 3 investigates risk attitudes from the perspective of form and function. Does the information people process relate to the goals of risky behavior? I measure process- and behavioral indicators in ten domains of risks which represent different evolutionary goals. The results show that not only do risk attitudes differ across domains, but also that females are not universally less risk taking than males. Further, on the process level, the valence of the aspects related to perceived risks is less related to peoples’ risk propensities compared to the most frequently mentioned aspects.
|
Page generated in 0.0718 seconds