Global ETD Search

291	Vers une plateforme informatique pour l'expérimentation d'outils de classification Bokhabrine, Ayoub January 2019 (has links) (PDF) No description available. Association Classe Classification Descripteur Expérimentation Extraction Hiérarchique Informatique Itemsets K-means K-médoïdes Lemmatisation M-confiance M-support Mot Nettoyage Nombre Outil Partionnement Plateforme Règle Segmentation Texte Vocabulaire
292	Investigating the Correlation Between Marketing Emails and Receivers Using Unsupervised Machine Learning on Limited Data : A comprehensive study using state of the art methods for text clustering and natural language processing / Undersökning av samband mellan marknadsföringsemail och dess mottagare med hjälp av oövervakad maskininlärning på begränsad data Pettersson, Christoffer January 2016 (has links) The goal of this project is to investigate any correlation between marketing emails and their receivers using machine learning and only a limited amount of initial data. The data consists of roughly 1200 emails and 98.000 receivers of these. Initially, the emails are grouped together based on their content using text clustering. They contain no information regarding prior labeling or categorization which creates a need for an unsupervised learning approach using solely the raw text based content as data. The project investigates state-of-the-art concepts like bag-of-words for calculating term importance and the gap statistic for determining an optimal number of clusters. The data is vectorized using term frequency - inverse document frequency to determine the importance of terms relative to the document and to all documents combined. An inherit problem of this approach is high dimensionality which is reduced using latent semantic analysis in conjunction with singular value decomposition. Once the resulting clusters have been obtained, the most frequently occurring terms for each cluster are analyzed and compared. Due to the absence of initial labeling an alternative approach is required to evaluate the clusters validity. To do this, the receivers of all emails in each cluster who actively opened an email is collected and investigated. Each receiver have different attributes regarding their purpose of using the service and some personal information. Once gathered and analyzed, conclusions could be drawn that it is possible to find distinguishable connections between the resulting email clusters and their receivers but to a limited extent. The receivers from the same cluster did show similar attributes as each other which were distinguishable from the receivers of other clusters. Hence, the resulting email clusters and their receivers are specific enough to distinguish themselves from each other but too general to handle more detailed information. With more data, this could become a useful tool for determining which users of a service should receive a particular email to increase the conversion rate and thereby reach out to more relevant people based on previous trends. / Målet med detta projekt att undersöka eventuella samband mellan marknadsföringsemail och dess mottagare med hjälp av oövervakad maskininlärning på en brgränsad mängd data. Datan består av ca 1200 email meddelanden med 98.000 mottagare. Initialt så gruperas alla meddelanden baserat på innehåll via text klustering. Meddelandena innehåller ingen information angående tidigare gruppering eller kategorisering vilket skapar ett behov för ett oövervakat tillvägagångssätt för inlärning där enbart det råa textbaserade meddelandet används som indata. Projektet undersöker moderna tekniker så som bag-of-words för att avgöra termers relevans och the gap statistic för att finna ett optimalt antal kluster. Datan vektoriseras med hjälp av term frequency - inverse document frequency för att avgöra relevansen av termer relativt dokumentet samt alla dokument kombinerat. Ett fundamentalt problem som uppstår via detta tillvägagångssätt är hög dimensionalitet, vilket reduceras med latent semantic analysis tillsammans med singular value decomposition. Då alla kluster har erhållits så analyseras de mest förekommande termerna i vardera kluster och jämförs. Eftersom en initial kategorisering av meddelandena saknas så krävs ett alternativt tillvägagångssätt för evaluering av klustrens validitet. För att göra detta så hämtas och analyseras alla mottagare för vardera kluster som öppnat något av dess meddelanden. Mottagarna har olika attribut angående deras syfte med att använda produkten samt personlig information. När de har hämtats och undersökts kan slutsatser dras kring hurvida samband kan hittas. Det finns ett klart samband mellan vardera kluster och dess mottagare, men till viss utsträckning. Mottagarna från samma kluster visade likartade attribut som var urskiljbara gentemot mottagare från andra kluster. Därav kan det sägas att de resulterande klustren samt dess mottagare är specifika nog att urskilja sig från varandra men för generella för att kunna handera mer detaljerad information. Med mer data kan detta bli ett användbart verktyg för att bestämma mottagare av specifika emailutskick för att på sikt kunna öka öppningsfrekvensen och därmed nå ut till mer relevanta mottagare baserat på tidigare resultat. Machine learning Unsupervised Natural language processing nlp clustering centroid based k-means text clustering limited data email clustering lsa svd tf-idf dimensionality reduction the gap statistic Lloyd's algorithm vectorization feature extraction Computer Sciences Datavetenskap (datalogi)
293	Automated image classification via unsupervised feature learning by K-means Karimy Dehkordy, Hossein 09 July 2015 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Research on image classification has grown rapidly in the field of machine learning. Many methods have already been implemented for image classification. Among all these methods, best results have been reported by neural network-based techniques. One of the most important steps in automated image classification is feature extraction. Feature extraction includes two parts: feature construction and feature selection. Many methods for feature extraction exist, but the best ones are related to deep-learning approaches such as network-in-network or deep convolutional network algorithms. Deep learning tries to focus on the level of abstraction and find higher levels of abstraction from the previous level by having multiple layers of hidden layers. The two main problems with using deep-learning approaches are the speed and the number of parameters that should be configured. Small changes or poor selection of parameters can alter the results completely or even make them worse. Tuning these parameters is usually impossible for normal users who do not have super computers because one should run the algorithm and try to tune the parameters according to the results obtained. Thus, this process can be very time consuming. This thesis attempts to address the speed and configuration issues found with traditional deep-network approaches. Some of the traditional methods of unsupervised learning are used to build an automated image-classification approach that takes less time both to configure and to run. Image classification K-means Unsupervised feature learning Machine learning Data structures (Computer science) Data encryption (Computer science) Image processing Artificial intelligence Optical pattern recognition Image analysis Pattern recognition systems Computer algorithms
294	An Approach To Cluster And Benchmark Regional Emergency Medical Service Agencies Kondapalli, Swetha 06 August 2020 (has links) No description available. Industrial Engineering Statistics Computer Science Emergency Medical Services Unsupervised Learning Random Forest Feature selection Clustering Benchmarking CLARANS K-means K-medoids Machine Learning Python Precision Recall Silhouette Elbow method
295	A PROBABILISTIC MACHINE LEARNING FRAMEWORK FOR CLOUD RESOURCE SELECTION ON THE CLOUD Khan, Syeduzzaman 01 January 2020 (has links) (PDF) The execution of the scientific applications on the Cloud comes with great flexibility, scalability, cost-effectiveness, and substantial computing power. Market-leading Cloud service providers such as Amazon Web service (AWS), Azure, Google Cloud Platform (GCP) offer various general purposes, memory-intensive, and compute-intensive Cloud instances for the execution of scientific applications. The scientific community, especially small research institutions and undergraduate universities, face many hurdles while conducting high-performance computing research in the absence of large dedicated clusters. The Cloud provides a lucrative alternative to dedicated clusters, however a wide range of Cloud computing choices makes the instance selection for the end-users. This thesis aims to simplify Cloud instance selection for end-users by proposing a probabilistic machine learning framework to allow to users select a suitable Cloud instance for their scientific applications. This research builds on the previously proposed A2Cloud-RF framework that recommends high-performing Cloud instances by profiling the application and the selected Cloud instances. The framework produces a set of objective scores called the A2Cloud scores, which denote the compatibility level between the application and the selected Cloud instances. When used alone, the A2Cloud scores become increasingly unwieldy with an increasing number of tested Cloud instances. Additionally, the framework only examines the raw application performance and does not consider the execution cost to guide resource selection. To improve the usability of the framework and assist with economical instance selection, this research adds two Naïve Bayes (NB) classifiers that consider both the application’s performance and execution cost. These NB classifiers include: 1) NB with a Random Forest Classifier (RFC) and 2) a standalone NB module. Naïve Bayes with a Random Forest Classifier (RFC) augments the A2Cloud-RF framework's final instance ratings with the execution cost metric. In the training phase, the classifier builds the frequency and probability tables. The classifier recommends a Cloud instance based on the highest posterior probability for the selected application. The standalone NB classifier uses the generated A2Cloud score (an intermediate result from the A2Cloud-RF framework) and execution cost metric to construct an NB classifier. The NB classifier forms a frequency table and probability (prior and likelihood) tables. For recommending a Cloud instance for a test application, the classifier calculates the highest posterior probability for all of the Cloud instances. The classifier recommends a Cloud instance with the highest posterior probability. This study performs the execution of eight real-world applications on 20 Cloud instances from AWS, Azure, GCP, and Linode. We train the NB classifiers using 80% of this dataset and employ the remaining 20% for testing. The testing yields more than 90% recommendation accuracy for the chosen applications and Cloud instances. Because of the imbalanced nature of the dataset and multi-class nature of classification, we consider the confusion matrix (true positive, false positive, true negative, and false negative) and F1 score with above 0.9 scores to describe the model performance. The final goal of this research is to make Cloud computing an accessible resource for conducting high-performance scientific executions by enabling users to select an effective Cloud instance from across multiple providers. Cloud computing Cloud resource selection K Means Machine learning Naive Bayes Random forest classifier Computer engineering Computer Engineering Computer Sciences Data Storage Systems Engineering Other Computer Engineering Other Computer Sciences
296	Traveling Salesman Problem with Single Truck and Multiple Drones for Delivery Purposes Rahmani, Hoda 23 September 2019 (has links) No description available. Engineering Industrial Engineering Drone studies drone technology drone delivery hybrid truck-drone delivery system drone route scheduling traveling salesman problem p-median problem genetic algorithms k-means clustering last-mile delivery
297	Help Document Recommendation System Vijay Kumar, Keerthi, Mary Stanly, Pinky January 2023 (has links) Help documents are important in an organization to use the technology applications licensed from a vendor. Customers and internal employees frequently use and interact with the help documents section to use the applications and know about the new features and developments in them. Help documents consist of various knowledge base materials, question and answer documents and help content. In day- to-day life, customers go through these documents to set up, install or use the product. Recommending similar documents to the customers can increase customer engagement in the product and can also help them proceed without any hurdles. The main aim of this study is to build a recommendation system by exploring different machine-learning techniques to recommend the most relevant and similar help document to the user. To achieve this, in this study a hybrid-based recommendation system for help documents is proposed where the documents are recommended based on similarity of the content using content-based filtering and similarity between the users using collaborative filtering. Finally, the recommendations from content-based filtering and collaborative filtering are combined and ranked to form a comprehensive list of recommendations. The proposed approach is evaluated by the internal employees of the company and by external users. Our experimental results demonstrate that the proposed approach is feasible and provides an effective way to recommend help documents. Document similarity Recommender systems content-based filtering collaborative filtering Non-Negative Matrix Factorisation (NMF) cosine similarity K-means clustering Computer Sciences Datavetenskap (datalogi)
298	Color Naming, Multidimensional Scaling, and Unique Hue Selections in English and Somali Speakers Do Not Show a Whorfian Effect Lange, Ryan January 2015 (has links) No description available. Psychology Behavioral Psychology Cognitive Psychology Experimental Psychology Linguistics color perception color cognition Whorfian linguistic relativity color term evolution Basic Color Terms unique hues multidimensional scaling color naming color categories consensus analysis gap statistic k-means procrustes perceptual modeling
299	The Myth of Incentive-Based Sales Strategies: an Empirical Analysis Contradicting Prevailing Theories using Data Mining and Hypothesis Testing Techniques Liang, Yidan (Nickia) January 2023 (has links) In recent decades, the use of incentive-based reward programs to foster customer loyalty and promote sales has become prevalent in various industries. While these strategies are widely accepted and implemented, there is a significant gap in empirical studies to ascertain their real-world effectiveness. This thesis embarks on a comprehensive examination into the effectiveness of an online business's reward program, utilizing data from the past five years and employing data mining techniques, including RFM (Recency, Frequency, Monetary) model and clustering algorithms; hypothesis tests are employed to further strengthen the drawn conclusions. Contrary to popular theories, the findings reveal that small incentives such as rewards did not induce significant changes in customer purchasing behavior, nor did they effectively boost sales among rewarded customers. A control group of non-rewarded top-class customers showed more robust purchasing patterns. These unexpected results challenge existing beliefs and call for a critical re-evaluation of current practices in sales promotion and customer loyalty. The research underscores the need for empirically grounded strategies, further exploration into alternative loyalty-building methods, and a recognition of the complex realities influencing customer engagement. Reward Program Reward Evaluation Data Mining RFM Customer Segmentation Clustering K-Means Hypothesis Testing One-Sided T-Test Marketing Effectiveness Incentive-Based Sales Strategy Sales Promotion Business Administration Företagsekonomi
300	Improving Knowledge of Truck Fuel Consumption Using Data Analysis Johnsen, Sofia, Felldin, Sarah January 2016 (has links) The large potential of big data and how it has brought value into various industries have been established in research. Since big data has such large potential if handled and analyzed in the right way, revealing information to support decision making in an organization, this thesis is conducted as a case study at an automotive manufacturer with access to large amounts of customer usage data of their vehicles. The reason for performing an analysis of this kind of data is based on the cornerstones of Total Quality Management with the end objective of increasing customer satisfaction of the concerned products or services. The case study includes a data analysis exploring how and if patterns about what affects fuel consumption can be revealed from aggregated customer usage data of trucks linked to truck applications. Based on the case study, conclusions are drawn about how a company can use this type of analysis as well as how to handle the data in order to turn it into business value. The data analysis reveals properties describing truck usage using Factor Analysis and Principal Component Analysis. Especially one property is concluded to be important as it appears in the result of both techniques. Based on these properties the trucks are clustered using k-means and Hierarchical Clustering which shows groups of trucks where the importance of the properties varies. Due to the homogeneity and complexity of the chosen data, the clusters of trucks cannot be linked to truck applications. This would require data that is more easily interpretable. Finally, the importance for fuel consumption in the clusters is explored using model estimation. A comparison of Principal Component Regression (PCR) and the two regularization techniques Lasso and Elastic Net is made. PCR results in poor models difficult to evaluate. The two regularization techniques however outperform PCR, both giving a higher and very similar explained variance. The three techniques do not show obvious similarities in the models and no conclusions can therefore be drawn concerning what is important for fuel consumption. During the data analysis many problems with the data are discovered, which are linked to managerial and technical issues of big data. This leads to for example that some of the parameters interesting for the analysis cannot be used and this is likely to have an impact on the inability to get unanimous results in the model estimations. It is also concluded that the data was not originally intended for this type of analysis of large populations, but rather for testing and engineering purposes. Nevertheless, this type of data still contains valuable information and can be used if managed in the right way. From the case study it can be concluded that in order to use the data for more advanced analysis a big-data plan is needed at a strategic level in the organization. The plan summarizes the suggested solution for the managerial issues of the big data for the organization. This plan describes how to handle the data, how the analytic models revealing the information should be designed and the tools and organizational capabilities needed to support the people using the information. big data Total Quality Management trucks automotive fuel consumption logged vehicle data customer usage data aggregated data big-data plan big data management data analysis data mining Principal Component Analysis Factor Analysis k-means Clustering Hierarchical Clustering Principal Component Regression Regularization Lasso Elastic Net

Search results