Global ETD Search

31	Efficient Temporal Synopsis of Social Media Streams Abouelnagah, Younes January 2013 (has links) Search and summarization of streaming social media, such as Twitter, requires the ongoing analysis of large volumes of data with dynamically changing characteristics. Tweets are short and repetitious -- lacking context and structure -- making it difficult to generate a coherent synopsis of events within a given time period. Although some established algorithms for frequent itemset analysis might provide an efficient foundation for synopsis generation, the unmodified application of standard methods produces a complex mass of rules, dominated by common language constructs and many trivial variations on topically related results. Moreover, these results are not necessarily specific to events within the time period of interest. To address these problems, we build upon the Linear time Closed itemset Mining (LCM) algorithm, which is particularly suited to the large and sparse vocabulary of tweets. LCM generates only closed itemsets, providing an immediate reduction in the number of trivial results. To reduce the impact of function words and common language constructs, we apply a filltering step that preserves these terms only when they may form part of a relevant collocation. To further reduce trivial results, we propose a novel strengthening of the closure condition of LCM to retain only those results that exceed a threshold of distinctiveness. Finally, we perform temporal ranking, based on information gain, to identify results that are particularly relevant to the time period of interest. We evaluate our work over a collection of tweets gathered in late 2012, exploring the efficiency and filtering characteristic of each processing step, both individually and collectively. Based on our experience, the resulting synopses from various time periods provide understandable and meaningful pictures of events within those periods, with potential application to tasks such as temporal summarization and query expansion for search. Information Retrieval Text Mining Data Mining Social Media Temporal summarization Frequent itemset mining Microblogs Frequent pattern mining Computer Science
32	Frequent pattern analysis for decision making in big data / Dažnų sekų analizė sprendimų priėmimui labai didelėse duomenų bazėse Pragarauskaitė, Julija 01 July 2013 (has links) Huge amounts of digital information are stored in the World today and the amount is increasing by quintillion bytes every day. Approximate data mining algorithms are very important to efficiently deal with such amounts of data due to the computation speed required by various real-world applications, whereas exact data mining methods tend to be slow and are best employed where the precise results are of the highest important. This thesis focuses on several data mining tasks related to analysis of big data: frequent pattern mining and visual representation. For mining frequent patterns in big data, three novel approximate methods are proposed and evaluated on real and artificial databases: • Random Sampling Method (RSM) creates a random sample from the original database and makes assumptions on the frequent and rare sequences based on the analysis results of the random sample. A significant benefit is a theoretical estimate of classification errors made by this method using standard statistical methods. • Multiple Re-sampling Method (MRM) is an improved version of RSM method with a re-sampling strategy that decreases the probability to incorrectly classify the sequences as frequent or rare. • Markov Property Based Method (MPBM) relies upon the Markov property. MPBM requires reading the original database several times (the number equals to the order of the Markov process) and then calculates the empirical frequencies using the Markov property. For visual representation... [to full text] / Didžiuliai informacijos kiekiai yra sukaupiami kiekvieną dieną pasaulyje bei jie sparčiai auga. Apytiksliai duomenų tyrybos algoritmai yra labai svarbūs analizuojant tokius didelius duomenų kiekius, nes algoritmų greitis yra ypač svarbus daugelyje sričių, tuo tarpu tikslieji metodai paprastai yra lėti bei naudojami tik uždaviniuose, kuriuose reikalingas tikslus atsakymas. Ši disertacija analizuoja kelias duomenų tyrybos sritis: dažnų sekų paiešką bei vizualizaciją sprendimų priėmimui. Dažnų sekų paieškai buvo pasiūlyti trys nauji apytiksliai metodai, kurie buvo testuojami naudojant tikras bei dirbtinai sugeneruotas duomenų bazes: • Atsitiktinės imties metodas (Random Sampling Method - RSM) formuoja pradinės duomenų bazės atsitiktinę imtį ir nustato dažnas sekas, remiantis atsitiktinės imties analizės rezultatais. Šio metodo privalumas yra teorinis paklaidų tikimybių įvertinimas, naudojantis standartiniais statistiniais metodais. • Daugybinio perskaičiavimo metodas (Multiple Re-sampling Method - MRM) yra RSM metodo patobulinimas, kuris formuoja kelias pradinės duomenų bazės atsitiktines imtis ir taip sumažina paklaidų tikimybes. • Markovo savybe besiremiantis metodas (Markov Property Based Method - MPBM) kelis kartus skaito pradinę duomenų bazę, priklausomai nuo Markovo proceso eilės, bei apskaičiuoja empirinius dažnius remdamasis Markovo savybe. Didelio duomenų kiekio vizualizavimui buvo naudojami pirkėjų internetu elgsenos duomenys, kurie analizuojami naudojant... [toliau žr. visą tekstą] Informatics Frequent pattern Frequent sequence analysis Big data Visualization Dažnų sekų analizė Paieška Didelės duomenų bazės Vizualizacija
33	Efficient Temporal Synopsis of Social Media Streams Abouelnagah, Younes January 2013 (has links) Search and summarization of streaming social media, such as Twitter, requires the ongoing analysis of large volumes of data with dynamically changing characteristics. Tweets are short and repetitious -- lacking context and structure -- making it difficult to generate a coherent synopsis of events within a given time period. Although some established algorithms for frequent itemset analysis might provide an efficient foundation for synopsis generation, the unmodified application of standard methods produces a complex mass of rules, dominated by common language constructs and many trivial variations on topically related results. Moreover, these results are not necessarily specific to events within the time period of interest. To address these problems, we build upon the Linear time Closed itemset Mining (LCM) algorithm, which is particularly suited to the large and sparse vocabulary of tweets. LCM generates only closed itemsets, providing an immediate reduction in the number of trivial results. To reduce the impact of function words and common language constructs, we apply a filltering step that preserves these terms only when they may form part of a relevant collocation. To further reduce trivial results, we propose a novel strengthening of the closure condition of LCM to retain only those results that exceed a threshold of distinctiveness. Finally, we perform temporal ranking, based on information gain, to identify results that are particularly relevant to the time period of interest. We evaluate our work over a collection of tweets gathered in late 2012, exploring the efficiency and filtering characteristic of each processing step, both individually and collectively. Based on our experience, the resulting synopses from various time periods provide understandable and meaningful pictures of events within those periods, with potential application to tasks such as temporal summarization and query expansion for search. Information Retrieval Text Mining Data Mining Social Media Temporal summarization Frequent itemset mining Microblogs Frequent pattern mining Computer Science
34	Discovering Neglected Conditions in Software by Mining Program Dependence Graphs CHANG, RAY-YAUNG January 2009 (has links) No description available. Computer Science Automatic defect detection mining software repositories program dependence graphs frequent itemset mining frequent subgraph mining
35	Himmelsk lojalitet? : En fallstudie av SAS EuroBonus inverkan på frekventa resenärer / Sky high loyalty? : A case study of SAS EuroBonus’ impact on frequent flyers Persson Hagström, Hjalmar, Meinert, Magnus January 2024 (has links) Flygbolag står inför utmaningen att utforma lojalitetsprogram på en marknad där nästan alla konkurrenter gör samma saker för att behålla sina kunder. Utmaningen accentueras av ett ekonomiskt klimat, i vilket flygbolag behöver balansera kostnadsminskningar med att erbjuda attraktiva förmåner till kunder. Mot denna bakgrund syftar fallstudien till att analysera hur ett flygbolags lojalitetsprogram påverkar lojaliteten hos frekventa resenärer vid val av flygbolag, samt vilka förmåner resenärer värderar högst. Syftet i studien har undersökts dels genom en enkät, dels intervjuer. Studien fokuserar specifikt på flygbolaget Scandinavian Airlines lojalitetsprogram EuroBonus, samt vilka förmåner som deras frekventa resenärer värderar högst. Slutsatser som dras är att EuroBonus förstärker respondenternas lojalitet till SAS och att förmåner både uppskattas och påverkar valet av flygbolag. Trots detta kan faktorer som pris och tillgänglighet påverka val av flygbolag. För att öka lojaliteten bör SAS differentiera förmånshantering baserat på medlemmarnas behov. Vidare är öppen kommunikation och transparens viktigt. Ökning av tillgängliga platser för bonusresor och fokus på sociala förmåner är konkreta förslag som kan förbättra lojaliteten. Framtida forskning kan utforska hur olika förmåner påverkar olika kundsegment för att optimera lojalitetsprogrammens effektivitet och förstå kundernas behov bättre. / Airlines face the challenge of designing loyalty programs in a market where nearly all competitors are doing same things to retain customers. This challenge is exacerbated by an economic climate where airlines need to balance cost reductions with offering attractive benefits to customers. With this in mind, this case study aims to analyze how an airline's loyalty program affects the loyalty of frequent travelers when choosing airlines, as well as which benefits travelers value the most. The aim of the study was investigated through a survey and interviews. The study specifically focuses on the airline Scandinavian Airlines' loyalty program EuroBonus and which benefits their frequent travelers prioritize. Conclusions drawn indicate that EuroBonus reinforces respondents' loyalty to SAS and that benefits are both appreciated and influence the choice of airline. However, factors such as price and availability can affect airline choice. To increase loyalty, SAS should differentiate benefit management based on members' needs. Furthermore, open communication and transparency are crucial. Concrete suggestions such as increasing available seats for bonus trips and focusing on social benefits can enhance loyalty. Future research could explore how different benefits affect different customer segments to optimize the effectiveness of loyalty programs and better understand customer needs. EuroBonus SAS frequent flyer program loyalty program benefits EuroBonus SAS frequent flyerprogram lojalitetsprogram förmåner Business Administration Företagsekonomi
36	Efficient Frequent Closed Itemset Algorithms With Applications To Stream Mining And Classification Ranganath, B N 09 1900 (has links) Data mining is an area to find valid, novel, potentially useful, and ultimately understandable abstractions in a data. Frequent itemset mining is one of the important data mining approaches to find those abstractions in the form of patterns. Frequent Closed itemsets provide complete and condensed information for non-redundant association rules generation. For many applications mining all the frequent itemsets is not necessary, and mining frequent Closed itemsets are adequate. Compared to frequent itemset mining, frequent Closed itemset mining generates less number of itemsets, and therefore improves the efficiency and effectiveness of these tasks. Recently, much research has been done on Closed itemsets mining, but it is mainly for traditional databases where multiple scans are needed, and whenever new transactions arrive, additional scans must be performed on the updated transaction database; therefore, they are not suitable for data stream mining. Mining frequent itemsets from data streams has many potential and broad applications. Some of the emerging applications of data streams that require association rule mining are network traffic monitoring and web click streams analysis. Different from data in traditional static databases, data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. Recent works on data stream mining based on sliding window method slide the window by one transaction at a time. But when the window size is large and support threshold is low, the existing methods consume significant time and lead to a large increase in user response time. In our first work, we propose a novel algorithm Stream-Close based on sliding window model to mine frequent Closed itemsets from the data streams within the current sliding window. We enhance the scalabality of the algorithm by introducing several optimization techniques such as sliding the window by multiple transactions at a time and novel pruning techniques which lead to a considerable reduction in the number of candidate itemsets to be examined for closure checking. Our experimental studies show that the proposed algorithm scales well with large data sets. Still the notion of frequent closed itemsets generates a huge number of closed itemsets in some applications. This drawback makes frequent closed itemsets mining infeasible in many applications since users cannot interpret the large volume of output (which sometimes will be greater than the data itself when support threshold is low) and may lead to an overhead to develop extra applications which post processes the output of original algorithm to reduce the size of the output. Recent work on clustering of itemsets considers strictly either expression(consists of items present in itemset) or support of the itemsets or partially both to reduce the number of itemsets. But the drawback of the above approaches is that in some situations, number of itemsets does not reduce due to their restricted view of either considering expressions or support. So we propose a new notion of frequent itemsets called clustered itemsets which considers both expressions and support of the itemsets in summarizing the output. We introduce a new distance measure w.r.t expressions and also prove the problem of mining clustered itemsets to be NP-hard. In our second work, we propose a deterministic locality sensitive hashing based classifier using clustered itemsets. Locality sensitive hashing(LSH)is a technique for efficiently finding a nearest neighbour in high dimensional data sets. The idea of locality sensitive hashing is to hash the points using several hash functions to ensure that for each function the probability of collision is much higher for objects that are close to each other than those that are far apart. We propose a LSH based approximate nearest neighbour classification strategy. But the problem with LSH is, it randomly chooses hash functions and the estimation of a large number of hash functions could lead to an increase in query time. From Classification point of view, since LSH chooses randomly from a family of hash functions the buckets may contain points belonging to other classes which may affect classification accuracy. So, in order to overcome these problems we propose to use class association rules based hash functions which ensure that buckets corresponding to the class association rules contain points from the same class. But associative classification involves generation and examination of large number of candidate class association rules. So, we use the clustered itemsets which reduce the number of class association rules to be examined. We also establish formal connection between clustering parameter(delta used in the generation of clustered frequent itemsets) and discriminative measure such as Information gain. Our experimental studies show that the proposed method achieves an increase in accuracy over LSH based near neighbour classification strategy. Data Mining Classification - Algorithms Frequent Itemset Mining Clustered Itemsets Data Stream Mining Locality Sensitive Hashing Stream-Close Algorithm Associative Classification Clustered Frequent Itemsets Closed Frequent Itemsets Stream Mining Computer Science
37	Kundlojalitet : En studie om hur långsiktiga kundrelationer skapas på en tjänstemarknad Hyytiäinen, Josephine, Braatz, Anna January 2010 (has links) <p><strong>Purpose:</strong> The study aims to, from a companys point of view, describe how strategic work with customer loyalty is done.</p><p><strong>Conclusion:</strong> Another possible way to try to create loyalty is to exceed the expectations whether the customer is a member of the loyalty program or not. A strong brand with a good reputation can also contribute to a stronger loyalty. Reward systems gradually reward the customers since they first need to collect points, or in this case miles, to be able to use them later on. The staff is often educated in service to be able to interact with the customer in a professional manner. One way to interact with the members of the loyalty program is through profiles on the internet. To work with loyalty programs is a way for companies to get loyal customers.</p> service management customer loyalty customer value/satisfaction customer loyalty programs the airline industry frequent flyer programs Relationsmarknadsföring kundlojalitet kundvärde/tillfredställelse kundlojalitetsprogram flygbranschen frequent flyer programs Business studies Företagsekonomi
38	Kundlojalitet : En studie om hur långsiktiga kundrelationer skapas på en tjänstemarknad Hyytiäinen, Josephine, Braatz, Anna January 2010 (has links) Purpose: The study aims to, from a companys point of view, describe how strategic work with customer loyalty is done. Conclusion: Another possible way to try to create loyalty is to exceed the expectations whether the customer is a member of the loyalty program or not. A strong brand with a good reputation can also contribute to a stronger loyalty. Reward systems gradually reward the customers since they first need to collect points, or in this case miles, to be able to use them later on. The staff is often educated in service to be able to interact with the customer in a professional manner. One way to interact with the members of the loyalty program is through profiles on the internet. To work with loyalty programs is a way for companies to get loyal customers. service management customer loyalty customer value/satisfaction customer loyalty programs the airline industry frequent flyer programs Relationsmarknadsföring kundlojalitet kundvärde/tillfredställelse kundlojalitetsprogram flygbranschen frequent flyer programs Business studies Företagsekonomi
39	Εξόρυξη και διαχείριση κανόνων συσχέτισης με χρήση τεχνικών ανάκτησης πληροφορίας Βαρσάμης, Θεόδωρος 11 June 2013 (has links) Σε έναν κόσμο που κατακλύζεται από δεδομένα, καθίσταται αναγκαία η αποδοτική οργάνωσή τους και η μετέπειτα επεξεργασία τους, με σκοπό την εύρεση και την ανάκτηση πληροφορίας για λήψη αποφάσεων. Στα πλαίσια της προσπάθειας αυτής έχουν δημοσιευθεί διάφορες μελέτες που στοχεύουν στην ανεύρεση σχέσεων μεταξύ των δεδομένων, οι οποίες μπορούν να αναδείξουν άγνωστες μέχρι πρότινος εξαρτήσεις και να επιτρέψουν την πρόγνωση και την πρόβλεψη μελλοντικών αποτελεσμάτων και αποφάσεων. Στην εργασία αυτή μελετάμε τους πιο διαδεδομένους αλγορίθμους εύρεσης κανόνων συσχετίσεων και ακολούθως προτείνουμε ένα σχήμα που χρησιμοποιεί ως βασική δομή για την ανάκτηση πληροφορίας από βάσεις δεδομένων συναλλαγών τα αντεστραμμένα αρχεία. Στόχος μας είναι η εύκολη παραγωγή κανόνων συσχέτισης αντικειμένων, βασιζόμενη στην αποδοτική αποθήκευση και ανάκτηση των Συχνών Συνόλων Αντικειμένων (Frequent Itemsets). Αρχικά επικεντρωνόμαστε στον τρόπο εύρεσης και αποθήκευσης ενός ελάχιστου συνόλου συναλλαγών, εκμεταλλευόμενοι την πληροφορία που εμπεριέχουν τα Κλειστά Συχνά Σύνολα Αντικειμένων (Closed Frequent Itemsets) και τα Μέγιστα Συχνά Σύνολα Αντικειμένων (Maximum Frequent Itemsets). Στη συνέχεια, αξιοποιώντας την αποθηκευμένη πληροφορία στα MFI και με ελάχιστο υπολογιστικό κόστος, προτείνουμε τον αλγόριθμο MFI-drive που απαντάει σε ερωτήματα εύρεσης υπερσυνόλου και υποσυνόλου αντικειμένων, καθώς και συνόλων αντικειμένων με προκαθορισμένο βαθμό ομοιότητας σε σχέση με ένα δεδομένο σύνολο. / -- Εξόρυξη πληροφορίας Ανεστραμμένα αρχεία 006.312 Data mining Closed frequent itemsets Maximum frequent itemsets Invert index files MFI-drive
40	Získávání znalostí z obchodních procesů / Business Process Mining Skácel, Jan January 2015 (has links) This thesis explains business process mining and it's principles. A substantial part is devoted to the problems of process discovery. Further, based on the analysis of specific manufacturing process are proposed three methods that are trying to identify shortcomings in the process. First discovers the manufacturing process and renders it into a graph. The second method uses simulator of production history to obtain products that may caused delays in the process. Acquired data are used to mine frequent itemsets. The third method tries to predict processing time on the selected workplace using asociation rules. Last two mentioned methods employ an algorithm Frequent Pattern Growth. The knowledge obtained from this thesis improve efficiency of the manufacturing process and enables better production planning.

Search results