1 |
Segmenting, Summarizing and Predicting Data SequencesChen, Liangzhe 19 June 2018 (has links)
Temporal data is ubiquitous nowadays and can be easily found in many applications. Consider the extensively studied social media website Twitter. All the information can be associated with time stamps, and thus form different types of data sequences: a sequence of feature values of users who retweet a message, a sequence of tweets from a certain user, or a sequence of the evolving friendship networks. Mining these data sequences is an important task, which reveals patterns in the sequences, and it is a very challenging task as it usually requires different techniques for different sequences. The problem becomes even more complicated when the sequences are correlated.
In this dissertation, we study the following two types of data sequences, and we show how to carefully exploit within-sequence and across-sequence correlations to develop more effective and scalable algorithms.
1. Multi-dimensional value sequences: We study sequences of multi-dimensional values, where each value is associated with a time stamp. Such value sequences arise in many domains such as epidemiology (medical records), social media (keyword trends), etc. Our goals are: for individual sequences, to find a segmentation of the sequence to capture where the pattern changes; for multiple correlated sequences, to use the correlations between sequences to further improve our segmentation; and to automatically find explanations of the segmentation results.
2. Social media post sequences: Driven by applications from popular social media websites such as Twitter and Weibo, we study the modeling of social media post sequences. Our goal is to understand how the posts (like tweets) are generated and how we can gain understanding of the users behind these posts. For individual social media post sequences, we study a prediction problem to find the users' latent state changes over the sequence. For dependent post sequences, we analyze the social influence among users, and how it affects users in generating posts and links.
Our models and algorithms lead to useful discoveries, and they solve real problems in Epidemiology, Social Media and Critical Infrastructure Systems. Further, most of the algorithms and frameworks we propose can be extended to solve sequence mining problems in other domains as well. / Ph. D. / Temporal data is ubiquitous nowadays and can be easily found in many applications. Consider the extensively studied social media website Twitter. All the information can be associated with time stamps, and thus form different types of data sequences: a sequence of feature values of users who retweet a message, a sequence of tweets from a certain user, or a sequence of the evolving friendship networks. Mining these data sequences is an important task, which reveals patterns in the sequences, and helps downstream tasks like data compression and visualization. At the same time, it is a very challenging task as it usually requires different techniques for different sequences. The problem becomes even more complicated when the sequences are correlated.
In this dissertation, we first study value sequences, where objects in the sequence are multidimensional data values, and move to text sequences, where each object in the sequence is a text document (like a tweet). For each of these data sequences, we study them either as independent individual sequences, or as a group of dependent sequences. We then show how to carefully exploit different types of correlations behind the sequences to develop more effective and scalable algorithms.
Our models and algorithms lead to useful discoveries, and they solve real problems in Epidemiology, Social Media and Critical Infrastructure Systems. Further, most of the algorithms and frameworks we propose can be extended to solve sequence mining problems in other domains as well.
|
2 |
End-user service composition from a social networks analysis perspectiveMAARADJI, Abderrahmane 02 December 2011 (has links) (PDF)
Service composition has risen from the need to make information systems more flexible and open. The Service Oriented Architecture has become the reference architecture model for applications carried by the impetus of Internet (Web). In fact, information systems are able to expose interfaces through the Web which has increased the number of available Web services. On the other hand, with the emergence of the Web 2.0, service composition has evolved toward web users with limited technical skills. Those end-users, named Y generation, are participating, creating, sharing and commenting content through the Web. This evolution in service composition is translated by the reference paradigm of Mashup and Mashup editors such as Yahoo Pipes! This paradigm has established the service composition within end users community enabling them to meet their own needs, for instance by creating applications that do not exist. Additionally, Web 2.0 has brought also its social dimension, allowing users to interact, either directly through the online social networks or indirectly by sharing, modifying content, or adding metadata. In this context, this thesis aims to support the evolving concept of service composition through meaningful contributions. The main contribution of this thesis is indeed the introduction of the social dimension within the process of building a composite service through end users' dedicated environments. In fact, this concept of social dimension considers the activity of compositing services (creating a Mashup) as a social activity. This activity reveals social links between users based on their similarity in selecting and combining services. These links could be an interesting dissemination means of expertise, accumulated by users when compositing services. In other terms, based on frequent composition patterns, and similarity between users, when a user is editing a Mashup, dynamic recommendations are proposed. These recommendations aim to complete the initial part of Mashup already introduced by the user. This concept has been explored through (i) a step-by-step Mashup completion by recommending a single service at each step, and (ii) a full Mashup completion approaches by recommending the whole sequence of services that could complete the Mashup. Beyond pushing a vision for integrating the social dimension in the service composition process, this thesis has addressed a particular constraint for this recommendation system which conditions the interactive systems requirements in terms of response time. In this regard, we have developed robust algorithms adapted to the specificities of our problem. Whereas a composite service is considered as a sequence of basic service, finding similarities between users comes first to find frequent patterns (subsequences) and then represent them in an advantageous data structure for the recommendation algorithm. The proposed algorithm FESMA, provide exactly those requirements based on the FSTREE structure with interesting results compared to the prior art. Finally, to implement the proposed algorithms and methods, we have developed a Mashup creation framework, called Social Composer (SoCo). This framework, dedicated to end users, firstly implements abstraction and usability requirements through a workflow-based graphic environment. As well, it implements all the mechanisms needed to deploy composed service starting from an abstract description entered by the user. More importantly, SoCo has been augmented by including the dynamic recommendation functionality, demonstrating by the way the feasibility of this concept.
|
3 |
Frequent sequence mining on longitudinaldata : Segregation of Swedish employeesHietala, Isak January 2015 (has links)
This thesis is based on longitudinal data of the Swedish population provided byStatistics Sweden and is conducted on behalf of the Institute for Analytical Sociology.The focus is on investigating the effectiveness of a frequent sequence miningmethod called constrained Sequential PAttern Discovery using Equivalence classes(cSPADE). The method is applied to data on segregation within workplaces, specificallyreasons for Swedish employees moving to more segregated workplaces. Thethesis found that no unique pattern of age, gender, education, unemployment, income,workplace size or foreignness index explain why a Swedish employee movesto a more segregated workplace. Evaluating the algorithm, it was found that thenumber of observations need to be smaller or an alteration of the algorithm needsto be done to reduce the process time for this specific data set.
|
4 |
Event Sequence Identification and Deep Learning Classification for Anomaly Detection and Predication on High-Performance Computing SystemsLi, Zongze 12 1900 (has links)
High-performance computing (HPC) systems continue growing in both scale and complexity. These large-scale, heterogeneous systems generate tens of millions of log messages every day. Effective log analysis for understanding system behaviors and identifying system anomalies and failures is highly challenging. Existing log analysis approaches use line-by-line message processing. They are not effective for discovering subtle behavior patterns and their transitions, and thus may overlook some critical anomalies. In this dissertation research, I propose a system log event block detection (SLEBD) method which can extract the log messages that belong to a component or system event into an event block (EB) accurately and automatically. At the event level, we can discover new event patterns, the evolution of system behavior, and the interaction among different system components. To find critical event sequences, existing sequence mining methods are mostly based on the a priori algorithm which is compute-intensive and runs for a long time. I develop a novel, topology-aware sequence mining (TSM) algorithm which is efficient to generate sequence patterns from the extracted event block lists. I also train a long short-term memory (LSTM) model to cluster sequences before specific events. With the generated sequence pattern and trained LSTM model, we can predict whether an event is going to occur normally or not. To accelerate such predictions, I propose a design flow by which we can convert recurrent neural network (RNN) designs into register-transfer level (RTL) implementations which are deployed on FPGAs. Due to its high parallelism and low power, FPGA achieves a greater speedup and better energy efficiency compared to CPU and GPU according to our experimental results.
|
5 |
End-user service composition from a social networks analysis perspective / La composition de service pour les utilisateurs finaux, basée sur l'analyse des réseaux sociauxMaaradji, Abderrahmane 02 December 2011 (has links)
Le paradigme de service dans les nouvelles technologies de l’information et de communication est omniprésent, si bien qu’on parle de science des services. Les services Web sont définis dans le cadre des architectures orientées services (SOA) qui permet de distinguer le fournisseur de service, le répertoire de services, et enfin le consommateur du service. Cette distinction permet de créer de nouveaux services en composant des services déjà existants. Cependant, la composition de services est principalement bénéfique aux utilisateurs expérimentés comme les développeurs de logiciels car elle requiert un niveau technique élevé. Par opposition, la tendance actuelle traduite par l’émergence du Web2.0, vise à permettre aux utilisateurs du Web de créer leurs propres services à travers les environnements de Mashup, ou de collaborer et de capitaliser des connaissances à travers les réseaux et les médias sociaux. Nous croyons qu’il existe un grand potentiel pour “démocratiser” la composition de services dans de tels contextes. L’émergence du Web 2.0, basé sur des paradigmes tels que le contenu généré par l’utilisateur (UGC, Mashups) et le web social, constitue, une opportunité intéressante pour améliorer la productivité de services par l’utilisateur final et accélérer son processus créatif en capitalisant les connaissances générées par tous les utilisateurs. Dans ce contexte, cette thèse vise à soutenir l'évolution du concept de composition de services par le biais de contributions significatives. La principale contribution de cette thèse est en effet l'introduction de la dimension sociale dans le processus de construction d'un service composite à travers les environnements dédiés aux utilisateurs finaux. Ce concept considère l'activité de composition de services (création d'un Mashup) comme une activité sociale. Cette activité révèle les liens sociaux entre les utilisateurs en fonction de leur similitude dans le choix et la combinaison des services. Ces liens permettent de diffuser d'expertise de composition de services. En d'autres termes, sur la base des schémas fréquents de composition, et la similitude entre les utilisateurs, lorsqu’un utilisateur est en train d’éditer un Mashup, des recommandations dynamiques lui sont proposées. Ces recommandations visent à compléter la première partie de Mashup déjà mis en place par l'utilisateur. Ce concept a été exploré à travers (i) la complétion de Mashup étape par étape en recommandant à chaque étape un service unique, et (ii) la complétion totale de Mashup en recommandant la séquence complète de services qui pourraient le compléter. Au-delà de l’introduction de la dimension sociale dans le processus de composition de services, cette thèse a adressé une contrainte particulière du système de recommandation liée aux exigences des systèmes interactifs en termes de temps de réponse. À cet égard, nous avons développé des algorithmes robustes et adaptées aux spécificités de notre problème. Alors qu’un service composite est considéré comme une séquence de service, la recherche de similarités entre les utilisateurs revient d'abord à trouver des modèles fréquents, puis de les représenter dans une structure de données avantageuse pour l'algorithme de recommandation. L’algorithme proposé FESMA répond à ces exigences en se basant sur la structure FSTREE et offrant des résultats intéressants par rapport à l'art antérieur. Enfin, pour mettre en œuvre les algorithmes et les méthodes proposées, nous avons développé un environnement de création de Mashup, appelé ‘Social Composer’ (SoCo). Cet environnement, dédié aux utilisateurs finaux, respecte les critères d'utilisation en se basant sur le workflow graphique. En outre, il met en œuvre tous les mécanismes nécessaires pour déployer le service composé à partir d'une description abstraite introduite par l'utilisateur. De plus, SoCo a été augmentée en y incluant la fonctionnalité de recommandation dynamique, démontrant la faisabilité de ce concept / Service composition has risen from the need to make information systems more flexible and open. The Service Oriented Architecture has become the reference architecture model for applications carried by the impetus of Internet (Web). In fact, information systems are able to expose interfaces through the Web which has increased the number of available Web services. On the other hand, with the emergence of the Web 2.0, service composition has evolved toward web users with limited technical skills. Those end-users, named Y generation, are participating, creating, sharing and commenting content through the Web. This evolution in service composition is translated by the reference paradigm of Mashup and Mashup editors such as Yahoo Pipes! This paradigm has established the service composition within end users community enabling them to meet their own needs, for instance by creating applications that do not exist. Additionally, Web 2.0 has brought also its social dimension, allowing users to interact, either directly through the online social networks or indirectly by sharing, modifying content, or adding metadata. In this context, this thesis aims to support the evolving concept of service composition through meaningful contributions. The main contribution of this thesis is indeed the introduction of the social dimension within the process of building a composite service through end users’ dedicated environments. In fact, this concept of social dimension considers the activity of compositing services (creating a Mashup) as a social activity. This activity reveals social links between users based on their similarity in selecting and combining services. These links could be an interesting dissemination means of expertise, accumulated by users when compositing services. In other terms, based on frequent composition patterns, and similarity between users, when a user is editing a Mashup, dynamic recommendations are proposed. These recommendations aim to complete the initial part of Mashup already introduced by the user. This concept has been explored through (i) a step-by-step Mashup completion by recommending a single service at each step, and (ii) a full Mashup completion approaches by recommending the whole sequence of services that could complete the Mashup. Beyond pushing a vision for integrating the social dimension in the service composition process, this thesis has addressed a particular constraint for this recommendation system which conditions the interactive systems requirements in terms of response time. In this regard, we have developed robust algorithms adapted to the specificities of our problem. Whereas a composite service is considered as a sequence of basic service, finding similarities between users comes first to find frequent patterns (subsequences) and then represent them in an advantageous data structure for the recommendation algorithm. The proposed algorithm FESMA, provide exactly those requirements based on the FSTREE structure with interesting results compared to the prior art. Finally, to implement the proposed algorithms and methods, we have developed a Mashup creation framework, called Social Composer (SoCo). This framework, dedicated to end users, firstly implements abstraction and usability requirements through a workflow-based graphic environment. As well, it implements all the mechanisms needed to deploy composed service starting from an abstract description entered by the user. More importantly, SoCo has been augmented by including the dynamic recommendation functionality, demonstrating by the way the feasibility of this concept.
|
6 |
Abstraction et comparaison de traces d'exécution pour l'analyse d'applications multimédias embarquées / Abstraction and comparison of execution traces for analysis of embedded multimedia applicationsKamdem Kengne, Christiane 05 December 2014 (has links)
Le projet SoC-Trace a pour objectif le développement d'un ensemble de méthodes et d'outils basés sur les traces d'éxécution d'applications embarquées multicoeur afin de répondre aux besoins croissants d'observabilité et de 'débogabilité' requis par l'industrie. Le projet vise en particulier le développement de nouvelles méthodes d'analyse, s'appuyant sur différentes techniques d'analyse de données telles que l'analyse probabiliste, la fouille de données, et l'agrégation de données. Elles devraient permettre l'identification automatique d'anomalies,l'analyse des corrélations et dépendances complexes entre plusieurs composants d'une application embarquées ainsi que la maîtrise du volume important des traces qui peut désormais dépasser le GigaOctet. L'objectif de la thèse est de fournir une représentation de haut niveau des informations contenues dans les traces, basée sur la sémantique. Il s'agira dans un premier temps de développer un outil efficace de comparaison entre traces;de définir une distance démantique adaptée aux traces, puis dans un second temps d'analyser et d'interpréter les résultats des comparaisons de traces en se basant sur la distance définie. / The SoC-Trace project aims to develop a set of methods and tools based on execution traces of multicore embedded applications to meet the growing needs of observability and 'débogability' required by the industry. The project aims in particular the development of new analytical methods, based on different data analysis techniques such as probabilistic analysis, data mining, and data aggregation. They should allow the automatic identification of anomalies, the analysis of complex correlations and dependencies between different components of an embedded application and control of the volume traces that can now exceed the gigabyte. The aim of the thesis is to provide a high-level representation of information in the trace based semantics. It will initially develop an effective tool for comparing traces, to define a semantic distance for execution traces, then a second time to analyze and interpret the results of comparisons of traces based on the defined distance.
|
7 |
對於閱讀的感興趣程度與眼動特徵關係之研究 / The Research on the Relationship between Interesting Degree of Reading and Eye Movement Features王加元, Wang, Jia Yuan Unknown Date (has links)
現在有許多對於眼動軌跡與人在認知方面的研究,包括理解狀態以及感興趣的程度;其中,閱讀文章時的眼動軌跡是最常被討論及研究的題材。而本研究的目的就是希望探討讀者在閱讀時的眼動軌跡,與其感興趣程度之間是否存在關係。 / 本研究的特色在於,我們不用一般分析眼動時關心每個AOI(area of interest)上的眼動資料,而是希望將眼動資料以序列的方式分析,並且運用資料探勘的方法,找出眼動序列中區分感興趣程度的眼動軌跡特徵的片段。 / 透過對於眼動軌跡的分析,我們希望研究的結果,在未來可以運用在資訊檢索的領域上,成為一種有效的「隱含式回饋(implicit feedback)」的方式,以改善現有資訊檢索效能。 / Much research has been performed on the relationship between eye movements and human cognition, including comprehension and interesting degree. The purpose of our research is to find out if there are relationships between eye movements of reading and interesting degree. / Instead of analyzing the eye movements on each area of interest, the characteristic of our research is to transform eye movements to sequence data, and to determine the eye movement patterns which discriminate whether user is interesting or not by using the method of data mining. / Through the analysis of the eye movements, our research result can be used as one way of implicit feedback of information retrieval to improve the effectiveness of the search engine.
|
Page generated in 0.0888 seconds