Global ETD Search

91	Human Activity Analytics Based on Mobility and Social Media Data Paraskevopoulos, Pavlos January 2017 (has links) The development of social networks such as Twitter, Facebook and Google+ allow users to share their beliefs, feelings, or observations with their circles of friends. Based on these data, a range of applications and techniques has been developed, targeting to provide a better quality of life to the users. Nevertheless, the quality of results of the geolocationaware applications is signicantly restricted due to the tiny percentage of the social media data that is geotagged ( 2% for Twitter). Hence, increasing this percentage is an important and challenging problem. Moreover, information extracted from social media data can be complemented by the analysis of mobile phone usage data, in order to provide further insights on human activity patterns. In this thesis, we present a novel method for analyzing and geolocalizing non-geotagged Twitter posts. The proposed method is the rst to do so at the ne-grain of city neighborhoods, while being both eective and time ecient. Our method is based on the extraction of representative keywords for each candidate location,as well as the analysis of the tweet volume time series. We also describe a system built on top of our method, which geolocalizes tweets and allows users to visually examine the results and their evolution over time. Our system allows the user to get a better idea of how the activity of a particular location changes, which the most important keywords are, as well as to geolocalize individual tweets of interest. Moreover, we study the activity and mobility characteristics of the users that post geotagged tweets and compared the mobility of users who attended the event with a random set of users. Interestingly, the results of this analysis indicate that a very small number of users (i.e., less than 35 users in this study) is able to represent the mobility patterns present in the entire dataset. Finally, we study the call activity and mobility patterns, clustering the observed behaviors that exhibited similar characteristics, and characterizing the anomalous behaviors. We analyzed a Call Detail Record (CDR) dataset, containing (aggregated) information on the calls among mobile phones. Employing density-based algorithms and statistical analysis, we developed a framework that identies abnormal locations, as well abnormal time intervals. The results of this work can be used for early identication of exceptional situations, monitoring the eects of important events in urban and transportation planning, and others. Settore INF/01 - Informatica
92	Video Scene Understanding: Semantic-based representation, Temporal Variation Modeling, Multi-Task Learning Rostamzadeh, Negar January 2017 (has links) One of the major research topics in computer vision is automatic video scene understanding where the ultimate goal is to build artificial intelligence systems comparable with humans in understanding video contents. Automatic video scene understanding covers many applications including (i) semantic functional complex scene categorization, (ii) human body-pose estimation in videos, (iii) human fine-grained daily living action recognition, (vi) video retrieval, and genre recognition. In this thesis, we introduce computer vision and pattern analysis techniques that outperform the state of art of the above mentioned applications on some publicly available datasets. Our major research contributions towards automatic video scene understanding are (i) introducing an efficient approach to combine low and high-level information content of videos, (ii) modeling temporal variation of frame-based descriptors in videos, and (iii) proposing a multitask learning framework to leverage the huge amount of unlabeled videos. The first category covers a method for enriching visual words that contain local motion information but they lack information about the cause of the motion. Our proposed approach embeds the source of a generated motion in video descriptors and hence induces some semantic information in the employed visual words in the pattern analysis task. Our approach is validated on traffic scene analysis as well as human body pose estimation applications. When employing an already-trained off-the-shelves model over an unseen dataset, the accuracy of the model usually drops significantly. We present an approach that considers low-level cues such as the optical flow in the foreground of a video to make an already-trained, off-the-shelves, pictorial deformable model work well on a body pose estimation working well for an unseen dataset. The second category covers methods that induce temporal variation information to video descriptors. Many video descriptors are based on global video representations, where, frame-based descriptors are combined to a unified video descriptor without preserving much of the temporal information content. To include the temporal information content in video descriptors, we introduce a descriptor, namely, the Hard and Soft Cluster Encoding. The descriptor includes how similar frames are distributed over a video timespan. We present that our approach yields significant improvements on the human fine-grained daily living action recognition task. The third category includes a novel Multi-Task Clustering (MTC) approach to leverage the information of unlabeled videos. Our proposed method is on human fine-grained daily living action recognition application. People tend to perform similar activities in the similar environments. Therefore, a proper clustering approach could determine patterns of fine-grained activities during some learning process. Our proposed MTC approach rather than clustering the data of each individual separately, capture more generic patterns across users over the training data and hence leads to remarkable recognition rates. Finally, we discuss opportunities for future applications of our research and conclude with a summary of our contributions to video understanding. Settore INF/01 - Informatica
93	Speech Adaptation Modeling for Statistical Machine Translation Ruiz, Nicholas January 2017 (has links) Spoken language translation (SLT) exists within one of the most challenging intersections of speech and natural language processing. While machine translation (MT) has demonstrated its effectiveness on the translation of textual data, the translation of spoken language remains a challenge, largely due to the mismatch between the training conditions of MT and the noisy signal that is output by an automatic speech recognition (ASR) system. In the interchange between ASR and MT, errors propagated from noisy speech recognition outputs may become compounded, rendering the speech translation to be unintelligible. Additionally, aspects such as stylistic differences between written and spoken registers can lead to the generation of inadequate translations. This scenario is predominantly caused by a mismatch between the training conditions of ASR and MT. Due to the lack of training data that couples speech audio with translated transcripts, MT systems in the SLT pipeline must rely predominantly on textual data that does not represent well the characteristics of spoken language. Likewise, independence assumptions between each sentence results in ASR and MT systems that do not yield consistent outputs. In this thesis develop techniques to overcome the mismatch between speech and textual data by improving the robustness of the MT system. Our work can be divided into three parts. First we analyze the effects the difference between spoken and written registers has on SLT quality. We additionally introduce a data analysis methodology to measure the impact of ASR errors on translation quality. Secondly, we propose several approaches to improve the MT component's tolerance of noisy ASR outputs: by adapting its models based on the bilingual statistics of each sentence's neighboring context, and through the introduction of a process by which textual resources can be transformed into synthetic ASR data to use when training a speech-centric MT system. In particular, we focus on the translation from spoken English to French and German -- the two parent languages of English -- and demonstrate that information about the types and frequency of ASR errors can improve the robustness of machine translation for SLT. Finally, we introduce and motivate several challenges in spoken language translation with neural machine translation models that are specific to their modeling architecture. Settore INF/01 - Informatica
94	Machine Learning for Investigating Post-Transcriptional Regulation of Gene Expression Corrado, Gianluca January 2017 (has links) RNA binding proteins (RBPs) and non-coding RNAs (ncRNAs) are key actors in post-transcriptional gene regulation. By being able to bind messenger RNA (mRNA) they modulate many regulatory processes. In the last years, the increasing interest in this level of regulation favored the development of many NGS-based experimental techniques to detect RNA-protein interactions, and the consequent release of a considerable amount of interaction data on a growing number of eukaryotic RBPs. Despite the continuous advances in the experimental procedures, these techniques are still far from fully uncovering, on their own, the global RNA-protein interaction system. For instance, the available interaction data still covers a small fraction (less than 10%) of the known human RBPs. Moreover, experimentally determined interactions are often noisy and cell-line dependent. Importantly, obtaining genome-wide experimental evidence of combinatorial interactions of RBPs is still an experimental challenge. Machine learning approaches are able to learn from the data and generalize the information contained in them. This might give useful insights to help the investigation of the post-transcriptional regulation. In this work, three machine learning contributions are proposed. They aim at addressing the three above-mentioned shortcomings of the experimental techniques, to help researchers unveiling some yet uncharacterized aspects of post-transcriptional gene regulation. The first contribution is RNAcommender, a tool capable of suggesting RNA targets to unexplored RBPs at a genome-wide level. RNAcommender is a recommender system that propagates the available interaction data, considering biologically relevant aspects of the RNA-protein interactions, such as protein domains and RNA predicted secondary structure. The second contribution is ProtScan, a tool that models RNA-protein interactions at a single-nucleotide resolution. Learning models from experimentally determined interactions allows to denoise the data and to make predictions of the RBP binding preferences in conditions that are different from those of the experiment. The third and last contribution is PTRcombiner, a tool that unveils the combinatorial aspects of post-transcriptional gene regulation. It extracts clusters of mRNA co-regulators from the interaction annotations, and it automatically provides a biological analysis that might supply a functional characterization of the set of mRNAs targeted by a cluster of co-regulators, as well as of the binding dynamics of different RBPs belonging to the same cluster. Settore INF/01 - Informatica
95	Corrective Evolution of Adaptable Process Models Sirbu, Adina Iulia January 2013 (has links) Modeling business processes is a complex and time-consuming task, which can be simplified by allowing process instances to be structurally adapted at runtime, based on context (e.g., by adding or deleting activities). The process model then no longer needs to include a handling procedure for every exception that can occur. Instead, it only needs to include the assumptions under which a successful execution is guaranteed. If a design-time assumption is violated, the exception handling procedure matching the context is selected at runtime. However, if runtime structural adaptation is allowed, the process model may later need to be updated based on the logs of adapted process instances. Evolving the process model is necessary if adapting at run-time is too costly, or if certain adaptations fail and should be avoided. An issue that is insufficiently addressed in the previous work on process evolution is how to evolve a process model and also ensure that the evolved process model continues to achieve the goal of the original model. We refer to the problem of evolving a process model based on selected instance adaptations, such that the evolved model satisfies the goal of the original model, as corrective evolution. Automated techniques for solving the corrective evolution problem are necessary for two reasons. First, the more complex a process model is, the more difficult it is to be changed manually. Second, there is a need to verify that the evolved model satisfies the original goal. To develop automated techniques, we first formalize the problem of corrective evolution. Since we use a graph-based representation of processes, a key element in our formal model is the notion of trace. When plugging an instance adaptation at a particular point in the process model, there can be multiple paths in the model for reaching this point. Each of these paths is uniquely identified by a trace, i.e., a recording of the activities executed up to that point. Depending on traces, an instance adaptation can be used to correct the process model in three different ways. A correction is strict if the adaptation should be plugged in on a precise trace, relaxed if on all traces, and relaxed with conditions if on a subset of all traces. The choice is driven by competing concerns: the evolved model should not introduce untested behavior, but it should also remain understandable. Using our formal model, we develop automated techniques for solving the corrective evolution problem in two cases. The first case is also the most restrictive, when all corrections are strict. This case does not require verification, since the process model and adaptations are assumed to satisfy the goal, as long as the adaptations are applied on the corresponding traces. The second case is when corrections are either strict or relaxed. This second case requires verification, and for this reason we develop an automated technique based on planning. We implemented the two automated techniques as tools, which are integrated into a common toolkit. We used this toolkit to evaluate the tradeoffs between applying strict and relaxed corrections on a scenario built on a real event log. Settore INF/01 - Informatica
96	Distributed Contact and Identity Management Hume Llamosas, Alethia Graciela January 2014 (has links) Contact management is a twofold problem involving a local and global level where the separation between them is rather fuzzy. Locally, users need to deal with contact management, which refers to a local need to store, organize, maintain up to date, and find information that will allow them contacting or reaching other people, organizations, etc. Globally, users deal with identity management that refers to peers having multiple identities (i.e., profiles) and the need of staying in control of them. In other words, they should be able to manage what information is shared and with whom. We believe many existing applications try to deal with this problem looking only at the data level and without analyzing the underlying complexity. Our approach focus on the complex social relations and interactions between users, identifying three main subproblem: (i) management of identity, (ii) search, and (iii) privacy. The solution we propose concentrates on the models that are needed to address these problems. In particular, we propose a Distributed Contact Management System (DCM System) that: Models and represents the knowledge of peers about physical or abstract objects through the notion of entities that can be of different types (e.g., locations, people, events, facilities, organizations, etc.) and are described by a set of attributes; By representing contacts as entities, allows peers to locally organize their contacts taking into consideration the semantics of the contact’s characteristics; By describing peers as entities allows them to manage their different identities in the network, by sharing different views of themselves (showing possibly different in- formation) with different people. The contributions of this thesis are, (i) the definition of a reference architecture that allows dealing with the diversity in relation with the partial view that peers have of the world, (ii) an approach to search entities based on identifiers, (iii) an approach to search entities based on descriptions, and (iv) the definition of the DCM system that instantiates the previously mentioned approaches and architecture to address concrete usage scenarios. Settore INF/01 - Informatica
97	Multimodal Recognition of Social Behaviors and Personality Traits in Small Group Interaction Lepri, Bruno January 2009 (has links) In recent years, the automatic analysis of human behaviour has been attracting an increasing amount of attention from researchers because of its important applicative aspects and its intrinsic scientific interest. In many technological fields (pervasive and ubiquitous computing, multimodal interaction, ambient as-sisted living and assisted cognition, computer supported collaborative work, user modelling, automatic visual surveillance, etc.) the awareness is emerging that system can provide better and more appropriate services to people only if they can understand much more of what they presently do about usersâ€™ attitudes, preferences, personality, etc., as well as about what people are doing, the activities they have been en-gaged in the past, etc. At the same time, progress on sensors, sensor networking, computer vision, audio analysis and speech recognition are making available the building blocks for the automatic behavioural analysis. Multimodal analysisâ€”the joint consideration of several perceptual channelsâ€”is a powerful tool to extract large and varied amounts of information from the acoustical and visual scene and from other sensing devices (e.g., RFIDs, on-body accelerometers, etc.). In this thesis, we consider small group meetings as a challenging example and case study of real life situations in which the multimodal analysis of social signals can be used to extract relevant information about the group and about individuals. In particular, we show how the same type of social signals can be used to reconstruct apparently disparate and diverse aspects of social and individual life ranging from the functional roles played by the participants in a meeting, to static characteristics of individuals (per-sonality traits) and behavioural outcomes (task performance). Settore INF/01 - Informatica
98	Distributed Identity Management Pane Fernandez, Juan Ignacio January 2012 (has links) Semantics is a local and a global problem at the same time. Local because is in the mind of the people who have personal interpretations, and global because we need to reach a common understanding by sharing and aligning these personal interpretations. As opposed to current state-of-the-art approaches based on a two layer architecture (local and global), we deal with this problem by designing a general three layer architecture rooted on the personal, social, and universal levels. The new intermediate social level acts as a global level for the personal level, where semantics is managed around communities focusing on specific domains, and as local for the universal level as it only deals with one part of universal knowledge. For any of these layers there are three main components of knowledge that helps us encode the semantics at the right granularity. These are: i) Concrete knowledge, which allows us to achieve semantic compatibility at the level of entities, the things we want to talk about; ii) Schematic knowledge, which defines the structure and methods of the entities; and iii) Background knowledge, which enables compatibility at the language level used to describe and structure entities. The contribution of this work is threefold: i) the definition of general architecture for managing semantics of entities, ii) the development components of the system based on the architecture; these are structure preserving semantic matching and sense induction algorithms, and iii) the evaluation of these components with the creation of new gold standards datasets. Settore INF/01 - Informatica
99	Energy Adaptive Infrastructure for Sustainable Cloud Data Centres Dupont, Corentin January 2016 (has links) With the raising concerns about the environment, the ICT equipments have been pointed out as a major and ever rising source of energy consumption and pollution. Among those ICT equipments, data centres play obviously a major role with the rise of the Cloud computing paradigm. In the recent years, researchers have focused on reducing the energy consumption of data centres. Furthermore, future environmentally friendly data centres are also expected to prioritize the usage of renewable energies over brown energies. However, managing the energy consumption within a data centre is challenging because data centres are complex facilities which supports a huge variety of hardware, computing styles and SLAs. Those may evolve through time as user requirements can change rapidly. Furthermore, differently from non-renewable energy sources, the availability of renewable energies is very volatile and time dependent: e.g. solar power is obtainable only during the day, and is subject to variations due to the meteorological conditions. The goal in this case is to shift the workload of running applications, according to the forecasted availability of the renewable energy. In this thesis we propose a flexible framework called Plug4Green able to reduce the energy consumption of a Cloud data centre. Plug4Green is based on the Constraint Programming paradigm, allowing it to take into account a great number of constraints regarding energy, hardware and SLAs in data centres. We also propose the concept of an energy adaptive software controller (EASC), able to augment the usage of renewable energies in data centres. The EASC supports two kind of applications: service-oriented and task-oriented applications; and two kind of computing environments: Infrastructure as a Service and Platform as a Service. We evaluated our solutions in several trials executed in the testbeds of Milan and Trento, Italy. Results show that Plug4Green was able to reduce the power consumption by 27% in the Milan trial, while the EASC was able to augment the renewable energy percentage by 7.07pp in the Trento trial. Settore INF/01 - Informatica
100	Effective Analysis, Characterization, and Detection of Malicious Activities on the Web Eshete, Birhanu Mekuria January 2013 (has links) The Web has evolved from a handful of static web pages to billions of dynamic and interactive web pages. This evolution has positively transformed the paradigm of communication, trading, and collaboration for the benefit of humanity. However, these invaluable benefits of the Web are shadowed by cyber-criminals who use the Web as a medium to perform malicious activities motivated by illegitimate benefits. Cyber-criminals often lure victims to visit malicious web pages, exploit vulnerabilities on victims’ devices, and then launch attacks that could lead to: stealing invaluable credentials of victims, downloading and installation of malware on victims’ devices, or complete compromise of victims’ devices to mount future attacks. While the current state-of-the-art is to detect malicious web pages is promising, it is yet limited in addressing the following three problems. First, for the sake of focused detection of certain class of malicious web pages, existing techniques are limited to partial analysis and characterization of attack payloads. Secondly, attacker-motivated and benign evolution of web page artifacts have challenged the resilience of existing detection techniques. The third problem is the prevalence and evolution of Exploit Kits used in spreading web-borne malware. In this dissertation, we present the approaches and the tools we developed to address these problems. To the address partial analysis and characterization of attack payloads, we propose a holistic and lightweight approach that combines static analysis and minimalistic emulation to analyze and detect malicious web pages. This approach leverages features from URL structure, HTML content, JavaScript executed on the client, and reputation of URLs on social networking websites to train multiple models, which are then used in confidence-weighted majority vote classifier to detect unknown web pages. Evaluation of the approach on a large corpus of web pages shows that the approach not only is precise enough in detecting malicious web pages with very low false signals but also does detection with a minimal performance penalty. To address the evolution of web page artifacts, we propose an evolution-aware approach that tunes detection models inline with the evolution of web page artifacts. Our approach takes advantage of evolutionary searching and optimization using Genetic Algorithm to decide the best combination of features and learning algorithms, i.e., models, as a function of detection accuracy and false signals. Evaluation of our approach suggests that it reduces false negatives by about 10% on a fairly large testing corpus of web pages. To tackle the prevalence of Exploit Kits on the Web, we first analyze source code and runtime behavior of several Exploit Kits in a contained setting. In addition, we analyze the behavior of live Exploit Kits on the Web in a contained environment. Combining the analysis results, we characterize Exploit Kits pertinent to their attack-centric and self-defense behaviors. Based on these behaviors, we draw distinguishing features to train classifiers used to detect URLs that are hosted by Exploit Kits. The evaluation of our classifiers on independent testing dataset shows that our approach is effective in precisely detecting malicious URLs linked with Exploit Kits with very low false positives. Settore INF/01 - Informatica

Search results