Global ETD Search

141	Effective Analysis, Characterization, and Detection of Malicious Activities on the Web Eshete, Birhanu Mekuria January 2013 (has links) The Web has evolved from a handful of static web pages to billions of dynamic and interactive web pages. This evolution has positively transformed the paradigm of communication, trading, and collaboration for the benefit of humanity. However, these invaluable benefits of the Web are shadowed by cyber-criminals who use the Web as a medium to perform malicious activities motivated by illegitimate benefits. Cyber-criminals often lure victims to visit malicious web pages, exploit vulnerabilities on victims’ devices, and then launch attacks that could lead to: stealing invaluable credentials of victims, downloading and installation of malware on victims’ devices, or complete compromise of victims’ devices to mount future attacks. While the current state-of-the-art is to detect malicious web pages is promising, it is yet limited in addressing the following three problems. First, for the sake of focused detection of certain class of malicious web pages, existing techniques are limited to partial analysis and characterization of attack payloads. Secondly, attacker-motivated and benign evolution of web page artifacts have challenged the resilience of existing detection techniques. The third problem is the prevalence and evolution of Exploit Kits used in spreading web-borne malware. In this dissertation, we present the approaches and the tools we developed to address these problems. To the address partial analysis and characterization of attack payloads, we propose a holistic and lightweight approach that combines static analysis and minimalistic emulation to analyze and detect malicious web pages. This approach leverages features from URL structure, HTML content, JavaScript executed on the client, and reputation of URLs on social networking websites to train multiple models, which are then used in confidence-weighted majority vote classifier to detect unknown web pages. Evaluation of the approach on a large corpus of web pages shows that the approach not only is precise enough in detecting malicious web pages with very low false signals but also does detection with a minimal performance penalty. To address the evolution of web page artifacts, we propose an evolution-aware approach that tunes detection models inline with the evolution of web page artifacts. Our approach takes advantage of evolutionary searching and optimization using Genetic Algorithm to decide the best combination of features and learning algorithms, i.e., models, as a function of detection accuracy and false signals. Evaluation of our approach suggests that it reduces false negatives by about 10% on a fairly large testing corpus of web pages. To tackle the prevalence of Exploit Kits on the Web, we first analyze source code and runtime behavior of several Exploit Kits in a contained setting. In addition, we analyze the behavior of live Exploit Kits on the Web in a contained environment. Combining the analysis results, we characterize Exploit Kits pertinent to their attack-centric and self-defense behaviors. Based on these behaviors, we draw distinguishing features to train classifiers used to detect URLs that are hosted by Exploit Kits. The evaluation of our classifiers on independent testing dataset shows that our approach is effective in precisely detecting malicious URLs linked with Exploit Kits with very low false positives. Settore INF/01 - Informatica
142	Efficient Automated Security Analysis of Complex Authorization Policies Truong, Anh January 2015 (has links) Access Control is becoming increasingly important for today's ubiquitous systems. Sophisticated security requirements need to be ensured by authorization policies for increasingly complex and large applications. As a consequence, designers need to understand such policies and ensure that they meet the desired security constraints while administrators must also maintain them so as to comply with the evolving needs of systems and applications. These tasks are greatly complicated by the expressiveness and the dimensions of the authorization policies. It is thus necessary to provide policy designers and administrators with automated analysis techniques that are capable to foresee if, and under what conditions, security properties may be violated. For example, some analysis techniques have already been proposed in the literature for Role-Based Access Control (RBAC) policies. RBAC is a security model for access control that has been widely adopted in real-world applications. Although RBAC simplifies the design and management of policies, modifications of RBAC policies in complex organizations are difficult and error prone activities due to the limited expressiveness of the basic RBAC model. For this reason, RBAC has been extended in several directions to accommodate various needs arising in the real world such as Administrative RBAC (ARBAC) and Temporal RBAC (TRBAC). This Dissertation presents our research efforts to find the best trade-off between scalability and expressiveness for the design and benchmarking of analysis techniques for authorization policies. We review the state-of-the-art of automated analysis for authorization policies, identify limitations of available techniques and then describe our approach that is based on recently developed symbolic model checking techniques based on Satisfiability Modulo Theories (SMT) solving (for expressiveness) and carefully tuned heuristics (for scalability). Particularly, we present the implementation of the techniques on the automated analysis of ARBAC and ATRBAC policies and discuss extensive experiments that show that the proposed approach is superior to other state-of-the-art analysis techniques. Finally, we discuss directions for extensions. Settore INF/01 - Informatica
143	On Efficient Algorithms for Stochastic Simulation of Biochemical Reaction Systems Vo, Hong Thanh January 2013 (has links) Computational techniques provide invaluable tools for developing a quantitative understanding the complexity of biological systems. The knowledge of the biological system under study is formalized in a precise form by a model. A simulation algorithm will realize the dynamic interactions encoded in the model. The simulation can uncover biological implications and derive further predictive experiments. Several successful approaches with different levels of detail have been introduced to deal with various biological pathways including regulatory networks, metabolic pathways and signaling pathways. The Stochastic simulation algorithm (SSA), in particular, is an exact method to realize the time evolution of a well-mixed biochemical reaction network. It takes the inherent randomness in biological reactions and the discrete nature of involved molecular species as the main source in sampling a reaction event. SSA is useful for reaction networks with low populations of molecular species, especially key species. The macroscopic response can be significantly affected when these species involved in the reactions both quantitatively and qualitatively. Even though the underlying assumptions of SSA are obviously simplified for real biological networks, it has been proved having the capability of reproducing the stochastic effects in biological behaviour. Essentially, SSA uses a Monte Carlo simulation technique to realize temporal behaviour of biochemical network. A reaction is randomly selected to fire at a time according to its propensity by conducting a search procedure. The fired reaction leads the system to a new configuration. At this new configuration, reactions have to update their propensities to reflect the changes. In this thesis we investigate new algorithms for improving performance of SSA. First, we study the application of tree-based search for improving the search of a reaction firing, and devise a solution to optimize the average search length. We prove that by a tree-based search the performance of SSA can be sensibly improved, moving the search from linear time complexity to logarithmic complexity. We combine this idea with others from the literature, and compare the performance of our algorithm with previous ones. Our experiments show that our algorithm is faster, especially on large models. Second, we focus on reducing the cost of propensity updates. Although the computational cost for evaluating one reaction propensity is small, the cumulative cost for a large number of reactions contributes a significant portion to the simulation performance. Typical experiments show that the propensity updates contribute 65% to 85%, and in some special cases up to 99%, of the total simulation time even though a dependency graph was applied. Moreover, sometimes one models the kinetics using a complex propensity formula, further increasing the cost of propensity updates. We study and propose a new exact simulation algorithm, called RSSA named after Rejection-based SSA, to reduce the cost of propensity updates. The principle of RSSA is using an over-approximation of propensities to select a reaction firing. The exact propensity value is evaluated only as needed. Thus, the propensity updates are postponed and collapsed as much as possible. We show through experiments that the propensity updates by our algorithm is significantly reduced, and hence substantially improving the simulation time. Third, we extend our study for reaction-diffusion processes. The simulation should explicitly account the diffusion of species in space. The compartment-based reaction-diffusion simulation is based on dividing the space into subvolumes so that the subvolumes are well-mixed. The diffusion of a species between subvolumes is modelled as an additional unimolecular reaction. We propose a new algorithm, called Rejection-based Reaction Diffusion (RRD), to efficiently simulate such reaction-diffusion systems. RRD combines the tree-based search and the idea of RSSA to select the next reaction firing in a subvolume. The highlight of RRD comparing with previous algorithms is the selection of both the subvolume and the reaction uses only the over-approximation of propensities. We prove the correctness and experimentally show performance improvement of RRD over other compartment-based approaches in literature. Finally, we focus on performing a statistical analysis of the targeted event by stochastic simulation. A direct application of SSA is generating trajectories and then counting the number of the successful ones. Rare events, which occur only with a very small probability, however, make this approach infeasible since a prohibitively large number of trajectories would need to be generated before the estimation becomes reasonably accurate. We propose a new method, called splitting SSA (sSSA), to improve the accuracy and efficiency of stochastic simulation while applying to this problem. Essentially, sSSA is a kind of biased simulation in which it encourages the evolution of the system making the target event more likely, yet in such a way that allows one to recover an unbiased estimated probability. We compare both performance and accuracy for sSSA and SSA by experimenting in some concrete scenarios. Experimental results prevail that sSSA is more efficient than the naive SSA approach. Settore INF/01 - Informatica
144	Formal failure analyses for effective fault management: an aerospace perspective Bittner, Benjamin January 2016 (has links) The possibility of failures is a reality that all modern complex engineering systems need to deal with. In this dissertation we consider two techniques to analyze the nature and impact of faults on system dynamics, which is fundamental to reliably manage them. Timed failure propagation analysis studies how and how fast faults propagate through physical and logical parts of a system. We develop formal techniques to validate and automatically generate representations of such behavior from a more detailed model of the system under analysis. Diagnosability analysis studies the impact of faults on observable parameters and tries to understand whether the presence of faults can be inferred from the observations within a useful time frame. We extend a recently developed framework for specifying diagnosis requirements, develop efficient algorithms to assess diagnosability under a fixed set of observables, and propose an automated technique to select optimal subsets of observables. The techniques have been implemented and evaluated on realistic models and case studies developed in collaboration with engineers from the European Space Agency, demonstrating the practicality of the contributions. Settore INF/01 - Informatica
145	Automatic Population of Structured Knowledge Bases via Natural Language Processing Fossati, Marco January 2017 (has links) The Web has evolved into a huge mine of knowledge carved in different forms, the predominant one still being the free-text document. This motivates the need for Intelligent Web-reading Agents: hypothetically, they would skim through disparate Web sources corpora and generate meaningful structured assertions to fuel Knowledge Bases (KBs). Ultimately, comprehensive KBs, like Wikidata and DBpedia, play a fundamental role to cope with the issue of information overload. On account of such vision, this thesis depicts a set of systems based on Natural Language Processing (NLP), which take as input unstructured or semi-structured information sources and produce machine-readable statements for a target KB. We implement four main research contributions: (1) a one-step methodology for crowdsourcing the Frame Semantics annotation; (2) a NLP technique implementing the above contribution to perform N-ary Relation Extraction from Wikipedia, thus enriching the target KB with properties; (3) a taxonomy learning strategy to produce an intuitive and exhaustive class hierarchy from the Wikipedia category graph, thus augmenting the target KB with classes; (4) a recommender system that leverages a KB network to yield atypical suggestions with detailed explanations, serving as a proof of work for real-world end users. The outcomes are incorporated into the Italian DBpedia chapter, can be queried through its public endpoint, and/or downloaded as standalone data dumps. Settore INF/01 - Informatica
146	Nomos 3: legal compliance of software requirements Ingolfo, Silvia January 2015 (has links) Laws and regulations are increasingly impacting the design and development of software systems, as legislations around the world attempt to control the impact of software on social and private life. Software systems need to be designed from the beginning in a law-aware fashion to ensure compliance with applicable laws. Moreover, they need to evolve over time as new laws pass and existing ones are amended. In this interdisciplinary field many challenges remain open. For any given norm, there are alternative ways to comply with it for a system-to-be. Moreover, revising some requirements or adding new ones can have an important impact on what norms apply. To complicate matters, there is a sizeable knowledge gap between technical and legal experts, and this hampers requirements analysts in dealing with the problem on their own. This thesis proposes to use conceptual models of law and requirements to help requirements engineers address these problems by answering questions such as ``Given this set of requirements, which norms are applicable?'', ``Which norms are complied with?'', ``What are the alternative ways I use to comply with a norm?''. The thesis proposes the Nomos 3 framework that includes a modeling language for law and requirements, reasoning support for Nomos 3 models, as well as a systematic process for establishing compliance. The proposed framework is evaluated by means of illustrative case studies, a scalability study for the reasoning mechanism, as well as other specific studies intended to assess the effectiveness of the proposed concepts, tools, and process. Settore INF/01 - Informatica
147	Desiree - a Refinement Calculus for Requirements Engineering Li , Fenglin January 2016 (has links) The requirements elicited from stakeholders suffer from various afflictions, including informality, incompleteness, ambiguity, vagueness, inconsistencies, and more. It is the task of requirements engineering (RE) processes to derive from these an eligible (formal, complete enough, unambiguous, consistent, measurable, satisfiable, modifiable and traceable) requirements specification that truly captures stakeholder needs. We propose Desiree, a refinement calculus for systematically transforming stakeholder requirements into an eligible specification. The core of the calculus is a rich set of requirements operators that iteratively transform stakeholder requirements by strengthening or weakening them, thereby reducing incompleteness, removing ambiguities and vagueness, eliminating unattainability and conflicts, turning them into an eligible specification. The framework also includes an ontology for modeling and classifying requirements, a description-based language for representing requirements, as well as a systematic method for applying the concepts and operators in order to engineer an eligible specification from stakeholder requirements. In addition, we define the semantics of the requirements concepts and operators, and develop a graphical modeling tool in support of the entire framework. To evaluate our proposal, we have conducted a series of empirical evaluations, including an ontology evaluation by classifying a large public requirements set, a language evaluation by rewriting the large set of requirements using our description-based syntax, a method evaluation through a realistic case study, and an evaluation of the entire framework through three controlled experiments. The results of our evaluations show that our ontology, language, and method are adequate in capturing requirements in practice, and offer strong evidence that with sufficient training, our framework indeed helps people conduct more effective requirements engineering. Settore INF/01 - Informatica
148	Extraction and Exploitation of User Goals and Intentions for Querying and Recommendation Papadimitriou, Dimitra January 2017 (has links) Users are often found in situations where they need to make selections from very large collections of items. These items may be digital artifacts e.g., web pages or forum posts, or digital representations of real world objects, e.g., products or people. There is a great deal of techniques for assisting users in making such selections. However, the plethora of systems and the size of the item collections makes the ability to provide the users with the items that really meet their standards in terms of interestingness and usefulness, a challenging task. We are dealing with the problem of providing items of interest to the users as response to explicit user requests or in the form of recommendations by exploiting a factor that has been poorly investigated so far in information systems: the goals for which items are intended, i.e., the goals for which items have been generated or produced; and the goals that may lead the user to “consume” them, i.e., the goals that s/he is willing to fulfill. The items may not be just items but interactions with items or actions that the user may be interested in performing. In this dissertation, we provide the required background and framework for exploiting goals in building better data managements systems. Within this context, we study three different problems. First, we are dealing with the problem of finding posts of interest (related posts) given a post-query in forums within user communities. Forum posts consist of segments each one serving a different goal that the author had in mind to communicate to the reader through the text. Therefore, plain content comparisons often fail to retrieve posts of interest, or they retrieve posts that despite the similar content are not related to the post-query. Instead, we have developed a goal-aware matching approach that uses content similarity over intention-based segmentations, i.e., over segments that are intended for different communication goals to perform more effective comparisons. Second, we are dealing with the goal-aware recommendation problem. This problem, opposed to the post matching mechanism to which we have referred earlier does not consider domain specific characteristics; thus it can be applied to any domain. The goal-aware mechanisms we have developed handle the diverse goals that the user can fulfill by first recognizing the intended user goals, deciding the priorities among them, and by quantifying the benefit of each item. Last but not least, we are dealing with the problem of building a goal implementation set from texts where users describe how they managed to fulfill certain goals in their real life. We have applied our technique on textual descriptions from a goal-setting site. For each solution we have designed, implemented and extensively evaluated models, algorithms and techniques that deal with all the individual tasks that are required for a goal-aware approach: the identification and extraction of goal-related information in the examined data sources, the modeling of the derived information, the matching of the user's request or previous activity to the goal model elements, and finally the exploitation of this matching into the forming of the system's response. The goal-aware techniques have been found to retrieve items that would not have been considered by the traditional techniques giving to the user a different and more complete view of the item collection. Moreover, the scalability of the techniques and the efficient structures and indexes that we use to store and retrieve the items alongside the goal-related data allows us to meet the requirements of modern online systems. Settore INF/01 - Informatica
149	Technologies for Supporting Social Participation with a focus on intergeneretional Interactions Jara Laconich, Juan José January 2016 (has links) Loneliness increases mortality risk by 50% and is one of the main causes of depression. Several factors like living far away from the family, not being able to move much due to physical problems, or being unable to use communication technologies favor the likeliness of feeling lonely, especially in later life. We propose Lifehsare, a system for intergenerational communications that facilitates connecting people, enabling them to participate in the life of each other either in an active (synchronous interactions) or passive (asynchronous interactions) way. Current proposals for intergenerational communication do not address the problems related to the lack of time to share and lack of topic to talk that young usually have when interacting with their older relatives. Our proposal addresses these problems by implementing a method that requires no effort to share on the side of the young and by automatically enhancing the shared information. Furthermore, our experience with the evaluation of our proposal was translated into design recommendations that extend the current literature on design guidelines for applications for older adults. Settore INF/01 - Informatica
150	Cross-Domain and Cross-Language Porting of Shallow Parsing Stepanov, Evgeny January 2014 (has links) EEnglish was the main focus of attention of the Natural Language Processing (NLP) community for years. As a result, there are significantly more annotated linguistic resources in English than in any other language. Consequently, data-driven tools for automatic text or speech processing are developed mainly for English. Developing similar corpora and tools for other languages is an important issue. However, this requires significant amount of effort. Recently, Statistical Machine Translation (SMT) techniques and parallel corpora were used to transfer annotations from a linguistic resource rich languages to a resource-poor languages for a variety of Natural Language Processing (NLP) tasks, including Part-of-Speech tagging, Noun Phrase chunking, dependency parsing, textual entailment, etc. This cross-language NLP paradigm relies on the solution of the following sub-problems: - Data-driven NLP techniques are very sensitive to the differences in training and testing conditions. Different domains, such as financial news-wire and biomedical publications, have different distributions of NLP task-specific properties; thus, the domain adaptation of the source language tools -- either the development of models with good cross-domain performance or tuned to the target domain -- is critical. - Another difference in training and testing conditions arises with cross-genre applications such as written text (monologues) and spontaneous dialog data. Properties of written text such as punctuation and the notion of sentence are not present in spoken conversation transcriptions. Thus, style-adaptation techniques to cover a wider range of genres is critical as well. - The basis of cross-language porting is parallel corpora. Unfortunately, parallel corpora are scarce. Thus, generation or retrieval of parallel corpora between the languages of interest is important. Additionally, these parallel corpora most often are not in the domains of interest; consequently, the cross-language porting should be augmented with SMT domain adaptation techniques. - The language distance play an important role within the paradigm, since for close family language pairs (e.g. Romance languages Italian and Spanish) the range of linguistic phenomena to consider is significantly less compared to the distant family language pairs (e.g. Italian and Turkish). The developed cross-language techniques should be applicable to both conditions. In this thesis we address these sub-problems on complex Natural Language Processing tasks of Discourse Parsing and Spoken Language Understanding. Both tasks are cast as token-level shallow parsing. Penn Discourse Treebank (PDTB) style discourse parsing is applied cross-domain and we contribute feature-level domain adaptation techniques for the task. Additionally, we explore PDTB-style discourse parsing on dialog data in Italian are report on challenges. The problems of parallel corpora creation, language style adaptation, SMT domain-adaptation and language distance are addressed on the task of cross-language porting of Spoken Language Understanding. This thesis contributes to the task with the language-style and domain adaptation techniques for machine translation of spoken conversations using off-the-shelf systems like Google Translate, SMT systems trained on both out-of-domain and in-domain parallel data. We demonstrate that the techniques are beneficial for both close and distant language pairs. We propose the methodologies for the creation of parallel spoken conversation corpora via professional translation services that considers speech phenomena such as disfluencies. Additionally, we explore the semantic annotation transfer using automatic SMT methods and crowdsourcing. For the later, we propose the computational methodology to obtain acceptable quality corpus without the target language references and the low worker agreement. Settore INF/01 - Informatica

Search results