Spelling suggestions: "subject:"[een] SIMILARITY"" "subject:"[enn] SIMILARITY""
531 |
Detecting opinion spam and fake news using n-gram analysis and semantic similarityAhmed, Hadeer 14 November 2017 (has links)
In recent years, deceptive contents such as fake news and fake reviews, also known as opinion spams, have increasingly become a dangerous prospect, for online users. Fake reviews affect consumers and stores a like. Furthermore, the problem of fake news has gained attention in 2016, especially in the aftermath of the last US presidential election. Fake reviews and fake news are a closely related phenomenon as both consist of writing and spreading false information or beliefs. The opinion spam problem was formulated for the first time a few years ago, but it has quickly become a growing research area due to the abundance of user-generated content. It is now easy for anyone to either write fake reviews or write fake news on the web. The biggest challenge is the lack of an efficient way to tell the difference between a real review or a fake one; even humans are often unable to tell the difference. In this thesis, we have developed an n-gram model to detect automatically fake contents with a focus on fake reviews and fake news. We studied and compared two different features extraction techniques and six machine learning classification techniques. Furthermore, we investigated the impact of keystroke features on the accuracy of the n-gram model. We also applied semantic similarity metrics to detect near-duplicated content. Experimental evaluation of the proposed using existing public datasets and a newly introduced fake news dataset introduced indicate improved performances compared to state of the art. / Graduate
|
532 |
Amélioration de la maîtrise des risques dans les projets par l'utilisation des mécanismes de retour d'expérience / Improving risk management in projects using lesson learning mechanismsManotas Niño, Vanessa Patricia 26 September 2017 (has links)
Pour améliorer l’analyse des risques dans les projets et renforcer son efficacité, les gestionnaires de projet devraient réutiliser les expériences et les bonnes pratiques acquises au cours de projets antérieurs. Le retour d’expérience constitue une source de connaissances importante pour réduire les niveaux d’incertitude et donc les risques dans les projets. Les méthodologies de retour d’expériences sont ainsi devenues réellement reconnues dans de nombreuses entreprises. Toutefois, ces entreprises se sont souvent contentées de recueillir des informations en fin de projet en pensant que cela suffirait à générer les connaissances nécessaires pour améliorer leur performance. Malheureusement, la capitalisation de ces expériences est traditionnellement une étape statique à la clôture du projet qui permet simplement la capture de quelques événements mémorisés par les experts impliqués. En outre, l’information capitalisée est difficile à réutiliser directement dans une nouvelle analyse des risques. Nos efforts se sont concentrés sur l’élaboration d’une méthode ayant pour but d’améliorer le processus de gestion des risques dans les projets en utilisant un système de retour d’expérience et, ainsi, de contribuer à une logique d’amélioration continue. La méthode envisagée est basée sur un système d’exploitation des connaissances qui permet de développer des compétences clés comme : la résolution des problèmes, la capacité à prendre des décisions collectivement, la réflexion, l’apprentissage et la capacité prospective (envisager). Le modèle proposé est défini sur la base de l’étude de la contribution de trois domaines centraux : le Management de Projets, le Management des Risques et le retour d’expérience. La singularité de nos travaux réside dans le fait d’intégrer explicitement un mécanisme de retour d’expérience en continu afin d’améliorer la performance des processus de gestion des risques dans les projets. Nous proposons de mettre en oeuvre une démarche de modélisation des connaissances orientée retour d’expérience. Nous définissons un modèle permettant de caractériser les projets, les risques et les expériences en vue du repérage, de la capitalisation et de l’exploitation des connaissances. La modélisation de ces éléments permet également de disposer d’une structuration facilitant une lecture plus rapide du projet ou de ses processus. Nous avons donc développé un modèle qui permet de représenter les éléments clé utilisés lors du processus de management des risques et ainsi de faciliter la capitalisation les expériences et la recherche des expériences antérieures similaires. Cela permet ensuite de standardiser et d’améliorer la démarche de management des risques. Du point de vue de la recherche des expériences antérieures, nous définissons : D’une part, un algorithme de recherche des expériences similaires fondé sur les comparaisons entre des graphes étiquetés orientés. Cela consiste en l’établissement d’une correspondance entre deux graphes (le graphe 1 étant le contexte du projet actuel et le graphe 2 le contexte d’un projet passé enregistré sous forme d’expérience) par une factorisation par paire de la matrice d’affinité qui découple la structure des noeuds et des arcs similaires (Zhou et De la Torre, 2012). D’autre part, un algorithme pour trouver la correspondance optimale entre ces graphes, de sorte que la somme de la compatibilité des noeuds et des arcs soit maximisée. Pour ce faire, nous avons utilisé un algorithme génétique. Enfin, nous proposons une démarche d’exploitation des expériences passées similaires. De cette manière nous pouvons obtenir un ensemble de risques associés à ces objets similaires afin d’alimenter le système d’aide à la décision dans la gestion du projet. / In order to improve Project Risk Management and to reinforce its effectiveness, the managers of the project should reuse the experiences and the good practices acquired in previous projects. The Experience Management constitutes an important source of knowledge to reduce the levels of uncertainty and thus the risks in the projects. The methodologies for Experience Management have thus become truly recognized in many companies. However, these companies were often satisfied to collect information at the end of the project by thinking that would be enough to generate knowledge necessary to improve their performance. Unfortunately, the capitalization of these experiments is traditionally a static stage at the close of the project, which simply allows the capture of some events memorized by the implied experts. Moreover, capitalized information is difficult to reuse directly in a new-risk analysis. Our efforts focused on developing a methodology to improve the Risk Management Process in Projects using an Experience Management System and thus contribute to the logic of continuous improvement. The proposed approach is based on a knowledge-based system that enables the development of key competencies such as problem solving, collective decision-making, reflection, learning and prospective capacity. The proposed model is based on the study of the contribution of three central fields: Projects Management, Risks Management and the Experience Management. The singularity of our work lies in the fact of integrating a mechanism back from experiment explicitly uninterrupted in order to improve the performance of the processes of risk management in the projects. We propose to implement a knowledge-based modeling approach based on Experience management. We define a model to characterize projects, risks and experiences for the identification, capitalization and exploitation of knowledge. The modeling of these elements also makes it possible to have a structuring facilitating a faster reading of the project or its processes. Therefore, we have developed a model which allows us to represent the key elements used during the risk management process and thus to facilitate the capitalization of the experiments and the research of previous similar experiences. That then makes it possible to standardize and improve the risk management approach. From the point of view of the search for previous experiences, we define: On the one hand, an algorithm of research of similar experiments founded on the comparisons between directed labeled graphs. That consists of the establishment of a correspondence between two graphs (graph 1 being context of the current project and graph 2 context of the last project recorded in the form of experiment) by a pairwise factorization of the affinity matrix, which decouples the structure of nodes and similar arcs (Zhou and Of Torre, 2012). On the other hand, an algorithm to find the optimal correspondence between these graphs, so that the sum of the compatibility of the nodes and the arcs is maximized. With this intention, we used a genetic algorithm. Lastly, we propose an approach of exploitation of similar last experiences. In this manner, we can obtain a set of risks associated with these similar objects in order to feed the assistance system with the decision in the management of the project.
|
533 |
An Application of Dimension Reduction for Intention Groups in RedditSun, Xuebo, Wang, Yudan January 2016 (has links)
Reddit (www.reddit.com) is a social news platform for information sharing and exchanging. The amount of data, in terms of both observations and dimensions is enormous because a large number of users express all aspects of knowledge in their own lives by publishing the comments. While it’s easy for a human being to understand the Reddit comments on an individual basis, it is a tremendous challenge to use a computer and extract insights from it. In this thesis, we seek one algorithmic driven approach to analyze both the unique Reddit data structure and the relations inside owners of comments by their similar features. We explore the various types of communications between two people with common characteristics and build a special communication model that characterizes the potential relationship between two users via their communication messages. We then seek a dimensionality reduction methodology that can merge users with similar behavior into same groups. Along the process, we develop computer program to collect data, define attributes based on the communication model and apply a rule-based group merging algorithm. We then evaluate the results to show the effectiveness of this methodology. Our results show reasonable success in producing user groups that have recognizable group characteristics and share similar intentions.
|
534 |
Synthesis Techniques for Sub-threshold Leakage and NBTI Optimization in Digital VLSI SystemsPendyala, Shilpa 19 November 2015 (has links)
The rising power demands and cost motivates us to explore low power solutions in electronics. In nanometer Complementary Metal Oxide Semiconductor (CMOS) processes with low threshold voltages and thin gate oxides, subthreshold leakage power dominates total power of a circuit. As technology scales, Negative Bias Temperature Instability (NBTI) emerged as a major limiting reliability mechanism. It causes a threshold voltage shift which, over time, results in circuit performance degradation. Hence, leakage power and NBTI degradation are two key challenges in deep sub micron regime.
In this dissertation, interval arithmetic based interval propagation technique is introduced as an effective leakage optimization technique in high level circuits with little overhead. The concept of self similarity from fractal theory is adopted for the first time in VLSI research to handle large design space. Though there are some leakage and NBTI co-optimization techniques in literature, our vector cycling approach combined with a back tracking algorithm have achieved better results for ISCAS85 benchmarks. We did not find any previous research works on NBTI optimization of finite state machines (FSMs). The optimization techniques of NBTI optimization in FSMs is introduced in this dissertation as well and substantial NBTI optimization is reported.
Input vector control has been shown to be an effective technique to minimize subthreshold leakage. Applying appropriate minimum leakage vector (MLV) to each register transfer level (RTL) module instance results in a low leakage state with significant area overhead. For each module, via Monte Carlo simulation, we identify a set of MLV intervals such that maximum leakage is within (say) 10% of the lowest leakage points. As the module bit width increases, exhaustive simulation to find the low leakage vector is not feasible. Further, we need to search the entire input space uniformly to obtain as many low leakage intervals as possible. Based on empirical observations, we observed self similarity in the leakage distribution of adder/multiplier modules when input space is partitioned into smaller cells. This property enables uniform search of low leakage vectors in the entire input space. Also, the time taken for characterization increases linearly with the module size. Hence, this technique is scalable to higher bit width modules with acceptable characterization time. We can reduce area overhead (in some cases to 0) by choosing Primary Input (PI) MLVs such that resultant inputs to internal nodes are also MLVs. Otherwise, control points can be inserted. Based on interval arithmetic, given a DFG, we propose a heuristic with several variations for PI MLV identification with minimal control points. Experimental results for DSP filters simulated in 16nm technology demonstrated leakage savings of 93.8% with no area overhead, compared to existing work.
Input vector control can also be adopted to reduce NBTI degradation as well as leakage in CMOS circuits. In the prior work, it is shown that minimum leakage vector of a circuit is not necessarily NBTI friendly. In order to achieve NBTI and leakage co-optimization, we propose an input vector cycling technique which applies different sub-optimal low leakage vectors to primary inputs at regular intervals. A co-optimal input vector for a given circuit is obtained by using simulated annealing (SA) technique. For a given input vector, a set of critical path PMOS transistors are under stress. A second input vector is obtained using a back tracking algorithm such that most of the critical path PMOS transistors are put in recovery mode. When a co-optimized input vector is assigned to primary input, critical path nodes under stress with high delay contribution are set to recovery. Logic 1 is back propagated from the nodes to the primary inputs to obtain the second input vector. These two vectors are alternated at regular time intervals. The total stress is evenly distributed among transistor sets of two vectors, as the intersection of the two sets is minimized. Hence, the overall stress on critical path transistors is alleviated, thereby reducing the NBTI delay degradation. For ISCAS85 benchmarks, an average of 5.3% improvement is achieved in performance degradation at 3.3% leakage overhead with NBTI-leakage co-optimization with a back tracking algorithm compared to solely using co-optimization. A 10.5% average NBTI improvement is obtained when compared to circuit with minimum leakage input vector for 18% average leakage overhead. Also, an average NBTI improvement of 2.13% is obtained with 6.77% leakage improvement when compared to circuit with minimum NBTI vector. Vector cycling is shown to be more effective in mitigating NBTI over input vector control.
Several works in the literature have proposed optimal state encoding techniques for delay, leakage, and dynamic power optimization. In this work, we propose, for the first time, NBTI optimization based on state code optimization. We propose a SA based state code assignment algorithm, resulting in minimization of NBTI degradation in the synthesized circuit. A PMOS transistor when switched ON for a long period of time, will lead to delay degradation due to NBTI. Therefore, in combinational circuits, an NBTI friendly input vector that stresses the least number of PMOS transistors on the critical path can be applied. For sequential circuits, the state code can significantly influence the ON/OFF mode of PMOS transistors in the controller implementation. Therefore, we propose to focus on state encoding. As the problem is computational intractable, we will focus on encoding states with high state probability. The following SA moves are employed: (a) code swap; and (b) code modification by flipping bits. Experiments with LGSYNTH93 benchmarks resulted in 18.6% improvement in NBTI degradation on average with area and power improvements of 5.5% and 4.6% respectively.
|
535 |
Ähnlichkeitsmessung von ausgewählten Datentypen in Datenbanksystemen zur Berechnung des Grades der AnonymisierungHeinrich, Jan-Philipp, Neise, Carsten, Müller, Andreas 21 February 2018 (has links) (PDF)
Es soll ein mathematisches Modell zur Berechnung von Abweichungen verschiedener Datentypen auf relationalen Datenbanksystemen eingeführt und getestet werden. Basis dieses Modells sind Ähnlichkeitsmessungen für verschiedene Datentypen.
Hierbei führen wir zunächst eine Betrachtung der relevanten Datentypen für die Arbeit durch. Danach definieren wir für die für diese Arbeit relevanten Datentypen eine Algebra, welche die Grundlage zur Berechnung des Anonymisierungsgrades θ ist.
Das Modell soll zur Messung des Grades der Anonymisierung, vor allem personenbezogener Daten, zwischen Test- und Produktionsdaten angewendet werden. Diese Messung ist im Zuge der Einführung der EU-DSGVO im Mai 2018 sinnvoll, und soll helfen personenbezogene Daten mit einem hohen Ähnlichkeitsgrad zu identifizieren.
|
536 |
An experimental investigation of the relation between learning and separability in spatial representationsEriksson, Louise January 2001 (has links)
One way of modeling human knowledge is by using multidimensional spaces, in which an object is represented as a point in the space, and the distances among the points reflect the similarities among the represented objects. The distances are measured with some metric, commonly some instance of the Minkowski metric. The instances differ with the magnitude of the so-called r-parameter. The instances most commonly mentioned in the literature are the ones where r equals 1, 2 and infinity. Cognitive scientists have found out that different metrics are suited to describe different dimensional combinations. From these findings an important distinction between integral and separable dimensions has been stated (Garner, 1974). Separable dimensions, e.g. size and form, are best described by the city-block metric, where r equals 1, and integral dimensions, such as the color dimensions, are best described by the Euclidean metric, where r equals 2. Developmental psychologists have formulated a hypothesis saying that small children perceive many dimensional combinations as integral whereas adults perceive the same combinations as separable. Thus, there seems to be a shift towards increasing separability with age or maturity. Earlier experiments show the same phenomenon in adult short-term learning with novel stimuli. In these experiments, the stimuli were first perceived as rather integral and were then turning more separable, indicated by the Minkowski-r. This indicates a shift towards increasing separability with familiarity or skill. This dissertation aims at investigating the generality of this phenomenon. Five similarity-rating experiments are conducted, for which the best fitting metric for the first half of the session is compared to the last half of the session. If the Minkowski-r is lower for the last half compared to the first half, it is considered to indicate increasing separability. The conclusion is that the phenomenon of increasing separability during short-term learning cannot be found in these experiments, at least not given the operational definition of increasing separability as a function of a decreasing Minkowski-r. An alternative definition of increasing separability is suggested, where an r-value ‘retreating’ 2.0 indicates increasing separability, i.e. when the r-value of the best fitting metric for the last half of a similarity-rating session is further away from 2.0 compared to the first half of the session.
|
537 |
Grouping Biological DataRundqvist, David January 2006 (has links)
Today, scientists in various biomedical fields rely on biological data sources in their research. Large amounts of information concerning, for instance, genes, proteins and diseases are publicly available on the internet, and are used daily for acquiring knowledge. Typically, biological data is spread across multiple sources, which has led to heterogeneity and redundancy. The current thesis suggests grouping as one way of computationally managing biological data. A conceptual model for this purpose is presented, which takes properties specific for biological data into account. The model defines sub-tasks and key issues where multiple solutions are possible, and describes what approaches for these that have been used in earlier work. Further, an implementation of this model is described, as well as test cases which show that the model is indeed useful. Since the use of ontologies is relatively new in the management of biological data, the main focus of the thesis is on how semantic similarity of ontological annotations can be used for grouping. The results of the test cases show for example that the implementation of the model, using Gene Ontology, is capable of producing groups of data entries with similar molecular functions.
|
538 |
Combinational Watermarking for Medical ImagesChakravarthy Chinna Narayana Swamy, Thrilok 01 January 2015 (has links)
Digitization of medical data has become a very important part of the modern healthcare system. Data can be transmitted easily at any time to anywhere in the world using Internet to get the best diagnosis possible for a patient. This digitized medical data must be protected at all times to preserve the doctor-patient confidentiality. Watermarking can be used as an effective tool to achieve this.
In this research project, image watermarking is performed both in the spatial domain and the frequency domain to embed a shared image with medical image data and the patient data which would include the patient identification number.
For the proposed system, Structural Similarity (SSIM) is used as an index to measure the quality of the watermarking process instead of Peak Signal to Noise Ratio (PSNR) since SSIM takes into account the visual perception of the images compared to PSNR which uses the intensity levels to measure the quality of the watermarking process. The system response under ideal conditions as well as under the influence of noise were measured and the results were analyzed.
|
539 |
Detección de Anomalías en Procesos Industriales Usando Modelos Basados en SimilitudLeón Olivares, Alejandro Samir January 2012 (has links)
La detección de anomalías en procesos industriales es un tema de alto impacto que ha sido analizado y estudiado en diversas áreas de la ingeniería e investigación. La mayor parte de los métodos de detección actualmente disponibles posibilitan el estudio de las irregularidades encontradas en el historial de un proceso, ayudando a extraer información significativa (y a veces crítica) en una amplia variedad de aplicaciones, y convirtiéndose de este modo en parta fundamental e integral de esquemas de reducción de costos tanto humanos como económicos en la industria contemporánea.
El objetivo general de este trabajo es desarrollar e implementar un enfoque modular de detección de anomalías, aplicable a procesos industriales multivariados y fundado en el análisis de residuos generados a partir de modelos no paramétricos basados en similitud (similarity-based modeling, SBM). La herramienta consiste principalmente de un sistema de generación automática de modelos SBM, una metodología de estudio de eventos y un algoritmo estadístico de detección.
El trabajo realizado se enmarca en un proyecto de colaboración conjunta entre las empresas CONTAC, INGENIEROS LTDA. y ENDESA-CHILE. Gracias a esto, ha sido posible evaluar el sistema propuesto utilizando datos de operación correspondientes a una central termoeléctrica de ciclo combinado perteneciente a la segunda empresa ya mencionada.
Las comparaciones en cuanto al desempeño del sistema de modelación implementado permiten concluir que el algoritmo es capaz de generar una representación más apropiada del proceso, basado en el error obtenido con la técnica de modelación SBM, la cual es cercana al 25% del error obtenido con la técnica de modelación lineal en los parámetros.
Además, la metodología de estudio de eventos permite detectar correctamente las variables que no aportan al objetivo de detección de un evento en particular, así como también identifica las variables más significativas para lograr tal propósito, reduciendo el número de variables analizadas y con esto, los requerimientos de cómputo de operación en línea.
La validación de los resultados entregados por el método de detección de anomalías desarrollado, permite aseverar que la utilización de modelos no-paramétricos tipo SBM, en combinación con la metodología de estudio de eventos y el algoritmo estadístico de detección, resultan eficaces a la hora de generar alarmas y detectar las anomalías estudiadas.
|
540 |
Comparison and Tracking Methods for Interactive Visualization of Topological Structures in Scalar FieldsSaikia, Himangshu January 2017 (has links)
Scalar fields occur quite commonly in several application areas in both static and time-dependent forms. Hence a proper visualization of scalar fieldsneeds to be equipped with tools to extract and focus on important features of the data. Similarity detection and pattern search techniques in scalar fields present a useful way of visualizing important features in the data. This is done by isolating these features and visualizing them independently or show all similar patterns that arise from a given search pattern. Topological features are ideal for this purpose of isolating meaningful patterns in the data set and creating intuitive feature descriptors. The Merge Tree is one such topological feature which has characteristics ideally suited for this purpose. Subtrees of merge trees segment the data into hierarchical regions which are topologically defined. This kind of feature-based segmentation is more intelligent than pure data based segmentations involving windows or bounding volumes. In this thesis, we explore several different techniques using subtrees of merge trees as features in scalar field data. Firstly, we begin with a discussion on static scalar fields and devise techniques to compare features - topologically segmented regions given by the subtrees of the merge tree - against each other. Second, we delve into time-dependent scalar fields and extend the idea of feature comparison to spatio-temporal features. In this process, we also come up with a novel approach to track features in time-dependent data considering the entire global network of likely feature associations between consecutive time steps.The highlight of this thesis is the interactivity that is enabled using these feature-based techniques by the real-time computation speed of our algorithms. Our techniques are implemented in an open-source visualization framework Inviwo and are published in several peer-reviewed conferences and journals. / <p>QC 20171020</p>
|
Page generated in 0.0625 seconds