• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 670
  • 276
  • 82
  • 58
  • 32
  • 14
  • 10
  • 8
  • 7
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • Tagged with
  • 1394
  • 264
  • 220
  • 218
  • 186
  • 148
  • 125
  • 119
  • 105
  • 104
  • 80
  • 79
  • 78
  • 78
  • 73
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
171

Conditional-entropy metrics for feature selection

Bancarz, Iain January 2005 (has links)
We examine the task of feature selection, which is a method of forming simplified descriptions of complex data for use in probabilistic classifiers. Feature selection typically requires a numerical measure or metric of the desirability of a given set of features. The thesis considers a number of existing metrics, with particular attention to those based on entropy and other quantities derived from information theory. A useful new perspective on feature selection is provided by the concepts of partitioning and encoding of data by a feature set. The ideas of partitioning and encoding, together with the theoretical shortcomings of existing metrics, motivate a new class of feature selection metrics based on conditional entropy. The simplest of the new metrics is referred to as expected partition entropy or EPE. Performances of the new and existing metrics are compared by experiments with a simplified form of part-of-speech tagging and with classification of Reuters news stories by topic. In order to conduct the experiments, a new class of accelerated feature selection search algorithms is introduced; a member of this class is found to provide significantly increased speed with minimal loss in performance, as measured by feature selection metrics and accuracy on test data. The comparative performance of existing metrics is also analysed, giving rise to a new general conjecture regarding the wrapper class of metrics. Each wrapper is inherently tied to a specific type of classifier. The experimental results support the idea that a wrapper selects feature sets which perform well in conjunction with its own particular classifier, but this good performance cannot be expected to carry over to other types of model. The new metrics introduced in this thesis prove to have substantial advantages over a representative selection of other feature selection mechanisms: Mutual information, frequency-based cutoff, the Koller-Sahami information loss measure, and two different types of wrapper method. Feature selection using the new metrics easily outperforms other filter-based methods such as mutual information; additionally, our approach attains comparable performance to a wrapper method, but at a fraction of the computational expense. Finally, members of the new class of metrics succeed in a case where the Koller-Sahami metric fails to provide a meaningful criterion for feature selection.
172

Sample entropy and random forests a methodology for anomaly-based intrusion detection and classification of low-bandwidth malware attacks

Hyla, Bret M. 09 1900 (has links)
Sample Entropy examines changes in the normal distribution of network traffic to identify anomalies. Normalized Information examines the overall probability distribution in a data set. Random Forests is a supervised learning algorithm which is efficient at classifying highlyimbalanced data. Anomalies are exceedingly rare compared to the overall volume of network traffic. The combination of these methods enables low-bandwidth anomalies to easily be identified in high-bandwidth network traffic. Using only low-dimensional network information allows for near real-time identification of anomalies. The data set was collected from 1999 DARPA intrusion detection evaluation data set. The experiments compare a baseline f-score to the observed entropy and normalized information of the network. Anomalies that are disguised in network flow analysis were detected. Random Forests prove to be capable of classifying anomalies using the sample entropy and normalized information. Our experiment divided the data set into five-minute time slices and found that sample entropy and normalized information metrics were successful in classifying bad traffic with a recall of .99 and a f-score .50 which was 185% better than our baseline.
173

DAY FOLDER

Martyn, Raewyn 24 April 2013 (has links)
Provisional or unfinished images, forms and actions can sustain their status by continuing to change. This can resist programmed experience of their state, and shift their relationship as images within time. The sub-aesthetics of the unfinished and entropic can alter our understanding of where and how images are formed and located within time. My paintings each exist within their own emergent systems of time, structure and productive disorder. This thesis discusses these ideas in relation to DAY FOLDER and other work made during my MFA studies.
174

Komprese pseudonáhodných posloupností / Compressing Pseudorandom sequences

Vald, Denis January 2011 (has links)
Generators of pseudorandom sequences are widely used objects, not in the least place because of their application in stream ciphers. One of the ways to improve resistance to different types of attack is to use compression on the generated sequence in order to remove redundant information, that might lead to an attack against the generator. In this work we try to explore from a wider perspective the theoretical foundations for compressing pseudorandom sequences created thus far. Using this general view we will examine some known attacks against the PRN generators and look for a way to resist such attacks.
175

An implementation methodology and software tool for an entropy based engineering model for evolving systems

Behnke, Matthew J. 06 1900 (has links)
Approved for public release; distribution is unlimited. / This thesis presents a practical method for calculating and representing entropy-based metrics for a set of bibliographic records evolving over time, in support of Dr. Michael Saboe's dissertation research which addressed the ability to measure software technology transfer. The implementation of the analysis methodology for determining the information-temperature of evolving datasets containing bibliographic records is described. The information-temperature metric is based on information entropy and is used to relate the maximum complexity of a system to the current complexity. The implementation of the analysis methodology required using data mining techniques to prepare the datasets. Additionally, since the information-temperature metric derived from Saboe's work was a new emerging concept, the data analysis methodology had to be refined several times in order to obtain the desired results. An iterative software development paradigm was used to write the application in 3 iterations using Visual Basic. At the end of the implementation the data analysis process became systemized allowing the outlining of the steps to compute the temperature of datasets, and it is estimated that the learning curve of the analysis can be reduced by 50 percent through integration and packing of the analysis functions into a stand-alone application with an intuitive user interface. / Civilian, United States Army
176

Entropy in Two American Road Narratives

Deskin, Sean 17 December 2010 (has links)
Tony Tanner's book City of Words analyzes American literature from 1950-1970; in the chapter entitled "Everything Running Down" the theme of entropy, the second law of thermodynamics, is explored and revealed to be a common motif within many works of American literature. Tanner's analysis does not specifically address the presence of entropy within the genre of the American road narrative; when considering his analysis presented in "Everything Running Down" with Kris Lackey's analysis of American road narratives presented in his book RoadFrames, the presence of entropy and how it is applied within the American road narrative becomes apparent. Although Jack Kerouac's On the Road and Cormac McCarthy's The Road were published over sixty years apart from one another and are seemingly disparate texts, these two texts reveal the thematic use of entropy which connects them in an ongoing dialogue within the genre of the American road narrative.
177

Development of a data processing toolkit for the analysis of next-generation sequencing data generated using the primer ID approach

Labuschagne, Jan Phillipus Lourens January 2018 (has links)
Philosophiae Doctor - PhD / Sequencing an HIV quasispecies with next generation sequencing technologies yields a dataset with significant amplification bias and errors resulting from both the PCR and sequencing steps. Both the amplification bias and sequencing error can be reduced by labelling each cDNA (generated during the reverse transcription of the viral RNA to DNA prior to PCR) with a random sequence tag called a Primer ID (PID). Processing PID data requires additional computational steps, presenting a barrier to the uptake of this method. MotifBinner is an R package designed to handle PID data with a focus on resolving potential problems in the dataset. MotifBinner groups sequences into bins by their PID tags, identifies and removes false unique bins, produced from sequencing errors in the PID tags, as well as removing outlier sequences from within a bin. MotifBinner produces a consensus sequence for each bin, as well as a detailed report for the dataset, detailing the number of sequences per bin, the number of outlying sequences per bin, rates of chimerism, the number of degenerate letters in the final consensus sequences and the most divergent consensus sequences (potential contaminants). We characterized the ability of the PID approach to reduce the effect of sequencing error, to detect minority variants in viral quasispecies and to reduce the rates of PCR induced recombination. We produced reference samples with known variants at known frequencies to study the effectiveness of increasing PCR elongation time, decreasing the number of PCR cycles, and sample partitioning, by means of dPCR (droplet PCR), on PCR induced recombination. After sequencing these artificial samples with the PID approach, each consensus sequence was compared to the known variants. There are complex relationships between the sample preparation protocol and the characteristics of the resulting dataset. We produce a set of recommendations that can be used to inform sample preparation that is the most useful the particular study. The AMP trial infuses HIV-negative patients with the VRC01 antibody and monitors for HIV infections. Accurately timing the infection event and reconstructing the founder viruses of these infections are critical for relating infection risk to antibody titer and homology between the founder virus and antibody binding sites. Dr. Paul Edlefsen at the Fred Hutch Cancer Research Institute developed a pipeline that performs infection timing and founder reconstruction. Here, we document a portion of the pipeline, produce detailed tests for that portion of the pipeline and investigate the robustness of some of the tools used in the pipeline to violations of their assumptions.
178

Entropia aplicada ao reconhecimento de padrões em imagens / Entropy applied to pattern recognition in images

Assirati, Lucas 23 July 2014 (has links)
Este trabalho faz um estudo do uso da entropia como ferramenta para o reconhecimento de padrões em imagens. A entropia é um conceito utilizado em termodinâmica para medir o grau de organização de um meio. Entretanto, este conceito pode ser ampliado para outras áreas do conhecimento. A adoção do conceito em Teoria da Informação e, por consequência, em reconhecimento de padrões foi introduzida por Shannon no trabalho intitulado \"A Mathematical Theory of Communication\", publicado no ano de 1948. Neste mestrado, além da entropia clássica de Boltzman-Gibbs-Shannon, são investigadas a entropia generalizada de Tsallis e suas variantes (análise multi-escala, múltiplo índice q e seleção de atributos), aplicadas ao reconhecimento de padrões em imagens. Utilizando bases de dados bem conhecidas na literatura, realizou-se estudos comparativos entre as técnicas. Os resultados mostram que a entropia de Tsallis, através de análise multi-escala e múltiplo índice q, tem grande vantagem sobre a entropia de Boltzman-Gibbs-Shannon. Aplicações práticas deste estudo são propostas com o intuito de demonstrar o potencial do método. / This work studies the use of entropy as a tool for pattern recognition in images. Entropy is a concept used in thermodynamics to measure the degree of organization of a system. However, this concept can be extended to other areas of knowledge. The adoption of the concept in information theory and, consequently, in pattern recognition was introduced by Shannon in the paper entitled \"A Mathematical Theory of Communication\", published in 1948. In this master thesis, the classical Boltzmann-Gibbs-Shannon entropy, the generalized Tsallis entropy and its variants (multi-scale analysis, multiple q index, and feature selection) are studied, applied to pattern recognition in images. Using well known databases, we performed comparative studies between the techniques. The results show that the Tsallis entropy, through multiscale analysis and multiple q index has a great advantage over the classical Boltzmann-Gibbs- Shannon entropy. Practical applications of this study are proposed in order to demonstrate the potential of the method.
179

Estudo in silico de centros geradores de padrão: arquiteturas mínimas de funcionamento e fluxo interno de informação / In silico study of central pattern generators: minimal architectures for operation and internal information flux

Santos, Breno Teixeira 26 April 2013 (has links)
O estudo dos centros geradores de padrão, CPGs, ´e um excelente exemplo das limitações do método reducionista, na tentativa de explicar um comportamento de ordem mais global. Não queremos, com isso, relegar a descrição esmiuçada dos mecanismos biofísicos e moleculares ao ostracismo. Muito pelo contrário, iremos nos apropriar de um subconjunto desses conceitos, na forma do modelo de Hodgkin & Huxley, para construir um sistema de simulação computacional de redes neurais, em pequena escala, passível de realizar duas métricas. Uma destinada a medir a complexidade da geração de informação circulante interna a rede, enquanto a outra traz dados relativos ao consumo energético das células neurais. Espera-se, com isso, alguma resposta para a seguinte questão: existe algum mecanismo, algum princípio básico em redes que oscilam, capaz de mapear um mínimo de uma grandeza física externa em algum outro mínimo interno a rede? Ao que tudo indica a resposta é afirmativa. Apresentaremos um tal ponto de minimização, juntamente com um formalismo, ainda em desenvolvimento, que justifica os resultados / The study of central pattern generators is a great example of the limitations in a reductionist approach, to achieve global knowledge about a system. We are not neglecting the importance of biophysical and molecular mechanisms. Quite the contrary, we will apply some of this concepts by means of Hodgkin & Huxley formalism, to build up a small form factor neural network software simulator. This platform will be able to perform two measurements, informational complexity and metabolic consumption with the aim of answer the question: is there some mechanism, some basic principle in oscillatory networks, capable of mapping a minimum in an external physical quantity into another minimum internal to the network? It seems that the answer is affirmative. We will present this minimization point, together with an under development formalism, to embase the results
180

Aspectos da transmissão e processamento de informação em canto de anuros / Features of Transmition and processing of information in anuran vocalization

Rodrigues, Vitor Hugo 04 March 2009 (has links)
Os sistemas biológicos interagem com o ambiente em que vivem de muitas formas, e uma das principais formas na qual essa interação se dá é pela transmissão de mensagens, de diversas maneiras. A transmissão de mensagens por vias sonoras, em animais, é uma questão muito abordada em muitos estudos por ser uma importante via de comunicação utilizada por diversos tipos de sistemas biológicos. Este trabalho tem como objetivo verificar, pelo viés da Teoria da Informação de Shannon, se através da organização das séries temporais referentes à comunicação é possível observar a diminuição de geração de entropia no sistema nervoso referente ao processamento dos sinais sonoros de comunicação. Para tanto, utilizou-se, como sistemas-modelo, três espécies de anuros, Scinax perpusillus e Hypsiboas faber e Hypsiboas pardalis. Ferramentas de análise não-lineares, como a Entropia Aproximada Volumétrica, Análise de Potência Espectral e Plot de Poincaré, foram utilizados para caracterizar a variabilidade tanto do sinal emitido por um único animal quanto dos sinais trocados entre indivíduos da mesma espécie. Foi verificado que as séries temporais analisadas são altamente variáveis, com os intervalos entre vocalizações sendo descritos por distribuições de Lévy. As séries temporais relativas à interação entre indivíduos da mesma espécie apresentam uma alta variabilidade, porém menor que as individuais. Devido ao controle neural exercido na vocalização em anuros, é possível que a variabilidade apresentada pelas séries otimize o processo de emissão sonora, diminuindo a sobreposição de vocalizações entre indivíduos. A diminuição da variabilidade da interação entre indivíduos pode ser um indicativo da diminuição da geração de entropia no processo de comunicação destes animais. / Biological systems interact with their environment in many forms. The way these interactions occur can be assigned, ultimately, as the transmission of messages. Transmission of messages by sound waves, in animals, is a well studied subject because they are an important way of communication used by many groups. The objective of the present work is to verify whether the central nervous system presents entropy generation minimization in processing communicating sound signals. The approach employed is derived from Shannons Information Theory. The temporal organization of the time intervals between successive vocalizations was the subject of analysis. Recordings of three anuran species, Scinax perpusillus, Hypsiboas faber and Hypsiboas pardalis were used. Non-linear analyses were performed by tools as Volumetric Approximated Entropy, Power Spectrum Analysis and Poincaré Plot in order to address the variability of the signals emitted by a single animal, as well as the signals exchanged among individuals of the same species. The time-series data analyzed present highly variability and can be adequately described by Lévys distributions. The time-series data of the interactions among individuals also present high variability, however such a variability is not as high as the variability of single individuals. The results suggest that the neural system works in a way that reduces the probability of vocalizations overlapping. The reduction in variability of the interactions among individuals is, perhaps, linked to a minimization of entropy generation in the communication process of these animals.

Page generated in 0.0645 seconds