Global ETD Search

71	Assessing, Modifying, and Combining Data Fields from the Virginia Office of the Chief Medical Examiner (OCME) Dataset and the Virginia Department of Forensic Science (DFS) Datasets in Order to Compare Concentrations of Selected Drugs Herrin, Amy Elizabeth 01 January 2006 (has links) The Medical Examiner of Virginia (ME) dataset and the Virginia Department of Forensic Science Driving Under the Influence of Drugs (DUI) datasets were used to determine whether people have the potential to develop tolerances to diphenhydramine, cocaine, oxycodone, hydrocodone, methadone, and morphine. These datasets included the years 2000-2004 and were used to compare the concentrations of these six drugs between people who died from a drug-related cause of death (of the drug of interest) and people who were pulled over for driving under the influence. Three drug pattern groups were created to divide each of the six drug-specific datasets in order to compare concentrations between individuals with the drug alone, the drug and ethanol, or a poly pharmacy of drugs (multiple drugs). An ANOVA model was used to determine if there was an interaction effect between the source dataset (ME or DUI) and the drug pattern groups. For diphenhydramine and cocaine, an interaction was statistically significant, but for the other drugs, it was not significant. The other four drug-specific datasets showed that the DUI and ME were statistically significantly different from each other, and all of those datasets except for methadone showed that there was a statistically significant difference between at least two drug pattern groups. Showing that all of these datasets showed differences between the ME and DUI datasets did not provide sufficient evidence to suggest the development of tolerances to each of the six drugs. One exception was with methadone because there were 14 individuals that had what is defined as a "clinical 'lethal' blood concentration". These individuals provide some evidence for the possibility of developing tolerances.The main outcomes of this study include suggesting changes to make to the ME datasets and the DUI datasets with regard to the way data is kept and collected. Several problems with the fields of these datasets arose before beginning the analysis and had to be corrected. Some of the changes suggested are currently being considered at the Virginia Office of the Chief Medical Examiner as they are beginning to restructure their database. methadone database Chief Medical Examiner datasets Virginia Department of Forensic Science oxycodone hydrocodone Biostatistics Physical Sciences and Mathematics Statistics and Probability
72	Geração de imagens artificiais e quantização aplicadas a problemas de classificação / Artificial images generation and quantization applied to classification problems Thumé, Gabriela Salvador 29 April 2016 (has links) Cada imagem pode ser representada como uma combinação de diversas características, como por exemplo o histograma de intensidades de cor ou propriedades de textura da imagem. Essas características compõem um vetor multidimensional que representa a imagem. É comum esse vetor ser dado como entrada para um método de classificação de padrões que, após aprender por meio de diversos exemplos, pode gerar um modelo de decisão. Estudos sugerem evidências de que a preparação das imagens-- por meio da especificação cuidadosa da aquisição, pré-processamento e segmentação-- pode impactar significativamente a classificação. Além da falta de tratamento das imagens antes da extração de características, o desbalanceamento de classes também se apresenta como um obstáculo para que a classificação seja satisfatória. Imagens possuem características que podem ser exploradas para melhorar a descrição dos objetos de interesse e, portanto, sua classificação. Entre as possibilidades de melhorias estão: a redução do número de intensidades das imagens antes da extração de características ao invés de métodos de quantização no vetor já extraído; e a geração de imagens a partir das originais, de forma a promover o balanceamento de bases de dados cujo número de exemplos de cada classe é desbalanceado. Portanto, a proposta desta dissertação é melhorar a classificação de imagens utilizando métodos de processamento de imagens antes da extração de características. Especificamente, busca analisar a influência do balanceamento de bases de dados e da quantização na classificação. Este estudo analisa ainda a visualização do espaço de características após os métodos de geração artificial de imagens e de interpolação das características extraídas das imagens originais (SMOTE), comparando como espaço original. A ênfase dessa visualização se dá na observação da importância do rebalanceamento das classes. Os resultados obtidos indicam que a quantização simplifica as imagens antes da extração de características e posterior redução de dimensionalidade, produzindo vetores mais compactos; e que o rebalanceamento de classes de imagens através da geração de imagens artificiais pode melhorar a classificação da base de imagens, em relação à classificação original e ao uso de métodos no espaço de características já extraídas. / Each image can be represented by a combination of several features like color frequency and texture properties. Those features compose a multidimensional vector, which represents the original image. Commonly this vector is given as an input to a classification method that can learn from examplesand build a decision model. The literature suggests that image preparation steps like acute acquisition, preprocessing and segmentation can positively impact such classification. Besides that, class unbalancing is also a barrier to achieve good classification accuracy. Some features and methods can be explored to improveobjects\' description, thus their classification. Possible suggestions include: reducing colors number before feature extraction instead of applying quantization methods to raw vectors already extracted; and generating synthetic images from original ones, to balance the number of samples in an uneven data set. We propose to improve image classification using image processing methods before feature extraction. Specifically we want to analyze the influence of both balancing and quantization methods while applied to datasets in a classification routine. This research also analyses the visualization of feature space after the artificial image generation and feature interpolation (SMOTE), against to original space. Such visualization is used because it allows us to know how important is the rebalacing method. The results show that quantization simplifies imagesby producing compacted vectors before feature extraction and dimensionality reduction; and that using artificial generation to rebalance image datasets can improve classification, when compared to the original one and to applying methods on the already extracted feature vectors. Bases de dados desbalanceados Classificação de imagens Geração de imagens Image classification Image generation Image processing Image quantization Processamento de imagens Quantização Unbalanced datasets
73	Some applications of statistical phylogenetics : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Biomathematics at Massey University Schliep, Klaus Peter January 2009 (has links) The increasing availability of molecular data means that phylogenetic studies nowadays often use datasets which combine a large number of loci for many different species. This leads to a trade-off. On the one hand more complex models are preferred to account for heterogeneity in evolutionary processes. On the other hand simple models that can answer biological questions of interest that are easy to interpret and can be computed in reasonable time are favoured. This thesis focuses on four cases of phylogenetic analysis which arise from this conflict. - It is shown that edge weight estimates can be non-identifiable if the data are simulated under a mixture model. Even if the underlying process is known the estimation and interpretation may be difficult due to the high variance of the parameters of interest. - Partition models are commonly used to account for heterogeneity in data sets. Novel methods are presented here which allow grouping of genes under similar evolutionary constraints. A data set, containing 14 genes of the chloroplast from 19 anciently diverged species is used to find groups of co-evolving genes. The prospects and limitations of such methods are discussed. - Penalised likelihood estimation is a useful tool for improving the performance of models and allowing for variable selection. A novel approach is presented that uses pairwise dissimilarities to visualise the data as a network. It is further shown how penalised likelihood can be used to decrease the variance of parameter estimates for mixture and partition models, allowing a more reliable analysis. Estimates for the variance and the expected number of parameters of penalised likelihood estimates are derived. - Tree shape statistics are used to describe speciation events in macroevolution. A new tree shape statistic is introduced and the biases of different cluster methods on tree shape statistics are discussed. datasets partition model penalised likelihood tree shape statistic
74	Digitizing the Parthenon using 3D Scanning : Managing Huge Datasets Lundgren, Therese January 2004 (has links) <p>Digitizing objects and environments from real world has become an important part of creating realistic computer graphics. Through the use of structured lighting and laser time-of-flight measurements the capturing of geometric models is now a common process. The result are visualizations where viewers gain new possibilities for both visual and intellectual experiences. </p><p>This thesis presents the reconstruction of the Parthenon temple and its environment in Athens, Greece by using a 3D laser-scanning technique. </p><p>In order to reconstruct a realistic model using 3D scanning techniques there are various phases in which the acquired datasets have to be processed. The data has to be organized, registered and integrated in addition to pre and post processing. This thesis describes the development of a suitable and efficient data processing pipeline for the given data. </p><p>The approach differs from previous scanning projects considering digitizing this large scale object at very high resolution. In particular the issue managing and processing huge datasets is described. </p><p>Finally, the processing of the datasets in the different phases and the resulting 3D model of the Parthenon is presented and evaluated.</p> Datorteknik 3D scanning range image huge datasets mesh generation data processing pipeline filtering cultural heritage Datorteknik Computer engineering Datorteknik
75	Out-of-Core Multi-Resolution Volume Rendering of Large Data Sets Lundell, Fredrik January 2011 (has links) A modality device can today capture high resolution volumetric data sets and as the data resolutions increase so does the challenges of processing volumetric data through a visualization pipeline. Standard volume rendering pipelines often use a graphic processing unit (GPU) to accelerate rendering performance by taking beneficial use of the parallel architecture on such devices. Unfortunately, graphics cards have limited amounts of video memory (VRAM), causing a bottleneck in a standard pipeline. Multi-resolution techniques can be used to efficiently modify the rendering pipeline, allowing a sub-domain within the volume to be represented at different resolutions. The active resolution distribution is temporarily stored on the VRAM for rendering and the inactive parts are stored on secondary memory layers such as the system RAM or on disk. The active resolution set can be optimized to produce high quality renders while minimizing the amount of storage required. This is done by using a dynamic compression scheme which optimize the visual quality by evaluating user-input data. The optimized resolution of each sub-domain is then, on demand, streamed to the VRAM from secondary memory layers. Rendering a multi-resolution data set requires some extra care between boundaries of sub-domains. To avoid artifacts, an intrablock interpolation (II) sampling scheme capable of creating smooth transitions between sub-domains at arbitrary resolutions can be used. The result is a highly optimized rendering pipeline complemented with a preprocessing pipeline together capable of rendering large volumetric data sets in real-time. out-of-core volume rendering raycasting large datasets gpgpu streaming multi-resolution openCL bricking blocking adaptive intrablock interpolation voreen TECHNOLOGY TEKNIKVETENSKAP
76	Geração de imagens artificiais e quantização aplicadas a problemas de classificação / Artificial images generation and quantization applied to classification problems Gabriela Salvador Thumé 29 April 2016 (has links) Cada imagem pode ser representada como uma combinação de diversas características, como por exemplo o histograma de intensidades de cor ou propriedades de textura da imagem. Essas características compõem um vetor multidimensional que representa a imagem. É comum esse vetor ser dado como entrada para um método de classificação de padrões que, após aprender por meio de diversos exemplos, pode gerar um modelo de decisão. Estudos sugerem evidências de que a preparação das imagens-- por meio da especificação cuidadosa da aquisição, pré-processamento e segmentação-- pode impactar significativamente a classificação. Além da falta de tratamento das imagens antes da extração de características, o desbalanceamento de classes também se apresenta como um obstáculo para que a classificação seja satisfatória. Imagens possuem características que podem ser exploradas para melhorar a descrição dos objetos de interesse e, portanto, sua classificação. Entre as possibilidades de melhorias estão: a redução do número de intensidades das imagens antes da extração de características ao invés de métodos de quantização no vetor já extraído; e a geração de imagens a partir das originais, de forma a promover o balanceamento de bases de dados cujo número de exemplos de cada classe é desbalanceado. Portanto, a proposta desta dissertação é melhorar a classificação de imagens utilizando métodos de processamento de imagens antes da extração de características. Especificamente, busca analisar a influência do balanceamento de bases de dados e da quantização na classificação. Este estudo analisa ainda a visualização do espaço de características após os métodos de geração artificial de imagens e de interpolação das características extraídas das imagens originais (SMOTE), comparando como espaço original. A ênfase dessa visualização se dá na observação da importância do rebalanceamento das classes. Os resultados obtidos indicam que a quantização simplifica as imagens antes da extração de características e posterior redução de dimensionalidade, produzindo vetores mais compactos; e que o rebalanceamento de classes de imagens através da geração de imagens artificiais pode melhorar a classificação da base de imagens, em relação à classificação original e ao uso de métodos no espaço de características já extraídas. / Each image can be represented by a combination of several features like color frequency and texture properties. Those features compose a multidimensional vector, which represents the original image. Commonly this vector is given as an input to a classification method that can learn from examplesand build a decision model. The literature suggests that image preparation steps like acute acquisition, preprocessing and segmentation can positively impact such classification. Besides that, class unbalancing is also a barrier to achieve good classification accuracy. Some features and methods can be explored to improveobjects\' description, thus their classification. Possible suggestions include: reducing colors number before feature extraction instead of applying quantization methods to raw vectors already extracted; and generating synthetic images from original ones, to balance the number of samples in an uneven data set. We propose to improve image classification using image processing methods before feature extraction. Specifically we want to analyze the influence of both balancing and quantization methods while applied to datasets in a classification routine. This research also analyses the visualization of feature space after the artificial image generation and feature interpolation (SMOTE), against to original space. Such visualization is used because it allows us to know how important is the rebalacing method. The results show that quantization simplifies imagesby producing compacted vectors before feature extraction and dimensionality reduction; and that using artificial generation to rebalance image datasets can improve classification, when compared to the original one and to applying methods on the already extracted feature vectors. Bases de dados desbalanceados Classificação de imagens Geração de imagens Processamento de imagens Quantização Image classification Image generation Image processing Image quantization Unbalanced datasets
77	Digitizing the Parthenon using 3D Scanning : Managing Huge Datasets Lundgren, Therese January 2004 (has links) Digitizing objects and environments from real world has become an important part of creating realistic computer graphics. Through the use of structured lighting and laser time-of-flight measurements the capturing of geometric models is now a common process. The result are visualizations where viewers gain new possibilities for both visual and intellectual experiences. This thesis presents the reconstruction of the Parthenon temple and its environment in Athens, Greece by using a 3D laser-scanning technique. In order to reconstruct a realistic model using 3D scanning techniques there are various phases in which the acquired datasets have to be processed. The data has to be organized, registered and integrated in addition to pre and post processing. This thesis describes the development of a suitable and efficient data processing pipeline for the given data. The approach differs from previous scanning projects considering digitizing this large scale object at very high resolution. In particular the issue managing and processing huge datasets is described. Finally, the processing of the datasets in the different phases and the resulting 3D model of the Parthenon is presented and evaluated. Datorteknik 3D scanning range image huge datasets mesh generation data processing pipeline filtering cultural heritage Datorteknik Computer Engineering Datorteknik
78	A qualitative study: how Solution Snippets are presented in Stack Overflow and how those Solution Snippets need to be adapted for reuse Weeraddana, Nimmi Rashinika 22 March 2022 (has links) Researchers use datasets of Question-Solution pairs to train machine learning models, such as source code generation models. A Question-Solution pair contains two parts: a programming question and its corresponding Solution Snippet. A Solution Snippet is a source code that solves a programming question. These datasets of Question-Solution pairs can be extracted from a number of different platforms. In this research, I study how Question-Solution pairs are extracted from Stack Overflow (SO). There are two limitations of datasets of Question-Solution pairs extracted from SO: (1) according to the authors of these datasets, some Question-Solution pairs contain Solution Snippets that do not solve the question correctly, and (2) these datasets do not contain the information on how Solution Snippets need to be reused, and such information would enhance the reusability of Solution Snippets. These limitations of datasets of pairs could adversely affect the quality of the code being generated by machine learning models. In this research, I conducted a qualitative study to categorize various presentations of Solution Snippets in SO’s answers as well as how Solution Snippets can be adapted for reuse. By doing so, I identified eight categories of how Solution Snippets are presented in SO’s answers and five categories of how Solution Snippets could be adapted. Based on these results, I concluded several potential reasons why it is not easy to create datasets of Question-Solution pairs. The first categorization informs that finding the correct location of the Solution Snippet is challenging when there are several code blocks within the answer to the question. Subsequently, the researcher must identify which code within that code block is the Solution Snippet. The second categorization informs that most Solution Snippets appear challenging to be adapted for reuse, and how Solution Snippets are potentially adapted is not explicitly stated in them. These insights shed light on creating better quality datasets from questions and answers posted on Stack Overflow. / Graduate Stack Overflow Code Blocks Solution Snippets Code Reuse Qualitative Study Question-Solution pairs Datasets Source code generation Source code synthesis
79	Text simplification in Swedish using transformer-based neural networks / Textförenkling på Svenska med transformer-baserade neurala nätverk Söderberg, Samuel January 2023 (has links) Textförenkling innebär modifiering av text så att den blir lättare att läsa genom ersättning av komplexa ord, ändringar av satsstruktur och/eller borttagning av onödig information. Forskning existerar kring textförenkling på svenska, men användandet av neurala nätverk inom området är begränsat. Neurala nätverk kräver storaskaliga och högkvalitativa dataset, men sådana dataset är sällsynta för textförenkling på svenska. Denna studie undersöker framtagning av dataset för textförenkling på svenska genom parafrasutvinning från webbsidor och genom översättning av existerande dataset till svenska, och hur neurala nätverk tränade på sådana dataset presterar. Tre dataset med sekvenspar av komplexa och motsvarande simpla sekvenser skapades, den första genom parafrasutvinning från web data, det andra genom översättning av ett dataset från engelska till svenska, och ett tredje genom att kombinera de framtagna dataseten till ett. Dessa dataset användes sedan för att finjustera ett neuralt vätverk av BART modell, förtränad på stora mängder svensk data. Utvärdering av de tränade modellerna utfördes sedan genom en manuell undersökning och kategorisering av output, och en automatiserad bedömning med mätverktygen SARI och LIX. Två olika dataset för testning skapades och användes i utvärderingen, ett översatt från engelska och ett manuellt framtaget från svenska texter. Den automatiska utvärderingen med SARI gav resultat nära, men inte lika bra, som liknande forskning inom textförenkling på engelska. Utvärderingen med LIX gav resultat på liknande nivå eller bättre än nuvarande forskning inom textförenkling på svenska. Den manuella utvärderingen visade att modellen tränad på datat från parafrasutvinningen oftast producerade korta sekvenser med många ändringar jämfört med originalet, medan modellen tränad på det översatta datasetet oftast producerade oförändrade sekvenser och/eller sekvenser med få ändringar. Dock visade det sig att modellen tränad på de utvunna paragraferna producerade många fler oanvändbara sekvenser än vad modellen tränad på det översatta datasetet gjorde. Modellen tränad på det kombinerade datasetet presterade mellan de två andra modellerna i dessa två avseenden, då den producerade färre oanvändbara sekvenser än modellen tränad på de utvunna paragraferna och färre oförändrade sekvenser jämfört med modellen tränad på det översatta datat. Många sekvenser förenklades bra med de tre modellerna, men den manuella utvärderingen visade att en signifikant andel av de genererade sekvenserna förblev oförändrade eller oanvändbara, vilket belyser behovet av ytterligare forskning, utforskning av metoder, och förfinande av de använda verktygen. / Text simplification involves modifying text to make it easier to read by replacing complex words, altering sentence structure, and/or removing unnecessary information. It can be used to make text more accessible to a larger crowd. While research in text simplification exists for Swedish, the use of neural networks in the field is limited. Neural networks require large-scale high-quality datasets, but such datasets are scarce for text simplification in Swedish. This study investigates the acquisition of datasets through paraphrase mining from web snapshots and translation of existing datasets for text simplification in English to Swedish and aims to assess the performance of neural network models trained on such acquired datasets. Three datasets with complex-to-simple sequence pairs were created, one through mining paraphrases from web data, another by translating a dataset from English to Swedish, and a third by combining the acquired mined and translated datasets into one. These datasets were then used to fine-tune a BART neural network model pre-trained on large amounts of Swedish data. An evaluation was conducted through manual examination and categorization of output, and automated assessment using the SARI and LIX metrics. Two different test sets were evaluated, one translated from English and one manually constructed from Swedish texts. The automatic evaluation produced SARI scores close to, but not as well as, similar research in text simplification in English. When considering LIX scores, the models perform on par or better than existing research into automatic text simplification in Swedish. The manual evaluation revealed that the model trained on the mined paraphrases generally produced short sequences that had many alterations compared to the original, while the translated dataset often produced unchanged sequences and sequences with few alterations. However, the model trained on the mined dataset produced many more sequences that were unusable, either with corrupted Swedish or by altering the meaning of the sequences, compared to the model trained on the translated dataset. The model trained on the combined dataset reached a middle ground in these two regards, producing fewer unusable sequences than the model trained on the mined dataset and fewer unchanged sequences compared to the model trained on the translated dataset. Many sequences were successfully simplified using the three models, but the manual evaluation revealed that a significant portion of the generated sequences remains unchanged or unusable, highlighting the need for further research, exploration of methods, and tool refinement. Machine learning Natural language processing Text simplification Datasets Maskininlärning Neurolingvistisk programmering Textförenkling Dataset Computer and Information Sciences Data- och informationsvetenskap
80	Identifying Units on a WiFi Based on Their Physical Properties Nyström, Jonatan January 2019 (has links) This project aims to classify different units on a wireless network with the use of their frequency response. This is in purpose to increase security when communicating over WiFi. We use a convolution neural network for finding symmetries in the frequency responses recorded from two different units. We used two pre-recorded sets of data which contained the same units but from two different locations. The project achieve an accuracy of 99.987%, with a 5 hidden layers CNN, when training and testing on one dataset. When training the neural network on one set and testing it on a second set, we achieve results below 54.12% for identifying the units. At the end we conclude that the amount of data needed, for achieving high enough accuracy, is to large for this method to be a practical solution for non-stationary units. Convolutional Neural Network Supervised Learning WiFi Security Activation Functions Validation WiFi Datasets Elektroteknik och elektronik

Search results