• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 27
  • 21
  • 6
  • 4
  • 3
  • 3
  • 2
  • 1
  • Tagged with
  • 74
  • 74
  • 21
  • 10
  • 10
  • 10
  • 9
  • 9
  • 8
  • 8
  • 8
  • 8
  • 7
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Design, implementation, and evaluation of node placement and data reduction algorithms for large scale wireless networks

Mehta, Hardik 01 December 2003 (has links)
No description available.
22

High performance latent dirichlet allocation for text mining

Liu, Zelong January 2013 (has links)
Latent Dirichlet Allocation (LDA), a total probability generative model, is a three-tier Bayesian model. LDA computes the latent topic structure of the data and obtains the significant information of documents. However, traditional LDA has several limitations in practical applications. LDA cannot be directly used in classification because it is a non-supervised learning model. It needs to be embedded into appropriate classification algorithms. LDA is a generative model as it normally generates the latent topics in the categories where the target documents do not belong to, producing the deviation in computation and reducing the classification accuracy. The number of topics in LDA influences the learning process of model parameters greatly. Noise samples in the training data also affect the final text classification result. And, the quality of LDA based classifiers depends on the quality of the training samples to a great extent. Although parallel LDA algorithms are proposed to deal with huge amounts of data, balancing computing loads in a computer cluster poses another challenge. This thesis presents a text classification method which combines the LDA model and Support Vector Machine (SVM) classification algorithm for an improved accuracy in classification when reducing the dimension of datasets. Based on Density-Based Spatial Clustering of Applications with Noise (DBSCAN), the algorithm automatically optimizes the number of topics to be selected which reduces the number of iterations in computation. Furthermore, this thesis presents a noise data reduction scheme to process noise data. When the noise ratio is large in the training data set, the noise reduction scheme can always produce a high level of accuracy in classification. Finally, the thesis parallelizes LDA using the MapReduce model which is the de facto computing standard in supporting data intensive applications. A genetic algorithm based load balancing algorithm is designed to balance the workloads among computers in a heterogeneous MapReduce cluster where the computers have a variety of computing resources in terms of CPU speed, memory space and hard disk space.
23

IMPLEMENTING SOFTWARE PROCESS IMPROVEMENTS IN THE T&E COMMUNITY

Posey, Chlotia 10 1900 (has links)
International Telemetering Conference Proceedings / October 23-26, 2000 / Town & Country Hotel and Conference Center, San Diego, California / The Capability Maturity Model (CMM) developed by the Software Engineering Institute is widely promoted as a method to help decrease the volume of error riddled and late software projects. Because of the projected benefits, the 96th Communications Group/SC (SC) at Eglin Air Force Base began an intensive software process improvement effort in late 1997. This effort was rewarded in September 1999 when the group achieved a CMM Level 2 software rating on its first attempt. As of December 1999, 68% of assessed organizations remained at Level 1 on their first or second assessment. The SC success was not only obtained on its first attempt, but also 11 months ahead of the industry standard. The Level 2 rating was accomplished in the volatile environment needed to support the test and evaluation mission. This environment includes frequent requirement changes, short notice modifications, and externally driven schedules. One reason this milestone was possible is close and direct involvement by management. This paper will present additional factors to implementing a successful software process improvement effort.
24

A PC Database and GUI for Telemetry Data Reduction

Reinsmith, Lee, Surber, Steven 10 1900 (has links)
International Telemetering Conference Proceedings / October 25-28, 1999 / Riviera Hotel and Convention Center, Las Vegas, Nevada / The Telemetry Definition and Processing (TDAP II) application is a PC-based software tool that meets the varied needs - both now and into the 21st century - of instrumentation engineers, data analysts, test engineers, and project personnel in the Test and Evaluation (T&E) community. TDAP II uses state-of-the-art commercial software technology that includes a Microsoft Access 97Ô database and a Microsoft Visual BasicÔ Graphical User Interface (GUI) for users to view and navigate the database. Developed by the Test and Analysis Division of the 96th Communications Group for the tenants of the Air Armament Center (AAC), Eglin AFB Florida, TDAP II provides a centralized repository for both aircraft and weapons instrumentation descriptions and telemetry EU conversion calibrations. Operating in a client/server environment, TDAP II can be effectively used on a small or large network as well as on both a classified or unclassified Intranet or Internet. This paper describes the components and design of this application, along with its operational flexibility and varied uses resulting from the chosen commercial software technology.
25

Ground-based near-infrared remote sounding of ice giant clouds and methane

Tice, Dane Steven January 2014 (has links)
The ice giants, Uranus and Neptune, are the two outermost planets in our solar system. With only one satellite flyby each in the late 1980’s, the ice giants are arguably the least understood of the planets orbiting the Sun. A better understanding of these planets’ atmospheres will not only help satisfy the natural scientific curiosity we have about these distant spheres of gas, but also might provide insight into the dynamics and meteorology of our own planet’s atmosphere. Two new ground-based, near-infrared datasets of the ice giants are studied. Both datasets provide data in a portion of the electromagnetic spectrum that provides good constraint on the size of small scattering particles in the atmospheres’ clouds and haze layers. The broad extent of both telescopes’ spectral coverage allows characterisation of these small particles for a wide range of wavelengths. Both datasets also provide coverage of the 825 nm collision-induced hydrogen-absorption feature, allowing us to disentangle the latitudinal variation of CH4 abundance from the height and vertical extent of clouds in the upper troposphere. A two-cloud model is successfully fitted to IRTF SpeX Uranus data, parameterising both clouds with base altitude, fractional scale height, and total opacity. An optically thick, vertically thin cloud with a base pressure of 1.6 bar, tallest in the midlatitudes, shows strong preference for scattering particles of 1.35 μm radii. Above this cloud lies an optically thin, vertically extended haze extending upward from 1.0 bar and consistent with particles of 0.10 μm radii. An equatorial enrichment of methane abundance and a lower cloud of constant vertical thickness was shown to exist using two independent methods of analysis. Data from Palomar SWIFT of three different latitude regions.
26

An Empirical Approach to Evaluating Sufficient Similarity: Utilization of Euclidean Distance As A Similarity Measure

Marshall, Scott 27 May 2010 (has links)
Individuals are exposed to chemical mixtures while carrying out everyday tasks, with unknown risk associated with exposure. Given the number of resulting mixtures it is not economically feasible to identify or characterize all possible mixtures. When complete dose-response data are not available on a (candidate) mixture of concern, EPA guidelines define a similar mixture based on chemical composition, component proportions and expert biological judgment (EPA, 1986, 2000). Current work in this literature is by Feder et al. (2009), evaluating sufficient similarity in exposure to disinfection by-products of water purification using multivariate statistical techniques and traditional hypothesis testing. The work of Stork et al. (2008) introduced the idea of sufficient similarity in dose-response (making a connection between exposure and effect). They developed methods to evaluate sufficient similarity of a fully characterized reference mixture, with dose-response data available, and a candidate mixture with only mixing proportions available. A limitation of the approach is that the two mixtures must contain the same components. It is of interest to determine whether a fully characterized reference mixture (representative of the random process) is sufficiently similar in dose-response to a candidate mixture resulting from a random process. Four similarity measures based on Euclidean distance are developed to aid in the evaluation of sufficient similarity in dose-response, allowing for mixtures to be subsets of each other. If a reference and candidate mixture are concluded to be sufficiently similar in dose-response, inference about the candidate mixture can be based on the reference mixture. An example is presented demonstrating that the benchmark dose (BMD) of the reference mixture can be used as a surrogate measure of BMD for the candidate mixture when the two mixtures are determined to be sufficiently similar in dose-response. Guidelines are developed that enable the researcher to evaluate the performance of the proposed similarity measures.
27

CorrelaÃÃo EspaÃo-Temporal Multivariada na Melhoria da PrecisÃo de PrediÃÃo para ReduÃÃo de Dados em Redes de Sensores Sem Fio / Improving Prediction Accuracy for WSN Data Reduction by Applying Multivariate Spatio-Temporal Correlation

Carlos Giovanni Nunes de Carvalho 23 March 2012 (has links)
FundaÃÃo de Amparo a Pesquisa do Estado do Piauà / A prediÃÃo de dados nÃo enviados ao sorvedouro à uma tÃcnica usada para economizar energia em RSSF atravÃs da reduÃÃo da quantidade de dados trafegados. PorÃm, os dispositivos devem rodar mecanismos simples devido as suas limitaÃÃes de recursos, os quais podem gerar erros indesejÃveis e isto pode nÃo ser muito preciso. Este trabalho propÃe um mÃtodo baseado na correlaÃÃo espacial e temporal multivariada para melhorar a precisÃo da prediÃÃo na reduÃÃo de dados de Redes de Sensores Sem Fio (RSSF). SimulaÃÃes foram feitas envolvendo funÃÃes de regressÃo linear simples e regressÃo linear mÃltipla para verificar o desempenho do mÃtodo proposto. Os resultados mostram um maior grau de correlaÃÃo entre as variÃveis coletadas em campo, quando comparadas com a variÃvel tempo, a qual à uma variÃvel independente usada para prediÃÃo. A precisÃo da prediÃÃo à menor quando a regressÃo linear simples à usada, enquanto a regressÃo linear mÃltipla à mais precisa. AlÃm disto, a soluÃÃo proposta supera algumas soluÃÃes atuais em cerca de 50% na prediÃÃo da variÃvel umidade e em cerca de 21% na prediÃÃo da variÃvel luminosidade. / Prediction of data not sent to the sink node is a technique used to save energy in WSNs by reducing the amount of data traffic. However, sensor devices must run simple mechanisms due to its constrained resources, which may cause unwanted errors and this may not be very accurate. This work proposes a method based on multivariate spatial and temporal correlation to improve prediction accuracy in data reduction for Wireless Sensor Networks (WSN). Simulations were made involving simple linear regression and multiple linear regression functions to assess the performance of the proposed method. The results show a higher correlation between gathered inputs when compared to variable time, which is an independent variable widely used for prediction and forecasting. Prediction accuracy is lower when simple linear regression is used, whereas multiple linear regression is the most accurate one. In addition to that, the proposed solution outperforms some current solutions by about 50% in humidity prediction and 21% in light prediction.
28

A Contribution To Modern Data Reduction Techniques And Their Applications By Applied Mathematics And Statistical Learning

Sakarya, Hatice 01 January 2010 (has links) (PDF)
High-dimensional data take place from digital image processing, gene expression micro arrays, neuronal population activities to financial time series. Dimensionality Reduction - extracting low dimensional structure from high dimension - is a key problem in many areas like information processing, machine learning, data mining, information retrieval and pattern recognition, where we find some data reduction techniques. In this thesis we will give a survey about modern data reduction techniques, representing the state-of-the-art of theory, methods and application, by introducing the language of mathematics there. This needs a special care concerning the questions of, e.g., how to understand discrete structures as manifolds, to identify their structure, preparing the dimension reduction, and to face complexity in the algorithmically methods. A special emphasis will be paid to Principal Component Analysis, Locally Linear Embedding and Isomap Algorithms. These algorithms are studied by a research group from Vilnius, Lithuania and Zeev Volkovich, from Software Engineering Department, ORT Braude College of Engineering, Karmiel, and others. The main purpose of this study is to compare the results of the three of the algorithms. While the comparison is beeing made we will focus the results and duration.
29

A Similarity-based Data Reduction Approach

Ouyang, Jeng 07 September 2009 (has links)
Finding an efficient data reduction method for large-scale problems is an imperative task. In this paper, we propose a similarity-based self-constructing fuzzy clustering algorithm to do the sampling of instances for the classification task. Instances that are similar to each other are grouped into the same cluster. When all the instances have been fed in, a number of clusters are formed automatically. Then the statistical mean for each cluster will be regarded as representing all the instances covered in the cluster. This approach has two advantages. One is that it can be faster and uses less storage memory. The other is that the number of new representative instances need not be specified in advance by the user. Experiments on real-world datasets show that our method can run faster and obtain better reduction rate than other methods.
30

The quantification and visualisation of human flourishing.

Henley, Lisa January 2015 (has links)
Economic indicators such as GDP have been a main indicator of human progress since the first half of last century. There is concern that continuing to measure our progress and / or wellbeing using measures that encourage consumption on a planet with limited resources, may not be ideal. Alternative measures of human progress, have a top down approach where the creators decide what the measure will contain. This work defines a 'bottom up' methodology an example of measuring human progress that doesn't require manual data reduction. The technique allows visual overlay of other 'factors' that users may feel are particularly important. I designed and wrote a genetic algorithm, which, in conjunction with regression analysis, was used to select the 'most important' variables from a large range of variables loosely associated with the topic. This approach could be applied in many areas where there are a lot of data from which an analyst must choose. Next I designed and wrote a genetic algorithm to explore the evolution of a spectral clustering solution over time. Additionally, I designed and wrote a genetic algorithm with a multi-faceted fitness function which I used to select the most appropriate clustering procedure from a range of hierarchical agglomerative methods. Evolving the algorithm over time was not successful in this instance, but the approach holds a lot of promise as an alternative to 'scoring' new data based on an original solution, and as a method for using alternate procedural options to those an analyst might normally select. The final solution allowed an evolution of the number of clusters with a fixed clustering method and variable selection over time. Profiling with various external data sources gave consistent and interesting interpretations to the clusters.

Page generated in 0.043 seconds