• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 19521
  • 3369
  • 2399
  • 2005
  • 1551
  • 1432
  • 873
  • 406
  • 390
  • 359
  • 297
  • 233
  • 208
  • 208
  • 208
  • Tagged with
  • 37950
  • 12373
  • 9207
  • 7050
  • 6665
  • 5869
  • 5261
  • 5157
  • 4700
  • 3393
  • 3295
  • 2788
  • 2725
  • 2523
  • 2094
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
141

Models for Univariate and Multivariate Analysis of Longitudinal and Clustered Data

Luo, Dandan Unknown Date
No description available.
142

Scalable Embeddings for Kernel Clustering on MapReduce

Elgohary, Ahmed 14 February 2014 (has links)
There is an increasing demand from businesses and industries to make the best use of their data. Clustering is a powerful tool for discovering natural groupings in data. The k-means algorithm is the most commonly-used data clustering method, having gained popularity for its effectiveness on various data sets and ease of implementation on different computing architectures. It assumes, however, that data are available in an attribute-value format, and that each data instance can be represented as a vector in a feature space where the algorithm can be applied. These assumptions are impractical for real data, and they hinder the use of complex data structures in real-world clustering applications. The kernel k-means is an effective method for data clustering which extends the k-means algorithm to work on a similarity matrix over complex data structures. The kernel k-means algorithm is however computationally very complex as it requires the complete data matrix to be calculated and stored. Further, the kernelized nature of the kernel k-means algorithm hinders the parallelization of its computations on modern infrastructures for distributed computing. This thesis defines a family of kernel-based low-dimensional embeddings that allows for scaling kernel k-means on MapReduce via an efficient and unified parallelization strategy. Then, three practical methods for low-dimensional embedding that adhere to our definition of the embedding family are proposed. Combining the proposed parallelization strategy with any of the three embedding methods constitutes a complete scalable and efficient MapReduce algorithm for kernel k-means. The efficiency and the scalability of the presented algorithms are demonstrated analytically and empirically.
143

The development and application of informatics-based systems for the analysis of the human transcriptome.

Kelso, Janet January 2003 (has links)
<p>Despite the fact that the sequence of the human genome is now complete it has become clear that the elucidation of the transcriptome is more complicated than previously expected. There is mounting evidence for unexpected and previously underestimated phenomena such as alternative splicing in the transcriptome. As a result, the identification of novel transcripts arising from the genome continues. Furthermore, as the volume of transcript data grows it is becoming increasingly difficult to integrate expression information which is from different sources, is stored in disparate locations, and is described using differing terminologies. Determining the function of translated transcripts also remains a complex task. Information about the expression profile &ndash / the location and timing of transcript expression &ndash / provides evidence that can be used in understanding the role of the expressed transcript in the organ or tissue under study, or in developmental pathways or disease phenotype observed.<br /> <br /> In this dissertation I present novel computational approaches with direct biological applications to two distinct but increasingly important areas of research in gene expression research. The first addresses detection and characterisation of alternatively spliced transcripts. The second is the construction of an hierarchical controlled vocabulary for gene expression data and the annotation of expression libraries with controlled terms from the hierarchies. In the final chapter the biological questions that can be approached, and the discoveries that can be made using these systems are illustrated with a view to demonstrating how the application of informatics can both enable and accelerate biological insight into the human transcriptome.</p>
144

NEW METHODS FOR MINING SEQUENTIAL AND TIME SERIES DATA

Al-Naymat, Ghazi January 2009 (has links)
Doctor of Philosophy (PhD) / Data mining is the process of extracting knowledge from large amounts of data. It covers a variety of techniques aimed at discovering diverse types of patterns on the basis of the requirements of the domain. These techniques include association rules mining, classification, cluster analysis and outlier detection. The availability of applications that produce massive amounts of spatial, spatio-temporal (ST) and time series data (TSD) is the rationale for developing specialized techniques to excavate such data. In spatial data mining, the spatial co-location rule problem is different from the association rule problem, since there is no natural notion of transactions in spatial datasets that are embedded in continuous geographic space. Therefore, we have proposed an efficient algorithm (GridClique) to mine interesting spatial co-location patterns (maximal cliques). These patterns are used as the raw transactions for an association rule mining technique to discover complex co-location rules. Our proposal includes certain types of complex relationships – especially negative relationships – in the patterns. The relationships can be obtained from only the maximal clique patterns, which have never been used until now. Our approach is applied on a well-known astronomy dataset obtained from the Sloan Digital Sky Survey (SDSS). ST data is continuously collected and made accessible in the public domain. We present an approach to mine and query large ST data with the aim of finding interesting patterns and understanding the underlying process of data generation. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a predefined time. One approach to processing a “flock query” is to map ST data into high-dimensional space and to reduce the query to a sequence of standard range queries that can be answered using a spatial indexing structure; however, the performance of spatial indexing structures rapidly deteriorates in high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing the curse of dimensionality in a ST setting by combining random projections with traditional data structures. In time series data mining, we devised a new space-efficient algorithm (SparseDTW) to compute the dynamic time warping (DTW) distance between two time series, which always yields the optimal result. This is in contrast to other approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of similarity and/or correlation between the time series: the more the similarity between the time series, the less space required to compute the DTW between them. Other techniques for speeding up DTW, impose a priori constraints and do not exploit similarity characteristics that may be present in the data. Our experiments demonstrate that SparseDTW outperforms these approaches. We discover an interesting pattern by applying SparseDTW algorithm: “pairs trading” in a large stock-market dataset, of the index daily prices from the Australian stock exchange (ASX) from 1980 to 2002.
145

Meta-learning: strategies, implementations, and evaluations for algorithm selection /

Köpf, Christian Rudolf. January 2006 (has links)
Univ., Diss.--Ulm, 2005. / Literaturverz. S. 227 - 248.
146

Compressing data cube in parallel OLAP systems /

Liang, Boyong, January 1900 (has links)
Thesis (M.C.S.) - Carleton University, 2005. / Includes bibliographical references (p. 88-93). Also available in electronic format on the Internet.
147

Klassifikation, Konzeption und Anwendung medizinischer Data Dictionaries /

Bürkle, Thomas. January 2001 (has links)
Universiẗat, Habil.-Schr.--Gießen, 2001.
148

Adaptive classification of scarcely labeled and evolving data streams /

Masud, Mohammad Mehedy. January 2009 (has links)
Thesis. / Includes vita. Includes bibliographical references (leaves 136-146)
149

Master data management maturity model for the successful of mdm initiatives in the microfinance sector in Peru

Vásquez D., Vásquez, Daniel, Kukurelo, Romina, Raymundo, Carlos, Dominguez, Francisco, Moguerza, Javier 04 1900 (has links)
El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado. / The microfinance sector has a strategic role since they facilitate integration and development of all social classes to sustained economic growth. In this way the actual point is the exponential growth of data, resulting from transactions and operations carried out with these companies on a daily basis, becomes imminent. Appropriate management of this data is therefore necessary because, otherwise, it will result in a competitive disadvantage due to the lack of valuable and quality information for decision-making and process improvement. The Master Data Management (MDM) give a new way in the Data management, reducing the gap between the business perspectives versus the technology perspective In this regard, it is important that the organization have the ability to implement a data management model for Master Data Management. This paper proposes a Master Data management maturity model for microfinance sector, which frames a series of formal requirements and criteria providing an objective diagnosis with the aim of improving processes until entities reach desired maturity levels. This model was implemented based on the information of Peruvian microfinance organizations. Finally, after validation of the proposed model, it was evidenced that it serves as a means for identifying the maturity level to help in the successful of initiative for Master Data management projects. / Revisión por pares
150

Quality data extraction methodology based on the labeling of coffee leaves with nutritional deficiencies

Jungbluth, Adolfo, Yeng, Jon Li 04 1900 (has links)
El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado. / Nutritional deficiencies detection for coffee leaves is a task which is often undertaken manually by experts on the field known as agronomists. The process they follow to carry this task is based on observation of the different characteristics of the coffee leaves while relying on their own experience. Visual fatigue and human error in this empiric approach cause leaves to be incorrectly labeled and thus affecting the quality of the data obtained. In this context, different crowdsourcing approaches can be applied to enhance the quality of the data extracted. These approaches separately propose the use of voting systems, association rule filters and evolutive learning. In this paper, we extend the use of association rule filters and evolutive approach by combining them in a methodology to enhance the quality of the data while guiding the users during the main stages of data extraction tasks. Moreover, our methodology proposes a reward component to engage users and keep them motivated during the crowdsourcing tasks. The extracted dataset by applying our proposed methodology in a case study on Peruvian coffee leaves resulted in 93.33% accuracy with 30 instances collected by 8 experts and evaluated by 2 agronomic engineers with background on coffee leaves. The accuracy of the dataset was higher than independently implementing the evolutive feedback strategy and an empiric approach which resulted in 86.67% and 70% accuracy respectively under the same conditions. / Revisión por pares

Page generated in 0.0961 seconds