Global ETD Search

141	Personal Information Environment: A Framework for Managing Personal Files across a Set of Devices MOHAMMAD, ATIF 06 August 2009 (has links) The advancement in computing in the last three decades has introduced many devices in our daily lives including personal computers, laptops, cellular devices and many more. The data we need for our processing needs is scattered among these devices. The availability of all the scattered data in the devices in use associated to an individual user as one is achieved in a Personal Information Environment. Data recharging is a technique used to achieve a Personal Information Environment for an individual user using data replication. In this thesis, we propose a data recharging scheme for an individual user’s Personal Information Environment. We study the data availability to a user by conducting a simulation using the data recharging algorithm. This data recharging approach is achieved by using master-slave data replication technique. / Thesis (Master, Computing) -- Queen's University, 2009-08-06 00:18:00.19 Personal Information Environment Data Recharging Data Replication Data Transmission
142	Models for Univariate and Multivariate Analysis of Longitudinal and Clustered Data Luo, Dandan Unknown Date No description available. Longitudinal Data Analysis Clustered Data Analysis Zero-inflated Data
143	Scalable Embeddings for Kernel Clustering on MapReduce Elgohary, Ahmed 14 February 2014 (has links) There is an increasing demand from businesses and industries to make the best use of their data. Clustering is a powerful tool for discovering natural groupings in data. The k-means algorithm is the most commonly-used data clustering method, having gained popularity for its effectiveness on various data sets and ease of implementation on different computing architectures. It assumes, however, that data are available in an attribute-value format, and that each data instance can be represented as a vector in a feature space where the algorithm can be applied. These assumptions are impractical for real data, and they hinder the use of complex data structures in real-world clustering applications. The kernel k-means is an effective method for data clustering which extends the k-means algorithm to work on a similarity matrix over complex data structures. The kernel k-means algorithm is however computationally very complex as it requires the complete data matrix to be calculated and stored. Further, the kernelized nature of the kernel k-means algorithm hinders the parallelization of its computations on modern infrastructures for distributed computing. This thesis defines a family of kernel-based low-dimensional embeddings that allows for scaling kernel k-means on MapReduce via an efficient and unified parallelization strategy. Then, three practical methods for low-dimensional embedding that adhere to our definition of the embedding family are proposed. Combining the proposed parallelization strategy with any of the three embedding methods constitutes a complete scalable and efficient MapReduce algorithm for kernel k-means. The efficiency and the scalability of the presented algorithms are demonstrated analytically and empirically. Data Clustering Kernel Methods Scalable Data Analytics MapReduce Big Data
144	The development and application of informatics-based systems for the analysis of the human transcriptome. Kelso, Janet January 2003 (has links) <p>Despite the fact that the sequence of the human genome is now complete it has become clear that the elucidation of the transcriptome is more complicated than previously expected. There is mounting evidence for unexpected and previously underestimated phenomena such as alternative splicing in the transcriptome. As a result, the identification of novel transcripts arising from the genome continues. Furthermore, as the volume of transcript data grows it is becoming increasingly difficult to integrate expression information which is from different sources, is stored in disparate locations, and is described using differing terminologies. Determining the function of translated transcripts also remains a complex task. Information about the expression profile &ndash / the location and timing of transcript expression &ndash / provides evidence that can be used in understanding the role of the expressed transcript in the organ or tissue under study, or in developmental pathways or disease phenotype observed.<br /> <br /> In this dissertation I present novel computational approaches with direct biological applications to two distinct but increasingly important areas of research in gene expression research. The first addresses detection and characterisation of alternatively spliced transcripts. The second is the construction of an hierarchical controlled vocabulary for gene expression data and the annotation of expression libraries with controlled terms from the hierarchies. In the final chapter the biological questions that can be approached, and the discoveries that can be made using these systems are illustrated with a view to demonstrating how the application of informatics can both enable and accelerate biological insight into the human transcriptome.</p> Gene expression, Data processing Genetics, Data processing Genomes, Data processing.
145	NEW METHODS FOR MINING SEQUENTIAL AND TIME SERIES DATA Al-Naymat, Ghazi January 2009 (has links) Doctor of Philosophy (PhD) / Data mining is the process of extracting knowledge from large amounts of data. It covers a variety of techniques aimed at discovering diverse types of patterns on the basis of the requirements of the domain. These techniques include association rules mining, classification, cluster analysis and outlier detection. The availability of applications that produce massive amounts of spatial, spatio-temporal (ST) and time series data (TSD) is the rationale for developing specialized techniques to excavate such data. In spatial data mining, the spatial co-location rule problem is different from the association rule problem, since there is no natural notion of transactions in spatial datasets that are embedded in continuous geographic space. Therefore, we have proposed an efficient algorithm (GridClique) to mine interesting spatial co-location patterns (maximal cliques). These patterns are used as the raw transactions for an association rule mining technique to discover complex co-location rules. Our proposal includes certain types of complex relationships – especially negative relationships – in the patterns. The relationships can be obtained from only the maximal clique patterns, which have never been used until now. Our approach is applied on a well-known astronomy dataset obtained from the Sloan Digital Sky Survey (SDSS). ST data is continuously collected and made accessible in the public domain. We present an approach to mine and query large ST data with the aim of finding interesting patterns and understanding the underlying process of data generation. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a predefined time. One approach to processing a “flock query” is to map ST data into high-dimensional space and to reduce the query to a sequence of standard range queries that can be answered using a spatial indexing structure; however, the performance of spatial indexing structures rapidly deteriorates in high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing the curse of dimensionality in a ST setting by combining random projections with traditional data structures. In time series data mining, we devised a new space-efficient algorithm (SparseDTW) to compute the dynamic time warping (DTW) distance between two time series, which always yields the optimal result. This is in contrast to other approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of similarity and/or correlation between the time series: the more the similarity between the time series, the less space required to compute the DTW between them. Other techniques for speeding up DTW, impose a priori constraints and do not exploit similarity characteristics that may be present in the data. Our experiments demonstrate that SparseDTW outperforms these approaches. We discover an interesting pattern by applying SparseDTW algorithm: “pairs trading” in a large stock-market dataset, of the index daily prices from the Australian stock exchange (ASX) from 1980 to 2002. Data mining Spatial data Spatio-temporal Time series data
146	Meta-learning: strategies, implementations, and evaluations for algorithm selection / Köpf, Christian Rudolf. January 2006 (has links) Univ., Diss.--Ulm, 2005. / Literaturverz. S. 227 - 248.
147	Compressing data cube in parallel OLAP systems / Liang, Boyong, January 1900 (has links) Thesis (M.C.S.) - Carleton University, 2005. / Includes bibliographical references (p. 88-93). Also available in electronic format on the Internet.
148	Klassifikation, Konzeption und Anwendung medizinischer Data Dictionaries / Bürkle, Thomas. January 2001 (has links) Universiẗat, Habil.-Schr.--Gießen, 2001.
149	Adaptive classification of scarcely labeled and evolving data streams / Masud, Mohammad Mehedy. January 2009 (has links) Thesis. / Includes vita. Includes bibliographical references (leaves 136-146)
150	Master data management maturity model for the successful of mdm initiatives in the microfinance sector in Peru Vásquez D., Vásquez, Daniel, Kukurelo, Romina, Raymundo, Carlos, Dominguez, Francisco, Moguerza, Javier 04 1900 (has links) El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado. / The microfinance sector has a strategic role since they facilitate integration and development of all social classes to sustained economic growth. In this way the actual point is the exponential growth of data, resulting from transactions and operations carried out with these companies on a daily basis, becomes imminent. Appropriate management of this data is therefore necessary because, otherwise, it will result in a competitive disadvantage due to the lack of valuable and quality information for decision-making and process improvement. The Master Data Management (MDM) give a new way in the Data management, reducing the gap between the business perspectives versus the technology perspective In this regard, it is important that the organization have the ability to implement a data management model for Master Data Management. This paper proposes a Master Data management maturity model for microfinance sector, which frames a series of formal requirements and criteria providing an objective diagnosis with the aim of improving processes until entities reach desired maturity levels. This model was implemented based on the information of Peruvian microfinance organizations. Finally, after validation of the proposed model, it was evidenced that it serves as a means for identifying the maturity level to help in the successful of initiative for Master Data management projects. / Revisión por pares Data governance Data management Master data management Maturity model Microfinance

Search results