Global ETD Search

191	Editing and segmenting display files for color graphics Mitchell, Sharlene Kay January 2010 (has links) Typescript (photocopy). / Digitized by Kansas Correctional Industries Computer graphics Data editing Electronic data processing
192	Mining and Managing Neighbor-Based Patterns in Data Streams Yang, Di 09 January 2012 (has links) The current data-intensive world is continuously producing huge volumes of live streaming data through various kinds of electronic devices, such as sensor networks, smart phones, GPS and RFID systems. To understand these data sources and thus better leverage them to serve human society, the demands for mining complex patterns from these high speed data streams have significantly increased in a broad range of application domains, such as financial analysis, social network analysis, credit fraud detection, and moving object monitoring. In this dissertation, we present a framework to tackle the mining and management problem for the family of neighbor-based patterns in data streams, which covers a broad range of popular pattern types, including clusters, outliers, k-nearest neighbors and others. First, we study the problem of efficiently executing single neighbor-based pattern mining queries. We propose a general optimization principle for incremental pattern maintenance in data streams, called "Predicted Views". This general optimization principle exploits the "predictability" of sliding window semantics to eliminate both the computational and storage effort needed for handling the expiration of stream objects, which usually constitutes the most expensive operations for incremental pattern maintenance. Second, the problem of multiple query optimization for neighbor-based pattern mining queries is analyzed, which aims to efficiently execute a heavy workload of neighbor-based pattern mining queries using shared execution strategies. We present an integrated pattern maintenance strategy to represent and incrementally maintain the patterns identified by queries with different query parameters within a single compact structure. Our solution realizes fully shared execution of multiple queries with arbitrary parameter settings. Third, the problem of summarization and matching for neighbor-based patterns is examined. To solve this problem, we first propose a summarization format for each pattern type. Then, we present computation strategies, which efficiently summarize the neighbor-based patterns either during or after the online pattern extraction process. Lastly, to compare patterns extracted on different time horizon of the stream, we design an efficient matching mechanism to identify similar patterns in the stream history for any given pattern of interest to an analyst. Our comprehensive experimental studies, using both synthetic as well as real data from domains of stock trades and moving object monitoring, demonstrate superiority of our proposed strategies over alternate methods in both effectiveness and efficiency. Algorithm Streaming Data Query Processing Data Mining
193	Reconstructing gene regulatory networks with new datasets. / CUHK electronic theses & dissertations collection January 2013 (has links) 競爭性內源核糖核酸(ceRNA) 假設最近已成為生物訊息學研究中最熱門的話題之一。Cell 是在生物科學界上經常被引用的學術期刊，早前亦有一班學者在Cell 2011年同一期成功發佈四篇關於ceRNA 假設的學術文章。跟據有關ceRNA 假設的學術文章，大部份學者均以不同的個別例子成功驗證假定，可是，欠缺一個大規模的及全面性的分析。 / 在我兩年碩士的研究中，我引入了一個新的概念微核糖核酸及其目標對向聚類(MTB) 運用了ceRNA 的假設，還提出算法，成功從微核糖核酸與信使核糖核酸的相互數據中找出一系列的MTB' 還利用GENCODE 項目上大量的微核糖核酸及信使核糖核酸的表達數據去驗証MTB 的概念。一方面，我從大量的表達數據中成功推斷出微核糖核酸與信使核糖核酸之間的相反關連、信使核糖核酸之間的正面關運和微核糖核酸之間的正面關連;另一方面，這些關連進一步肯定ceRNA 假設的真實性。此外，我提出一個從大量基因組中找出基因功能分析的方法，並在大量的MTB 的基因組中找出重要的基因註解。最後，我提出另一個MTB 概念的應用一新算法來預測微核糖核酸與信使核糖核酸的相互影響。總括而吉， MTB 概念從複雜且混亂的微核糖核酸與信使核糖核酸網絡中定義簡單且穩固的模姐，提供一個系統生物學分析微核糖核酸調節能力的方法。 / The competing Endogenous RNA (ceRNA) hypothesis has become one of the hottest topics in bioinformatics research recently. Four papers related to the ceRNA hypothesis were published simultaneously in Cell in 2011, a top journal in life sciences. For most papers related to the ceRNA hypothesis, the corresponding studies have successfully validated the hypothesis with different individual examples, without a large-scale and comprehensive analysis. / In my Master of Philosophy study, a novel concept, called mi-RNA Target Bicluster (MTB), is introduced to model the ceRNA hypothesis. The MTBs are identified computationally from validated and/or predicted miRNA-mRNA interaction pairs. The MTB models were tested with the mRNAs and miRNAs expression data from the GENCODE Project. Statistically significant miRNA-mRNA anti-correlation, mRNA-mRNA correlation and miRNA-miRNA correlation in expression data are found, verifying the correlation relations among mRNAs and miRNAs stated in the ceRNA hypothesis with large-scale data support. Moreover, a novel large-scale functional enrichment analysis is performed, and the mRNAs selected by the MTBs are found to be biologically relevant. Besides, some new target prediction algorithms are suggested, as another application of the MTBs, are suggested. Overall, the concept of MTB defines simple and robust modules from the complex and noisy miRNA-mRNA network, suggesting ways for system biology analyses in miRNA-mediated regulations. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Yip, Kit Sang Danny. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves [117]-126). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts also in Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Contributions --- p.1 / Chapter 1.2 --- Thesis Outline --- p.2 / Chapter 2 --- Background --- p.3 / Chapter 2.1 --- Bioinformatics --- p.3 / Chapter 2.2 --- Biological Background --- p.7 / Chapter 2.2.1 --- The Central Dogma of Molecular Biology . --- p.7 / Chapter 2.2.2 --- RNAs --- p.8 / Chapter 2.2.3 --- Competing Endogenous RNA (ceRNA) hypothesis --- p.9 / Chapter 2.2.4 --- Biological Considerations in Functional Enrichment Analysis --- p.11 / Chapter 2.3 --- Computational Background --- p.12 / Chapter 2.3.1 --- miRNA Genomic Annotation Prediction --- p.13 / Chapter 2.3.2 --- miRNA Target Interaction Prediction --- p.14 / Chapter 2.3.3 --- Applying Computational Algorithms on Related Problems --- p.16 / Chapter 2.3.4 --- Algorithms in Functional Enrichment Analysis --- p.16 / Chapter 2.4 --- Experiments and Data --- p.17 / Chapter 2.4.1 --- miRNA Target Interactions --- p.17 / Chapter 2.4.2 --- Expression Data --- p.18 / Chapter 2.4.3 --- Annotation Datasets --- p.19 / Chapter 2.5 --- Research Motivations --- p.20 / Chapter 3 --- Definitions of miRNA Target Biclusters (MTB) --- p.22 / Chapter 3.1 --- Representations --- p.22 / Chapter 3.1.1 --- Binary Association Matrix Representation --- p.23 / Chapter 3.1.2 --- Bipartite Graph Representation --- p.23 / Chapter 3.1.3 --- Mathematical Representation --- p.24 / Chapter 3.2 --- Concept of MTB --- p.24 / Chapter 3.2.1 --- MTB Restrictive Type (Type R) --- p.27 / Chapter 3.2.2 --- MTB Restrictive Type on miRNA (Type Rmi) --- p.31 / Chapter 3.2.3 --- MTB Restrictive Type on mRNA (Type Rm) --- p.34 / Chapter 3.2.4 --- MTB Restrictive and General Type (Type Rgen) --- p.37 / Chapter 3.2.5 --- MTB Loose Type (Type L) --- p.44 / Chapter 3.2.6 --- MTB Loose Type but restricts on miRNA (Type Lmi) --- p.47 / Chapter 3.2.7 --- MTB Loose Type but restricts on mRNA (Type Lm) --- p.50 / Chapter 3.2.8 --- MTB Loose and General Type (Type Lgen) --- p.53 / Chapter 3.2.9 --- A General Definition on all Eight Types --- p.58 / Chapter 3.2.10 --- Discussions --- p.60 / Chapter 4 --- MTB Workflow in Checking Correlation Relations --- p.61 / Chapter 4.1 --- MTB Workflow in Checking Correlation Relations --- p.61 / Chapter 4.1.1 --- MTB Identification --- p.62 / Chapter 4.1.2 --- Correlation Coefficients --- p.63 / Chapter 4.1.3 --- Scoring Scheme --- p.64 / Chapter 4.1.4 --- Background Construction --- p.65 / Chapter 4.1.5 --- Wilcoxon Rank-sum Test --- p.66 / Chapter 4.1.6 --- Preliminary Studies --- p.67 / Chapter 4.2 --- miRNA-mRNA Anti-correlation in Expression Data --- p.68 / Chapter 4.2.1 --- Interaction Datasets --- p.69 / Chapter 4.2.2 --- Expression Datasets --- p.72 / Chapter 4.2.3 --- Independence of the Choices of Datasets --- p.73 / Chapter 4.2.4 --- Independence of the Types of MTBs --- p.76 / Chapter 4.2.5 --- Independence of the Choices of Correlation Coefficients --- p.78 / Chapter 4.2.6 --- Dependence on the Way to Score --- p.79 / Chapter 4.2.7 --- Independence of theWay to Construct Background --- p.81 / Chapter 4.2.8 --- Independence of Natural Bias in Datasets --- p.82 / Chapter 4.3 --- mRNA-mRNA Correlation in Expression Data --- p.84 / Chapter 4.3.1 --- Variations in the Analysis --- p.85 / Chapter 4.3.2 --- Discussions --- p.87 / Chapter 4.4 --- miRNA-miRNA Correlation in Expression Data --- p.88 / Chapter 4.4.1 --- Variations in the Analysis --- p.89 / Chapter 4.4.2 --- Discussions --- p.92 / Chapter 5 --- Target Prediction Aided by MTB --- p.94 / Chapter 5.1 --- Workflow in Target Prediction --- p.94 / Chapter 5.2 --- Contingency Table Approach --- p.96 / Chapter 5.2.1 --- One-tailed Hypothesis Testing --- p.97 / Chapter 5.3 --- Ranked List Approach --- p.98 / Chapter 5.3.1 --- Wilcoxon Signed Rank Test --- p.99 / Chapter 5.4 --- Results and Discussions --- p.99 / Chapter 6 --- Large-scale Functional Enrichment Analysis --- p.102 / Chapter 6.1 --- Principles in Functional Enrichment Analysis --- p.102 / Chapter 6.1.1 --- Annotation Files --- p.104 / Chapter 6.1.2 --- Functional Enrichment Analysis on a gene --- p.set105 / Chapter 6.1.3 --- Functional Enrichment Analysis on many gene sets --- p.106 / Chapter 6.2 --- Results and Discussions --- p.107 / Chapter 7 --- Future Perspectives and Conclusions --- p.112 / Chapter 7.1 --- Applying MTB definition on other problems --- p.112 / Chapter 7.2 --- Matrix Definitions and Optimization Problems --- p.113 / Chapter 7.3 --- Non-binary association matrix problem settings --- p.114 / Chapter 7.4 --- Limitations --- p.114 / Chapter 7.5 --- Conclusions --- p.116 / Bibliography --- p.117 / Chapter A --- Publications --- p.127 / Chapter A.1 --- Publications --- p.127 RNA--Data processing Proteins--Analysis--Data processing
194	Materializing views in data warehouse: an efficient approach to OLAP. January 2003 (has links) Gou Gang. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 83-87). / Abstracts in English and Chinese. / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Data Warehouse and OLAP --- p.4 / Chapter 1.2 --- Computational Model: Dependent Lattice --- p.10 / Chapter 1.3 --- Materialized View Selection --- p.12 / Chapter 1.3.1 --- Materialized View Selection under a Disk-Space Constraint --- p.13 / Chapter 1.3.2 --- Materialized View Selection under a Maintenance-Time Con- straint --- p.16 / Chapter 1.4 --- Main Contributions --- p.21 / Chapter 2 --- A* Search: View Selection under a Disk-Space Constraint --- p.24 / Chapter 2.1 --- The Weakness of Greedy Algorithms --- p.25 / Chapter 2.2 --- A*-algorithm --- p.29 / Chapter 2.2.1 --- An Estimation Function --- p.36 / Chapter 2.2.2 --- Pruning Feasible Subtrees --- p.38 / Chapter 2.2.3 --- Approaching the Optimal Solution from Two Directions --- p.41 / Chapter 2.2.4 --- NIBS Order: Accelerating Convergence --- p.43 / Chapter 2.2.5 --- Sliding Techniques: Eliminating Redundant H-Computation --- p.45 / Chapter 2.2.6 --- Examples --- p.50 / Chapter 2.3 --- Experiment Results --- p.54 / Chapter 2.3.1 --- Analysis of Experiment Results --- p.55 / Chapter 2.3.2 --- Computing for a Series of S Constraints --- p.60 / Chapter 2.4 --- Conclusions --- p.62 / Chapter 3 --- Randomized Search: View Selection under a Maintenance-Time Constraint --- p.64 / Chapter 3.1 --- Non-monotonic Property --- p.65 / Chapter 3.2 --- A Stochastic-Ranking-Based Evolutionary Algorithm --- p.67 / Chapter 3.2.1 --- A Basic Evolutionary Algorithm --- p.68 / Chapter 3.2.2 --- The Weakness of the rg-Method --- p.69 / Chapter 3.2.3 --- Stochastic Ranking: a Novel Constraint Handling Technique --- p.70 / Chapter 3.2.4 --- View Selection Using the Stochastic-Ranking-Based Evolu- tionary Algorithm --- p.72 / Chapter 3.3 --- Conclusions --- p.74 / Chapter 4 --- Conclusions --- p.75 / Chapter 4.1 --- Thesis Review --- p.76 / Chapter 4.2 --- Future Work --- p.78 / Chapter A --- My Publications for This Thesis --- p.81 / Bibliography --- p.83 OLAP technology Data mining Data warehousing
195	A study of two problems in data mining: anomaly monitoring and privacy preservation. January 2008 (has links) Bu, Yingyi. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2008. / Includes bibliographical references (leaves 89-94). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.v / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Anomaly Monitoring --- p.1 / Chapter 1.2 --- Privacy Preservation --- p.5 / Chapter 1.2.1 --- Motivation --- p.7 / Chapter 1.2.2 --- Contribution --- p.12 / Chapter 2 --- Anomaly Monitoring --- p.16 / Chapter 2.1 --- Problem Statement --- p.16 / Chapter 2.2 --- A Preliminary Solution: Simple Pruning --- p.19 / Chapter 2.3 --- Efficient Monitoring by Local Clusters --- p.21 / Chapter 2.3.1 --- Incremental Local Clustering --- p.22 / Chapter 2.3.2 --- Batch Monitoring by Cluster Join --- p.24 / Chapter 2.3.3 --- Cost Analysis and Optimization --- p.28 / Chapter 2.4 --- Piecewise Index and Query Reschedule --- p.31 / Chapter 2.4.1 --- Piecewise VP-trees --- p.32 / Chapter 2.4.2 --- Candidate Rescheduling --- p.35 / Chapter 2.4.3 --- Cost Analysis --- p.36 / Chapter 2.5 --- Upper Bound Lemma: For Dynamic Time Warping Distance --- p.37 / Chapter 2.6 --- Experimental Evaluations --- p.39 / Chapter 2.6.1 --- Effectiveness --- p.40 / Chapter 2.6.2 --- Efficiency --- p.46 / Chapter 2.7 --- Related Work --- p.49 / Chapter 3 --- Privacy Preservation --- p.52 / Chapter 3.1 --- Problem Definition --- p.52 / Chapter 3.2 --- HD-Composition --- p.58 / Chapter 3.2.1 --- Role-based Partition --- p.59 / Chapter 3.2.2 --- Cohort-based Partition --- p.61 / Chapter 3.2.3 --- Privacy Guarantee --- p.70 / Chapter 3.2.4 --- Refinement of HD-composition --- p.75 / Chapter 3.2.5 --- Anonymization Algorithm --- p.76 / Chapter 3.3 --- Experiments --- p.77 / Chapter 3.3.1 --- Failures of Conventional Generalizations --- p.78 / Chapter 3.3.2 --- Evaluations of HD-Composition --- p.79 / Chapter 3.4 --- Related Work --- p.85 / Chapter 4 --- Conclusions --- p.87 / Bibliography --- p.89 Data mining Cluster analysis Data protection
196	Depth-based object segmentation and tracking from multi-view video. / 基于深度的多视角视频物体分割与追踪 / CUHK electronic theses & dissertations collection / Ji yu shen du de duo shi jiao shi pin wu ti fen ge yu zhui zong January 2011 (has links) Zhang, Qian. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 97-111). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Digital video--Data processing Optical data processing
197	Learning from large data : Bias, variance, sampling, and learning curves Brain, Damien, mikewood@deakin.edu.au January 2003 (has links) One of the fundamental machine learning tasks is that of predictive classification. Given that organisations collect an ever increasing amount of data, predictive classification methods must be able to effectively and efficiently handle large amounts of data. However, it is understood that present requirements push existing algorithms to, and sometimes beyond, their limits since many classification prediction algorithms were designed when currently common data set sizes were beyond imagination. This has led to a significant amount of research into ways of making classification learning algorithms more effective and efficient. Although substantial progress has been made, a number of key questions have not been answered. This dissertation investigates two of these key questions. The first is whether different types of algorithms to those currently employed are required when using large data sets. This is answered by analysis of the way in which the bias plus variance decomposition of predictive classification error changes as training set size is increased. Experiments find that larger training sets require different types of algorithms to those currently used. Some insight into the characteristics of suitable algorithms is provided, and this may provide some direction for the development of future classification prediction algorithms which are specifically designed for use with large data sets. The second question investigated is that of the role of sampling in machine learning with large data sets. Sampling has long been used as a means of avoiding the need to scale up algorithms to suit the size of the data set by scaling down the size of the data sets to suit the algorithm. However, the costs of performing sampling have not been widely explored. Two popular sampling methods are compared with learning from all available data in terms of predictive accuracy, model complexity, and execution time. The comparison shows that sub-sampling generally products models with accuracy close to, and sometimes greater than, that obtainable from learning with all available data. This result suggests that it may be possible to develop algorithms that take advantage of the sub-sampling methodology to reduce the time required to infer a model while sacrificing little if any accuracy. Methods of improving effective and efficient learning via sampling are also investigated, and now sampling methodologies proposed. These methodologies include using a varying-proportion of instances to determine the next inference step and using a statistical calculation at each inference step to determine sufficient sample size. Experiments show that using a statistical calculation of sample size can not only substantially reduce execution time but can do so with only a small loss, and occasional gain, in accuracy. One of the common uses of sampling is in the construction of learning curves. Learning curves are often used to attempt to determine the optimal training size which will maximally reduce execution time while nut being detrimental to accuracy. An analysis of the performance of methods for detection of convergence of learning curves is performed, with the focus of the analysis on methods that calculate the gradient, of the tangent to the curve. Given that such methods can be susceptible to local accuracy plateaus, an investigation into the frequency of local plateaus is also performed. It is shown that local accuracy plateaus are a common occurrence, and that ensuring a small loss of accuracy often results in greater computational cost than learning from all available data. These results cast doubt over the applicability of gradient of tangent methods for detecting convergence, and of the viability of learning curves for reducing execution time in general. Data mining database searching algorithms data processing
198	Multisensor data fusion Filippidis, Arthur. January 1993 (has links) (PDF) Bibliography: leaves 149-152. Multisensor data fusion Tracking radar Data processing
199	MicroSoar : a high speed microstructure profiling system May, Glenn H. 10 September 1997 (has links) As ocean ecosystems continue to deteriorate in the face of human induced pressures, marine management professionals are increasingly being urged to predict the impacts of various activities on ocean ecosystems. Many ecosystem interactions are still not adequately understood, so managers often turn to scientists to provide data and analysis on impacts resulting from specific actions. One important physical ocean process in need of more empirical data is microscale turbulence. Because it is responsible for mixing across isopycnal surfaces in stratified waters, turbulence is important in many physical, chemical and biological processes in the ocean. An elementary description of turbulence and mixing is presented along with a summary of the role of turbulence in marine ecosystems. In order to be of use to scientists, turbulence must be measured over large areas of the ocean. This paper presents a discussion of techniques for measuring turbulence. Measurements of turbulence are specialized and costly. A new microstructure data acqusition system was developed to acquire microstructure data eight times faster than present methods allow. The design details of the high-speed microstructure data acquisition system called MicroSoar are presented along with some preliminary data obtained from its deployment on actual cruises. / Graduation date: 1998 Oceanic mixing -- Data processing Turbulence -- Data processing
200	A linear programming and sampling approach to the cutting-order problem Hamilton, Evan D. 15 November 2000 (has links) In the context of forest products, a cutting order is a list of dimension parts along with demanded quantities. The cutting-order problem is to minimize the total cost of filling the cutting order from a given lumber grade (or grades). Lumber of a given grade is supplied to the production line in a random sequence, and each board is cut in a way that maximizes the total value of dimension parts produced, based on a value (or price) specified for each dimension part. Hence, the problem boils down to specifying suitable dimension-part prices for each board to be cut. The method we propose is adapted from Gilmore and Gomory's linear programming approach to the cutting stock problem. The main differences are the use of a random sample to construct the linear program and the use of prices rather than cutting patterns to specify a solution. The primary result of this thesis is that the expected cost of filling an order under the proposed method is approximately equal to the minimum possible expected cost, in the sense that the ratio (expected cost divided by the minimum expected cost) approaches one as the size of the order (e.g., in board feet) and the size of the random sample grow large. A secondary result is a lower bound on the minimum possible expected cost. The actual minimum is usually impractical to calculate, but the lower bound can be used in computer simulations to provide an absolute standard against which to compare costs. It applies only to independent sequences, whereas the convergence property above applies to a large class of dependent sequences, called alpha-mixing sequences. Experimental results (in the form of computer simulations) suggest that the proposed method is capable of attaining nearly minimal expected costs in moderately large orders. The main drawbacks are that the method is computationally expensive and of questionable value in smaller orders. / Graduation date: 2001 Lumbering -- Data processing Sawmills -- Data processing

Search results