Global ETD Search

601	Accelerated circuit simulation via Faber series and hierarchical matrix techniques Li, Ying-chi, 李應賜 January 2013 (has links) This dissertation presents two circuit simulation techniques to accelerate the simulation time for time-domain transient circuit simulation and circuit thermal analysis. Matrix exponential method is one of the state-of-the-art methods for millionth-order time-domain circuit simulations due to its explicit nature and global stability. The matrix exponential is commonly computed by Krylov subspace methods, which become inefficient when the circuit is stiff, namely when the time constants of the circuit differ by several orders. The truncated Faber series is suitable for accurate evaluation of the matrix exponential even under a highly stiff system matrix arising from practical circuits. Experiments have shown that the proposed approach is globally stable, highly accurate and parallelizable, and avoids excessive memory storage demanded by Krylov subspace methods. Another major issue in circuit simulation is thermal circuit analysis. The use of Hierarchical matrix (H-matrix) in the efficient finite-element-based (FE-based) direct solver implementation for both steady and transient thermal analyses of three-dimensional integrated circuits (3D ICs) is proposed. H-matrix was shown to provide a data-sparse way to approximate the matrices and their inverses with almost linear space and time complexities. This is also true for FE-based transient analysis of thermal parabolic partial differential equations (PDEs). Specifically, the stiffness matrix from a FE-based steady and transient thermal analysis can be represented by H-matrix without approximation, and its inverse and Cholesky factors can be evaluated by H-matrix with controlled accuracy. This thesis shows that the memory and time complexities of the solver are bounded by O(k_1NlogN) and O(K_1^2Nlog〖log〗^2N), respectively, for very large scale thermal systems, where k1 is a small quantity determined by accuracy requirements and N is the number of unknowns in the system. Numerical results validate and demonstrate the effectiveness of the proposed method in terms of predicted theoretical scalability. / published_or_final_version / Electrical and Electronic Engineering / Master / Master of Philosophy Integrated circuits - Data processing.
602	Systematic review : the return on investment of EHR implementation and associated key factors leading to positive return-on-investment Tse, Pui-yin, Fiona, 謝佩妍 January 2013 (has links) Background: Implementations of national electronic health record (EHR) were currently underway worldwide as a core objective of eHealth strategies. It was widely believed that implementation of EHR might lead to considerable financial savings. This paper aimed to conduct a systematic review to assess return-on-investment (ROI) of HER implementation and to identify areas with greatest potential to positive ROI for ongoing deliberation on continuous development of EHR. Methodology: An inclusive string was developed to search English paper published between January 2003 and June 2013. This paper only included studies meet the following criteria 1) Primary study; 2) Involve a computerized system with electronic health record; and 3) include some form of economic evaluation. Critical appraisal was undertaken and articles with higher quality were selected. Hard ROI and soft ROI defined for EHR implementation were adopted as outcome metrics to examine both tangible and intangible return of EHR implementation. Results: A total of 18 articles were examined for data extraction and synthesis. Most of the available evidences came from pre-post evaluation or cross-sectional analysis without uniform standards for reporting. Findings of 56% of the articles indicated that there is cost saving after EHR implementation while 17% of the articles indicated loss in totalrevenue. The remaining articles concluded that there is no association between cost reduction and EHR implementation. Among the defined hard ROI, most studies mentioned the positive effect in resource reduction. Some authors argued that the resource was reallocated to other initiatives and resulted in negligible cost saving. According to the selected literatures, evidences showed that EHR was able to achieve defined soft ROI, especially for improving caring process, but the overall outcome was subject to individual practice. Authors of 12 out of 18 articles have identified the factor leading to positive return and provided recommendation toward successful EHR implementation. Other than implying helpful EHR functions and promoting practice change, additional incentive on quality improvement and performance benchmarking should be considered. The organizations and EHR systems studied in the articles examined were vastly different; it would be desirable if a controlled study adopting EHR with uniform standards can be performed to evaluate the ROI of different clinical settings. Conclusions: The benefits of EHR are not guaranteed, it requires change of practice and substantial efforts. Healthcare industries have to equip themselves for implementing the new technology and to exploit the usage for better clinical outcome. / published_or_final_version / Public Health / Master / Master of Public Health Medical records - Data processing
603	Efficient methods for improving the sensitivity and accuracy of RNA alignments and structure prediction Li, Yaoman, 李耀满 January 2013 (has links) RNA plays an important role in molecular biology. RNA sequence comparison is an important method to analysis the gene expression. Since aligning RNA reads needs to handle gaps, mutations, poly-A tails, etc. It is much more difficult than aligning other sequences. In this thesis, we study the RNA-Seq align tools, the existing gene information database and how to improve the accuracy of alignment and predict RNA secondary structure. The known gene information database contains a lot of reliable gene information that has been discovered. And we note most DNA align tools are well developed. They can run much faster than existing RNA-Seq align tools and have higher sensitivity and accuracy. Combining with the known gene information database, we present a method to align RNA-Seq data by using DNA align tools. I.e. we use the DNA align tools to do alignment and use the gene information to convert the alignment to genome based. The gene information database, though updated daily, there are still a lot of genes and alternative splicings that hadn't been discovered. If our RNA align tool only relies on the known gene database, then there may be a lot reads that come from unknown gene or alternative splicing cannot be aligned. Thus, we show a combinational method that can cover potential alternative splicing junction sites. Combining with the original gene database, the new align tools can cover most alignments which are reported by other RNA-Seq align tools. Recently a lot of RNA-Seq align tools have been developed. They are more powerful and faster than the old generation tools. However, the RNA read alignment is much more complicated than other sequence alignment. The alignments reported by some RNA-Seq align tools have low accuracy. We present a simple and efficient filter method based on the quality score of the reads. It can filter most low accuracy alignments. At last, we present a RNA secondary prediction method that can predict pseudoknot(a type of RNA secondary structure) with high sensitivity and specificity. / published_or_final_version / Computer Science / Master / Master of Philosophy Nucleotide sequence - Data processing
604	Budget-limited data disambiguation Yang, Xuan, 楊譞 January 2013 (has links) The problem of data ambiguity exists in a wide range of applications. In this thesis, we study “cost-aware" methods to alleviate the data ambiguity problems in uncertain databases and social-tagging data. In database applications, ambiguous (or uncertain) data may originate from data integration and measurement error of devices. These ambiguous data are maintained by uncertain databases. In many situations, it is possible to “clean", or remove, ambiguities from these databases. For example, the GPS location of a user is inexact due to measurement error, but context information (e.g., what a user is doing) can be used to reduce the imprecision of the location value. In practice, a cleaning activity often involves a cost, may fail and may not remove all ambiguities. Moreover, the statistical information about how likely database entities can be cleaned may not be precisely known. We model the above aspects with the uncertain database cleaning problem, which requires us to make sensible decisions in selecting entities to clean in order to maximize the amount of ambiguous information removed under a limited budget. To solve this problem, we propose the Explore-Exploit (or EE) algorithm, which gathers valuable information during the cleaning process to determine how the remaining cleaning budget should be invested. We also study how to fine-tune the parameters of EE in order to achieve optimal cleaning effectiveness. Social tagging data capture web users' textual annotations, called tags, for resources (e.g., webpages and photos). Since tags are given by casual users, they often contain noise (e.g., misspelled words) and may not be able to cover all the aspects of each resource. In this thesis, we design a metric to systematically measure the tagging quality of each resource based on the tags it has received. We propose an incentive-based tagging framework in order to improve the tagging quality. The main idea is to award users some incentive for giving (relevant) tags to resources. The challenge is, how should we allocate incentives to a large set of resources, so as to maximize the improvement of their tagging quality under a limited budget? To solve this problem, we propose a few efficient incentive allocation strategies. Experiments shows that our best strategy provides resources with a close-to-optimal gain in tagging quality. To summarize, we study the problem of budget-limited data disambiguation for uncertain databases and social tagging data \| given a set of objects (entities from uncertain databases or web resources), how can we make sensible decisions about which object to \disambiguate" (to perform a cleaning activity on the entity or ask a user to tag the resource), in order to maximize the amount of ambiguous information reduced under a limited budget. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy Data mining - Mathematical models
605	Algorithms for evolving graph analysis Ren, Chenghui, 任成會 January 2014 (has links) In many applications, entities and their relationships are represented by graphs. Examples include social networks (users and friendship), the WWW (web pages and hyperlinks) and bibliographic networks (authors and co-authorship). In a dynamic world, information changes and so the graphs representing the information evolve with time. For example, a Facebook link between two friends is established, or a hyperlink is added to a web page. We propose that historical graph-structured data be archived for analytical processing. We call a historical evolving graph sequence an EGS. We study the problem of efficient query processing on an EGS, which finds many applications that lead to interesting evolving graph analysis. To solve the problem, we propose a solution framework called FVF and a cluster-based LU decomposition algorithm called CLUDE, which can evaluate queries efficiently to support EGS analysis. The Find-Verify-and-Fix (FVF) framework applies to a wide range of queries. We demonstrate how some important graph measures, including shortest-path distance, closeness centrality and graph centrality, can be efficiently computed from EGSs using FVF. Since an EGS generally contains numerous large graphs, we also discuss several compact storage models that support our FVF framework. Through extensive experiments on both real and synthetic datasets, we show that our FVF framework is highly efficient in EGS query processing. A graph can be conveniently modeled by a matrix from which various quantitative measures are derived like PageRank and SALSA and Personalized PageRank and Random Walk with Restart. To compute these measures, linear systems of the form Ax = b, where A is a matrix that captures a graph's structure, need to be solved. To facilitate solving the linear system, the matrix A is often decomposed into two triangular matrices (L and U). In a dynamic world, the graph that models it changes with time and thus is the matrix A that represents the graph. We consider a sequence of evolving graphs and its associated sequence of evolving matrices. We study how LU-decomposition should be done over the sequence so that (1) the decomposition is efficient and (2) the resulting LU matrices best preserve the sparsity of the matrices A's (i.e., the number of extra non-zero entries introduced in L and U are minimized). We propose a cluster-based algorithm CLUDE for solving the problem. Through an experimental study, we show that CLUDE is about an order of magnitude faster than the traditional incremental update algorithm. The number of extra non-zero entries introduced by CLUDE is also about an order of magnitude fewer than that of the traditional algorithm. CLUDE is thus an efficient algorithm for LU decomposition that produces high-quality LU matrices over an evolving matrix sequence. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy Graph theory - Data processing
606	Competitive online job scheduling algorithms under different energy management models Chan, Sze-hang, 陳思行 January 2013 (has links) Online flow-time scheduling is a fundamental problem in computer science and has been extensively studied for years. It is about how to design a scheduler to serve computer jobs with unpredictable arrival times and varying sizes and priorities so as to minimize the total flow time (better understood as response time) of jobs. It has many applications, most notable in the operating of server farms. As energy has become an important issue, the design of scheduler also has to take power management into consideration, for example, how to scale the speed of the processors dynamically. The objectives are orthogonal as one would prefer lower processor speed to save energy, yet a good quality of service must be retained. In this thesis, I study a few scheduling problems for energy and flow time in depth and give new algorithms to tackle them. The competitiveness of our algorithms is guaranteed with worst-case mathematical analysis against the best possible or hypothetical solutions. In the speed scaling model, the power of a processor increases with its speed according to a certain function (e.g., a cubic function of speed). Among all online scheduling problems with speed scaling, the nonclairvoyant setting (in which the size of a job is not known during its execution) with arbitrary priorities is perhaps the most challenging. This thesis gives the first competitive algorithm called WLAPS for this setting. In reality, it is not uncommon that during the peak-load period, some (low-priority) users have their jobs rejected by the servers. This triggers me to study more complicated scheduling algorithms that can strike a good balance among speed scaling, flow time and rejection penalty. Two new algorithms UPUW and HDFAC for different models of rejection penalty have been proposed and analyzed. Last, but perhaps the most interesting, we study power management in large server farm environment in which the primary energy saving mechanism is to put some processors to sleep. Two new algorithms POOL and SATA have been designed to tackle jobs that cannot and can migrate among the processors, respectively. They are integrated algorithms that can consider speed scaling, job scheduling and processor sleep management together to optimize the energy usage and ow time simultaneously. These algorithms are again proven mathematically to be competitive even in the worst case. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy Production scheduling - Data processing
607	Workflows for identifying differentially expressed small RNAs and detection of low copy repeats in human Liu, Xuan, 刘璇 January 2014 (has links) With the rapid development of next-generation sequencing NGS technology, we are able to investigate various aspects biological problems, including genome and transcriptome sequencing, genomic structural variation and the mechanism of regulatory small RNAs, etc. An enormous number of associated computational methods have been proposed to study the biological problems using NGS reads, at a low cost of expense and time. Regulatory small RNAs and genomic structure variations are two main problems that we have studied. In the area of regulatory small RNA, various computational tools have been designed from the prediction of small RNA to target prediction. Regulatory small RNAs play essential roles in plants and bacteria such as in responses to environmental stresses. We focused on sRNAs that in act by base pairing with target mRNA in complementarity. A comprehensive analysis workflow that is able to integrate sRNA-Seq and RNA-Seq analysis and generate regulatory network haven't been designed yet. Thus, we proposed and implemented two small RNA analysis workflow for plants and bacteria respectively. In the area of genomic structural variations (SV), two types of disease-related SVs have been investigated, including complex low copy repeats (LCRs, also termed as segmental duplications) and tandem duplication (TD). LCRs provide structural basis to form a combination of other SVs which may in turn lead to some serious genetic diseases and TDs of specific areas have been reported for patients. Locating LCRs and TDs in human genome can help researchers to further interrogate the mechanism of related diseases. Therefore, we proposed two computational methods to predict novel LCRs and TDs in human genome. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy Non-coding RNA - Data processing
608	Binning and annotation for metagenomic next-generation sequencing reads Wang, Yi, 王毅 January 2014 (has links) The development of next-generation sequencing technology enables us to obtain a vast number of short reads from metagenomic samples. In metagenomic samples, the reads from different species are mixed together. So, metagenomic binning has been introduced to cluster reads from the same or closely related species and metagenomic annotation is introduced to predict the taxonomic information of each read. Both metagenomic binning and annotation are critical steps in downstream analysis. This thesis discusses the difficulties of these two computational problems and proposes two algorithmic methods, MetaCluster 5.0 and MetaAnnotator, as solutions. There are six major challenges in metagenomic binning: (1) the lack of reference genomes; (2) uneven abundance ratios; (3) short read lengths; (4) a large number of species; (5) the existence of species with extremely-low-abundance; and (6) recovering low-abundance species. To solve these problems, I propose a two-round binning method, MetaCluster 5.0. The improvement achieved by MetaCluster 5.0 is based on three major observations. First, the short q-mer (length-q substring of the sequence with q = 4, 5) frequency distributions of individual sufficiently long fragments sampled from the same genome are more similar than those sampled from different genomes. Second, sufficiently long w-mers (length-w substring of the sequence with w ≈ 30) are usually unique in each individual genome. Third, the k-mer (length-k substring of the sequence with k ≈ 16) frequencies from reads of a species are usually linearly proportional to that of the species’ abundance. The metagenomic annotation methods in the literatures often suffer from five major drawbacks: (1) unable to annotate many reads; (2) less precise annotation for reads and more incorrect annotation for contigs; (3) unable to deal with novel clades with limited references genomes well; (4) performance affected by variable genome sequence similarities between different clades; and (5) high time complexity. In this thesis, a novel tool, MetaAnnotator, is proposed to tackle these problems. There are four major contributions of MetaAnnotator. Firstly, instead of annotating reads/contigs independently, a cluster of reads/contigs are annotated as a whole. Secondly, multiple reference databases are integrated. Thirdly, for each individual clade, quadratic discriminant analysis is applied to capture the similarities between reference sequences in the clade. Fourthly, instead of using alignment tools, MetaAnnotator perform annotation using k-mer exact match which is more efficient. Experiments on both simulated datasets and real datasets show that MetaCluster 5.0 and MetaAnnotator outperform existing tools with higher accuracy as well as less time and space cost. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy Nucleotide sequence - Data processing
609	Discovering meta-paths in large knowledge bases Meng, Changping, 蒙昌平 January 2014 (has links) A knowledge base, such as Yago or DBpedia, can be modeled as a large graph with nodes and edges annotated with class and relationship labels. Recent work has studied how to make use of these rich information sources. In particular, meta-paths, which represent sequences of node classes and edge types between two nodes in a knowledge base, have been proposed for such tasks as information retrieval, decision making, and product recommendation. Current methods assume meta-paths are found by domain experts. However, in a large and complex knowledge base, retrieving meta-paths manually can be tedious and difficult. We thus study how to discover meta-paths automatically. Specifically, users are asked to provide example pairs of nodes that exhibit high proximity. We then investigate how to generate meta-paths that can best explain the relationship between these node pairs. Since this problem is computationally intractable, we propose a greedy algorithm to select the most relevant meta-paths. We also present a data structure to enable efficient execution of this algorithm. We further incorporate hierarchical relationships among node classes in our solutions. Finally, we propose an effective similarity join algorithm in order to generate more node pairs using these meta-paths. Extensive experiments on real knowledge bases show that our approach captures important meta-paths in an efficient and scalable manner. / published_or_final_version / Computer Science / Master / Master of Philosophy Data mining Knowledge management
610	Practical Delaunay triangulation algorithms for surface reconstruction and related problems Choi, Sunghee 28 August 2008 (has links) Not available / text Triangulation Geometry--Data processing

Search results