• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 19666
  • 3371
  • 2417
  • 2012
  • 1551
  • 1432
  • 878
  • 406
  • 390
  • 359
  • 297
  • 236
  • 208
  • 208
  • 208
  • Tagged with
  • 38173
  • 12466
  • 9255
  • 7116
  • 6700
  • 5896
  • 5299
  • 5201
  • 4736
  • 3458
  • 3306
  • 2826
  • 2727
  • 2544
  • 2116
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
431

A task allocation protocol for real-time financial data mining system.

January 2003 (has links)
Lam Lui-fuk. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 75-76). / Abstracts in English and Chinese. / ABSTRACT --- p.I / 摘要 --- p.II / ACKNOWLEDGEMENT --- p.III / TABLE OF CONTENTS --- p.IV / LIST OF FIGURES --- p.VIII / LIST OF ABBREVIATIONS --- p.X / Chapter CHAPTER 1 --- INTRODUCTION --- p.1 / Chapter 1.1 --- Introduction --- p.1 / Chapter 1.2. --- Motivation and Research Objective --- p.3 / Chapter 1.3. --- Organization of the Dissertation --- p.3 / Chapter CHAPTER 2 --- BACKGROUND STUDIES --- p.5 / Chapter 2.1 --- The Contract Net Protocol --- p.5 / Chapter 2.2 --- Two-tier software architectures --- p.8 / Chapter 2.3 --- Three-tier software architecture --- p.9 / Chapter CHAPTER 3 --- SYSTEM ARCHITECTURE --- p.12 / Chapter 3.1 --- Introduction --- p.12 / Chapter 3.2 --- System Architecture Overview --- p.12 / Chapter 3.2.1 --- Client Layer --- p.13 / Chapter 3.2.2 --- Middle Layer --- p.13 / Chapter 3.2.3 --- Back-end Layer --- p.14 / Chapter 3.3 --- Advantages of the System Architecture --- p.14 / Chapter 3.3.1 --- "Separate the presentation components, business logic and data storage" --- p.14 / Chapter 3.3.2 --- Provide a central-computing platform for user using different computing platforms --- p.15 / Chapter 3.3.3 --- Improve system capacity --- p.15 / Chapter 3.3.4 --- Enable distributed computing --- p.16 / Chapter CHAPTER 4. --- SOFTWARE ARCHITECTURE --- p.17 / Chapter 4.1 --- Introduction --- p.17 / Chapter 4.2 --- Descriptions of Middle Layer Server Side Software Components --- p.17 / Chapter 4.2.1 --- Data Cache --- p.18 / Chapter 4.2.2 --- Functions Library --- p.18 / Chapter 4.2.3 --- Communicator --- p.18 / Chapter 4.2.4 --- Planner Module --- p.19 / Chapter 4.2.5 --- Scheduler module --- p.19 / Chapter 4.2.6 --- Execution Module --- p.20 / Chapter 4.3 --- Overview the Execution of Service Request inside Server --- p.20 / Chapter 4.4 --- Descriptions of Client layer Software Components --- p.21 / Chapter 4.4.1 --- Graphical User Interface --- p.22 / Chapter 4.5 --- Overview of Task Execution in Advanced Client ´ةs Application --- p.23 / Chapter 4.6 --- The possible usages of task allocation protocol --- p.24 / Chapter 4.6.1 --- Chart Drawing --- p.25 / Chapter 4.6.2 --- Compute user-defined technical analysis indicator --- p.25 / Chapter 4.6.3 --- Unbalance loading --- p.26 / Chapter 4.6.4 --- Large number of small data mining V.S. small number of large data mining --- p.26 / Chapter 4.7 --- Summary --- p.27 / Chapter CHAPTER 5. --- THE CONTRACT NET PROTOCOL FOR TASK ALLOCATION --- p.28 / Chapter 5.1 --- Introduction --- p.28 / Chapter 5.2 --- The FIPA Contract Net Interaction Protocol --- p.28 / Chapter 5.2.1 --- Introduction to the FIPA Contract Net Interaction Protocol --- p.28 / Chapter 5.2.2 --- Strengths of the FIPA Contract Net Interaction Protocol for our system --- p.30 / Chapter 5.2.3 --- Weakness of the FIPA Contractor Net Interaction Protocol for our system --- p.32 / Chapter 5.3 --- The Modified Contract Net Protocol --- p.33 / Chapter 5.4 --- The Implementation of the Modified Contract Net Protocol --- p.39 / Chapter 5.5 --- Summary --- p.46 / Chapter CHAPTER 6. --- A CLIENT AS SERVER MODEL USING MCNP FOR TASK ALLOCATION --- p.48 / Chapter 6.1 --- Introduction --- p.48 / Chapter 6.2 --- The CASS System Model --- p.48 / Chapter 6.3 --- The analytical model of the CASS system --- p.51 / Chapter 6.4 --- Performance Analysis of the CASS System --- p.55 / Chapter 6.5 --- Performance Simulation --- p.62 / Chapter 6.6 --- An Extension of the Load-Balancing Algorithm for Non-Uniform Client's Service Time Distribution --- p.68 / Chapter 6.7 --- Summary --- p.69 / Chapter CHAPTER 7. --- CONCLUSION AND FUTURE WORK --- p.71 / Chapter 7.1 --- Conclusion --- p.71 / Chapter 7.2 --- Future Work --- p.73 / BIBLIOGRAPHY --- p.75
432

Efficient and effective outlier detection.

January 2003 (has links)
by Chiu Lai Mei. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 142-149). / Abstracts in English and Chinese. / Abstract --- p.ii / Acknowledgement --- p.vi / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Outlier Analysis --- p.2 / Chapter 1.2 --- Problem Statement --- p.4 / Chapter 1.2.1 --- Binary Property of Outlier --- p.4 / Chapter 1.2.2 --- Overlapping Clusters with Different Densities --- p.4 / Chapter 1.2.3 --- Large Datasets --- p.5 / Chapter 1.2.4 --- High Dimensional Datasets --- p.6 / Chapter 1.3 --- Contributions --- p.8 / Chapter 2 --- Related Work in Outlier Detection --- p.10 / Chapter 2.1 --- Outlier Detection --- p.11 / Chapter 2.1.1 --- Clustering-Based Methods --- p.11 / Chapter 2.1.2 --- Distance-Based Methods --- p.14 / Chapter 2.1.3 --- Density-Based Methods --- p.18 / Chapter 2.1.4 --- Deviation-Based Methods --- p.22 / Chapter 2.2 --- Breakthrough Outlier Notion: Degree of Outlier-ness --- p.25 / Chapter 2.2.1 --- LOF: Local Outlier Factor --- p.26 / Chapter 2.2.2 --- Definitions --- p.26 / Chapter 2.2.3 --- Properties --- p.29 / Chapter 2.2.4 --- Algorithm --- p.30 / Chapter 2.2.5 --- Time Complexity --- p.31 / Chapter 2.2.6 --- LOF of High Dimensional Data --- p.31 / Chapter 3 --- LOF': Formula with Intuitive Meaning --- p.33 / Chapter 3.1 --- Definition of LOF' --- p.33 / Chapter 3.2 --- Properties --- p.34 / Chapter 3.3 --- Time Complexity --- p.37 / Chapter 4 --- "LOF"" for Detecting Small Groups of Outliers" --- p.39 / Chapter 4.1 --- "Definition of LOF"" " --- p.40 / Chapter 4.2 --- Properties --- p.41 / Chapter 4.3 --- Time Complexity --- p.44 / Chapter 5 --- GridLOF for Pruning Reasonable Portions from Datasets --- p.46 / Chapter 5.1 --- GridLOF Algorithm --- p.47 / Chapter 5.2 --- Determine Values of Input Parameters --- p.51 / Chapter 5.2.1 --- Number of Intervals w --- p.51 / Chapter 5.2.2 --- Threshold Value σ --- p.52 / Chapter 5.3 --- Advantages --- p.53 / Chapter 5.4 --- Time Complexity --- p.55 / Chapter 6 --- SOF: Efficient Outlier Detection for High Dimensional Data --- p.57 / Chapter 6.1 --- Motivation --- p.57 / Chapter 6.2 --- Notations and Definitions --- p.59 / Chapter 6.3 --- SOF: Subspace Outlier Factor --- p.62 / Chapter 6.3.1 --- Formal Definition of SOF --- p.62 / Chapter 6.3.2 --- Properties of SOF --- p.67 / Chapter 6.4 --- SOF-Algorithm: the Overall Framework --- p.73 / Chapter 6.5 --- Identify Associated Subspaces of Clusters in SOF-Algorithm . . --- p.74 / Chapter 6.5.1 --- Technical Details in Phase I --- p.76 / Chapter 6.6 --- Technical Details in Phase II and Phase III --- p.88 / Chapter 6.6.1 --- Identify Outliers --- p.88 / Chapter 6.6.2 --- Subspace Quantization --- p.90 / Chapter 6.6.3 --- X-Tree Index Structure --- p.91 / Chapter 6.6.4 --- Compute GSOF and SOF --- p.95 / Chapter 6.6.5 --- Assign SO Values --- p.95 / Chapter 6.6.6 --- Multi-threads Programming --- p.96 / Chapter 6.7 --- Time Complexity --- p.97 / Chapter 6.8 --- Strength of SOF-Algorithm --- p.99 / Chapter 7 --- "Experiments on LOF' ,LOF"" and GridLOF" --- p.102 / Chapter 7.1 --- Datasets Used --- p.103 / Chapter 7.2 --- LOF' --- p.103 / Chapter 7.3 --- "LOF"" " --- p.109 / Chapter 7.4 --- GridLOF --- p.114 / Chapter 8 --- Empirical Results of SOF --- p.121 / Chapter 8.1 --- Synthetic Data Generation --- p.121 / Chapter 8.2 --- Experimental Setup --- p.124 / Chapter 8.3 --- Performance Measure --- p.124 / Chapter 8.3.1 --- Quality Measurement --- p.127 / Chapter 8.3.2 --- Scalability of SOF-Algorithm --- p.136 / Chapter 8.3.3 --- Effect of Parameters on SOF-Algorithm --- p.139 / Chapter 9 --- Conclusion --- p.140 / Bibliography --- p.142 / Publication --- p.149
433

Writer identification using wavelet, contourlet and statistical models

He, Zhenyu 01 January 2006 (has links)
No description available.
434

Clustering of categorical and numerical data without knowing cluster number

Jia, Hong 01 January 2013 (has links)
No description available.
435

Analýza intranetu společnosti Sprinx Systems, a.s. a návrhy na jeho zlepšení. / Analysis of the intranet Sprinx Systems, a.s. and suggestions for its improvement

Perná, Lucie January 2011 (has links)
This diploma thesis devote to design a search of intranets questions and analysis of the current state of the corporate intranet Sprinx Systems, a.s. Intranet can be as useful for small and big company. A well-functioning intranet secures the know-how of company and helps its users at work. The first part of this diploma thesis is focused on design a search of intranet questions, which contains for example historical development of intranets, the basic functions of intranets, create intranet plan, planning intranet content etc. The second part of diploma thesis is focused on the analysis of the current intranet, which was performed using our own experience, structured interview with managers and a questionnaire survey, evaluation and recommendation of proposals to improve the company`s intranet for the future.
436

Un nouvel horizon pour la recommandation : intégration de la dimension spatiale dans l'aide à la décision / A new horizon for the recommendation : integration of spatial dimensions to aid decision making

Chulyadyo, Rajani 19 October 2016 (has links)
De nos jours, il est très fréquent de représenter un système en termes de relations entre objets. Parmi les applications les plus courantes de telles données relationnelles, se situent les systèmes de recommandation (RS), qui traitent généralement des relations entre utilisateurs et items à recommander. Les modèles relationnels probabilistes (PRM) sont un bon choix pour la modélisation des dépendances probabilistes entre ces objets. Une tendance croissante dans les systèmes de recommandation est de rajouter une dimension spatiale à ces objets, que ce soient les utilisateurs, ou les items. Cette thèse porte sur l’intersection peu explorée de trois domaines connexes - modèles probabilistes relationnels (et comment apprendre les dépendances probabilistes entre attributs d’une base de données relationnelles), les données spatiales et les systèmes de recommandation. La première contribution de cette thèse porte sur le chevauchement des PRM et des systèmes de recommandation. Nous avons proposé un modèle de recommandation à base de PRM capable de faire des recommandations à partir des requêtes des utilisateurs, mais sans profils d’utilisateurs, traitant ainsi le problème du démarrage à froid. Notre deuxième contribution aborde le problème de l’intégration de l’information spatiale dans un PRM. / Nowadays it is very common to represent a system in terms of relationships between objects. One of the common applications of such relational data is Recommender System (RS), which usually deals with the relationships between users and items. Probabilistic Relational Models (PRMs) can be a good choice for modeling probabilistic dependencies between such objects. A growing trend in recommender systems is to add spatial dimensions to these objects, and make recommendations considering the location of users and/or items. This thesis deals with the (not much explored) intersection of three related fields – Probabilistic Relational Models (a method to learn probabilistic models from relational data), spatial data (often used in relational settings), and recommender systems (which deal with relational data). The first contribution of this thesis deals with the overlapping of PRM and recommender systems. We have proposed a PRM-based personalized recommender system that is capable of making recommendations from user queries in cold-start systems without user profiles. Our second contribution addresses the problem of integrating spatial information into a PRM.
437

Indexing Linked Data / Indexing Linked Data

Conicov, Andrei January 2012 (has links)
The fast evolution of the World Wide Web has offered the possibility to publish a huge amount of linked documents. Each such document represents a valuable piece of information. Linked Data is the term used to describe a method of exposing and connecting such documents. Even if this method is still in an experimental phase, it is already hard to process all existing data sources and the most obvious solution is to try and index them. The study addresses questions on how to design an index that will be capable to operate with millions of such entries. It analyses the existing projects and describes an index that may fulfill the requirements. The prototype implementation and the provided test results offer additional information about the index structure and effectiveness.
438

Measuring academic performance of students in Higher Education using data mining techniques

Alsuwaiket, Mohammed January 2018 (has links)
Educational Data Mining (EDM) is a developing discipline, concerned with expanding the classical Data Mining (DM) methods and developing new methods for discovering the data that originate from educational systems. It aims to use those methods to achieve a logical understanding of students, and the educational environment they should have for better learning. These data are characterized by their large size and randomness and this can make it difficult for educators to extract knowledge from these data. Additionally, knowledge extracted from data by means of counting the occurrence of certain events is not always reliable, since the counting process sometimes does not take into consideration other factors and parameters that could affect the extracted knowledge. Student attendance in Higher Education has always been dealt with in a classical way, i.e. educators rely on counting the occurrence of attendance or absence building their knowledge about students as well as modules based on this count. This method is neither credible nor does it necessarily provide a real indication of a student s performance. On other hand, the choice of an effective student assessment method is an issue of interest in Higher Education. Various studies (Romero, et al., 2010) have shown that students tend to get higher marks when assessed through coursework-based assessment methods - which include either modules that are fully assessed through coursework or a mixture of coursework and examinations than assessed by examination alone. There are a large number of Educational Data Mining (EDM) studies that pre-processed data through the conventional Data Mining processes including the data preparation process, but they are using transcript data as it stands without looking at examination and coursework results weighting which could affect prediction accuracy. This thesis explores the above problems and tries to formulate the extracted knowledge in a way that guarantees achieving accurate and credible results. Student attendance data, gathered from the educational system, were first cleaned in order to remove any randomness and noise, then various attributes were studied so as to highlight the most significant ones that affect the real attendance of students. The next step was to derive an equation that measures the Student Attendance s Credibility (SAC) considering the attributes chosen in the previous step. The reliability of the newly developed measure was then evaluated in order to examine its consistency. In term of transcripts data, this thesis proposes a different data preparation process through investigating more than 230,000 student records in order to prepare students marks based on the assessment methods of enrolled modules. The data have been processed through different stages in order to extract a categorical factor through which students module marks are refined during the data preparation process. The results of this work show that students final marks should not be isolated from the nature of the enrolled module s assessment methods; rather they must be investigated thoroughly and considered during EDM s data pre-processing phases. More generally, it is concluded that Educational Data should not be prepared in the same way as exist data due to the differences such as sources of data, applications, and types of errors in them. Therefore, an attribute, Coursework Assessment Ratio (CAR), is proposed to use in order to take the different modules assessment methods into account while preparing student transcript data. The effect of CAR and SAC on prediction process using data mining classification techniques such as Random Forest, Artificial Neural Networks and k-Nears Neighbors have been investigated. The results were generated by applying the DM techniques on our data set and evaluated by measuring the statistical differences between Classification Accuracy (CA) and Root Mean Square Error (RMSE) of all models. Comprehensive evaluation has been carried out for all results in the experiments to compare all DM techniques results, and it has been found that Random forest (RF) has the highest CA and lowest RMSE. The importance of SAC and CAR in increasing the prediction accuracy has been proved in Chapter 5. Finally, the results have been compared with previous studies that predicted students final marks, based on students marks at earlier stages of their study. The comparisons have taken into consideration similar data and attributes, whilst first excluding average CAR and SAC and secondly by including them, and then measuring the prediction accuracy between both. The aim of this comparison is to ensure that the new preparation process stage will positively affect the final results.
439

Conditional Differential Expression for Biomarker Discovery In High-throughput Cancer Data

Wang, Dao Sen 15 February 2019 (has links)
Biomarkers have important clinical uses as diagnostic, prognostic, and predictive tools for cancer therapy. However, translation from biomarkers claimed in literature to clinical use has been traditionally poor. Importantly, clinical covariates have been shown to be important factors in biomarker discovery in small-scale studies. Yet, traditional differential gene expression analysis for expression biomarkers ignores covariates, which are only accounted for later, if at all. We conjecture that covariate-sensitive biomarker identification should lead to the discovery of more robust and true biomarkers as confounding effects are considered. Here we examine gene expression in more than 750 breast invasive ductal carcinoma cases from The Cancer Genome Atlas (TCGA-BRCA) in the form of RNA-Seq data. Specifically, we focus on differential gene expression with respect to understanding HER2, ER, and PR biology – the three key receptors in breast cancer. We explore methods of differential expression analysis, including non-parametric Mann-Whitney-Wilcoxon analysis, generalized linear models with covariates, and a novel categorical method for covariates. We tested the influence of common patient characteristics, such as age and race, and clinical covariates such as HER2, ER, and PR receptor statuses. More importantly, we show that inclusion of a correlated covariate (e.g. PR status as a covariate in ER analysis) substantially changes the list of differentially expressed genes, removing many likely false positives and revealing genes obscured by the covariate. Incorporation of relevant covariates in differential gene expression analysis holds strong biological importance with respect to biomarker discovery and may be the next step towards better translation of biomarkers to clinical use.
440

Blog content mining: topic identification and evolution extraction.

January 2009 (has links)
Ng, Kuan Kit. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 92-100). / Abstract also in Chinese. / Abstract --- p.i / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Blog Overview --- p.2 / Chapter 1.2 --- Motivation --- p.4 / Chapter 1.2.1 --- Blog Mining --- p.5 / Chapter 1.2.2 --- Topic Detection and Tracking --- p.8 / Chapter 1.3 --- Objectives and Contributions --- p.9 / Chapter 1.4 --- Proposed Methodology --- p.11 / Chapter 2 --- Related Work --- p.13 / Chapter 2.1 --- Web Document Clustering --- p.13 / Chapter 2.2 --- Document Clustering with Temporal Information --- p.15 / Chapter 2.3 --- Blog Mining --- p.17 / Chapter 3 --- Feature Extraction and Selection --- p.20 / Chapter 3.1 --- Blog Extraction and Content Cleaning --- p.21 / Chapter 3.1.1 --- Blog Parsing and Structure Identification --- p.22 / Chapter 3.1.2 --- Stop-word Removal --- p.24 / Chapter 3.1.3 --- Word Stemming --- p.25 / Chapter 3.1.4 --- Heuristic Content Cleaning and Multiword Grouping --- p.25 / Chapter 3.2 --- Feature Selection --- p.26 / Chapter 3.2.1 --- Term Frequency Inverse Document Frequency --- p.27 / Chapter 3.2.2 --- Term Contribution --- p.29 / Chapter 4 --- Blog Topic Extraction --- p.31 / Chapter 4.1 --- Requirements of Document Clustering --- p.32 / Chapter 4.1.1 --- Vector Space Modeling --- p.32 / Chapter 4.1.2 --- Similarity Measurement --- p.33 / Chapter 4.2 --- Document Clustering --- p.34 / Chapter 4.2.1 --- Partitional Clustering --- p.36 / Chapter 4.2.2 --- Hierarchial Clustering --- p.37 / Chapter 4.2.3 --- Density-Based Clustering --- p.38 / Chapter 4.3 --- Proposed Concept Clustering --- p.40 / Chapter 4.3.1 --- Semantic Distance between Concepts --- p.43 / Chapter 4.3.2 --- Bounded Density-Based Clustering --- p.47 / Chapter 4.3.3 --- Document Assignment with Topic Clusters --- p.57 / Chapter 4.4 --- Discussion --- p.58 / Chapter 5 --- Blog Topic Evolution --- p.61 / Chapter 5.1 --- Topic Evolution Graph --- p.61 / Chapter 5.2 --- Topic Evolution --- p.64 / Chapter 6 --- Experimental Result --- p.69 / Chapter 6.1 --- Evaluation of Topic Cluster --- p.70 / Chapter 6.1.1 --- Evaluation Criteria --- p.70 / Chapter 6.1.2 --- Evaluation Result --- p.73 / Chapter 6.2 --- Evaluation of Topic Evolution --- p.79 / Chapter 6.2.1 --- Results of Topic Evolution Graph --- p.80 / Chapter 6.2.2 --- Evaluation Criteria --- p.82 / Chapter 6.2.3 --- Evaluation of Topic Evolution --- p.83 / Chapter 6.2.4 --- Case Study --- p.84 / Chapter 7 --- Conclusions and Future Work --- p.88 / Chapter 7.1 --- Conclusions --- p.88 / Chapter 7.2 --- Future Work --- p.90 / Bibliography --- p.92 / Chapter A --- Stop Word List --- p.101 / Chapter B --- Feature Selection Comparison --- p.104 / Chapter C --- Topic Evolution --- p.106 / Chapter D --- Topic Cluster --- p.108

Page generated in 0.1169 seconds