81 |
Location Knowledge Discovery from User Activities / ユーザアクティビティからの場所に関する知識発見Zhuang, Chenyi 25 September 2017 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第20737号 / 情博第651号 / 新制||情||112(附属図書館) / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授 吉川 正俊, 教授 石田 亨, 教授 美濃 導彦, 准教授 馬 強 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
82 |
A Data Analytic Methodology for Materials InformaticsAbuOmar, Osama Yousef 17 May 2014 (has links)
A data analytic materials informatics methodology is proposed after applying different data mining techniques on some datasets of particular domain in order to discover and model certain patterns, trends and behavior related to that domain. In essence, it is proposed to develop an information mining tool for vapor-grown carbon nanofiber (VGCNF)/vinyl ester (VE) nanocomposites as a case study. Formulation and processing factors (VGCNF type, use of a dispersing agent, mixing method, and VGCNF weight fraction) and testing temperature were utilized as inputs and the storage modulus, loss modulus, and tan delta were selected as outputs or responses. The data mining and knowledge discovery algorithms and techniques included self-organizing maps (SOMs) and clustering techniques. SOMs demonstrated that temperature had the most significant effect on the output responses followed by VGCNF weight fraction. A clustering technique, i.e., fuzzy C-means (FCM) algorithm, was also applied to discover certain patterns in nanocomposite behavior after using principal component analysis (PCA) as a dimensionality reduction technique. Particularly, these techniques were able to separate the nanocomposite specimens into different clusters based on temperature and tan delta features as well as to place the neat VE specimens in separate clusters. In addition, an artificial neural network (ANN) model was used to explore the VGCNF/VE dataset. The ANN was able to predict/model the VGCNF/VE responses with minimal mean square error (MSE) using the resubstitution and 3olds cross validation (CV) techniques. Furthermore, the proposed methodology was employed to acquire new information and mechanical and physical patterns and trends about not only viscoelastic VGCNF/VE nanocomposites, but also about flexural and impact strengths properties for VGCNF/ VE nanocomposites. Formulation and processing factors (curing environment, use or absence of dispersing agent, mixing method, VGCNF fiber loading, VGCNF type, high shear mixing time, sonication time) and testing temperature were utilized as inputs and the true ultimate strength, true yield strength, engineering elastic modulus, engineering ultimate strength, flexural modulus, flexural strength, storage modulus, loss modulus, and tan delta were selected as outputs. This work highlights the significance and utility of data mining and knowledge discovery techniques in the context of materials informatics.
|
83 |
METABOLIC NETWORK-BASED ANALYSES OF OMICS DATACicek, A. Ercument 23 August 2013 (has links)
No description available.
|
84 |
The Impact of Data Imputation Methodologies on Knowledge DiscoveryBrown, Marvin Lane 26 November 2008 (has links)
No description available.
|
85 |
UNSUPERVISED DATA MINING BY RECURSIVE PARTITIONINGHE, AIJING 16 September 2002 (has links)
No description available.
|
86 |
Robust and Efficient Feature Selection for High-Dimensional DatasetsMo, Dengyao 19 April 2011 (has links)
No description available.
|
87 |
PATTERN EXTRACTION USING A CONTEXT DEPENDENT MEASURE OF DIVERGENCE AND ITS VALIDATIONTEMBE, WAIBHAV DEEPAK 11 October 2001 (has links)
No description available.
|
88 |
Extracting, Representing and Mining Semantic Metadata from Text: Facilitating Knowledge Discovery in BiomedicineRamakrishnan, Cartic 26 September 2008 (has links)
No description available.
|
89 |
Estimating the Importance of Terrorists in a Terror NetworkElhajj, Ahmad, Elsheikh, A., Addam, O., Alzohbi, M., Zarour, O., Aksaç, A., Öztürk, O., Özyer, T., Ridley, Mick J., Alhajj, R. January 2013 (has links)
no / While criminals may start their activities at individual level, the same is in general not true for terrorists who are mostly organized in well established networks. The effectiveness of a terror network could be realized by watching many factors, including the volume of activities accomplished by its members, the capabilities of its members to hide, and the ability of the network to grow and to maintain its influence even after the loss of some members, even leaders. Social network analysis, data mining and machine learning techniques could play important role in measuring the effectiveness of a network in general and in particular a terror network in support of the work presented in this chapter. We present a framework that employs clustering, frequent pattern mining and some social network analysis measures to determine the effectiveness of a network. The clustering and frequent pattern mining techniques start with the adjacency matrix of the network. For clustering, we utilize entries in the table by considering each row as an object and each column as a feature. Thus features of a network member are his/her direct neighbors. We maintain the weight of links in case of weighted network links. For frequent pattern mining, we consider each row of the adjacency matrix as a transaction and each column as an item. Further, we map entries into a 0/1 scale such that every entry whose value is greater than zero is assigned the value one; entries keep the value zero otherwise. This way we can apply frequent pattern mining algorithms to determine the most influential members in a network as well as the effect of removing some members or even links between members of a network. We also investigate the effect of adding some links between members. The target is to study how the various members in the network change role as the network evolves. This is measured by applying some social network analysis measures on the network at each stage during the development. We report some interesting results related to two benchmark networks: the first is 9/11 and the second is Madrid bombing.
|
90 |
Automatic Question Answering and Knowledge Discovery from Electronic Health RecordsWang, Ping 25 August 2021 (has links)
Electronic Health Records (EHR) data contain comprehensive longitudinal patient information, which is usually stored in databases in the form of either multi-relational structured tables or unstructured texts, e.g., clinical notes. EHR provides a useful resource to assist doctors' decision making, however, they also present many unique challenges that limit the efficient use of the valuable information, such as large data volume, heterogeneous and dynamic information, medical term abbreviations, and noisy nature caused by misspelled words.
This dissertation focuses on the development and evaluation of advanced machine learning algorithms to solve the following research questions: (1) How to seek answers from EHR for clinical activity related questions posed in human language without the assistance of database and natural language processing (NLP) domain experts, (2) How to discover underlying relationships of different events and entities in structured tabular EHRs, and (3) How to predict when a medical event will occur and estimate its probability based on previous medical information of patients.
First, to automatically retrieve answers for natural language questions from the structured tables in EHR, we study the question-to-SQL generation task by generating the corresponding SQL query of the input question. We propose a translation-edit model driven by a language generation module and an editing module for the SQL query generation task. This model helps automatically translate clinical activity related questions to SQL queries, so that the doctors only need to provide their questions in natural language to get the answers they need. We also create a large-scale dataset for question answering on tabular EHR to simulate a more realistic setting. Our performance evaluation shows that the proposed model is effective in handling the unique challenges about clinical terminologies, such as abbreviations and misspelled words.
Second, to automatically identify answers for natural language questions from unstructured clinical notes in EHR, we propose to achieve this goal by querying a knowledge base constructed based on fine-grained document-level expert annotations of clinical records for various NLP tasks. We first create a dataset for clinical knowledge base question answering with two sets: clinical knowledge base and question-answer pairs. An attention-based aspect-level reasoning model is developed and evaluated on the new dataset. Our experimental analysis shows that it is effective in identifying answers and also allows us to analyze the impact of different answer aspects in predicting correct answers.
Third, we focus on discovering underlying relationships of different entities (e.g., patient, disease, medication, and treatment) in tabular EHR, which can be formulated as a link prediction problem in graph domain. We develop a self-supervised learning framework for better representation learning of entities across a large corpus and also consider local contextual information for the down-stream link prediction task. We demonstrate the effectiveness, interpretability, and scalability of the proposed model on the healthcare network built from tabular EHR. It is also successfully applied to solve link prediction problems in a variety of domains, such as e-commerce, social networks, and academic networks.
Finally, to dynamically predict the occurrence of multiple correlated medical events, we formulate the problem as a temporal (multiple time-points) and multi-task learning problem using tensor representation. We propose an algorithm to jointly and dynamically predict several survival problems at each time point and optimize it with the Alternating Direction Methods of Multipliers (ADMM) algorithm. The model allows us to consider both the dependencies between different tasks and the correlations of each task at different time points. We evaluate the proposed model on two real-world applications and demonstrate its effectiveness and interpretability. / Doctor of Philosophy / Healthcare is an important part of our lives. Due to the recent advances of data collection and storing techniques, a large amount of medical information is generated and stored in Electronic Health Records (EHR). By comprehensively documenting the longitudinal medical history information about a large patient cohort, this EHR data forms a fundamental resource in assisting doctors' decision making including optimization of treatments for patients and selection of patients for clinical trials. However, EHR data also presents a number of unique challenges, such as (i) large-scale and dynamic data, (ii) heterogeneity of medical information, and (iii) medical term abbreviation. It is difficult for doctors to effectively utilize such complex data collected in a typical clinical practice. Therefore, it is imperative to develop advanced methods that are helpful for efficient use of EHR and further benefit doctors in their clinical decision making.
This dissertation focuses on automatically retrieving useful medical information, analyzing complex relationships of medical entities, and detecting future medical outcomes from EHR data. In order to retrieve information from EHR efficiently, we develop deep learning based algorithms that can automatically answer various clinical questions on structured and unstructured EHR data. These algorithms can help us understand more about the challenges in retrieving information from different data types in EHR. We also build a clinical knowledge graph based on EHR and link the distributed medical information and further perform the link prediction task, which allows us to analyze the complex underlying relationships of various medical entities. In addition, we propose a temporal multi-task survival analysis method to dynamically predict multiple medical events at the same time and identify the most important factors leading to the future medical events. By handling these unique challenges in EHR and developing suitable approaches, we hope to improve the efficiency of information retrieval and predictive modeling in healthcare.
|
Page generated in 0.0847 seconds