141 |
Schema Matching and Data Extraction over HTML TablesTao, Cui 16 September 2003 (has links) (PDF)
Data on the Web in HTML tables is mostly structured, but we usually do not know the structure in advance. Thus, we cannot directly query for data of interest. We propose a solution to this problem for the case of mostly structured data in the form of HTML tables, based on document-independent extraction ontologies. The solution entails elements of table location and table understanding, data integration, and wrapper creation. Table location and understanding allows us to locate the table of interest, recognize attributes and values, pair attributes with values, and form records. Data-integration techniques allow us to match source records with a target schema. Ontologically specified wrappers allow us to extract data from source records into a target schema. Experimental results show that we can successfully map data of interest from source HTML tables with unknown structure to a given target database schema. We can thus "directly" query source data with unknown structure through a known target schema.
|
142 |
Putting it all together: Geophysical data integrationKvamme, Kenneth L., Ernenwein, Eileen G., Menzer, Jeremy G. 01 January 2018 (has links)
The integration of information from multiple geophysical and other prospection surveys of archaeological sites and regions leads to a richer and more complete understanding of subsurface content, structure, and physical relationships. Such fusions of information occur within a single geophysical data set or between two or more geophysical and other prospection sources in one, two, or three dimensions. An absolute requirement is the accurate coregistration of all information to the same coordinate space. Data integrations occur at two levels. At the feature level, discrete objects that denote archaeological features are defined, usually subjectively, through the manual digitization of features interpreted in the data, although there is growing interest in automated feature identification and extraction. At the pixel level, distributional issues of skewness and outliers, high levels of noise that obfuscate targets of interest, and a lack of correlation between largely independent dimensions must be confronted. Nevertheless, successful fusions occur using computer graphic methods, simple arithmetic combinations, and advanced multivariate methods, including principal components analysis and supervised and unsupervised classifications. Four case studies are presented that illustrate some of these approaches and offer advancement into new domains.
|
143 |
Datenintegration: die Echtzeit-Lagedarstellung aus der Leitstelle für Mittel der nichtpolizeilichen GefahrenabwehrSachtler, Clamor 02 April 2024 (has links)
In dieser Arbeit wird die Konzeption und Umsetzung einer digitalen Echtzeit-Lagedarstellung für Mittel der nichtpolizeilichen Gefahrenabwehr im Umfeld der Integrierten Regionalleitstelle Leipzig beschrieben.
Das Ziel besteht darin, eine automatische, selbstfüllende Lösung zur Datenintegration und -darstellung zu konzipieren, die eine umfassende und bedarfsgerechte Informationsversorgung ohne direkte Zuarbeiten des Leitstellenpersonals ermöglicht.
Die in der Leitstelle vorhandenen Daten sollen genutzt werden, um unter anderem die Auslastung und Verfügbarkeit von Ressourcen sowie Einsatzschwerpunkte zu visualisieren und diese Informationen den Nutzergruppen bedarfsgerecht zur Verfügung zu stellen, um basierend darauf taktische Entscheidungen treffen zu können.
Um den Bedarf und die Anforderungen zu ermitteln, wurde eine qualitative und quantitative Umfrage durchgeführt. Die Ergebnisse haben den grundlegenden Bedarf mit verschiedenen Bedürfnissen und Prioritäten der unterschiedlichen Nutzergruppen bestätigt. In dieser Arbeit wurden verschiedene Lösungsansätze verglichen und die Entwicklung eines Prototypen für eine Echtzeit-Lagedarstellung vorgestellt.
Die Anwendbarkeit und Grenzen des entwickelten Prototypen wurden durch praktische Untersuchungen mit direktem Feedback und einer Überprüfung der Korrektheit der Daten bewertet. Die Evaluation ergab, dass der Prototyp bereits in seiner derzeitigen Form ein sehr genaues Lagebild liefert und die Erwartungen der Nutzer erfüllt.:Inhaltsverzeichnis
Abkürzungsverzeichnis
1 Übersicht
1.1 Zielstellung
1.2 Methodisches Vorgehen
2 Betrachtung der Ausgangssituation
2.1 Die Integrierte Regionalleitstelle Leipzig
2.2 Technischer Aufbau der IRLS
2.3 Lagedarstellung
2.3.1 Lagedarstellung im Kontext der nichtpolizeilichen Gefahrenabwehr
2.3.2 Lagedarstellung im Kontext der Leitstelle
2.3.3 Einordnung der Lagedarstellung in einen allgemeinen Kontext
3 Anforderungen
3.1 Ermittlung des optimalen Informationsbedarfs
3.2 Durchführung einer Umfrage
3.2.1 Im Bereich Feuerwehr
3.2.2 Im Bereich Rettungsdienst
3.2.3 Im Bereich Katastrophenschutz
3.2.4 Zusammenfassung der Ergebnisse
3.3 Vergleich von Lösungsansätzen
3.3.1 Anruf in der Leitstelle
3.3.2 Bereitstellung des Einsatzleitsystems für Anwender
3.3.3 Gang in die Leitstelle
3.3.4 LvS Display
3.3.5 MobiKat
4 Entwicklung eines Prototypen zur Echtzeit-Lagedarstellung
4.1 Lösungsansatz
4.2 Implementierung
4.2.1 Werkzeuge
4.2.2 Daten und Datenquellen
4.2.3 Herangehensweise
4.3 Datenschutzkriterien
4.4 Plausibilität
5 Untersuchung des Prototypen im Leitstellenumfeld und der Gefahrenabwehr
5.1 Im Einsatzfeld der Stabsarbeit
5.2 Ressourcenbetrachtung
5.2.1 HLF
5.2.2 RTW
5.3 Schlussfolgerung
5.4 Grenzfälle
5.5 Kritik des Modells
6 Fazit und Ausblick
Abbildungsverzeichnis
Tabellenverzeichnis
Quellcodeverzeichnis
Literaturverzeichnis
Anhang / This paper describes the design and implementation of a digital real-time management display for resources of emergency services in the vicinity of the Leipzig Regional Dispatch Center.
The objective is to create an automated and self-populating system for data integration and display, facilitating the extensive and tailored supply of information without the need for direct human involvement within the control center.
The data available in the control center provides visualization of resource utilization and availability, operational priorities, and other key information that can be made available to user groups on demand for tactical decision making.
A qualitative and quantitative survey was conducted to identify needs and requirements. The results confirmed the demand with different demands and priorities of the various user groups. The paper compares different approaches and presents the development of a prototype for a real-time management display.
Practical tests with direct feedback and a verification of accuracy were used to evaluate the capabilities and limitations of the developed prototype. The evaluation demonstrated that the prototype currently offers a fairly accurate picture of the situation and meets the expectations of the users.:Inhaltsverzeichnis
Abkürzungsverzeichnis
1 Übersicht
1.1 Zielstellung
1.2 Methodisches Vorgehen
2 Betrachtung der Ausgangssituation
2.1 Die Integrierte Regionalleitstelle Leipzig
2.2 Technischer Aufbau der IRLS
2.3 Lagedarstellung
2.3.1 Lagedarstellung im Kontext der nichtpolizeilichen Gefahrenabwehr
2.3.2 Lagedarstellung im Kontext der Leitstelle
2.3.3 Einordnung der Lagedarstellung in einen allgemeinen Kontext
3 Anforderungen
3.1 Ermittlung des optimalen Informationsbedarfs
3.2 Durchführung einer Umfrage
3.2.1 Im Bereich Feuerwehr
3.2.2 Im Bereich Rettungsdienst
3.2.3 Im Bereich Katastrophenschutz
3.2.4 Zusammenfassung der Ergebnisse
3.3 Vergleich von Lösungsansätzen
3.3.1 Anruf in der Leitstelle
3.3.2 Bereitstellung des Einsatzleitsystems für Anwender
3.3.3 Gang in die Leitstelle
3.3.4 LvS Display
3.3.5 MobiKat
4 Entwicklung eines Prototypen zur Echtzeit-Lagedarstellung
4.1 Lösungsansatz
4.2 Implementierung
4.2.1 Werkzeuge
4.2.2 Daten und Datenquellen
4.2.3 Herangehensweise
4.3 Datenschutzkriterien
4.4 Plausibilität
5 Untersuchung des Prototypen im Leitstellenumfeld und der Gefahrenabwehr
5.1 Im Einsatzfeld der Stabsarbeit
5.2 Ressourcenbetrachtung
5.2.1 HLF
5.2.2 RTW
5.3 Schlussfolgerung
5.4 Grenzfälle
5.5 Kritik des Modells
6 Fazit und Ausblick
Abbildungsverzeichnis
Tabellenverzeichnis
Quellcodeverzeichnis
Literaturverzeichnis
Anhang
|
144 |
Bioinformatics Approaches to Heterogeneous Omic Data IntegrationGuan, Xiaowei 27 August 2012 (has links)
No description available.
|
145 |
Shoreline Mapping with Integrated HSI-DEM using Active Contour MethodSukcharoenpong, Anuchit 30 December 2014 (has links)
No description available.
|
146 |
Intelligent Data Mining on Large-scale Heterogeneous Datasets and its Application in Computational BiologyWu, Chao 10 October 2014 (has links)
No description available.
|
147 |
Knowledge Based Topology Discovery and Geo-localizationShelke, Yuri Rajendra 27 September 2010 (has links)
No description available.
|
148 |
SEEDEEP: A System for Exploring and Querying Deep Web Data SourcesWang, Fan 27 September 2010 (has links)
No description available.
|
149 |
Statistical Methods for Data Integration and Disease ClassificationIslam, Mohammad 11 1900 (has links)
Classifying individuals into binary disease categories can be challenging due to complex relationships across different exposures of interest. In this thesis, we investigate three different approaches for disease classification using multiple biomarkers. First, we consider combining information from literature reviews and INTERHEART data set to identify the threshold of ApoB, ApoA1 and the ratio of these two biomarkers to classify individuals at risk of developing myocardial infarction. We develop a Bayesian estimation procedure for this purpose that utilizes the conditional probability distribution of these biomarkers. This method is flexible compared to standard logistic regression approach and allows us to identify a precise threshold of these biomarkers. Second, we consider the problem of disease classification using two dependent biomarkers. An independently identified threshold for this purpose usually leads to a conflicting classification for some individuals. We develop and describe a method of determining the joint threshold of two dependent biomarkers for a disease classification, based on the joint probability distribution function constructed through copulas. This method will allow researchers uniquely classify individuals at risk of developing the disease. Third, we consider the problem of classifying an outcome using a gene and miRNA expression data sets. Linear principal component analysis (PCA) is a widely used approach to reduce the dimension of such data sets and subsequently use it for classification, but many authors suggest using kernel PCA for this purpose. Using real and simulated data sets, we compare these two approaches and assess the performance of components towards genetic data integration for an outcome classification. We conclude that reducing dimensions using linear PCA followed by a logistic regression model for classification seems to be acceptable for this purpose. We also observe that integrating information from multiple data sets using either of these approaches leads to a better performance of an outcome classification. / Thesis / Doctor of Philosophy (PhD)
|
150 |
Module-based Analysis of Biological Data for Network Inference and Biomarker DiscoveryZhang, Yuji 25 August 2010 (has links)
Systems biology comprises the global, integrated analysis of large-scale data encoding different levels of biological information with the aim to obtain global insight into the cellular networks. Several studies have unveiled the modular and hierarchical organization inherent in these networks. In this dissertation, we propose and develop innovative systems approaches to integrate multi-source biological data in a modular manner for network inference and biomarker discovery in complex diseases such as breast cancer.
The first part of the dissertation is focused on gene module identification in gene expression data. As the most popular way to identify gene modules, many cluster algorithms have been applied to the gene expression data analysis. For the purpose of evaluating clustering algorithms from a biological point of view, we propose a figure of merit based on Kullback-Leibler divergence between cluster membership and known gene ontology attributes. Several benchmark expression-based gene clustering algorithms are compared using the proposed method with different parameter settings. Applications to diverse public time course gene expression data demonstrated that fuzzy c-means clustering is superior to other clustering methods with regard to the enrichment of clusters for biological functions. These results contribute to the evaluation of clustering outcomes and the estimations of optimal clustering partitions.
The second part of the dissertation presents a hybrid computational intelligence method to infer gene regulatory modules. We explore the combined advantages of the nonlinear and dynamic properties of neural networks, and the global search capabilities of the hybrid genetic algorithm and particle swarm optimization method to infer network interactions at modular level.
The proposed computational framework is tested in two biological processes: yeast cell cycle, and human Hela cancer cell cycle. The identified gene regulatory modules were evaluated using several validation strategies: 1) gene set enrichment analysis to evaluate the gene modules derived from clustering results; (2) binding site enrichment analysis to determine enrichment of the gene modules for the cognate binding sites of their predicted transcription factors; (3) comparison with previously reported results in the literatures to confirm the inferred regulations.
The proposed framework could be beneficial to biologists for predicting the components of gene regulatory modules in which any candidate gene is involved. Such predictions can then be used to design a more streamlined experimental approach for biological validation. Understanding the dynamics of these gene regulatory modules will shed light on the related regulatory processes. Driven by the fact that complex diseases such as cancer are “diseases of pathways”, we extended the module concept to biomarker discovery in cancer research. In the third part of the dissertation, we explore the combined advantages of molecular interaction network and gene expression profiles to identify biomarkers in cancer research. The reliability of conventional gene biomarkers has been challenged because of the biological heterogeneity and noise within and across patients. In this dissertation, we present a module-based biomarker discovery approach that integrates interaction network topology and high-throughput gene expression data to identify markers not as individual genes but as modules. To select reliable biomarker sets across different studies, a hybrid method combining group feature selection with ensemble feature selection is proposed. First, a group feature selection method is used to extract the modules (subnetworks) with discriminative power between disease groups. Then, an ensemble feature selection method is used to select the optimal biomarker sets, in which a double-validation strategy is applied. The ensemble method allows combining features selected from multiple classifications with various data subsampling to increase the reliability and classification accuracy of the final selected biomarker set. The results from four breast cancer studies demonstrated the superiority of the module biomarkers identified by the proposed approach: they can achieve higher accuracies, and are more reliable in datasets with same clinical design. Based on the experimental results above, we believe that the proposed systems approaches provide meaningful solutions to discover the cellular regulatory processes and improve the understanding about disease mechanisms. These computational approaches are primarily developed for analysis of high-throughput genomic data. Nevertheless, the proposed methods can also be extended to analyze high-throughput data in proteomics and metablomics areas. / Ph. D.
|
Page generated in 0.0583 seconds