Global ETD Search

91	Communicating the Pixel: A Strategy for Guiding the Use of Remotely-Sensed Habitat Data in Coral Reef Management Newman, Candace 28 August 2008 (has links) Over the last decade, coral reef remote sensing research has focused on habitat map development. Advances in field methods, spatial and spectral resolution of remote sensing data, and algorithm development have led to more detailed map categories and to heightened map accuracy. Studies have provided guidance for practitioners in areas such as imagery selection, algorithm application, and class selection methods, but the product has remained relatively unchanged – a habitat map showing the spatial distribution of a range of substrate classes, classified primarily on the basis of their spectral signature. However, the application of such a product in a management context has not been elaborated by the remote sensing community. The research described in this thesis addresses the challenge that the application of remotely-sensed coral reef information in a coral reef management environment elicits. In such an environment, the coral reef manager asks: "What can the map do to help me?", while the remote sensing scientist asks: "What type of information do you need?". The research described here aims to reconcile these two points of view, by answering the research question of this thesis: How can coral reef remotely-sensed information address stakeholder-specific coral reef management objectives? This question was answered through the development of a four-stage strategy. The strategy includes: 1) developing a traditional habitat map, 2) investigating stakeholder receptivity to the habitat map, 3) linking stakeholder interests with habitat data, and 4) illustrating the linked habitat data in what we term a management map. The strategy was applied on Bunaken Island, Indonesia, and involved the collection of both qualitative and quantitative data sets. The research was relevant to the communities on Bunaken Island, as they are directly responsible for the management of the coral reef resources surrounding Bunaken Island, and they are regularly planning and implementing coral reef management projects. The effectiveness of the four-stage strategy was evaluated in a framework that compares potential and actual uses of habitat maps and management maps in coral reef management projects. It was shown that management maps are superior to habitat maps for a wide range of management purposes. This research has provided two main contributions to the field of coral reef remote sensing and management. The first is the four-stage strategy that results in the development of management maps, and the second is the framework for evaluating the effectiveness of the management maps. This research seeks to traverse the gap between producers and users of coral reef remotely-sensed information. The recommendations made from this research addresses coral reef management procedures, action research, and cross-cultural communication. Each recommendation is founded on collaboration between scientist and manager. Such collaboration is crucial for successful application of remotely-sensed information to management. remote sensing coral reef management communication community-based management data integration Geography
92	Communicating the Pixel: A Strategy for Guiding the Use of Remotely-Sensed Habitat Data in Coral Reef Management Newman, Candace 28 August 2008 (has links) Over the last decade, coral reef remote sensing research has focused on habitat map development. Advances in field methods, spatial and spectral resolution of remote sensing data, and algorithm development have led to more detailed map categories and to heightened map accuracy. Studies have provided guidance for practitioners in areas such as imagery selection, algorithm application, and class selection methods, but the product has remained relatively unchanged – a habitat map showing the spatial distribution of a range of substrate classes, classified primarily on the basis of their spectral signature. However, the application of such a product in a management context has not been elaborated by the remote sensing community. The research described in this thesis addresses the challenge that the application of remotely-sensed coral reef information in a coral reef management environment elicits. In such an environment, the coral reef manager asks: "What can the map do to help me?", while the remote sensing scientist asks: "What type of information do you need?". The research described here aims to reconcile these two points of view, by answering the research question of this thesis: How can coral reef remotely-sensed information address stakeholder-specific coral reef management objectives? This question was answered through the development of a four-stage strategy. The strategy includes: 1) developing a traditional habitat map, 2) investigating stakeholder receptivity to the habitat map, 3) linking stakeholder interests with habitat data, and 4) illustrating the linked habitat data in what we term a management map. The strategy was applied on Bunaken Island, Indonesia, and involved the collection of both qualitative and quantitative data sets. The research was relevant to the communities on Bunaken Island, as they are directly responsible for the management of the coral reef resources surrounding Bunaken Island, and they are regularly planning and implementing coral reef management projects. The effectiveness of the four-stage strategy was evaluated in a framework that compares potential and actual uses of habitat maps and management maps in coral reef management projects. It was shown that management maps are superior to habitat maps for a wide range of management purposes. This research has provided two main contributions to the field of coral reef remote sensing and management. The first is the four-stage strategy that results in the development of management maps, and the second is the framework for evaluating the effectiveness of the management maps. This research seeks to traverse the gap between producers and users of coral reef remotely-sensed information. The recommendations made from this research addresses coral reef management procedures, action research, and cross-cultural communication. Each recommendation is founded on collaboration between scientist and manager. Such collaboration is crucial for successful application of remotely-sensed information to management. remote sensing coral reef management communication community-based management data integration Geography
93	Data Integration Over Horizontally Partitioned Databases In Service-oriented Data Grids Sonmez Sunercan, Hatice Kevser 01 September 2010 (has links) (PDF) Information integration over distributed and heterogeneous resources has been challenging in many terms: coping with various kinds of heterogeneity including data model, platform, access interfaces / coping with various forms of data distribution and maintenance policies, scalability, performance, security and trust, reliability and resilience, legal issues etc. It is obvious that each of these dimensions deserves a separate thread of research efforts. One particular challenge among the ones listed above that is more relevant to the work presented in this thesis is coping with various forms of data distribution and maintenance policies. This thesis aims to provide a service-oriented data integration solution over data Grids for cases where distributed data sources are partitioned with overlapping sections of various proportions. This is an interesting variation which combines both replicated and partitioned data within the same data management framework. Thus, the data management infrastructure has to deal with specific challenges regarding the identification, access and aggregation of partitioned data with varying proportions of overlapping sections. To provide a solution we have extended OGSA-DAI DQP, a well-known service-oriented data access and integration middleware with distributed query processing facilities, by incorporating UnionPartitions operator into its algebra in order to cope with various unusual forms of horizontally partitioned databases. As a result / our solution extends OGSA-DAI DQP, in two points / 1 - A new operator type is added to the algebra to perform a specialized union of the partitions with different characteristics, 2 - OGSA-DAI DQP Federation Description is extended to include some more metadata to facilitate the successful execution of the newly introduced operator. QA Computer Software 76.75-76.765
94	Adaptive windows for duplicate detection Draisbach, Uwe, Naumann, Felix, Szott, Sascha, Wonneberg, Oliver January 2012 (has links) Duplicate detection is the task of identifying all groups of records within a data set that represent the same real-world entity, respectively. This task is difficult, because (i) representations might differ slightly, so some similarity measure must be defined to compare pairs of records and (ii) data sets might have a high volume making a pair-wise comparison of all records infeasible. To tackle the second problem, many algorithms have been suggested that partition the data set and compare all record pairs only within each partition. One well-known such approach is the Sorted Neighborhood Method (SNM), which sorts the data according to some key and then advances a window over the data comparing only records that appear within the same window. We propose several variations of SNM that have in common a varying window size and advancement. The general intuition of such adaptive windows is that there might be regions of high similarity suggesting a larger window size and regions of lower similarity suggesting a smaller window size. We propose and thoroughly evaluate several adaption strategies, some of which are provably better than the original SNM in terms of efficiency (same results with fewer comparisons). / Duplikaterkennung beschreibt das Auffinden von mehreren Datensätzen, die das gleiche Realwelt-Objekt repräsentieren. Diese Aufgabe ist nicht trivial, da sich (i) die Datensätze geringfügig unterscheiden können, so dass Ähnlichkeitsmaße für einen paarweisen Vergleich benötigt werden, und (ii) aufgrund der Datenmenge ein vollständiger, paarweiser Vergleich nicht möglich ist. Zur Lösung des zweiten Problems existieren verschiedene Algorithmen, die die Datenmenge partitionieren und nur noch innerhalb der Partitionen Vergleiche durchführen. Einer dieser Algorithmen ist die Sorted-Neighborhood-Methode (SNM), welche Daten anhand eines Schlüssels sortiert und dann ein Fenster über die sortierten Daten schiebt. Vergleiche werden nur innerhalb dieses Fensters durchgeführt. Wir beschreiben verschiedene Variationen der Sorted-Neighborhood-Methode, die auf variierenden Fenstergrößen basieren. Diese Ansätze basieren auf der Intuition, dass Bereiche mit größerer und geringerer Ähnlichkeiten innerhalb der sortierten Datensätze existieren, für die entsprechend größere bzw. kleinere Fenstergrößen sinnvoll sind. Wir beschreiben und evaluieren verschiedene Adaptierungs-Strategien, von denen nachweislich einige bezüglich Effizienz besser sind als die originale Sorted-Neighborhood-Methode (gleiches Ergebnis bei weniger Vergleichen). Informationssysteme Datenqualität Datenintegration Duplikaterkennung Duplicate Detection Data Quality Data Integration Information Systems Data processing Computer science
95	Integration of dynamic data into reservoir description using streamline approaches He, Zhong 15 November 2004 (has links) Integration of dynamic data is critical for reliable reservoir description and has been an outstanding challenge for the petroleum industry. This work develops practical dynamic data integration techniques using streamline approaches to condition static geological models to various kinds of dynamic data, including two-phase production history, interference pressure observations and primary production data. The proposed techniques are computationally efficient and robust, and thus well-suited for large-scale field applications. We can account for realistic field conditions, such as gravity, and changing field conditions, arising from infill drilling, pattern conversion, and recompletion, etc., during the integration of two-phase production data. Our approach is fast and exhibits rapid convergence even when the initial model is far from the solution. The power and practical applicability of the proposed techniques are demonstrated with a variety of field examples. To integrate two-phase production data, a travel-time inversion analogous to seismic inversion is adopted. We extend the method via a 'generalized travel-time' inversion to ensure matching of the entire production response rather than just a single time point while retaining most of the quasi-linear property of travel-time inversion. To integrate the interference pressure data, we propose an alternating procedure of travel-time inversion and peak amplitude inversion or pressure inversion to improve the overall matching of the pressure response. A key component of the proposed techniques is the efficient computation of the sensitivities of dynamic responses with respect to reservoir parameters. These sensitivities are calculated analytically using a single forward simulation. Thus, our methods can be orders of magnitude faster than finite-difference based numerical approaches that require multiple forward simulations. Streamline approach has also been extended to identify reservoir compartmentalization and flow barriers using primary production data in conjunction with decline type-curve analysis. The streamline 'diffusive' time of flight provides an effective way to calculate the drainage volume in 3D heterogeneous reservoirs. The flow barriers and reservoir compartmentalization are inferred based on the matching of drainage volumes from streamline-based calculation and decline type-curve analysis. The proposed approach is well-suited for application in the early stages of field development with limited well data and has been illustrated using a field example from the Gulf of Mexico. Dynamic data integration automatic history matching streamline simulation analytical sensitivity computation traveltime inversion reservoir compartmentalization
96	XAP Integration Zhu, Mingjie, Liu, Qinghua January 2006 (has links) <p>Abstract</p><p>This bachelor thesis will present the XAP tool integration project. Apart from presenting the survey of</p><p>integration techniques that includes integration models and case tool models, we have conducted a</p><p>comparison of these models. Then we reason about their applicability in the XAP setting. We apply this</p><p>survey into the XAP tool integration project – integrate three tools in one IDE on data level. In this IDE,</p><p>the user can create a new project and use these three tools freely in the new created project. The</p><p>database among them is shared.</p> Software engineering Programvaruteknik
97	UAB „GNT Lietuva" duomenų integravimo posistemio reinžinerija / Reengineering of data integration subsystem in JSC "GNT Lietuva" Kungytė, Indrė 13 August 2010 (has links) Atlikta duomenų integravimo procesų, metodų bei DI technologijų ir įrankių analizė siekiant rasti labiausiai tinkamą variantą įmonei. Išnagrinėti MS SQL 2005 Server SSIS paslauga, Oracle ir įmonėje veikusi MS SQL 2000 Server DTS funkcija. Duomenų integravimo įrankiai paremti ETL platforma. Tai procesai, leidžiantys kompanijoms perkelti duomenis iš įvairių šaltinių, pakeisti formatą ir įkelti juos į kitas duomenų bazes, duomenų centrus, nagrinėjimo saugyklas ar į kitas operacines sistemas palaikyti verslo procesus. Įvykdyta GNT įmonės duomenų integravimo sistemos posistemio reinžinerija naujai pasirinkta technologija. Reinžinerija yra egzistuojančios sistemos analizės ir modifikavimo procesas, atliekamas kai norima pertvarkyti sistemą. Pasirinktas informacijos apie produktus duomenų integravimo procesas, sukurtas DTS pagalba. Procesas patobulintas ir perkeltas į naują technologiją. Atlikus įmonėje veikiančių duomenų integravimo procesų analizę, pastebėtas skirtingas duomenų detalumas ir galimybė juos suskirstyti į kategorijas. Gauta informacija apibendrinta šablonu pavidalu, kuris pavadintas „Bendrinių duomenų atskyrimo šablonu“. Eksperimento metu nustatytas proceso pagreitėjimas, panaudojus sukurtą šabloną. Jis sėkmingai taikomas kitiems duomenų integravimo procesams, atliekant reinžinerią. / The research area of thesis covers various data extraction, transfer and integration methods and technologies; the main object of the research is the process of transferring data from remote subsidiaries and integrating it into one central database currently active in JSC “GNT Lietuva”. The goal of this research is to move data integration (DI) processes into a new technological environment and upgrade them without interrupting active daily DI process, ultimately creating a flexible data integration model (pattern), which could be reused in the future. Following tasks were carried out in order to achieve this goal: analysis of reengineering and data integration principles as well as new integration technologies; investigating their adaptability to the current DI processes and their improvement; implementing integration solutions and experimentation to verify the efficiency of new DI processes, and, finally, construction of a flexible integration solution. The final generated solution was formalized as a data integration pattern. Conclusions drawn from the experiment accomplished in JSC “GNT Lietuva” indicate that practical application of the pattern reduced overall duration of the DI process by 45,4%, whilst the additional application of the SSIS technology resulted in duration decrease of 81,99%. The data integration process became more flexible and new data sources can be easily incorporated from now on. Informatics Engineering ETL Duomenų integravimas Reinžinerija Projektavimo šablonai ETL Data integration Reengineering Patterns design
98	A Data Mining Framework for Automatic Online Customer Lead Generation Rahman, Md Abdur 23 March 2012 (has links) Customer lead generation is a crucial and challenging task for online real estate service providers. The business model of online real estate service differs from typical B2B or B2C e-commerce because it acts like a broker between the real estate companies and the potential home buyers. Currently, there is no suitable automatic customer lead generation system available for online real estate service providers. This thesis aims at developing a systematic solution framework of automatic customer lead generation for online real estate service providers. This framework includes data modeling, data integration from multiple online web data streams, as well as data mining and system evaluation for lead pattern discovery and lead prediction. Extensive experiments were conducted based on a case study. The results demonstrate that the proposed approach is able to empower online real estate service providers for lead data analysis and automatically generate targeting customer leads. Data mining Customer lead generation Business Intelligence Data modeling Data integration
99	Identifying protein complexes and disease genes from biomolecular networks 2014 November 1900 (has links) With advances in high-throughput measurement techniques, large-scale biological data, such as protein-protein interaction (PPI) data, gene expression data, gene-disease association data, cellular pathway data, and so on, have been and will continue to be produced. Those data contain insightful information for understanding the mechanisms of biological systems and have been proved useful for developing new methods in disease diagnosis, disease treatment and drug design. This study focuses on two main research topics: (1) identifying protein complexes and (2) identifying disease genes from biomolecular networks. Firstly, protein complexes are groups of proteins that interact with each other at the same time and place within living cells. They are molecular entities that carry out cellular processes. The identification of protein complexes plays a primary role for understanding the organization of proteins and the mechanisms of biological systems. Many previous algorithms are designed based on the assumption that protein complexes are densely connected sub-graphs in PPI networks. In this research, a dense sub-graph detection algorithm is first developed following this assumption by using clique seeds and graph entropy. Although the proposed algorithm generates a large number of reasonable predictions and its f-score is better than many previous algorithms, it still cannot identify many known protein complexes. After that, we analyze characteristics of known yeast protein complexes and find that not all of the complexes exhibit dense structures in PPI networks. Many of them have a star-like structure, which is a very special case of the core-attachment structure and it cannot be identified by many previous core-attachment-structure-based algorithms. To increase the prediction accuracy of protein complex identification, a multiple-topological-structure-based algorithm is proposed to identify protein complexes from PPI networks. Four single-topological-structure-based algorithms are first employed to detect raw predictions with clique, dense, core-attachment and star-like structures, respectively. A merging and trimming step is then adopted to generate final predictions based on topological information or GO annotations of predictions. A comprehensive review about the identification of protein complexes from static PPI networks to dynamic PPI networks is also given in this study. Secondly, genetic diseases often involve the dysfunction of multiple genes. Various types of evidence have shown that similar disease genes tend to lie close to one another in various biomolecular networks. The identification of disease genes via multiple data integration is indispensable towards the understanding of the genetic mechanisms of many genetic diseases. However, the number of known disease genes related to similar genetic diseases is often small. It is not easy to capture the intricate gene-disease associations from such a small number of known samples. Moreover, different kinds of biological data are heterogeneous and no widely acceptable criterion is available to standardize them to the same scale. In this study, a flexible and reliable multiple data integration algorithm is first proposed to identify disease genes based on the theory of Markov random fields (MRF) and the method of Bayesian analysis. A novel global-characteristic-based parameter estimation method and an improved Gibbs sampling strategy are introduced, such that the proposed algorithm has the capability to tune parameters of different data sources automatically. However, the Markovianity characteristic of the proposed algorithm means it only considers information of direct neighbors to formulate the relationship among genes, ignoring the contribution of indirect neighbors in biomolecular networks. To overcome this drawback, a kernel-based MRF algorithm is further proposed to take advantage of the global characteristics of biological data via graph kernels. The kernel-based MRF algorithm generates predictions better than many previous disease gene identification algorithms in terms of the area under the receiver operating characteristic curve (AUC score). However, it is very time-consuming, since the Gibbs sampling process of the algorithm has to maintain a long Markov chain for every single gene. Finally, to reduce the computational time of the MRF-based algorithm, a fast and high performance logistic-regression-based algorithm is developed for identifying disease genes from biomolecular networks. Numerical experiments show that the proposed algorithm outperforms many existing methods in terms of the AUC score and running time. To summarize, this study has developed several computational algorithms for identifying protein complexes and disease genes from biomolecular networks, respectively. These proposed algorithms are better than many other existing algorithms in the literature. protein complex disease gene biomolecular network multiple data integration network analysis
100	Record Linkage for Web Data Hassanzadeh, Oktie 15 August 2013 (has links) Record linkage refers to the task of finding and linking records (in a single database or in a set of data sources) that refer to the same entity. Automating the record linkage process is a challenging problem, and has been the topic of extensive research for many years. However, the changing nature of the linkage process and the growing size of data sources create new challenges for this task. This thesis studies the record linkage problem for Web data sources. Our hypothesis is that a generic and extensible set of linkage algorithms combined within an easy-to-use framework that integrates and allows tailoring and combining of these algorithms can be used to effectively link large collections of Web data from different domains. To this end, we first present a framework for record linkage over relational data, motivated by the fact that many Web data sources are powered by relational database engines. This framework is based on declarative specification of the linkage requirements by the user and allows linking records in many real-world scenarios. We present algorithms for translation of these requirements to queries that can run over a relational data source, potentially using a semantic knowledge base to enhance the accuracy of link discovery. Effective specification of requirements for linking records across multiple data sources requires understanding the schema of each source, identifying attributes that can be used for linkage, and their corresponding attributes in other sources. Schema or attribute matching is often done with the goal of aligning schemas, so attributes are matched if they play semantically related roles in their schemas. In contrast, we seek to find attributes that can be used to link records between data sources, which we refer to as linkage points. In this thesis, we define the notion of linkage points and present the first linkage point discovery algorithms. We then address the novel problem of how to publish Web data in a way that facilitates record linkage. We hypothesize that careful use of existing, curated Web sources (their data and structure) can guide the creation of conceptual models for semi-structured Web data that in turn facilitate record linkage with these curated sources. Our solution is an end-to-end framework for data transformation and publication, which includes novel algorithms for identification of entity types and their relationships out of semi-structured Web data. A highlight of this thesis is showcasing the application of the proposed algorithms and frameworks in real applications and publishing the results as high-quality data sources on the Web. Data Management Databases Record Linkage Entity Resolution Link Discovery Data Integration Data Quality 0984

Search results