• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 19624
  • 3369
  • 2417
  • 2007
  • 1551
  • 1432
  • 877
  • 406
  • 390
  • 359
  • 297
  • 233
  • 208
  • 208
  • 208
  • Tagged with
  • 38101
  • 12455
  • 9250
  • 7103
  • 6698
  • 5896
  • 5282
  • 5193
  • 4722
  • 3451
  • 3302
  • 2809
  • 2725
  • 2536
  • 2116
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
441

Automatic lexicon acquisition from encyclopedia.

January 2007 (has links)
Lo, Ka Kan. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (leaves 97-104). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.3 / Chapter 1.2 --- New paradigm in language learning --- p.5 / Chapter 1.3 --- Semantic Relations --- p.7 / Chapter 1.4 --- Contribution of this thesis --- p.9 / Chapter 2 --- Related Work --- p.13 / Chapter 2.1 --- Theoretical Linguistics --- p.13 / Chapter 2.1.1 --- Overview --- p.13 / Chapter 2.1.2 --- Analysis --- p.15 / Chapter 2.2 --- Computational Linguistics - General Learning --- p.17 / Chapter 2.3 --- Computational Linguistics - HPSG Lexical Acquisition --- p.20 / Chapter 2.4 --- Learning approach --- p.22 / Chapter 3 --- Background --- p.25 / Chapter 3.1 --- Modeling primitives --- p.26 / Chapter 3.1.1 --- Feature Structure --- p.26 / Chapter 3.1.2 --- Word --- p.28 / Chapter 3.1.3 --- Phrase --- p.35 / Chapter 3.1.4 --- Clause --- p.36 / Chapter 3.2 --- Wikipedia Resource --- p.38 / Chapter 3.2.1 --- Encyclopedia Text --- p.40 / Chapter 3.3 --- Semantic Relations --- p.40 / Chapter 4 --- Learning Framework - Syntactic and Semantic --- p.46 / Chapter 4.1 --- Type feature scoring function --- p.48 / Chapter 4.2 --- Confidence score of lexical entry --- p.50 / Chapter 4.3 --- Specialization and Generalization --- p.52 / Chapter 4.3.1 --- Further Processing --- p.54 / Chapter 4.3.2 --- Algorithm Outline --- p.54 / Chapter 4.3.3 --- Algorithm Analysis --- p.55 / Chapter 4.4 --- Semantic Information --- p.57 / Chapter 4.4.1 --- Extraction --- p.58 / Chapter 4.4.2 --- Induction --- p.60 / Chapter 4.4.3 --- Generalization --- p.63 / Chapter 4.5 --- Extension with new text documents --- p.65 / Chapter 4.6 --- Integrating the syntactic and semantic acquisition framework --- p.65 / Chapter 5 --- Evaluation --- p.68 / Chapter 5.1 --- Evaluation Metric - English Resource Grammar --- p.68 / Chapter 5.1.1 --- English Resource Grammar --- p.69 / Chapter 5.2 --- Experiments --- p.71 / Chapter 5.2.1 --- Tasks --- p.71 / Chapter 5.2.2 --- Evaluation Measures --- p.77 / Chapter 5.2.3 --- Methodologies --- p.78 / Chapter 5.2.4 --- Corpus Preparation --- p.79 / Chapter 5.2.5 --- Results --- p.81 / Chapter 5.3 --- Result Analysis --- p.85 / Chapter 6 --- Conclusions --- p.95 / Bibliography --- p.97
442

Privacy preserving data publishing: an expected gain model with negative association immunity. / CUHK electronic theses & dissertations collection

January 2012 (has links)
隱私保護是許多應用(特別是和人們有關的)要面對的重要問題。在隱私保護數據發布之研究中,我們探討如何在個人隱私不會被侵犯之情況下發布一個包含個人資料之數據庫,而此數據庫仍包含有用的信息以供研究或其他數據分析之用。 / 本論文著重於隱私保護數據發布之隱私模型及算法。我們首先提出一個預期收益模型,以確認發布一個數據庫會否侵犯個人隱私。預期收益模型符合我們在本論文中提出的六個關於量化私人信息之公理,而第六條公理還會以社會心理學之角度考慮人為因素。而且,這模型考慮敵意信息收集人在發布數據庫之中所得到的好處。所以這模型切實反映出敵意信息收集人利用這些好處而獲得利益,而其他隱私模型並沒有考慮這點。然後,我們還提出了一個算法來生成符合預期收益模型之發布數據庫。我們亦進行了一些包含現實數據庫之實驗來表示出這算法是現實可行的。在那之後,我們提出了一個敏感值抑制算法,使發布數據庫能對負向關聯免疫,而負向關聯是前景/背景知識攻擊之一種。我們亦進行了一些實驗來表示出我們只需要抑制平均數個百份比之敏感值就可以令一個發佈數據庫對負向關聯免疫。最後,我們探討在分散環境之下之隱私保護數據發布,這代表有兩個或以上的數據庫持有人分別生成不同但有關之發布數據庫。我們提出一個在分散環境下可用的相異L多樣性的隱私模型和一個算法來生成符合此模型之發布數據庫。我們亦進行了一些實驗來表示出這算法是現實可行的。 / Privacy preserving is an important issue in many applications, especially for the applications that involve human. In privacy preserving data publishing (PPDP), we study how to publish a database, which contains data records of some individuals, so that the privacy of the individuals is preserved while the published database still contains useful information for research or data analysis. / This thesis focuses on privacy models and algorithms in PPDP. We first propose an expected gain model to define whether privacy is preserved for publishing a database. The expected gain model satisfies the six axioms in quantifying private information proposed in this thesis, where the sixth axiom considers human factors in the view of social psychology. In addition, it considers the amount of advantage gained by an adversary by exploiting the private information deduced from a published database. Hence, the model reflects the reality that the adversary uses such an advantage to earn a profit, which is not conisidered by other existing privacy models. Then, we propose an algorithm to generate published databases that satisfy the expected gain model. Experiments on real datasets are conducted to show that the proposed algorithm is feasible to real applications. After that, we propose a value suppression framework to make the published databases immune to negative association, which is a kind of background / foreground knowledge attacks. Experiments are conducted to show that negative association immunity can be achieved by suppressing only a few percent of sensitive values on average. Finally, we investigate PPDP in a non-centralized environment, in which two or more data holders generate their own different but related published databases. We propose a non-centralized distinct l-diversity requirement as the privacy model and an algorithm to generate published databases for this requirement. Experiments are conducted to show that the proposed algorithm is feasible to real applications. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Cheong, Chi Hong. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 186-193). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.2 --- Thesis Contributions and Organization --- p.2 / Chapter 1.3 --- Other Related Areas --- p.5 / Chapter 1.3.1 --- Privacy Preserving Data Mining --- p.5 / Chapter 1.3.2 --- Partition-Based Approach vs. Differential Privacy Approach --- p.5 / Chapter 2 --- Expected Gain Model --- p.7 / Chapter 2.1 --- Introduction --- p.8 / Chapter 2.1.1 --- Background and Motivation --- p.8 / Chapter 2.1.2 --- Contributions --- p.11 / Chapter 2.2 --- Table Models --- p.12 / Chapter 2.2.1 --- Private Table --- p.12 / Chapter 2.2.2 --- Published Table --- p.13 / Chapter 2.3 --- Private Information Model --- p.14 / Chapter 2.3.1 --- Proposition --- p.14 / Chapter 2.3.2 --- Private Information and Private Probability --- p.15 / Chapter 2.3.3 --- Public Information and Public Probability --- p.18 / Chapter 2.3.4 --- Axioms in Quantifying Private Information --- p.20 / Chapter 2.4 --- Quantifying Private Information --- p.34 / Chapter 2.4.1 --- Expected Gain of a Fair Guessing Game --- p.34 / Chapter 2.4.2 --- Analysis --- p.41 / Chapter 2.5 --- Tuning the Importance of Opposite Information --- p.48 / Chapter 2.6 --- Conclusions --- p.53 / Chapter 3 --- Generalized Expected Gain Model --- p.56 / Chapter 3.1 --- Introduction --- p.58 / Chapter 3.2 --- Table Models --- p.60 / Chapter 3.2.1 --- Private Table --- p.62 / Chapter 3.2.2 --- Published Table --- p.62 / Chapter 3.3 --- Expected Gain Model --- p.63 / Chapter 3.3.1 --- Random Variable and Probability Distribution --- p.64 / Chapter 3.3.2 --- Public Information --- p.64 / Chapter 3.3.3 --- Private Information --- p.65 / Chapter 3.3.4 --- Expected Gain Model --- p.66 / Chapter 3.4 --- Generalization Algorithm --- p.75 / Chapter 3.4.1 --- Generalization Property and Subset Property --- p.75 / Chapter 3.4.2 --- Modified Version of Incognito --- p.78 / Chapter 3.5 --- Related Work --- p.80 / Chapter 3.5.1 --- k-Anonymity --- p.80 / Chapter 3.5.2 --- l-Diversity --- p.81 / Chapter 3.5.3 --- Confidence Bounding --- p.83 / Chapter 3.5.4 --- t-Closeness --- p.84 / Chapter 3.6 --- Experiments --- p.85 / Chapter 3.6.1 --- Experiment Set 1: Average/Max/Min Expected Gain --- p.85 / Chapter 3.6.2 --- Experiment Set 2: Expected Gain Distribution --- p.90 / Chapter 3.6.3 --- Experiment Set 3: Modified Version of Incognito --- p.95 / Chapter 3.7 --- Conclusions --- p.99 / Chapter 4 --- Negative Association Immunity --- p.100 / Chapter 4.1 --- Introduction --- p.100 / Chapter 4.2 --- Related Work --- p.104 / Chapter 4.3 --- Negative Association Immunity and Value Suppression --- p.107 / Chapter 4.3.1 --- Negative Association --- p.108 / Chapter 4.3.2 --- Negative Association Immunity --- p.111 / Chapter 4.3.3 --- Achieving Negative Association Immunity by Value Suppression --- p.114 / Chapter 4.4 --- Local Search Algorithm --- p.123 / Chapter 4.5 --- Experiments --- p.125 / Chapter 4.5.1 --- Settings --- p.125 / Chapter 4.5.2 --- Results and Discussions --- p.128 / Chapter 4.6 --- Conclusions --- p.129 / Chapter 5 --- Non-Centralized Distinct l-Diversity --- p.130 / Chapter 5.1 --- Introduction --- p.130 / Chapter 5.2 --- Related Work --- p.138 / Chapter 5.3 --- Table Models --- p.140 / Chapter 5.3.1 --- Private Tables --- p.140 / Chapter 5.3.2 --- Published Tables --- p.141 / Chapter 5.4 --- Private Information Deduced from Multiple Published Tables --- p.143 / Chapter 5.4.1 --- Private Information Deduced by Simple Counting on Each Published Tables --- p.143 / Chapter 5.4.2 --- Private Information Deduced from Multiple Published Tables --- p.145 / Chapter 5.4.3 --- Probabilistic Table --- p.156 / Chapter 5.5 --- Non-Centralized Distinct l-Diversity and Algorithm --- p.158 / Chapter 5.5.1 --- Non-centralized Distinct l-diversity --- p.159 / Chapter 5.5.2 --- Algorithm --- p.165 / Chapter 5.5.3 --- Theorems --- p.171 / Chapter 5.6 --- Experiments --- p.174 / Chapter 5.6.1 --- Settings --- p.174 / Chapter 5.6.2 --- Metrics --- p.176 / Chapter 5.6.3 --- Results and Discussions --- p.179 / Chapter 5.7 --- Conclusions --- p.181 / Chapter 6 --- Conclusions --- p.183 / Bibliography --- p.186
443

Algorithmes exacts et approchés pour des problèmes d'ordonnancement et de placement / Exact and approximation algorithms for scheduling and placement problems

Kacem, Fadi 27 June 2012 (has links)
Dans cette thèse, nous nous intéressons à la résolution de quelques problèmes d'optimisation combinatoires que nous avons choisi de traiter en deux volets. Dans un premier temps, nous étudions des problèmes d'optimisation issus de l'ordonnancement d'un ensemble de tâches sur des machines de calcul et où on cherche à minimiser l'énergie totale consommée par ces machines tout en préservant une qualité de service acceptable. Dans un deuxième temps, nous traitons deux problèmes d'optimisation classiques à savoir un problème d'ordonnancement dans une architecture de machines parallèles avec des temps de communication, et un problème de placement de données dans des graphes modélisant des réseaux pair-à-pair et visant à minimiser le coût total d'accès aux données. / In this thesis, we focus on solving some combinatorial optimization problems that we have chosen to study in two parts. Firstly, we study optimization problems issued from scheduling a set of tasks on computing machines where we seek to minimize the total energy consumed by these machines while maintaining acceptable quality of service. In a second step, we discuss two optimization problems, namely a classical scheduling problem in architecture of parallel machines with communication delays, and a problem of placing data in graphs that represent peer-to-peer networks and the goal is to minimize the total cost of data access.
444

Localisation multi-hypothèses pour l'aide à la conduite : conception d'un filtre "réactif-coopératif" / Multi-assumptions localization for driving assistance : design of a "reactive-cooperative" filter

Ahmed Bacha, Adda Redouane 01 December 2014 (has links)
“ Lorsqu'on utilise des données provenant d'une seule source,C'est du plagiat;Lorsqu'on utilise plusieurs sources,C'est de la fusion de données ”Ces travaux présentent une approche de fusion de données collaborative innovante pour l'égo-localisation de véhicules routiers. Cette approche appelée filtre de Kalman optimisé à essaim de particules (Optimized Kalman Particle Swarm) est une méthode de fusion de données et de filtrage optimisé. La fusion de données est faite en utilisant les données d'un GPS à faible coût, une centrale inertielle, un compteur odométrique et un codeur d'angle au volant. Ce travail montre que cette approche est à la fois plus robuste et plus appropriée que les méthodes plus classiques d'égo-localisation aux situations de conduite urbaine. Cette constatation apparait clairement dans le cas de dégradations des signaux capteurs ou des situations à fortes non linéarités. Les méthodes d'égo-localisation de véhicules les plus utilisées sont les approches bayésiennes représentées par le filtre de Kalman étendu (Extended Kalman Filter) et ses variantes (UKF, DD1, DD2). Les méthodes bayésiennes souffrent de sensibilité aux bruits et d'instabilité pour les cas fortement non linéaires. Proposées pour couvrir les limitations des méthodes bayésiennes, les approches multi-hypothèses (à base de particules) sont aussi utilisées pour la localisation égo-véhiculaire. Inspiré des méthodes de simulation de Monte-Carlo, les performances du filtre à particules (Particle Filter) sont fortement dépendantes des ressources en matière de calcul. Tirant avantage des techniques de localisation existantes et en intégrant les avantages de l'optimisation méta heuristique, l'OKPS est conçu pour faire face aux bruits, aux fortes dynamiques, aux données non linéaires et aux besoins d'exécution en temps réel. Pour l'égo-localisation d'un véhicule, en particulier pour les manœuvres très dynamiques sur route, un filtre doit être robuste et réactif en même temps. Le filtre OKPS est conçu sur un nouvel algorithme de localisation coopérative-réactive et dynamique inspirée par l'Optimisation par Essaim de Particules (Particle Swarm Optimization) qui est une méthode méta heuristique. Cette nouvelle approche combine les avantages de la PSO et des deux autres filtres: Le filtre à particules (PF) et le filtre de Kalman étendu (EKF). L'OKPS est testé en utilisant des données réelles recueillies à l'aide d'un véhicule équipé de capteurs embarqués. Ses performances sont testées en comparaison avec l'EKF, le PF et le filtre par essaim de particules (Swarm Particle Filter). Le filtre SPF est un filtre à particules hybride intéressant combinant les avantages de la PSO et du filtrage à particules; Il représente la première étape de la conception de l'OKPS. Les résultats montrent l'efficacité de l'OKPS pour un scénario de conduite à dynamique élevée avec des données GPS endommagés et/ou de qualité faible. / “ When we use information from one source,it's plagiarism;Wen we use information from many,it's information fusion ”This work presents an innovative collaborative data fusion approach for ego-vehicle localization. This approach called the Optimized Kalman Particle Swarm (OKPS) is a data fusion and an optimized filtering method. Data fusion is made using data from a low cost GPS, INS, Odometer and a Steering wheel angle encoder. This work proved that this approach is both more appropriate and more efficient for vehicle ego-localization in degraded sensors performance and highly nonlinear situations. The most widely used vehicle localization methods are the Bayesian approaches represented by the EKF and its variants (UKF, DD1, DD2). The Bayesian methods suffer from sensitivity to noises and instability for the highly non-linear cases. Proposed for covering the Bayesian methods limitations, the Multi-hypothesis (particle based) approaches are used for ego-vehicle localization. Inspired from monte-carlo simulation methods, the Particle Filter (PF) performances are strongly dependent on computational resources. Taking advantages of existing localization techniques and integrating metaheuristic optimization benefits, the OKPS is designed to deal with vehicles high nonlinear dynamic, data noises and real time requirement. For ego-vehicle localization, especially for highly dynamic on-road maneuvers, a filter needs to be robust and reactive at the same time. The OKPS filter is a new cooperative-reactive localization algorithm inspired by dynamic Particle Swarm Optimization (PSO) metaheuristic methods. It combines advantages of the PSO and two other filters: The Particle Filter (PF) and the Extended Kalman filter (EKF). The OKPS is tested using real data collected using a vehicle equipped with embedded sensors. Its performances are tested in comparison with the EKF, the PF and the Swarm Particle Filter (SPF). The SPF is an interesting particle based hybrid filter combining PSO and particle filtering advantages; It represents the first step of the OKPS development. The results show the efficiency of the OKPS for a high dynamic driving scenario with damaged and low quality GPS data.
445

Automatic web resource compilation using data mining

Escudeiro, Nuno Filipe Fonseca Vasconcelos January 2004 (has links)
Tese de mestrado. Análise de Dados e Sistemas de Apoio à Decisão. Faculdade de Economia. Universidade do Porto. 2004
446

The impact of computers in architectural practice /

Laplante, Marc A. (Marc Arthur) January 1989 (has links)
No description available.
447

Some techniques for the enhancement of electromagnetic data for mineral exploration.

Sykes, Michael P. January 2000 (has links)
The usefulness of electromagnetic (EM) methods for mineral exploration is severely restricted by the presence of a conductive overburden. Approximately 80% of the Australian continent is covered by regolith that contains some of the most conductive clays on Earth. As a result, frequency-domain methods are only effective for near surface investigations and time-domain methods, that are capable of deeper exploration, require the measurement of very small, late-time signals. Both methods suffer from the fact that the currents in the conductive Earth layers contribute a large portion of the total measured signal that may mask the signal from a conductive target. In the search for non-layered structures, this form of geological noise is the greatest impediment to the success of EM surveys in conductive terrains. Over the years a range of data acquisition and processing techniques have been used in an effort to enhance the response of the non-layered target and thereby increase the likelihood of its detection.The combined use of a variety of survey configurations to assist exploration and interpretation is not new and is practiced regularly. The active nature of EM exploration means that the measured response is determined to a large degree by the way in which the Earth is energised. Geological structures produce different responses to different stimuli. In this work, two new methods of data combination are used to transform the measured data into a residual quantity that enhances the signature of non-layered geological structures. Based on the concept of data redundancy and tested using the results of numerical modelling, the new combinations greatly increase the signal to noise ratio for targets located in a conductive environment by reducing the layered Earth contribution. The data combinations have application to frequency-domain and time-domain EM surveys and simple ++ / interpretive rules can be applied to the residuals to extract geological parameters useful in exploration. The new methods make use of inductive loop sources and can therefore also be applied to airborne surveys.Airborne surveys present special difficulties due to the data acquisition procedures commonly used. Flight-line related artefacts such as herringbones detract from the appearance of maps and make boundary definition more difficult. A new procedure, based on the Radon transform, is used to remove herringbones from airborne EM maps and locate the conductive boundaries correctly, making interpretation more reliable and easier. In addition, selective filtering of the Radon transform data enables the enhancement or attenuation of specific linear features shown in the map to emphasise features of interest. Comparison of the Radon transform procedures with the more conventional Fourier transform methods shaves the Radon transform processing to be more versatile and less prone to distortion of the features in a map.The procedures developed in this work are applied to field data with good results.
448

Embedding constraints into association rules mining

Kutty, Sangeetha Unknown Date (has links)
Mining frequent patterns from large databases plays a vital role in many data mining tasks and has a broad range of applications. Most previously proposed algorithms have been specifically designed for one type of dataset thus making them unsuitable for a range of datasets. There have been a few techniques suggested to provide performance for these association rules mining algorithms. However, these algorithms do not support a high level of user interaction, relying only on the classic support and confidence metrics for expressing user requirements. On the other hand, techniques exist that focus on improving the level of user interaction at the cost of performance.In this work, we propose a new algorithm, FOLD-growth with Constraints (FGC), which not only provides user interaction but also improves performance over existing popular algorithms. It embeds the user defined constraints into a pre-processing structure to generate constraint satisfied itemsets and uses this result to build a highly compact data structure. Interestingly, the constraint embedding technique makes existing pattern growth methods not only efficient but also highly effective over a range of datasets, irrespective of their data distribution. The technique also supports the use of conjunctions of different types of commonly used constraints.
449

A framework in support of structural monitoring by real time kinematic GPS and multisensor data

Ogaja, Clement, Surveying & Spatial Information Systems, Faculty of Engineering, UNSW January 2002 (has links)
Due to structural damages from earthquakes and strong winds, engineers and scientists have focused on performance based design methods and sensors directly measuring relative displacements. Among the monitoring methods being considered include those using Global Positioning System (GPS) technology. However, as the technical feasibility of using GPS for recording relative displacements has been (and is still being) proven, the challenge for users is to determine how to make use of the relative displacements being recorded. This thesis proposes a mathematical framework that supports the use of RTK-GPS and multisensor data for structural monitoring. Its main contributions are as follows: (a) Most of the emerging GPS-based structural monitoring systems consist of GPS receiver arrays (dozens or hundreds deployed on a structure), and the issue of integrity of the GPS data generated must be addressed for such systems. Based on this recognition, a methodology for integrity monitoring using a data redundancy approach has been proposed and tested for a multi-antenna measurement environment. The benefit of this approach is that it verifies the reliability of both the measuring instruments and the processed data contrary to the existing methods that only verifies the reliability of the processed data. (b) For real-time structural monitoring applications, high frequency data ought to be generated. A methodology that can extract, in real-time, deformation parameters from high frequency RTK measurements is proposed. The methodology is tested and shown to be effective for determining the amplitude and frequency of structural dynamics. Thus, it is suitable for the dynamic monitoring of towers, tall buildings and long span suspension bridges. (c) In the overall effort of deformation analysis, large quantities of observations are required, both of causative phenomena (e.g., wind velocity, temperature, pressure), and of response effects (e.g., accelerations, coordinate displacements, tilt, strain, etc.). One of the problems to be circumvented is that of dealing with excess data generated both due to process automation and the large number of instruments employed. This research proposes a methodology based on multivariate statistical process control whose benefit is that excess data generated on-line is reduced, while maintaining a timely response analysis of the GPS data (since they can give direct coordinate results). Based on the above contributions, a demonstrator software system was designed and implemented for the Windows operating system. Tests of the system with datasets from UNSW experiments, the Calgary Tower monitoring experiment in Canada, the Xiamen Bank Building monitoring experiment in China, and the Republic Plaza Building monitoring experiment in Singapore, have shown good results.
450

Insights into gene interactions using computational methods for literature and sequence resources

Dameh, Mustafa, n/a January 2008 (has links)
At the beginning of this century many sequencing projects were finalised. As a result, overwhelming amount of literature and sequence data have been available to biologist via online bioinformatics databases. This biological data lead to better understanding of many organisms and have helped identify genes. However, there is still much to learn about the functions and interactions of genes. This thesis is concerned with predicting gene interactions using two main online resources: biomedical literature and sequence data. The biomedical literature is used to explore and refine a text mining method, known as the "co-occurrence method", which is used to predict gene interactions. The sequence data are used in an analysis to predict an upper bound of the number of genes involved in gene interactions. The co-occurrence method of text mining was extensively explored in this thesis. The effects of certain computational parameters on influencing the relevancy of documents in which two genes co-occur were critically examined. The results showed that indeed some computational parameters do have an impact on the outcome of the co-occurrence method, and if taken into consideration, can lead to better identification of documents that describe gene interactions. To explore the co-occurrence method of text mining, a prototype system was developed, and as a result, it contains unique functions that are not present in currently available text mining systems. Sequence data were used to predict the upper bound of the number of genes involved in gene interactions within a tissue. A novel approach was undertaken that used an analysis of SAGE and EST sequence libraries using ecological estimation methods. The approach proves that the species accumulation theory used in ecology can be applied to tag libraries (SAGE or EST) to predict an upper bound to the number of mRNA transcript species in a tissue. The novel computational analysis provided in this study can be used to extend the body of knowledge and insights relating to gene interactions and, hence, provide better understanding of genes and their functions.

Page generated in 0.3727 seconds