Global ETD Search

51	Concept Based Knowledge Discovery From Biomedical Literature Radovanovic, Aleksandar January 2009 (has links) Philosophiae Doctor - PhD / Advancement in biomedical research and continuous growth of scientific literature available in electronic form, calls for innovative methods and tools for information management, knowledge discovery, and data integration. Many biomedical fields such as genomics, proteomics, metabolomics, genetics, and emerging disciplines like systems biology and conceptual biology require synergy between experimental, computational, data mining and text mining technologies. A large amount of biomedical information available in various repositories, such as the US National Library of Medicine Bibliographic Database, emerge as a potential source of textual data for knowledge discovery. Text mining and its application of natural language processing and machine learning technologies to problems of knowledge discovery, is one of the most challenging fields in bioinformatics. This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology presented can be integrated with the researchers' own knowledge, experimentation and observations for optimal progression of scientific research. Bioinformaties Text mining PubMed Entity recognition Information extraction Relation Extraction Levenshtein distance Supervised classification Natural Language Processing Machine learning
52	Geospatial integrated urban flood mapping and vulnerability assessment Islam, MD Tazmul, , 08 December 2023 (has links) (PDF) Natural disasters like flooding have always been a big problem for countries around the world, but as the global climate changes and the number of people living in cities keeps growing, the threat of flooding has become a lot worse. Even though many studies have been conducted on flood mapping and vulnerability assessment in urban areas, this research addresses a significant knowledge gap in this domain. First, we used a flood depth estimation approach has been used to address the overestimation of urban flood mapping areas using Sentinel-1 images. Ten different combinations of the two initial VH and VV polarizations were used to rapidly and accurately map urban floods within open-source Google Earth Engine platforms using four different methods. The inclusion of flood depth has improved the accuracy of these methods by 7% on average. Next, we focused our research to find out who is most at risk in the floodplain areas. Minority communities, such as African Americans, as a result of socioeconomic constraints, face more difficulties. So, next we conducted an analysis of spatial and temporal changes of demographic patterns (Race) in five southern cities in US. From our analysis we have found that in majority of cities, the minority population within the floodplain has increased over the past two decades, with the exception of Charleston, South Carolina, where the white population has increased while the minority population has decreased. Building upon these insights, we have included more socio-economic and demographic variables in our analysis to find out the more holistic view of the vulnerable people in two of these cities (Jackson and Birmingham). Due to high autocorrelation between explanatory variables, we used Principal Component Analysis (PCA) along with global and local regression techniques to find out how much these variables can explain the vulnerability. According to our findings, the spatial components play a significant role in explaining vulnerabilities in greater detail. The findings of this research can serve as an important resource for policymakers, urban planners, and emergency response agencies to make informed decisions in future events and enhancing overall resilience. Urban Flood SAR Flood Vulnerability Disaster Supervised Classification Unsupervised Classification Threshold method Change Detection Geographic Information Sciences
53	Image Classification for Remote Sensing Using Data-Mining Techniques Alam, Mohammad Tanveer 11 August 2011 (has links) No description available. Computer Science Geographic Information Science Remote Sensing Image Classification Remote Sensing Datamining unsupervised classification supervised classification LANDSAT IKONOS
54	Transforming user data into user value by novel mining techniques for extraction of web content, structure and usage patterns : the development and evaluation of new Web mining methods that enhance information retrieval and improve the understanding of users' Web behavior in websites and social blogs Ammari, Ahmad N. January 2010 (has links) The rapid growth of the World Wide Web in the last decade makes it the largest publicly accessible data source in the world, which has become one of the most significant and influential information revolution of modern times. The influence of the Web has impacted almost every aspect of humans' life, activities and fields, causing paradigm shifts and transformational changes in business, governance, and education. Moreover, the rapid evolution of Web 2.0 and the Social Web in the past few years, such as social blogs and friendship networking sites, has dramatically transformed the Web from a raw environment for information consumption to a dynamic and rich platform for information production and sharing worldwide. However, this growth and transformation of the Web has resulted in an uncontrollable explosion and abundance of the textual contents, creating a serious challenge for any user to find and retrieve the relevant information that he truly seeks to find on the Web. The process of finding a relevant Web page in a website easily and efficiently has become very difficult to achieve. This has created many challenges for researchers to develop new mining techniques in order to improve the user experience on the Web, as well as for organizations to understand the true informational interests and needs of their customers in order to improve their targeted services accordingly by providing the products, services and information that truly match the requirements of every online customer. With these challenges in mind, Web mining aims to extract hidden patterns and discover useful knowledge from Web page contents, Web hyperlinks, and Web usage logs. Based on the primary kinds of Web data used in the mining process, Web mining tasks can be categorized into three main types: Web content mining, which extracts knowledge from Web page contents using text mining techniques, Web structure mining, which extracts patterns from the hyperlinks that represent the structure of the website, and Web usage mining, which mines user's Web navigational patterns from Web server logs that record the Web page access made by every user, representing the interactional activities between the users and the Web pages in a website. The main goal of this thesis is to contribute toward addressing the challenges that have been resulted from the information explosion and overload on the Web, by proposing and developing novel Web mining-based approaches. Toward achieving this goal, the thesis presents, analyzes, and evaluates three major contributions. First, the development of an integrated Web structure and usage mining approach that recommends a collection of hyperlinks for the surfers of a website to be placed at the homepage of that website. Second, the development of an integrated Web content and usage mining approach to improve the understanding of the user's Web behavior and discover the user group interests in a website. Third, the development of a supervised classification model based on recent Social Web concepts, such as Tag Clouds, in order to improve the retrieval of relevant articles and posts from Web social blogs. 004
55	Classifying Pairwise Object Interactions: A Trajectory Analytics Approach Janmohammadi, Siamak 05 1900 (has links) We have a huge amount of video data from extensively available surveillance cameras and increasingly growing technology to record the motion of a moving object in the form of trajectory data. With proliferation of location-enabled devices and ongoing growth in smartphone penetration as well as advancements in exploiting image processing techniques, tracking moving objects is more flawlessly achievable. In this work, we explore some domain-independent qualitative and quantitative features in raw trajectory (spatio-temporal) data in videos captured by a fixed single wide-angle view camera sensor in outdoor areas. We study the efficacy of those features in classifying four basic high level actions by employing two supervised learning algorithms and show how each of the features affect the learning algorithms’ overall accuracy as a single factor or confounded with others. action recognition machine learning trajectory analysis supervised classification methods activity recognition Human activity recognition. Pattern recognition systems. Machine learning. Electronic surveillance.
56	Apprentissage de vote de majorité pour la classification supervisée et l'adaptation de domaine : Approches PAC Bayésiennes et combinaison de similarités Morvant, Emilie 18 September 2013 (has links) De nombreuses applications font appel à des méthodes d'apprentissage capables de considérer différentes sources d'information (e.g. images, son, texte) en combinant plusieurs modèles ou descriptions. Cette thèse propose des contributions théoriquement fondées permettant de répondre à deux problématiques importantes pour ces méthodes :(i) Comment intégrer de la connaissance a priori sur des informations ?(ii) Comment adapter un modèle sur des données ne suivant pas la distribution des données d'apprentissage ?Une 1ère série de résultats en classification supervisée s'intéresse à l'apprentissage de votes de majorité sur des classifieurs dans un contexte PAC-Bayésien prenant en compte un a priori sur ces classifieurs. Le 1er apport étend un algorithme de minimisation de l'erreur du vote en classification binaire en permettant l'utilisation d'a priori sous la forme de distributions alignées sur les votants. Notre 2ème contribution analyse théoriquement l'intérêt de la minimisation de la norme opérateur de la matrice de confusion de votes dans un contexte de données multiclasses. La 2nde série de résultats concerne l'AD en classification binaire : le 3ème apport combine des fonctions similarités (epsilon,gamma,tau)-Bonnes pour inférer un espace rapprochant les distributions des données d'apprentissage et de test à l'aide de la minimisation d'une borne. Notre 4ème contribution propose une analyse PAC-Bayésienne de l'AD basée sur une divergence entre distributions. Nous en dérivons des garanties théoriques pour les votes de majorité et un algorithme adapté aux classifieurs linéaires minimisant cette borne. / Many applications make use of machine learning methods able to take into account different information sources (e.g. sounds, image, text) by combining different descriptors or models. This thesis proposes a series of contributions theoretically founded dealing with two mainissues for such methods:(i) How to embed some a priori information available?(ii) How to adapt a model on new data following a distribution different from the learning data distribution? This last issue is known as domain adaptation (DA).A 1st series of contributions studies the problem of learning a majority vote over a set of voters for supervised classification in the PAC-Bayesian context allowing one to consider an a priori on the voters. Our 1st contribution extends an algorithm minimizing the error of the majority vote in binary classification by allowing the use of an a priori expressed as an aligned distribution. The 2nd analyses theoretically the interest of the minimization of the operator norm of the confusion matrix of the votes in the multiclass setting. Our 2nd series of contributions deals with DA for binary classification. The 3rd result combines (epsilon,gamma,tau)-Good similarity functions to infer a new projection space allowing us to move closer the learning and test distributions by means of the minimization of a DA bound. Finally, we propose a PAC-Bayesian analysis for DA based on a divergence between distributions. This analysis allows us to derive guarantees for learning majority votes in a DA context, and to design an algorithm specialized to linear classifiers minimizing our bound. Apprentissage Automatique Vote de majorité Théorie PAC-Bayésienne Classification supervisée Adaptation de domaine Machine Learning Majority vote PAC-Bayesian theory Supervised classification Domain Adaptation 004
57	An XML document representation method based on structure and content : application in technical document classification / An XML document representation method based on structure and content : application in technical document classification Chagheri, Samaneh 27 September 2012 (has links) L’amélioration rapide du nombre de documents stockés électroniquement représente un défi pour la classification automatique de documents. Les systèmes de classification traditionnels traitent les documents en tant que texte plat, mais les documents sont de plus en plus structurés. Par exemple, XML est la norme plus connue et plus utilisée pour la représentation de documents structurés. Ce type des documents comprend des informations complémentaires sur l'organisation du contenu représentées par différents éléments comme les titres, les sections, les légendes etc. Pour tenir compte des informations stockées dans la structure logique, nous proposons une approche de représentation des documents structurés basée à la fois sur la structure logique du document et son contenu textuel. Notre approche étend le modèle traditionnel de représentation du document appelé modèle vectoriel. Nous avons essayé d'utiliser d'information structurelle dans toutes les phases de la représentation du document: -procédure d'extraction de caractéristiques, -La sélection des caractéristiques, -Pondération des caractéristiques. Notre deuxième contribution concerne d’appliquer notre approche générique à un domaine réel : classification des documents techniques. Nous désirons mettre en œuvre notre proposition sur une collection de documents techniques sauvegardés électroniquement dans la société CONTINEW spécialisée dans l'audit de documents techniques. Ces documents sont en format représentations où la structure logique est non accessible. Nous proposons une solution d’interprétation de documents pour détecter la structure logique des documents à partir de leur présentation physique. Ainsi une collection hétérogène en différents formats de stockage est transformée en une collection homogène de documents XML contenant le même schéma logique. Cette contribution est basée sur un apprentissage supervisé. En conclusion, notre proposition prend en charge l'ensemble de flux de traitements des documents partant du format original jusqu’à la détermination de la ses classe Dans notre système l’algorithme de classification utilisé est SVM. / Rapid improvement in the number of documents stored electronically presents a challenge for automatic classification of documents. Traditional classification systems consider documents as a plain text; however documents are becoming more and more structured. For example, XML is the most known and used standard for structured document representation. These documents include supplementary information on content organization represented by different elements such as title, section, caption etc. We propose an approach on structured document classification based on both document logical structure and its content in order to take into account the information present in logical structure. Our approach extends the traditional document representation model called Vector Space Model (VSM). We have tried to integrate structural information in all phases of document representation construction: -Feature extraction procedure, -Feature selection, -Feature weighting. Our second contribution concerns to apply our generic approach to a real domain of technical documentation. We desire to use our proposition for classifying technical documents electronically saved in CONTINEW; society specialized in technical document audit. These documents are in legacy format in which logical structure is inaccessible. Then we propose an approach for document understanding in order to extract documents logical structure from their presentation layout. Thus a collection of heterogeneous documents in different physical presentations and formats is transformed to a homogenous XML collection sharing the same logical structure. Our contribution is based on learning approach where each logical element is described by its physical characteristics. Therefore, our proposal supports whole document transformation workflow from document’s original format to being classified. In our system SVM has been used as classification algorithm. Informatique Document structuré Document XML Classification supervisée Structure logique Reconstruction structure Computer science Structured document XML Document Supervised classification Logical structure Restructuring 006.740 72
58	Caractérisation de tissus cutanés superficiels hypertrophiques par spectroscopie multimodalité in vivo : instrumentation, extraction et classification de données multidimensionnelle / Characterization of hypertrophic scar tissues by multimodal spectroscopy in vivo : Instrumentation, Extraction and Classification of multidimensional datas Liu, Honghui 18 April 2012 (has links) L'objectif de ce travail de recherche est le développement, la mise au point et la validation d'une méthode de spectroscopie multi-modalités en diffusion élastique et autofluorescence pour caractériser des tissus cutanés cicatriciels hypertrophiques in vivo. Ces travaux sont reposés sur trois axes. La première partie des travaux présente l'instrumentation : développement d'un système spectroscopique qui permet de réaliser des mesures de multimodalités in vivo de manière automatique et efficace. Des procédures métrologiques sont mise en place pour caractériser le système développé et assurer la repétabilité les résultats de mesure. La deuxième partie présente une étude préclinique. Un modèle animal et un protocole expérimental ont été mises en place pour créer des cicatrices hypertrophiques sur lesquelles nous pouvons recueillir des spectres à analyser. La troisième partie porte sur la classification des spectres obtenus. Elle propose des méthodes algorithmiques pour débruiter et corriger les spectres mesurés, pour extraire automatiquement des caractéristiques spectrales interprétables et pour sélectionner un sous-ensemble de caractéristiques "optimales" en vue d'une classification efficace. Les résultats de classification réalisée respectivement par trois méthodes (k-ppv, ADL et RNA) montrent que la faisabilité d'utiliser la spectroscopie bimodale pour la caractérisation de ce type de lésion cutané. Par ailleurs, les caractéristiques sélectionnées par notre méthode montrent que la cicatrisation hypertrophique implique un changement de structure tissulaire et une variation de concentration de porphyrine / This research activity aims at developing and validating a multimodal system combining diffuse reflectance spectroscopy and autofluorescence spectroscopy in characterizing hypertrophic scar tissues in vivo. The work relies on three axes. The first part concerns the development of an automatic system which is suitable for multimodal spectroscopic measurement. A series of calibration procedures are carried out for ensuring the reliability of the measurement result. The second part presents a preclinical study on an animal model (rabbit ear). An experimental protocol was implemented in order to create hypertrophic scars on which we can collect spectra to analyze. The third part deals with the classification problem on the spectra obtained. It provides a series of algorithmic methods for denoising and correcting the measured spectra, for automatically extracting some interpretable spectral features and for selecting an optimal subset for classification. The classification results arched using respectively 3 different classifiers (knn, LDA and ANN) show the ability of bimodal spectroscopy in characterization of the topic skin lesion. Furthermore, the features selected my selection method indicate that the hypertrophic scarring may involve a change in tissue structure and in the concentration of porphyrins embedded in the epidermis Spectroscopie multimodalités Instrumentation Cicatrice hypertrophique Modèle préclinique Extraction Sélection Classification supervisée Multimodal spectroscopy Instrumentation Hypertrophic scar Preclinical model Feature extraction Sélection Supervised classification 617.1
59	Contribution à la détection et à l'analyse des signaux EEG épileptiques : débruitage et séparation de sources / Contribution to the detection and analysis of epileptic EEG signals : denoising and source separation Romo Vazquez, Rebeca del Carmen 24 February 2010 (has links) L'objectif principal de cette thèse est le pré-traitement des signaux d'électroencéphalographie (EEG). En particulier, elle vise à développer une méthodologie pour obtenir un EEG dit "propre" à travers l'identification et l'élimination des artéfacts extra-cérébraux (mouvements oculaires, clignements, activité cardiaque et musculaire) et du bruit. Après identification, les artéfacts et le bruit doivent être éliminés avec une perte minimale d'information, car dans le cas d'EEG, il est de grande importance de ne pas perdre d'information potentiellement utile à l'analyse (visuelle ou automatique) et donc au diagnostic médical. Plusieurs étapes sont nécessaires pour atteindre cet objectif : séparation et identification des sources d'artéfacts, élimination du bruit de mesure et reconstruction de l'EEG "propre". A travers une approche de type séparation aveugle de sources (SAS), la première partie vise donc à séparer les signaux EEG dans des sources informatives cérébrales et des sources d'artéfacts extra-cérébraux à éliminer. Une deuxième partie vise à classifier et éliminer les sources d'artéfacts et elle consiste en une étape de classification supervisée. Le bruit de mesure, quant à lui, il est éliminé par une approche de type débruitage par ondelettes. La mise en place d'une méthodologie intégrant d'une manière optimale ces trois techniques (séparation de sources, classification supervisée et débruitage par ondelettes) constitue l'apport principal de cette thèse. La méthodologie développée, ainsi que les résultats obtenus sur une base de signaux d'EEG réels (critiques et inter-critiques) importante, sont soumis à une expertise médicale approfondie, qui valide l'approche proposée / The goal of this research is the electroencephalographic (EEG) signals preprocessing. More precisely, we aim to develop a methodology to obtain a "clean" EEG through the extra- cerebral artefacts (ocular movements, eye blinks, high frequency and cardiac activity) and noise identification and elimination. After identification, the artefacts and noise must be eliminated with a minimal loss of cerebral activity information, as this information is potentially useful to the analysis (visual or automatic) and therefore to the medial diagnosis. To accomplish this objective, several pre-processing steps are needed: separation and identification of the artefact sources, noise elimination and "clean" EEG reconstruction. Through a blind source separation (BSS) approach, the first step aims to separate the EEG signals into informative and artefact sources. Once the sources are separated, the second step is to classify and to eliminate the identified artefacts sources. This step implies a supervised classification. The EEG is reconstructed only from informative sources. The noise is finally eliminated using a wavelet denoising approach. A methodology ensuring an optimal interaction of these three techniques (BSS, classification and wavelet denoising) is the main contribution of this thesis. The methodology developed here, as well the obtained results from an important real EEG data base (ictal and inter-ictal) is subjected to a detailed analysis by medical expertise, which validates the proposed approach Pré-traitement Validation médicale Débruitage par ondelettes Classification supervisée Séparation aveugle de sources Elimination des artéfacts EEG EEG preprocessing Medical validation Wavelet denoising Artefacts elimination Blind source separation Supervised classification
60	Apprentissage automatique pour la détection de relations d'affaire Capo-Chichi, Grâce Prudencia 04 1900 (has links) No description available. Relation d’affaire Business relation Classification supervisée Supervised classification Sélection de caractéristiques Feature selection Unbalanced data Déséquilibre de classes

Search results