Global ETD Search

231	Ensemble Stream Model for Data-Cleaning in Sensor Networks Iyer, Vasanth 16 October 2013 (has links) Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today. Sensor Networks Mobile Sensor Networks Data-cleaning Machine Learning Data Mining Routing Power-aware routing Netcoding Data Aggregation Quality of Data Quality of Service Feature Extraction Randomforest Bagging Classifiers Renewable Energy
232	How accuracy of estimated glottal flow waveforms affects spoofed speech detection performance Deivard, Johannes January 2020 (has links) In the domain of automatic speaker verification, one of the challenges is to keep the malevolent people out of the system. One way to do this is to create algorithms that are supposed to detect spoofed speech. There are several types of spoofed speech and several ways to detect them, one of which is to look at the glottal flow waveform (GFW) of a speech signal. This waveform is often estimated using glottal inverse filtering (GIF), since, in order to create the ground truth GFW, special invasive equipment is required. To the author’s knowledge, no research has been done where the correlation of GFW accuracy and spoofed speech detection (SSD) performance is investigated. This thesis tries to find out if the aforementioned correlation exists or not. First, the performance of different GIF methods is evaluated, then simple SSD machine learning (ML) models are trained and evaluated based on their macro average precision. The ML models use different datasets composed of parametrized GFWs estimated with the GIF methods from the previous step. Results from the previous tasks are then combined in order to spot any correlations. The evaluations of the different methods showed that they created GFWs of varying accuracy. The different machine learning models also showed varying performance depending on what type of dataset that was being used. However, when combining the results, no obvious correlations between GFW accuracy and SSD performance were detected. This suggests that the overall accuracy of a GFW is not a substantial factor in the performance of machine learning-based SSD algorithms. computer science machine learning automatic speech verification spoofed speech detection glottal flow waveform glottal inverse filtering artificial neural network logistic regression support vector machine classifiers Computer Sciences Datavetenskap (datalogi)
233	Cartographie des interfaces protéine-protéine et recherche de cavités droguables / Cartography of protein-protein interfaces and research of drugable cavities Da Silva, Franck 23 September 2016 (has links) Les interfaces protéine-protéine sont au cœur de nombreux mécanismes physiologiques du vivant. Les caractériser au niveau moléculaire est un donc enjeu crucial pour la recherche de nouveaux candidat-médicaments. Nous proposons ici de nouvelles méthodes d’analyse des interfaces protéine-protéine à visée pharmaceutique. Notre protocole automatisé détecte les interfaces au sein des structures de la Protein Data Bank afin de définir les zones d’interactions à potentiel pharmacologique, les cavités droguables, les ligands présents à l’interface ainsi que les pharmacophores directement déduits à partir des cavités. Notre méthode permet de réaliser un état de l’art des informations disponibles autour des interfaces protéine-protéine ainsi que de prédire de nouvelles cibles potentielles pour des molécules candidats médicaments. / Protein-protein interfaces are involved in many physiological mechanisms of living cells. Their characterization at the molecular level is therefore crucial in drug discovery.We propose here new methods for the analysis protein-protein interfaces of pharmaceutical interest. Our automated protocol detects the biologicaly relevant interfaces within the Protein Data Bank structures, droguables cavities, ligands present at the interface and pharmacophores derived directly from the cavities. Our method enables a state-of- the-art of all available structural information about protein-protein interfaces and predicts potential new targets for drug candidates. Base de données Bioinformatique Cavité Chémoinformatique Classifieurs Interface protéine-protéine Pharmacophore Protéine Site de liaison Database Computational biology Cavities Computational chemistry Classifiers Protein-protein interface Pharmacophore Protein Binding site 541.2 572.8
234	Detekce, lokalizace a rozpoznání dopravních značek / Detection, Localization and Recognition of Traffic Signs Svoboda, Tomáš January 2011 (has links) This master's thesis deals with the localization, detection and recognition of traffic signs. The possibilities of selection of areas with possible traffic signs occurrence are analysed. The properties of different kinds of features used for traffic signs recognition are described next. It focuses on the features based on histogram of oriented gradients. Some possible classifiers are discussed, in the first place the cascade of support vector machines, which are used in resulting system. A description of the system implementation and data sets for 5 types of traffic signs is part of this thesis. Many experiments were accomplished with created system. The results of the experiments are very good. New datasets were acquired from approximately 9 hours of processed video sequences. There are about 13 500 images in these datasets.
235	Extraction of medical knowledge from clinical reports and chest x-rays using machine learning techniques Bustos, Aurelia 19 June 2019 (has links) This thesis addresses the extraction of medical knowledge from clinical text using deep learning techniques. In particular, the proposed methods focus on cancer clinical trial protocols and chest x-rays reports. The main results are a proof of concept of the capability of machine learning methods to discern which are regarded as inclusion or exclusion criteria in short free-text clinical notes, and a large scale chest x-ray image dataset labeled with radiological findings, diagnoses and anatomic locations. Clinical trials provide the evidence needed to determine the safety and effectiveness of new medical treatments. These trials are the basis employed for clinical practice guidelines and greatly assist clinicians in their daily practice when making decisions regarding treatment. However, the eligibility criteria used in oncology trials are too restrictive. Patients are often excluded on the basis of comorbidity, past or concomitant treatments and the fact they are over a certain age, and those patients that are selected do not, therefore, mimic clinical practice. This signifies that the results obtained in clinical trials cannot be extrapolated to patients if their clinical profiles were excluded from the clinical trial protocols. The efficacy and safety of new treatments for patients with these characteristics are not, therefore, defined. Given the clinical characteristics of particular patients, their type of cancer and the intended treatment, discovering whether or not they are represented in the corpus of available clinical trials requires the manual review of numerous eligibility criteria, which is impracticable for clinicians on a daily basis. In this thesis, a large medical corpora comprising all cancer clinical trials protocols in the last 18 years published by competent authorities was used to extract medical knowledge in order to help automatically learn patient’s eligibility in these trials. For this, a model is built to automatically predict whether short clinical statements were considered inclusion or exclusion criteria. A method based on deep neural networks is trained on a dataset of 6 million short free-texts to classify them between elegible or not elegible. For this, pretrained word embeddings were used as inputs in order to predict whether or not short free-text statements describing clinical information were considered eligible. The semantic reasoning of the word-embedding representations obtained was also analyzed, being able to identify equivalent treatments for a type of tumor in an analogy with the drugs used to treat other tumors. Results show that representation learning using deep neural networks can be successfully leveraged to extract the medical knowledge from clinical trial protocols and potentially assist practitioners when prescribing treatments. The second main task addressed in this thesis is related to knowledge extraction from medical reports associated with radiographs. Conventional radiology remains the most performed technique in radiodiagnosis services, with a percentage close to 75% (Radiología Médica, 2010). In particular, chest x-ray is the most common medical imaging exam with over 35 million taken every year in the US alone (Kamel et al., 2017). They allow for inexpensive screening of several pathologies including masses, pulmonary nodules, effusions, cardiac abnormalities and pneumothorax. For this task, all the chest-x rays that had been interpreted and reported by radiologists at the Hospital Universitario de San Juan (Alicante) from Jan 2009 to Dec 2017 were used to build a novel large-scale dataset in which each high-resolution radiograph is labeled with its corresponding metadata, radiological findings and pathologies. This dataset, named PadChest, includes more than 160,000 images obtained from 67,000 patients, covering six different position views and additional information on image acquisition and patient demography. The free text reports written in Spanish by radiologists were labeled with 174 different radiographic findings, 19 differential diagnoses and 104 anatomic locations organized as a hierarchical taxonomy and mapped onto standard Unified Medical Language System (UMLS) terminology. For this, a subset of the reports (a 27%) were manually annotated by trained physicians, whereas the remaining set was automatically labeled with deep supervised learning methods using attention mechanisms and fed with the text reports. The labels generated were then validated in an independent test set achieving a 0.93 Micro-F1 score. To the best of our knowledge, this is one of the largest public chest x-ray databases suitable for training supervised models concerning radiographs, and also the first to contain radiographic reports in Spanish. The PadChest dataset can be downloaded on request from http://bimcv.cipf.es/bimcv-projects/padchest/. PadChest is intended for training image classifiers based on deep learning techniques to extract medical knowledge from chest x-rays. It is essential that automatic radiology reporting methods could be integrated in a clinically validated manner in radiologists’ workflow in order to help specialists to improve their efficiency and enable safer and actionable reporting. Computer vision methods capable of identifying both the large spectrum of thoracic abnormalities (and also the normality) need to be trained on large-scale comprehensively labeled large-scale x-ray datasets such as PadChest. The development of these computer vision tools, once clinically validated, could serve to fulfill a broad range of unmet needs. Beyond implementing and obtaining results for both clinical trials and chest x-rays, this thesis studies the nature of the health data, the novelty of applying deep learning methods to obtain large-scale labeled medical datasets, and the relevance of its applications in medical research, which have contributed to its extramural diffusion and worldwide reach. This thesis describes this journey so that the reader is navigated across multiple disciplines, from engineering to medicine up to ethical considerations in artificial intelligence applied to medicine. Natural Language Processing Machine Learning Artificial Intelligence Neural Networks Deep Learning Computer Vision Multilabel Text Classifiers Clinical Research Radiology Chest X-Rays Medical Image Dataset Clinical Trials on Cancer Medical Text Lenguajes y Sistemas Informáticos
236	Machine Learning for Exploring State Space Structure in Genetic Regulatory Networks Thomas, Rodney H. 01 January 2018 (has links) Genetic regulatory networks (GRN) offer a useful model for clinical biology. Specifically, such networks capture interactions among genes, proteins, and other metabolic factors. Unfortunately, it is difficult to understand and predict the behavior of networks that are of realistic size and complexity. In this dissertation, behavior refers to the trajectory of a state, through a series of state transitions over time, to an attractor in the network. This project assumes asynchronous Boolean networks, implying that a state may transition to more than one attractor. The goal of this project is to efficiently identify a network's set of attractors and to predict the likelihood with which an arbitrary state leads to each of the network’s attractors. These probabilities will be represented using a fuzzy membership vector. Predicting fuzzy membership vectors using machine learning techniques may address the intractability posed by networks of realistic size and complexity. Modeling and simulation can be used to provide the necessary training sets for machine learning methods to predict fuzzy membership vectors. The experiments comprise several GRNs, each represented by a set of output classes. These classes consist of thresholds τ and ¬τ, where τ = [τlaw,τhigh]; state s belongs to class τ if the probability of its transitioning to attractor 􀜣 belongs to the range [τlaw,τhigh]; otherwise it belongs to class ¬τ. Finally, each machine learning classifier was trained with the training sets that was previously collected. The objective is to explore methods to discover patterns for meaningful classification of states in realistically complex regulatory networks. The research design took a GRN and a machine learning method as input and produced output class < Ατ > and its negation ¬ < Ατ >. For each GRN, attractors were identified, data was collected by sampling each state to create fuzzy membership vectors, and machine learning methods were trained to predict whether a state is in a healthy attractor or not. For T-LGL, SVMs had the highest accuracy in predictions (between 93.6% and 96.9%) and precision (between 94.59% and 97.87%). However, naive Bayesian classifiers had the highest recall (between 94.71% and 97.78%). This study showed that all experiments have extreme significance with pvalue < 0.0001. The contribution this research offers helps clinical biologist to submit genetic states to get an initial result on their outcomes. For future work, this implementation could use other machine learning classifiers such as xgboost or deep learning methods. Other suggestions offered are developing methods that improves the performance of state transition that allow for larger training sets to be sampled. asynchronous Boolean networks attractors Boolean networks cross-validation decision trees fuzzy basins fuzzy membership vectors fuzzy vectors genetic regulatory networks Markov Chain Monte Carlo naïve Bayesian classifiers support vector machines Computer Sciences
237	Zjednoznačňování slovních významů / Word Sense Disambiguation Kraus, Michal January 2008 (has links) The master's thesis deals with sense disambiguation of Czech words. Reader is informed about task's history and used algorithms are introduced. There are naive Bayes classifier, AdaBoost classifier, maximum entrophy method and decision trees described in this thesis. Used methods are clearly demonstrated. In the next parts of this thesis are used data also described. Last part of the thesis describe reached results. There are some ideas to improve the system at the end of the thesis.
238	EXTREME HEAT EVENT RISK MAP CREATION USING A RULE-BASED CLASSIFICATION APPROACH Simmons, Kenneth Rulon 19 March 2012 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / During a 2011 summer dominated by headlines about an earthquake and a hurricane along the East Coast, extreme heat that silently killed scores of Americans largely went unnoticed by the media and public. However, despite a violent spasm of tornadic activity that claimed over 500 lives during the spring of the same year, heat-related mortality annually ranks as the top cause of death incident to weather. Two major data groups used in researching vulnerability to extreme heat events (EHE) include socioeconomic indicators of risk and factors incident to urban living environments. Socioeconomic determinants such as household income levels, age, race, and others can be analyzed in a geographic information system (GIS) when formatted as vector data, while environmental factors such as land surface temperature are often measured via raster data retrieved from satellite sensors. The current research sought to combine the insights of both types of data in a comprehensive examination of heat susceptibility using knowledge-based classification. The use of knowledge classifiers is a non-parametric approach to research involving the creation of decision trees that seek to classify units of analysis by whether they meet specific rules defining the phenomenon being studied. In this extreme heat vulnerability study, data relevant to the deadly July 1995 heat wave in Chicago’s Cook County was incorporated into decision trees for 13 different experimental conditions. Populations vulnerable to heat were identified in five of the 13 conditions, with predominantly low-income African-American communities being particularly at-risk. Implications for the results of this study are given, along with direction for future research in the area of extreme heat event vulnerability. Heat waves (Meteorology) Heat -- Physiological effect Health risk assessment Medical climatology
239	Improving Artist Content Matching with Stacking : A comparison of meta-level learners for stacked generalization Magnússon, Fannar January 2018 (has links) Using automatic methods to assign incoming tracks and albums from multiple sources to artists entities in a digital rights management company, where no universal artist identifier is available and artist names can be ambiguous, is a challenging problem. In this work we propose to use stacked generalization to combine the predictions of heterogeneous classifiers for an improved quality of artist content matching on two datasets from a digital rights management company. We compare the performance of using a nonlinear meta-level learner to a linear meta-level learner for the stacked generalization on the two datasets, as well as on eight additional datasets to see how well our results general- ize. We conduct experiments and evaluate how the different meta-level learners perform, using the base learners’ class probabilities or a combination of the base learners’ class probabilities and original input features as meta-features. Our results indicate that stacking with a non-linear meta-level learner can improve predictions on the artist chooser problem. Furthermore, our results indicate that when using a linear meta-level learner for stacked generalization, using the base learners’ class probabilities as metafeatures works best, while using a combination of the base learners’ class probabilities and the original input features as meta-features works best when using a non-linear metalevel learner. Among all the evaluated stacking approaches, stacking with a non-linear meta-level learner, using a combination of the base learners’ class probabilities and the original input features as meta-features, performs the best in our experiments over the ten evaluation datasets. / Att använda automatiska metoder för att tilldela spår och album från olika källor till artister i en digital underhållningstjänst är problematiskt då det inte finns några universellt använda identifierare för artister och namn på artister kan vara tvetydiga. I det här verket föreslår vi en användning av staplad generalisering för att kombinera förutsägningar från heterogena klassificerare för förbättra artistmatchningen i två datamäng från en digital underhållningstjänst. Vi jämför prestandan mellan en linjär och en icke-linjär metainlärningsmetod för den staplade generaliseringen av de två datamängder, samt även åtta ytterligare datamäng för att se hur resultaten kan generaliseras. Vi utför experiment och utvärderar hur de olika metainlärningsmetoderna presterar genom att använda basinlärningsmetodens klassannolikheter eller en kombination av basinlärningsmetodens klassannolikheter och den ursprungliga representationen som metarepresentation. Våra resultat indikerar att staplandet med en icke-linjär metainlärningsmetod kan förbättra förutsägningarna i problemet med att tilldela artister. Vidare indikerar våra resultat att när man använder en linjär metainlärningsmetod för en staplad generalisering är det bäst att använda basinlärningsmetodens klassannolikheter som metarepresentation, medan när man använder en icke-linjär metainlärningsmetod för en staplade generaliseringen är det bäst att använda en kombination av basinlärningsmetodens klassannolikheter och den ursprungliga representationen som metarepresentation. Av alla utvärderade sätt att stapla är staplandet med en icke-linjär metainlärningsmetod med en kombination av basinlärningsmetodens klassannolikheter och den ursprungliga representationen som metarepresentation den ansats som presterar bäst i våra experiment över de tio datamängderna. Elektroteknik och elektronik Computer and Information Sciences Data- och informationsvetenskap
240	A Deep-Learning Approach to Evaluating the Navigability of Off-Road Terrain from 3-D Imaging Pech, Thomas Joel 30 August 2017 (has links) No description available. Computer Science Robotics Robots

Search results