261 |
Free-text Informed Duplicate Detection of COVID-19 Vaccine Adverse Event ReportsTuresson, Erik January 2022 (has links)
To increase medicine safety, researchers use adverse event reports to assess causal relationships between drugs and suspected adverse reactions. VigiBase, the world's largest database of such reports, collects data from numerous sources, introducing the risk of several records referring to the same case. These duplicates negatively affect the quality of data and its analysis. Thus, efforts should be made to detect and clean them automatically. Today, VigiBase holds more than 3.8 million COVID-19 vaccine adverse event reports, making deduplication a challenging problem for existing solutions employed in VigiBase. This thesis project explores methods for this task, explicitly focusing on records with a COVID-19 vaccine. We implement Jaccard similarity, TF-IDF, and BERT to leverage the abundance of information contained in the free-text narratives of the reports. Mean-pooling is applied to create sentence embeddings from word embeddings produced by a pre-trained SapBERT model fine-tuned to maximise the cosine similarity between narratives of duplicate reports. Narrative similarity is quantified by the cosine similarity between sentence embeddings. We apply a Gradient Boosted Decision Tree (GBDT) model for classifying report pairs as duplicates or non-duplicates. For a more calibrated model, logistic regression fine-tunes the leaf values of the GBDT. In addition, the model successfully implements a ruleset to find reports whose narratives mention a unique identifier of its duplicate. The best performing model achieves 73.3% recall and zero false positives on a controlled testing dataset for an F1-score of 84.6%, vastly outperforming VigiBase’s previously implemented model's F1-score of 60.1%. Further, when manually annotated by three reviewers, it reached an average 87% precision when fully deduplicating 11756 reports amongst records relating to hearing disorders.
|
262 |
Propuesta de sistema web para la optimización de búsqueda y selección de proveedores a través de georreferenciación y árboles de decisión en el sector de organización de eventos / Proposal for a web system to optimize the search and selection of suppliers through georeferencing and decision trees in the event organization sectorMundaca Retuerto, Laura Jeanette, Mundaca Retuerto, Leslie Carol 14 December 2021 (has links)
La presente tesis tiene como objeto de estudio a la empresa Fantastibox S.A.C que pertenece a la industria de la organización de eventos tales como onomásticos, graduaciones, eventos corporativos y alquiler de cabinas de fotos instantáneas.
La principal motivación de este proyecto es abordar la problemática principal de la empresa que hace referencia a los tiempos elevados en la búsqueda y elección inadecuada de los proveedores; lo cual se produce por el manejo manual de información y una inversión de tiempo considerable para la aplicación de los criterios al momento de seleccionar al proveedor (precio, ubicación, años de experiencia en el servicio) necesarios para garantizar un evento exitoso.
Esto conlleva, a que se produzcan pérdidas de oportunidades de negocio, una baja fidelización de sus clientes. Asimismo, no realizan el seguimiento adecuado a los proveedores con los que trabajaron en algún momento incurriendo en contrataciones fallidas.
Después de identificar la problemática y analizar los procesos de la empresa se propone como objetivo del proyecto la implementación de un sistema web que logre la automatización del proceso de búsqueda de proveedores y la tendencia a la baja de precios con la subasta inversa. Asimismo, se plantea el uso de árboles de decisión para lograr elegir a los mejores proveedores por cada servicio solicitado por el cliente.
La propuesta está alineada al objetivo de la empresa en cuanto a incrementar la respuesta al número de solicitudes de los clientes independientemente de la fecha del evento. / The present thesis has as object of study the company Fantastibox S.A.C that belongs to the industry of the organization of events such as onomastics, graduations, corporate events and rental of instant photo booths.
The main motivation of this project is to address the main problem of the company that refers to the high times in the search and inappropriate choice of suppliers; This is produced by the manual handling of information and a considerable investment of time for the application of the criteria when selecting the supplier (price, location, years of experience in the service) necessary to guarantee a successful event.
This leads to loss of business opportunities, low customer loyalty. Likewise, they do not adequately monitor the suppliers with whom they worked at some point, incurring failed contracts.
Based on the identification of the problem and analysis of the company's processes, the project's objective is to implement a web system that achieves the automation of the supplier search process and the downward trend in prices with reverse auction. Likewise, the use of decision tree algorithms is proposed to be able to choose the best providers for each service requested by the client.
The proposal is aligned with the company's objective in terms of increasing the response to the number of customer requests regardless of the date of the event. / Tesis
|
263 |
Machine Learning for Exploring State Space Structure in Genetic Regulatory NetworksThomas, Rodney H. 01 January 2018 (has links)
Genetic regulatory networks (GRN) offer a useful model for clinical biology. Specifically, such networks capture interactions among genes, proteins, and other metabolic factors. Unfortunately, it is difficult to understand and predict the behavior of networks that are of realistic size and complexity. In this dissertation, behavior refers to the trajectory of a state, through a series of state transitions over time, to an attractor in the network. This project assumes asynchronous Boolean networks, implying that a state may transition to more than one attractor. The goal of this project is to efficiently identify a network's set of attractors and to predict the likelihood with which an arbitrary state leads to each of the network’s attractors. These probabilities will be represented using a fuzzy membership vector.
Predicting fuzzy membership vectors using machine learning techniques may address the intractability posed by networks of realistic size and complexity. Modeling and simulation can be used to provide the necessary training sets for machine learning methods to predict fuzzy membership vectors. The experiments comprise several GRNs, each represented by a set of output classes. These classes consist of thresholds τ and ¬τ, where τ = [τlaw,τhigh]; state s belongs to class τ if the probability of its transitioning to attractor belongs to the range [τlaw,τhigh]; otherwise it belongs to class ¬τ. Finally, each machine learning classifier was trained with the training sets that was previously collected. The objective is to explore methods to discover patterns for meaningful classification of states in realistically complex regulatory networks.
The research design took a GRN and a machine learning method as input and produced output class < Ατ > and its negation ¬ < Ατ >. For each GRN, attractors were identified, data was collected by sampling each state to create fuzzy membership vectors, and machine learning methods were trained to predict whether a state is in a healthy attractor or not. For T-LGL, SVMs had the highest accuracy in predictions (between 93.6% and 96.9%) and precision (between 94.59% and 97.87%). However, naive Bayesian classifiers had the highest recall (between 94.71% and 97.78%). This study showed that all experiments have extreme significance with pvalue < 0.0001. The contribution this research offers helps clinical biologist to submit genetic states to get an initial result on their outcomes. For future work, this implementation could use other machine learning classifiers such as xgboost or deep learning methods. Other suggestions offered are developing methods that improves the performance of state transition that allow for larger training sets to be sampled.
|
264 |
Zjednoznačňování slovních významů / Word Sense DisambiguationKraus, Michal January 2008 (has links)
The master's thesis deals with sense disambiguation of Czech words. Reader is informed about task's history and used algorithms are introduced. There are naive Bayes classifier, AdaBoost classifier, maximum entrophy method and decision trees described in this thesis. Used methods are clearly demonstrated. In the next parts of this thesis are used data also described. Last part of the thesis describe reached results. There are some ideas to improve the system at the end of the thesis.
|
265 |
Development of Adaptive Computational Algorithms for Manned and Unmanned Flight SafetyElkin, Colin P. January 2018 (has links)
No description available.
|
266 |
Electric utility planning methods for the design of one shot stability controlsNaghsh Nilchi, Maryam 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Reliability of the wide-area power system is becoming a greater concern as the power grid is growing. Delivering electric power from the most economical source through fewest and shortest transmission lines to customers frequently increases the stress on the system and prevents it from maintaining its stability. Events like loss of transmission equipment and phase to ground faults can force the system to cross its stability limits by causing the generators to lose their synchronism. Therefore, a helpful solution is detection of these dynamic events and prediction of instability.
Decision Trees (DTs) were used as a pattern recognition tool in this thesis. Based on training data, DT generated rules for detecting event, predicting loss of synchronism, and selecting stabilizing control. To evaluate the accuracy of these rules, they were applied to testing data sets.
To train DTs of this thesis, direct system measurements like generator rotor angles and bus voltage angles as well as calculated indices such as the rate of change of bus angles, the Integral Square Bus Angle (ISBA) and the gradient of ISBA were used.
The initial method of this thesis included a response based DT only for instability prediction. In this method, time and location of the events were unknown and the one shot control was applied when the instability was predicted. The control applied was in the form of fast power changes on four different buses. Further, an event detection DT was combined with the instability prediction such that the data samples of each case was checked with event detection DT rules. In cases that an event was detected, control was applied upon prediction of instability.
Later in the research, it was investigated that different control cases could behave differently in terms of the number of cases they stabilize. Therefore, a third DT was trained to select between two different control cases to improve the effectiveness of the methodology.
It was learned through internship at Midwest Independent Transmission Operators (MISO) that post-event steady-state analysis is necessary for better understanding the effect of the faults on the power system. Hence, this study was included in this research.
|
267 |
GNSS Position Error Estimated by Machine Learning Techniques with Environmental Information Input / GNSS Positionsfelestimering genom Maskinlärningstekniker med Indata om Kringliggande MiljöKuratomi, Alejandro January 2019 (has links)
In Intelligent Transport Systems (ITS), specifically in autonomous driving operations, accurate vehicle localization is essential for safe operations. The localization accuracy depends on both position and positioning error estimates. Technologies aiming to improve positioning error estimation are required and are currently being researched. This project has investigated machine learning algorithms applied to positioning error estimation by assessing relevant information obtained from a GNSS receiver and adding environmental information coming from a camera mounted on a radio controlled vehicle testing platform. The research was done in two stages. The first stage consists of the machine learning algorithms training and testing on existing GNSS data coming from Waysure´s data base from tests ran in 2016, which did not consider the environment surrounding the GNSS receiver used during the tests. The second stage consists of the machine learning algorithms training and testing on GNSS data coming from new test runs carried on May 2019, which include the environment surrounding the GNSS receiver used. The results of both stages are compared. The relevant features are obtained as a result of the machine learning decision trees algorithm and are presented. This report concludes that there is no statistical evidence indicating that the tested environmental input from the camera could improve positioning error estimation accuracy with the built machine learning models. / Inom Intelligenta transportsystem (ITS), specifikt för självkörande fordon, så är en exakt fordonspositionering en nödvändighet för ökad trafiksäkerhet. Positionsnoggrannheten beror på estimering av både positionen samt positionsfelet. Olika tekniker och tillämpningar som siktar på att förbättra positionsfeluppskattningen behövs, vilket det nu forskas kring. Denna uppsats undersöker olika maskininlärningsalgoritmer inriktade på estimering av positionsfel. Algoritmerna utvärderar relevant information från en GNSS-mottagare, samt information från en kamera om den kringliggande miljön. En GNSS-mottagare och kamera monterades på en radiostyrd mobil testplattform för insamling av data. Examensarbetet består av två delar. Första delen innehåller träning och testning av valda maskininlärningsalgoritmer med GNSS-data tillhandahållen av Waysure från tester gjorda under 2016. Denna data inkluderar ingen information från den omkringliggande miljön runt GNSS-mottagaren. Andra delen består av träning och testning av valda maskininlärningsalgoritmer på GNSS-data som kommer från nya tester gjorda under maj 2019, vilka inkluderar miljöinformation runt GNSS-mottagaren. Resultaten från båda delar analyseras. De viktigaste egenskaper som erhålls från en trädbaserad modell, algoritmens beslutsträd, presenteras. Slutsatsen från denna rapport är att det inte går att statistiskt säkerställa att inkludering av information från den omkringliggande miljön från en kamera förbättrar noggrannheten vid estimering av positionsfelet med de valda maskininlärningsmodellerna.
|
268 |
Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease. / Knowledge Discovery och Data mining med hjälp av demografiska och kliniska data för att diagnostisera hjärtsjukdomar.Fernandez Sanchez, Javier January 2018 (has links)
Cardiovascular disease (CVD) is the leading cause of morbidity, mortality, premature death and reduced quality of life for the citizens of the EU. It has been reported that CVD represents a major economic load on health care sys- tems in terms of hospitalizations, rehabilitation services, physician visits and medication. Data Mining techniques with clinical data has become an interesting tool to prevent, diagnose or treat CVD. In this thesis, Knowledge Dis- covery and Data Mining (KDD) was employed to analyse clinical and demographic data, which could be used to diagnose coronary artery disease (CAD). The exploratory data analysis (EDA) showed that female patients at an el- derly age with a higher level of cholesterol, maximum achieved heart rate and ST-depression are more prone to be diagnosed with heart disease. Furthermore, patients with atypical angina are more likely to be at an elderly age with a slightly higher level of cholesterol and maximum achieved heart rate than asymptotic chest pain patients. More- over, patients with exercise induced angina contained lower values of maximum achieved heart rate than those who do not experience it. We could verify that patients who experience exercise induced angina and asymptomatic chest pain are more likely to be diagnosed with heart disease. On the other hand, Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Decision Tree, Bagging and Boosting methods were evaluated by adopting a stratified 10 fold cross-validation approach. The learning models provided an average of 78-83% F-score and a mean AUC of 85-88%. Among all the models, the highest score is given by Radial Basis Function Kernel Support Vector Machines (RBF-SVM), achieving 82.5% ± 4.7% of F-score and an AUC of 87.6% ± 5.8%. Our research con- firmed that data mining techniques can support physicians in their interpretations of heart disease diagnosis in addition to clinical and demographic characteristics of patients.
|
269 |
Anticipating bankruptcies among companies with abnormal credit risk behaviour : Acase study adopting a GBDT model for small Swedish companies / Förutseende av konkurser bland företag med avvikande kreditrisks beteende : En fallstudie som använder en GBDT-modell för små svenska företagHeinke, Simon January 2022 (has links)
The field of bankruptcy prediction has experienced a notable increase of interest in recent years. Machine Learning (ML) models have been an essential component of developing more sophisticated models. Previous studies within bankruptcy prediction have not evaluated how well ML techniques adopt for data sets of companies with higher credit risks. This study introduces a binary decision rule for identifying companies with higher credit risks (abnormal companies). Two categories of abnormal companies are explored based on the activity of: (1) abnormal credit risk analysis (”AC”, herein) and (2) abnormal payment remarks (”AP”, herein) among small Swedish limited companies. Companies not fulfilling the abnormality criteria are considered normal (”NL”, herein). The abnormal companies showed a significantly higher risk for future payment defaults than NL companies. Previous studies have mainly used financial features for bankruptcy prediction. This study evaluates the contribution of different feature categories: (1) financial, (2) qualitative, (3) performed credit risk analysis, and (4) payment remarks. Implementing a Light Gradient Boosting Machine (LightGBM), the study shows that bankruptcies are easiest to anticipate among abnormal companies compared to NL and all companies (full data set). LightGBM predicted bankruptcies with an average Area Under the Precision Recall Curve (AUCPR) of 45.92% and 61.97% for the AC and AP data sets, respectively. This performance is 6.13 - 27.65 percentage units higher compared to the AUCPR achieved on the NL and full data set. The SHapley Additive exPlanations (SHAP)-values indicate that financial features are the most critical category. However, qualitative features highly contribute to anticipating bankruptcies on the NL companies and the full data set. The features of performed credit risk analysis and payment remarks are primarily useful for the AC and AP data sets. Finally, the field of bankruptcy prediction is introduced to: (1) evaluate if bankruptcies among companies with other forms of credit risk can be anticipated with even higher predictive performance and (2) test if other qualitative features bring even better predictive performance to bankruptcy prediction. / Konkursklassificering har upplevt en anmärkningsvärd ökning av intresse de senaste åren. I denna utveckling har maskininlärningsmodeller utgjort en nyckelkompentent i utvecklingen mot mer sofistikerade modeller. Tidigare studier har inte utvärderat hur väl maskininlärningsmodeller kan appliceras för att förutspå konkurser bland företag med högre kreditrisk. Denna studie introducerar en teknik för att definiera företag med högre kreditrisk, det vill säga avvikande företag. Två olika kategorier av avvikande företag introduceras baserat på företagets aktivitet av: (1) kreditrisksanalyser på företaget (”AK”, hädanefter), samt (2) betalningsanmärkningar (”AM”, hädanefter) för små svenska aktiebolag. Företag som inte uppfyller kraven för att vara ett avvikande företag klassas som normala (”NL”, hädanefter). Studien utvärderar sedan hur väl konkurser kan förutspås för avvikande företag i relation till NL och alla företag. Tidigare studier har primärt utvärdera finansiella variabler för konkursförutsägelse. Denna studie utvärderar ett bredare spektrum av variabler: (1) finansiella, (2) kvalitativa, (3) kreditrisks analyser, samt (4) betalningsanmärkningar för konkursförutsägelse. Genom att implementera LightGBM finner studien att konkurser förutspås med högst noggrannhet bland AM företag. Modellen presenterar bättre för samtliga avvikande företag i jämförelse med både NL företag och för hela datasetet. LightGBM uppnår ett genomsnittligt AUC-PR om 45.92% och 61.97% för AK och AM dataseten. Dessa resultat är 6.13-27.65 procentenheter högre i jämförelse med det AUC-PR som uppnås för NL och hela datasetet. Genom att analysera modellens variabler med SHAP-värden visar studien att finansiella variabler är mest betydelsefulla för modells prestation. Kvalitativa variabler har däremot en stor betydelse för hur väl konkurser kan förutspås för NL företag samt alla företag. Variabelkategorierna som indikerar företagets historik av genomförda kreditrisksanalyser samt betalningsanmärkningar är primärt betydelsefulla för konkursklassificering av AK samt AM företag. Detta introducerar området av konkursförutsägelse till att: (1) undersöka om konkurser bland företag med andra kreditrisker kan förutspås med högre noggrannhet och (2) test om andra kvalitativa variabler ger bättre prediktive prestandard för konkursförutsägelse.
|
270 |
Use of Adaptive Mobile Applications to Improve MindfulnessBoshoff, Wiehan 08 June 2018 (has links)
No description available.
|
Page generated in 0.1405 seconds