Spelling suggestions: "subject:"naive""
81 |
以文件分類技術預測股價趨勢 / Predicting Trends of Stock Prices with Text Classification Techniques陳俊達, Chen, Jiun-da Unknown Date (has links)
股價的漲跌變化是由於證券市場中眾多不同投資人及其投資決策後所產生的結果。然而,影響股價變動的因素眾多且複雜,新聞也屬於其中一種,新聞事件不但是投資人用來得知該股票上市公司的相關營運資訊的主要媒介,同時也是影響投資人決定或變更其股票投資策略的主要因素之一。本研究提出以新聞文件做為股價漲跌預測系統的基礎架構,透過文字探勘技術及分類技術來建置出能預測當日個股收盤股價漲跌趨勢之系統。
本研究共提出三種分類模型,分別是簡易貝氏模型、k最近鄰居模型以及混合模型,並設計了三組實驗,分別是分類器效能的比較、新聞樣本資料深度的比較、以及新聞樣本資料廣度的比較來檢驗系統的預測效能。實驗結果顯示,本研究所提出的分類模型可以有效改善相關研究中整體正確率高但各個類別的預測效能卻差異甚大的情況。而對於影響投資人獲利與否的關鍵類別"漲"及類別"跌"的平均預測效能上,本研究所提出的這三種分類模型亦同時具有良好的成效,可以做為投資人進行投資決策時的有效參考依據。 / Stocks' closing price levels can provide hints about investors' aggregate demands and aggregate supplies in the stock trading markets. If the level of a stock's closing price is higher than its previous closing price, it indicates that the aggregate demand is stronger than the aggregate supply in this trading day. Otherwise, the aggregate demand is weaker than the aggregate supply. It would be profitable if we can predict the individual stock's closing price level. For example, in case that one stock's current price is lower than its previous closing price. We can do the proper strategies(buy or sell) to gain profit if we can predict the stock's closing price level correctly in advance.
In this thesis, we propose and evaluate three models for predicting individual stock's closing price in the Taiwan stock market. These models include a naïve Bayes model, a k-nearest neighbors model, and a hybrid model. Experimental results show the proposed methods perform better than the NewsCATS system for the "UP" and "DOWN" categories.
|
82 |
應用共變異矩陣描述子及半監督式學習於行人偵測 / Semi-supervised learning for pedestrian detection with covariance matrix feature黃靈威, Huang, Ling Wei Unknown Date (has links)
行人偵測為物件偵測領域中一個極具挑戰性的議題。其主要問題在於人體姿勢以及衣著服飾的多變性,加之以光源照射狀況迥異,大幅增加了辨識的困難度。吾人在本論文中提出利用共變異矩陣描述子及結合單純貝氏分類器與級聯支持向量機的線上學習辨識器,以增進行人辨識之正確率與重現率。
實驗結果顯示,本論文所提出之線上學習策略在某些辨識狀況較差之資料集中能有效提升正確率與重現率達百分之十四。此外,即便於相同之初始訓練條件下,在USC Pedestrian Detection Test Set、 INRIA Person dataset 及 Penn-Fudan Database for Pedestrian Detection and Segmentation三個資料集中,本研究之正確率與重現率亦較HOG搭配AdaBoost之行人辨識方式為優。 / Pedestrian detection is an important yet challenging problem in object classification due to flexible body pose, loose clothing and ever-changing illumination. In this thesis, we employ covariance feature and propose an on-line learning classifier which combines naïve Bayes classifier and cascade support vector machine (SVM) to improve the precision and recall rate of pedestrian detection in a still image.
Experimental results show that our on-line learning strategy can improve precision and recall rate about 14% in some difficult situations. Furthermore, even under the same initial training condition, our method outperforms HOG + AdaBoost in USC Pedestrian Detection Test Set, INRIA Person dataset and Penn-Fudan Database for Pedestrian Detection and Segmentation.
|
83 |
Avaliação da distorção harmônica total de tensão no ponto de acoplamento comum industrial usando o processo KDD baseado em medição / Evaluation of total voltage harmonic distortion at the industrial joint coupling point using the KDD-based measurement processOLIVEIRA, Edson Farias de 27 March 2018 (has links)
Submitted by Kelren Mota (kelrenlima@ufpa.br) on 2018-06-13T17:38:37Z
No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Tese_AvaliacaoDistorcaoHarmonica.pdf: 4309009 bytes, checksum: 017d26b4d8e0ce6653f66d67f13f4cb6 (MD5) / Approved for entry into archive by Kelren Mota (kelrenlima@ufpa.br) on 2018-06-13T17:39:00Z (GMT) No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Tese_AvaliacaoDistorcaoHarmonica.pdf: 4309009 bytes, checksum: 017d26b4d8e0ce6653f66d67f13f4cb6 (MD5) / Made available in DSpace on 2018-06-13T17:39:00Z (GMT). No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Tese_AvaliacaoDistorcaoHarmonica.pdf: 4309009 bytes, checksum: 017d26b4d8e0ce6653f66d67f13f4cb6 (MD5)
Previous issue date: 2018-03-27 / In the last decades, the transformation industry has provided the introduction of increasingly faster and more energy efficient products for residential, commercial and industrial use, however these loads due to their non-linearity have contributed significantly to the increase of distortion levels harmonic of voltage as a result of the current according to the Power Quality indicators of the Brazilian electricity distribution system. The constant increase in the levels of distortions, especially at the point of common coupling, has generated in the current day a lot of concern in the concessionaires and in the consumers of electric power, due to the problems that cause like losses of the quality of electric power in the supply and in the installations of the consumers and this has provided several studies on the subject. In order to contribute to the subject, this thesis proposes a procedure based on the Knowledge Discovery in Database - KDD process to identify the impact loads of harmonic distortions of voltage at the common coupling point. The proposed methodology uses computational intelligence and data mining techniques to analyze the data collected by energy quality meters installed in the main loads and the common coupling point of the consumer and consequently establish the correlation between the harmonic currents of the nonlinear loads with the harmonic distortion at the common coupling point. The proposed process consists in analyzing the loads and the layout of the location where the methodology will be applied, in the choice and installation of the QEE meters and in the application of the complete KDD process, including the procedures for collection, selection, cleaning, integration, transformation and reduction, mining, interpretation, and evaluation of data. In order to contribute, the data mining techniques of Decision Tree and Naïve Bayes were applied and several algorithms were tested for the algorithm with the most significant results for this type of analysis as presented in the results. The results obtained evidenced that the KDD process has applicability in the analysis of the Voltage Total Harmonic Distortion at the Point of Common Coupling and leaves as contribution the complete description of each step of this process, and for this it was compared with different indices of data balancing, training and test and different scenarios in different shifts of analysis and presented good performance allowing their application in other types of consumers and energy distribution companies. It also shows, in the chosen application and using different scenarios, that the most impacting load was the seventh current harmonic of the air conditioning units for the collected data set. / Nas últimas décadas, a indústria de transformação, tem proporcionado a introdução de produtos cada vez mais rápidos e energeticamente mais eficientes para utilização residencial, comercial e industrial, no entanto essas cargas devido à sua não linearidade têm contribuído significativamente para o aumento dos níveis de distorção harmônica de tensão em decorrência da corrente conforme indicadores de Qualidade de Energia Elétrica do sistema brasileiro de distribuição de energia elétrico. O constante aumento dos níveis das distorções, principalmente no ponto de acoplamento comum, tem gerado nos dias atuais muita preocupação nas concessionárias e nos consumidores de energia elétrica, devido aos problemas que causam como perdas da qualidade de energia elétrica no fornecimento e nas instalações dos consumidores e isso têm proporcionado diversos estudos sobre o assunto. Com o intuito de contribuir com o assunto, a presente tese propõe um procedimento com base no processo Knowledge Discovery in Database - KDD para identificação das cargas impactantes das distorções harmônicas de tensão no ponto de acoplamento comum. A metodologia proposta utiliza técnicas de Inteligência computacional e mineração de dados para análise dos dados coletados por medidores de qualidade de energia instalados nas cargas principais e no ponto de acoplamento comum do consumidor e consequentemente estabelecer a correlação entre as correntes harmônicas das cargas não lineares com a distorção harmônica no ponto de acoplamento comum. O processo proposto consiste na análise das cargas e do layout do local onde a metodologia será aplicada, na escolha e na instalação dos medidores de QEE e na aplicação do processo KDD completo, incluindo os procedimentos de coleta, seleção, limpeza, integração, transformação e redução, mineração, interpretação, e avaliação dos dados. Com o propósito de contribuição foram aplicadas as técnicas de mineração de dados Árvore de Decisão e Naïve Bayes e foram testados diversos algoritmos em busca do algoritmo com resultados mais significativos para esse tipo de análise conforme apresentado nos resultados. Os resultados obtidos evidenciaram que o processo KDD possui aplicabilidade na análise da Distorção Harmônica Total de Tensão no Ponto de Acoplamento Comum e deixa como contribuição a descrição completa de cada etapa desse processo, e para isso foram comparados com diferentes índices de balanceamento de dados, treinamento e teste e diferentes cenários em diferentes turnos de análise e apresentaram bom desempenho possibilitando sua aplicação em outros tipos de consumidores e empresas de distribuição de energia. Evidencia também, na aplicação escolhida e utilizando diferentes cenários, que a carga mais impactante foi a sétima harmônica de corrente das centrais de ar condicionado para o conjunto de dados coletados.
|
84 |
Les chambres de leucoréduction sont une nouvelle source de cellules pour la génération de lignées de lymphocytes T en immunothérapieBoudreau, Gabrielle 10 1900 (has links)
No description available.
|
85 |
Redes probabilísticas de K-dependência para problemas de classificação binária / Redes probabilísticas de K-dependência para problemas de classificação bináriaSouza, Anderson Luiz de 28 February 2012 (has links)
Made available in DSpace on 2016-06-02T20:06:06Z (GMT). No. of bitstreams: 1
4338.pdf: 1335557 bytes, checksum: 8e0bef5711ff8c398be194e335deecec (MD5)
Previous issue date: 2012-02-28 / Universidade Federal de Sao Carlos / Classification consists in the discovery of rules of prediction to assist with planning and decision-making, being a continuously indispensable tool and a highly discussed subject in literature. As a special case in classification, we have the process of credit risk rating, within which there is interest in identifying good and bad paying customers through binary classification methods. Therefore, in many application backgrounds, as in financial, several techniques can be utilized, such as discriminating analysis, probit analysis, logistic regression and neural nets. However, the Probabilistic Nets technique, also known as Bayesian Networks, have showed itself as a practical convenient classification method with successful applications in several areas. In this paper, we aim to display the appliance of Probabilistic Nets in the classification scenario, specifically, the technique named K-dependence Bayesian Networks also known as KDB nets, as well as compared its performance with conventional techniques applied within context of the Credit Scoring and Medical diagnosis. Applications of the technique based in real and artificial datasets and its performance assisted by the bagging procedure will be displayed as results. / A classificação consiste na descoberta de regras de previsão para auxílio no planejamento e tomada de decisões, sendo uma ferramenta indispensável e um tema bastante discutido na literatura. Como caso especial de classificação, temos o processo de avaliação de risco de crédito, no qual temos o interesse de identificar clientes bons e maus pagadores através de métodos de classificação binária. Assim, em diversos enredos de aplicação, como nas financeiras, diversas técnicas podem ser utilizadas, tais como análise discriminante, análise probito, regressão logística e redes neurais. Porém, a técnica de Redes Probabilísticas, também conhecida como Redes Bayesianas, tem se mostrado um método prático de classificação e com aplicações bem sucedidas em diversos campos. Neste trabalho, visamos exibir a aplicação das Redes Probabilísticas no contexto de classificação, em específico, a técnica denominada Redes Probabilísticas com K-dependência, também conhecidas como redes KDB, bem como comparar seu desempenho com as técnicas convencionais aplicadas no contexto de Credit Scoring e Diagnose Médica. Exibiremos como resultado aplicações da técnica baseadas em conjuntos de dados reais e artificiais e seu desempenho auxiliado pelo procedimento de bagging.
|
86 |
Communication des organisations caritatives - Processus socio-cognitifs dans la production et la réception. Approches qualitative et expérimentale : processus socio-cognitifs dans la production et la réception : approches qualitative et expérimentale / The communication of charity organizations : socio-cognitive processes in the production and the reception : qualitative and experimental approachesBernard, Pascal 26 November 2015 (has links)
Régulièrement, les associations caritatives sollicitent financièrement des millions d’individus pour mener leurs actions sur le terrain. Représentant un enjeu majeur, ces campagnes de communication médiatique ont pour objectif d’appeler aux dons afin de collecter des fonds leur permettant de pérenniser leurs actions et de maintenir une indépendance à la fois financière et politique. Articulant une double méthodologie qualitative et expérimentale et l’aide d’un contexte théorique pluridisciplinaire mobilisant des ressources théoriques issues notamment des modèles psychosociaux de la réception, de la communication persuasive et de la communication engageante, la thèse vise le double objectif de mieux comprendre les processus de production et de réception de la communication d’appel aux dons des associations caritatives. Dans une logique de recherche action et devant l’importance des enjeux humains, nous proposons également des pistes pour contribuer à accroitre l’efficience des dispositifs de communication. / Charities regularly solicit millions of individuals financially to carry through their actions on the field. Representing a major stake, these media communication campaigns aim at calling for donations in order to raise funds which enable them to keep up their actions and maintain an independence both financially and politically. However, in literature so far, no research has been carried out about the socio-cognitive processes involved in this type of communication.Structuring a double qualitative and experimental methodology and the help of a theoretical multidisciplinary context calling up the theoretical resources mainly from the psychosocial models of the reception, the persuasive communication and the binding communication, this dissertation targets a double objective, namely a better understanding of the production and of the reception processes involved in the binding communication of charity fundraising campaigns
|
87 |
Machine Learning for Exploring State Space Structure in Genetic Regulatory NetworksThomas, Rodney H. 01 January 2018 (has links)
Genetic regulatory networks (GRN) offer a useful model for clinical biology. Specifically, such networks capture interactions among genes, proteins, and other metabolic factors. Unfortunately, it is difficult to understand and predict the behavior of networks that are of realistic size and complexity. In this dissertation, behavior refers to the trajectory of a state, through a series of state transitions over time, to an attractor in the network. This project assumes asynchronous Boolean networks, implying that a state may transition to more than one attractor. The goal of this project is to efficiently identify a network's set of attractors and to predict the likelihood with which an arbitrary state leads to each of the network’s attractors. These probabilities will be represented using a fuzzy membership vector.
Predicting fuzzy membership vectors using machine learning techniques may address the intractability posed by networks of realistic size and complexity. Modeling and simulation can be used to provide the necessary training sets for machine learning methods to predict fuzzy membership vectors. The experiments comprise several GRNs, each represented by a set of output classes. These classes consist of thresholds τ and ¬τ, where τ = [τlaw,τhigh]; state s belongs to class τ if the probability of its transitioning to attractor belongs to the range [τlaw,τhigh]; otherwise it belongs to class ¬τ. Finally, each machine learning classifier was trained with the training sets that was previously collected. The objective is to explore methods to discover patterns for meaningful classification of states in realistically complex regulatory networks.
The research design took a GRN and a machine learning method as input and produced output class < Ατ > and its negation ¬ < Ατ >. For each GRN, attractors were identified, data was collected by sampling each state to create fuzzy membership vectors, and machine learning methods were trained to predict whether a state is in a healthy attractor or not. For T-LGL, SVMs had the highest accuracy in predictions (between 93.6% and 96.9%) and precision (between 94.59% and 97.87%). However, naive Bayesian classifiers had the highest recall (between 94.71% and 97.78%). This study showed that all experiments have extreme significance with pvalue < 0.0001. The contribution this research offers helps clinical biologist to submit genetic states to get an initial result on their outcomes. For future work, this implementation could use other machine learning classifiers such as xgboost or deep learning methods. Other suggestions offered are developing methods that improves the performance of state transition that allow for larger training sets to be sampled.
|
88 |
Desarrollo de nuevos marcadores y clasificadores de bajo coste computacional para identificar afecciones cardiacas en registros ECGJiménez Serrano, Santiago 07 September 2023 (has links)
[ES] Las enfermedades cardiovasculares son una de las principales causas de mortalidad y morbilidad en el mundo. Entre las arritmias más comunes en adultos destaca la Fibrilación Auricular (FA), presentando una tendencia de crecimiento muy significativa, sobre todo en población de edad avanzada o con trastornos de obesidad. En el otro extremo, nos encontramos con la Miocardiopatía Arritmogénica (MCA), considerada una enfermedad rara con una prevalencia de 1:2000-5000 pero con gran afectación entre familiares directos, causante de muerte súbita cardiaca (MSC), y con un diagnóstico clínico complicado. Más allá de la FA o la MCA, existe una amplia variedad de patologías derivadas de una disfunción en la activación y conducción eléctrica del corazón.
Para todas ellas, el electrocardiograma (ECG) continúa figurando como la primera y principal técnica de diagnóstico clínico, siendo una herramienta fundamental de cribado y detección de patologías relativamente económica y ampliamente accesible. Sin embargo, el diagnóstico preciso a partir de la interpretación del ECG requiere de médicos experimentados, siendo ésta una tarea que consume recursos, tiempo y que además está sujeta a la variabilidad entre observadores.
Respecto a las afecciones cardiacas más comunes, conseguir un diagnóstico de forma automática que sea fiable, utilizando tanto 12 como un número reducido o único de derivaciones, sigue presentándose como un desafío. Este aspecto cobra especial relevancia con el uso cada vez más extendido de dispositivos portátiles o wearables, los cuales están ganando un gran interés para la detección temprana y preventiva de enfermedades cardiacas, registrando normalmente un número reducido de derivaciones ECG. Dicho uso masivo les confiere un gran potencial para facilitar el cribado y seguimiento de distintas afecciones en una amplia variedad de escenarios, a pesar de registrar señales de peor calidad en comparación con equipos certificados para uso clínico. El principal reto con estos dispositivos es encontrar un equilibrio adecuado entre la sensibilidad y la especificidad en la detección de ritmos cardiacos susceptibles de ser patológicos. En consecuencia, es indispensable diseñar e implementar algoritmos precisos adecuados para dispositivos móviles o portátiles capaces de detectar distintas afecciones cardiacas en registros de ECG.
Respecto las afecciones cardiacas menos comunes como el caso de la MCA, es necesario incrementar la sensibilidad en la detección durante los cribados intra-familiares realizados tras una MSC. Para ello, sería posible explorar biomarcadores propios a esta enfermedad obtenidos mediante técnicas de procesado de señales ECG, además de modelos de clasificación que hagan uso de ellos, contribuyendo así a reducir el número de casos de muerte súbita.
En base a lo descrito anteriormente, la presente tesis estudia las posibilidades de diagnóstico basadas en técnicas de aprendizaje y clasificación automática en dos escenarios principales. El primero aborda la detección de la FA, así como un amplio abanico de otras patologías cardiacas comunes, donde proponemos y validamos distintos modelos de clasificación de bajo consumo computacional. Todo esto, utilizando extensas bases de datos de acceso abierto, y haciendo énfasis en enfoques de derivación única, ya que son los más utilizados en dispositivos móviles e inteligentes. El segundo escenario se centra en la detección de MCA mediante las 12 derivaciones estándar del ECG, donde proponemos y validamos nuevos biomarcadores y modelos de clasificación que tratan de incrementar la sensibilidad de los cribados intra-familiares realizados tras una MSC. Para ello, utilizamos una base de datos específica de la Unidad de Cardiopatías Familiares del Hospital Universitario y Politécnico La Fe de València. / [CA] Les malalties cardiovasculars són una de les principals causes de mortalitat i morbiditat en el món. Entre les arrítmies més comunes en adults destaca la Fibril·lació Auricular (FA), presentant una tendència de creixement molt significativa, sobretot en població d'edat avançada o amb trastorns d'obesitat. En l'altre extrem, ens trobem amb la Miocardiopatia Arritmogènica (MCA), considerada una malaltia rara amb una prevalença de 1:2000-5000 però amb gran afectació entre familiars directes, causant de mort sobtada cardíaca (MSC), i amb un diagnòstic clínic complicat. Més enllà de la FA o la MCA, existeix una àmplia varietat de patologies derivades d'una disfunció en l'activació i conducció elèctrica del cor.
Per a totes elles, l'electrocardiograma (ECG) continua figurant com la primera i principal tècnica de diagnòstic clínic, sent una eina fonamental de cribratge i detecció de patologies relativament econòmica i àmpliament accessible. No obstant això, el diagnòstic precís a partir de la interpretació del ECG requereix de metges experimentats, sent aquesta una tasca que consumeix recursos, temps i que a més està subjecta a la variabilitat entre observadors.
Respecte a les afeccions cardíaques més comunes, aconseguir un diagnòstic de manera automàtica que siga fiable, utilitzant tant 12 com un número reduït o únic de derivacions, continua presentant-se com un desafiament. Aquest aspecte cobra especial rellevància amb l'ús cada vegada més estés de dispositius portàtils o wearables, els quals estan guanyant un gran interés per a la detecció precoç i preventiva de malalties cardíaques, registrant normalment un nombre reduït de derivacions ECG. Aquest ús massiu els confereix un gran potencial per a facilitar el cribratge i seguiment de diferents afeccions en una àmplia varietat d'escenaris, malgrat registrar senyals de pitjor qualitat en comparació amb equips certificats per a ús clínic. El principal repte amb aquests dispositius és trobar un equilibri adequat entre la sensibilitat i l'especificitat en la detecció de ritmes cardíacs susceptibles de ser patològics. En conseqüència, és indispensable dissenyar i implementar algorismes precisos adequats per a dispositius mòbils o portàtils capaços de detectar diferents afeccions cardíaques en registres de ECG.
Respecte les afeccions cardíaques menys comunes com el cas de la MCA, és necessari incrementar la sensibilitat en la detecció durant els cribratges intra-familiars realitzats després d'una MSC. Per a això, seria possible explorar biomarcadors propis a aquesta malaltia obtinguts mitjançant tècniques de processament de senyals ECG, a més de models de classificació que facen ús d'ells, contribuint així a reduir el nombre de casos de mort sobtada.
Sobre la base del descrit anteriorment, la present tesi estudia les possibilitats de diagnòstic basades en tècniques d'aprenentatge i classificació automàtica en dos escenaris principals. El primer aborda la detecció de la FA, així com un ampli ventall d'altres patologies cardíaques comunes, on proposem i validem diferents models de classificació de baix consum computacional. Tot això, utilitzant extenses bases de dades d'accés obert, i fent èmfasi en enfocaments de derivació única, ja que són els més utilitzats en dispositius mòbils i intel·ligents. El segon escenari se centra en la detecció de MCA mitjançant les 12 derivacions estàndard de l'ECG, on proposem i validem nous biomarcadors i models de classificació que tracten d'incrementar la sensibilitat dels cribratges intra-familiars realitzats després d'una MSC. Per a això, utilitzem una base de dades específica de la Unitat de Cardiopaties Familiars de l'Hospital Universitari i Politècnic La Fe de València. / [EN] Cardiovascular diseases are one of the leading causes of mortality and morbidity worldwide. Atrial Fibrillation (AF) stands out among adults' most common arrhythmias, presenting a very significant growth trend, especially in the elderly population or those with obesity disorders. At the other extreme, we find Arrhythmogenic Cardiomyopathy (ACM), a rare disease with a prevalence of 1:2000-5000 but great affectation among direct relatives, causing sudden cardiac death (SCD), and with a complicated clinical diagnosis. Beyond AF or ACM, there is a wide variety of pathologies derived from dysfunctions in the activation or electrical conduction of the heart.
For all of them, the electrocardiogram (ECG) continues to appear as the first and foremost clinical diagnostic technique, being a fundamental tool for screening and detecting pathologies that is relatively cheap and widely accessible. However, accurate diagnosis based on ECG interpretation requires experienced physicians, as this task consumes resources, time and is subject to variability between observers.
For the most common cardiac conditions, achieving a reliable diagnosis automatically, using either 12 or a smaller or single number of leads, remains a challenge. This aspect is especially relevant with the increasingly widespread use of portable or wearable devices, which are gaining significant interest for the early and preventive detection of heart disease, typically recording a reduced number of ECG leads. Such massive use gives them great potential to facilitate screening and monitoring different conditions in different scenarios, despite registering signals of lower quality compared to equipment certified for clinical use. The main challenge with these devices is finding the right balance between sensitivity and specificity in detecting pathologic heart rhythms. Consequently, designing and implementing accurate algorithms suitable for mobile or portable devices capable of detecting different cardiac conditions in ECG recordings is essential.
Concerning less common cardiac conditions such as the case of ACM, it is necessary to increase the sensitivity in detection during intra-family screenings carried out after an SCD. Hence, it would be possible to explore specific biomarkers to this disease obtained through ECG signal processing techniques, as well as classification models that use them, thus contributing to reduce the number of cases of sudden death.
Based on the previously described, this thesis studies the diagnostic possibilities based on machine learning and classification techniques in two main scenarios. The first deals with detecting AF and a wide range of other common cardiac pathologies, where we propose and validate different classification models with low computational consumption. All this, using extensive open access databases, and emphasizing single-lead approaches, since they are the most used in mobile and smart devices. The second scenario focuses on detecting ACM using the standard 12-lead ECG, where we propose and validate new biomarkers and classification models that try to increase the sensitivity of intra-family screenings carried out after an MSC. For this task, we used a specific database of the Familial Cardiopathies Unit of the Hospital Universitario y Politécnico La Fe de València. / Jiménez Serrano, S. (2023). Desarrollo de nuevos marcadores y clasificadores de bajo coste computacional para identificar afecciones cardiacas en registros ECG [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/196826
|
89 |
Geo-Locating Tweets with Latent Location InformationLee, Sunshin 13 February 2017 (has links)
As part of our work on the NSF funded Integrated Digital Event Archiving and Library (IDEAL) project and the Global Event and Trend Archive Research (GETAR) project, we collected over 1.4 billion tweets using over 1,000 keywords, key phrases, mentions, or hashtags, starting from 2009. Since many tweets talk about events (with useful location information), such as natural disasters, emergencies, and accidents, it is important to geo-locate those tweets whenever possible.
Due to possible location ambiguity, finding a tweet's location often is challenging. Many distinct places have the same geoname, e.g., "Greenville" matches 50 different locations in the U.S.A. Frequently, in tweets, explicit location information, like geonames mentioned, is insufficient, because tweets are often brief and incomplete. They have a small fraction of the full location information of an event due to the 140 character limitation. Location indicative words (LIWs) may include latent location information, for example, "Water main break near White House" does not have any geonames but it is related to a location "1600 Pennsylvania Ave NW, Washington, DC 20500 USA" indicated by the key phrase 'White House'.
To disambiguate tweet locations, we first extracted geospatial named entities (geonames) and predicted implicit state (e.g., Virginia or California) information from entities using machine learning algorithms including Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest (RF). Implicit state information helps reduce ambiguity. We also studied how location information of events is expressed in tweets and how latent location indicative information can help to geo-locate tweets. We then used a machine learning (ML) approach to predict the implicit state using geonames and LIWs.
We conducted experiments with tweets (e.g., about potholes), and found significant improvement in disambiguating tweet locations using a ML algorithm along with the Stanford NER. Adding state information predicted by our classifiers increased the possibility to find the state-level geo-location unambiguously by up to 80%. We also studied over 6 million tweets (3 mid-size and 2 big-size collections about water main breaks, sinkholes, potholes, car crashes, and car accidents), covering 17 months. We found that up to 91.1% of tweets have at least one type of location information (geo-coordinates or geonames), or LIWs. We also demonstrated that in most cases adding LIWs helps geo-locate tweets with less ambiguity using a geo-coding API. Finally, we conducted additional experiments with the five different tweet collections, and found significant improvement in disambiguating tweet locations using a ML approach with geonames and all LIWs that are present in tweet texts as features. / Ph. D. / As part of our work on the projects “Integrated Digital Event Archiving and Library (IDEAL)” and “Global Event and Trend Archive Research (GETAR),” funded by NSF, we collected over 1.4 billion tweets using over 1,000 keywords, key phrases, mentions, or hashtags, starting from 2009. Since many tweets talk about events (with useful location information), such as natural disasters, emergencies, and accidents, it is important to geolocate those tweets whenever possible.
Due to possible location ambiguity, finding a tweet’s location often is challenging. Many distinct places have the same geoname, e.g., “Greenville” matches 50 different locations in the U.S.A. Frequently, in tweets, explicit location information, like geonames mentioned, is insufficient, because tweets are often brief and incomplete. They have a small fraction of the full location information of an event due to the 140 character limitation. Location indicative words (LIWs) may include latent location information, for example, “Water main break near White House” does not have any geonames but it is related to a location “1600 Pennsylvania Ave NW, Washington, DC 20500 USA” indicated by the key phrase ‘White House’.
To disambiguate tweet locations, we first extracted geonames, and then predicted implicit state (e.g., Virginia or California) information from entities using machine learning (ML) algorithms (wherein computers learn from examples what state is appropriate). Implicit state information helps reduce ambiguity. We also studied how location information of events is expressed in tweets and how latent location indicative information can help to geo-locate tweets. We then used a ML approach to predict the implicit state using geonames and LIWs.
We conducted experiments with tweets (e.g., about potholes), and found significant improvement in disambiguating tweet locations using a ML algorithm along with the Stanford Named Entity Recognizer. Adding state information predicted by our classifiers increased the ability to find the state-level geo-location unambiguously by up to 80%. We also studied over 6 million tweets (in three mid-size and two big collections, about water main breaks, sinkholes, potholes, car crashes, and car accidents), covering 17 months. We found that up to 91.1% of tweets have at least one type of location information (geocoordinates or geonames), or LIWs. We also demonstrated that in most cases adding LIWs helps geo-locate tweets with less ambiguity using a geo-coding Web application (that converts addresses into geographic coordinates). Finally, we conducted additional experiments with the five different tweet collections, and found significant improvement in disambiguating tweet locations using a ML approach wherein the features considered are the geonames and all LIWs that are present in the tweet texts.
|
90 |
Analyse par apprentissage automatique des réponses fMRI du cortex auditif à des modulations spectro-temporellesBouchard, Lysiane 12 1900 (has links)
L'application de classifieurs linéaires à l'analyse des données d'imagerie cérébrale (fMRI) a mené à plusieurs percées intéressantes au cours des dernières années. Ces classifieurs combinent linéairement les réponses des voxels pour détecter et catégoriser différents états du cerveau. Ils sont plus agnostics que les méthodes d'analyses conventionnelles qui traitent systématiquement les patterns faibles et distribués comme du bruit. Dans le présent projet, nous utilisons ces classifieurs pour valider une hypothèse portant sur l'encodage des sons dans le cerveau humain. Plus précisément, nous cherchons à localiser des neurones, dans le cortex auditif primaire, qui détecteraient les modulations spectrales et temporelles présentes dans les sons. Nous utilisons les enregistrements fMRI de sujets soumis à 49 modulations spectro-temporelles différentes. L'analyse fMRI au moyen de classifieurs linéaires n'est pas standard, jusqu'à maintenant, dans ce domaine. De plus, à long terme, nous avons aussi pour objectif le développement de nouveaux algorithmes d'apprentissage automatique spécialisés pour les données fMRI. Pour ces raisons, une bonne partie des expériences vise surtout à étudier le comportement des classifieurs. Nous nous intéressons principalement à 3 classifieurs linéaires standards, soient l'algorithme machine à vecteurs de support (linéaire), l'algorithme régression logistique (régularisée) et le modèle bayésien gaussien naïf (variances partagées). / The application of linear machine learning classifiers to the analysis of brain imaging data (fMRI) has led to several interesting breakthroughs in recent years. These classifiers combine the responses of the voxels to detect and categorize different brain states. They allow a more agnostic analysis than conventional fMRI analysis that systematically treats weak and distributed patterns as unwanted noise. In this project, we use such classifiers to validate an hypothesis concerning the encoding of sounds in the human brain. More precisely, we attempt to locate neurons tuned to spectral and temporal modulations in sound. We use fMRI recordings of brain responses of subjects listening to 49 different spectro-temporal modulations. The analysis of fMRI data through linear classifiers is not yet a standard procedure in this field. Thus, an important objective of this project, in the long term, is the development of new machine learning algorithms specialized for neuroimaging data. For these reasons, an important part of the experiments is dedicated to studying the behaviour of the classifiers. We are mainly interested in 3 standard linear classifiers, namely the support vectors machine algorithm (linear), the logistic regression algorithm (regularized) and the naïve bayesian gaussian model (shared variances).
|
Page generated in 0.0487 seconds