Global ETD Search

591	Investigation of peptide nucleic acid fluorescence in situ hybridization for diagnosis of ventilator-associated pneumonia in bronchoalveolar lavage specimens Phillips, Aaron M. 03 January 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) QuickFISH PNA FISH VAP BAL ventilator-associated pneumonia bronchoalveolar lavage AdvanDx diagnosis Pneumonia -- Prevention Bronchoalveolar lavage -- Research Enteral feeding Artificial respiration -- Complications Airway extubation Nosocomial infections -- Transmission Peptides -- Biotechnology Nucleic acid probes -- Research Gastric acid -- Pathophysiology
592	Novel statistical approaches to text classification, machine translation and computer-assisted translation Civera Saiz, Jorge 04 July 2008 (has links) Esta tesis presenta diversas contribuciones en los campos de la clasificación automática de texto, traducción automática y traducción asistida por ordenador bajo el marco estadístico. En clasificación automática de texto, se propone una nueva aplicación llamada clasificación de texto bilingüe junto con una serie de modelos orientados a capturar dicha información bilingüe. Con tal fin se presentan dos aproximaciones a esta aplicación; la primera de ellas se basa en una asunción naive que contempla la independencia entre las dos lenguas involucradas, mientras que la segunda, más sofisticada, considera la existencia de una correlación entre palabras en diferentes lenguas. La primera aproximación dió lugar al desarrollo de cinco modelos basados en modelos de unigrama y modelos de n-gramas suavizados. Estos modelos fueron evaluados en tres tareas de complejidad creciente, siendo la más compleja de estas tareas analizada desde el punto de vista de un sistema de ayuda a la indexación de documentos. La segunda aproximación se caracteriza por modelos de traducción capaces de capturar correlación entre palabras en diferentes lenguas. En nuestro caso, el modelo de traducción elegido fue el modelo M1 junto con un modelo de unigramas. Este modelo fue evaluado en dos de las tareas más simples superando la aproximación naive, que asume la independencia entre palabras en differentes lenguas procedentes de textos bilingües. En traducción automática, los modelos estadísticos de traducción basados en palabras M1, M2 y HMM son extendidos bajo el marco de la modelización mediante mixturas, con el objetivo de definir modelos de traducción dependientes del contexto. Asimismo se extiende un algoritmo iterativo de búsqueda basado en programación dinámica, originalmente diseñado para el modelo M2, para el caso de mixturas de modelos M2. Este algoritmo de búsqueda n / Civera Saiz, J. (2008). Novel statistical approaches to text classification, machine translation and computer-assisted translation [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2502 Mixture modelling Em algorithm Bilingual text classification Machine-aided indexing Statistical machine translation Stochastic finite-state transducer Computer-assisted translation Word-based translation models N-gram language models LENGUAJES Y SISTEMAS INFORMATICOS 120304 - Inteligencia artificial 120317 - Informática
593	Contribution à la théorie des ondelettes : application à la turbulence des plasmas de bord de Tokamak et à la mesure dimensionnelle de cibles / Contribution to the wavelet theory : Application to edge plasma turbulence in tokamaks and to dimensional measurement of targets Scipioni, Angel 19 November 2010 (has links) La nécessaire représentation en échelle du monde nous amène à expliquer pourquoi la théorie des ondelettes en constitue le formalisme le mieux adapté. Ses performances sont comparées à d'autres outils : la méthode des étendues normalisées (R/S) et la méthode par décomposition empirique modale (EMD).La grande diversité des bases analysantes de la théorie des ondelettes nous conduit à proposer une approche à caractère morphologique de l'analyse. L'exposé est organisé en trois parties.Le premier chapitre est dédié aux éléments constitutifs de la théorie des ondelettes. Un lien surprenant est établi entre la notion de récurrence et l'analyse en échelle (polynômes de Daubechies) via le triangle de Pascal. Une expression analytique générale des coefficients des filtres de Daubechies à partir des racines des polynômes est ensuite proposée.Le deuxième chapitre constitue le premier domaine d'application. Il concerne les plasmas de bord des réacteurs de fusion de type tokamak. Nous exposons comment, pour la première fois sur des signaux expérimentaux, le coefficient de Hurst a pu être mesuré à partir d'un estimateur des moindres carrés à ondelettes. Nous détaillons ensuite, à partir de processus de type mouvement brownien fractionnaire (fBm), la manière dont nous avons établi un modèle (de synthèse) original reproduisant parfaitement la statistique mixte fBm et fGn qui caractérise un plasma de bord. Enfin, nous explicitons les raisons nous ayant amené à constater l'absence de lien existant entre des valeurs élevées du coefficient d'Hurst et de supposées longues corrélations.Le troisième chapitre est relatif au second domaine d'application. Il a été l'occasion de mettre en évidence comment le bien-fondé d'une approche morphologique couplée à une analyse en échelle nous ont permis d'extraire l'information relative à la taille, dans un écho rétrodiffusé d'une cible immergée et insonifiée par une onde ultrasonore / The necessary scale-based representation of the world leads us to explain why the wavelet theory is the best suited formalism. Its performances are compared to other tools: R/S analysis and empirical modal decomposition method (EMD). The great diversity of analyzing bases of wavelet theory leads us to propose a morphological approach of the analysis. The study is organized into three parts. The first chapter is dedicated to the constituent elements of wavelet theory. Then we will show the surprising link existing between recurrence concept and scale analysis (Daubechies polynomials) by using Pascal's triangle. A general analytical expression of Daubechies' filter coefficients is then proposed from the polynomial roots. The second chapter is the first application domain. It involves edge plasmas of tokamak fusion reactors. We will describe how, for the first time on experimental signals, the Hurst coefficient has been measured by a wavelet-based estimator. We will detail from fbm-like processes (fractional Brownian motion), how we have established an original model perfectly reproducing fBm and fGn joint statistics that characterizes magnetized plasmas. Finally, we will point out the reasons that show the lack of link between high values of the Hurst coefficient and possible long correlations. The third chapter is dedicated to the second application domain which is relative to the backscattered echo analysis of an immersed target insonified by an ultrasonic plane wave. We will explain how a morphological approach associated to a scale analysis can extract the diameter information Ondelettes Principe d'incertitude de Heisenberg Pavage temps-Fréquence Moments Régularité Support compact Fonction d'échelle Fonction d'ondelette Approximations Détails Résolution Filtre QMF Coefficients de Daubechies Analyse Reconstruction Précurseur Algorithme de Mallat Convolution Décimation Symétrisation Orthogonalisation Gram Schmidt Filtres frontières Complexité Décomposition modale empirique Méthode R/S Analyse des fluctuations redressées Fractal Auto-Similarité Plasma Bruit Gaussien fractionnaire Mouvement Brownien fractionnaire Ultrason Wavelets Heisenberg uncertainty principle Time-Frequency tiles Vanishing moments Regularity Compact support Scaling function Wavelet function Approximations Details Resolution QMF filters Daubechies coefficients Analysis Reconstruction Precursor Mallat algorithm Convolution Decimation Mirroring Orthogonalization Gram Schmidt Boundary filters Complexity Empirical Modal Decomposition R/S method Detrended fluctuations Analysis Fractal Self-Similarity Plasma Fractional Gaussian noise Fractional Brownian motion Ultrasound 530.44 515.243 3
594	Random parameters in learning: advantages and guarantees Evzenie Coupkova (18396918) 22 April 2024 (has links) <p dir="ltr">The generalization error of a classifier is related to the complexity of the set of functions among which the classifier is chosen. We study a family of low-complexity classifiers consisting of thresholding a random one-dimensional feature. The feature is obtained by projecting the data on a random line after embedding it into a higher-dimensional space parametrized by monomials of order up to k. More specifically, the extended data is projected n-times and the best classifier among those n, based on its performance on training data, is chosen. </p><p dir="ltr">We show that this type of classifier is extremely flexible, as it is likely to approximate, to an arbitrary precision, any continuous function on a compact set as well as any Boolean function on a compact set that splits the support into measurable subsets. In particular, given full knowledge of the class conditional densities, the error of these low-complexity classifiers would converge to the optimal (Bayes) error as k and n go to infinity. On the other hand, if only a training dataset is given, we show that the classifiers will perfectly classify all the training points as k and n go to infinity. </p><p dir="ltr">We also bound the generalization error of our random classifiers. In general, our bounds are better than those for any classifier with VC dimension greater than O(ln(n)). In particular, our bounds imply that, unless the number of projections n is extremely large, there is a significant advantageous gap between the generalization error of the random projection approach and that of a linear classifier in the extended space. Asymptotically, as the number of samples approaches infinity, the gap persists for any such n. Thus, there is a potentially large gain in generalization properties by selecting parameters at random, rather than optimization. </p><p dir="ltr">Given a classification problem and a family of classifiers, the Rashomon ratio measures the proportion of classifiers that yield less than a given loss. Previous work has explored the advantage of a large Rashomon ratio in the case of a finite family of classifiers. Here we consider the more general case of an infinite family. We show that a large Rashomon ratio guarantees that choosing the classifier with the best empirical accuracy among a random subset of the family, which is likely to improve generalizability, will not increase the empirical loss too much. </p><p dir="ltr">We quantify the Rashomon ratio in two examples involving infinite classifier families in order to illustrate situations in which it is large. In the first example, we estimate the Rashomon ratio of the classification of normally distributed classes using an affine classifier. In the second, we obtain a lower bound for the Rashomon ratio of a classification problem with a modified Gram matrix when the classifier family consists of two-layer ReLU neural networks. In general, we show that the Rashomon ratio can be estimated using a training dataset along with random samples from the classifier family and we provide guarantees that such an estimation is close to the true value of the Rashomon ratio.</p> Pattern recognition Deep learning Probability theory Statistical data science random classifiers rashomon ratio approximation error generalization error classifier complexity sample complexity random projection neural network Polynomial expansion modified Gram matrix epsilon-cover chaining covering number Bayes error reducible error epsilon-net random parameters generalizability classification growth function VC dimension projection of the data labels training error test error
595	The research on chinese text multi-label classification / Avancée en classification multi-labels de textes en langue chinoise / 中文文本多标签分类研究 Wei, Zhihua 07 May 2010 (has links) Text Classification (TC) which is an important field in information technology has many valuable applications. When facing the sea of information resources, the objects of TC are more complicated and diversity. The researches in pursuit of effective and practical TC technology are fairly challenging. More and more researchers regard that multi-label TC is more suited for many applications. This thesis analyses the difficulties and problems in multi-label TC and Chinese text representation based on a mass of algorithms for single-label TC and multi-label TC. Aiming at high dimensionality in feature space, sparse distribution in text representation and poor performance of multi-label classifier, this thesis will bring forward corresponding algorithms from different angles.Focusing on the problem of dimensionality “disaster” when Chinese texts are represented by using n-grams, two-step feature selection algorithm is constructed. The method combines filtering rare features within class and selecting discriminative features across classes. Moreover, the proper value of “n”, the strategy of feature weight and the correlation among features are discussed based on variety of experiments. Some useful conclusions are contributed to the research of n-gram representation in Chinese texts.In a view of the disadvantage in Latent Dirichlet Allocation (LDA) model, that is, arbitrarily revising the variable in smooth process, a new strategy for smoothing based on Tolerance Rough Set (TRS) is put forward. It constructs tolerant class in global vocabulary database firstly and then assigns value for out-of-vocabulary (oov) word in each class according to tolerant class.In order to improve performance of multi-label classifier and degrade computing complexity, a new TC method based on LDA model is applied for Chinese text representation. It extracts topics statistically from texts and then texts are represented by using the topic vector. It shows competitive performance both in English and in Chinese corpus.To enhance the performance of classifiers in multi-label TC, a compound classification framework is raised. It partitions the text space by computing the upper approximation and lower approximation. This algorithm decomposes a multi-label TC problem into several single-label TCs and several multi-label TCs which have less labels than original problem. That is, an unknown text should be classified by single-label classifier when it is partitioned into lower approximation space of some class. Otherwise, it should be classified by corresponding multi-label classifier.An application system TJ-MLWC (Tongji Multi-label Web Classifier) was designed. It could call the result from Search Engines directly and classify these results real-time using improved Naïve Bayes classifier. This makes the browse process more conveniently for users. Users could locate the texts interested immediately according to the class information given by TJ-MLWC. / La thèse est centrée sur la Classification de texte, domaine en pleine expansion, avec de nombreuses applications actuelles et potentielles. Les apports principaux de la thèse portent sur deux points : Les spécificités du codage et du traitement automatique de la langue chinoise : mots pouvant être composés de un, deux ou trois caractères ; absence de séparation typographique entre les mots ; grand nombre d’ordres possibles entre les mots d’une phrase ; tout ceci aboutissant à des problèmes difficiles d’ambiguïté. La solution du codage en «n-grams »(suite de n=1, ou 2 ou 3 caractères) est particulièrement adaptée à la langue chinoise, car elle est rapide et ne nécessite pas les étapes préalables de reconnaissance des mots à l’aide d’un dictionnaire, ni leur séparation. La classification multi-labels, c'est-à-dire quand chaque individus peut être affecté à une ou plusieurs classes. Dans le cas des textes, on cherche des classes qui correspondent à des thèmes (topics) ; un même texte pouvant être rattaché à un ou plusieurs thème. Cette approche multilabel est plus générale : un même patient peut être atteint de plusieurs pathologies ; une même entreprise peut être active dans plusieurs secteurs industriels ou de services. La thèse analyse ces problèmes et tente de leur apporter des solutions, d’abord pour les classifieurs unilabels, puis multi-labels. Parmi les difficultés, la définition des variables caractérisant les textes, leur grand nombre, le traitement des tableaux creux (beaucoup de zéros dans la matrice croisant les textes et les descripteurs), et les performances relativement mauvaises des classifieurs multi-classes habituels. / 文本分类是信息科学中一个重要而且富有实际应用价值的研究领域。随着文本分类处理内容日趋复杂化和多元化，分类目标也逐渐多样化，研究有效的、切合实际应用需求的文本分类技术成为一个很有挑战性的任务，对多标签分类的研究应运而生。本文在对大量的单标签和多标签文本分类算法进行分析和研究的基础上，针对文本表示中特征高维问题、数据稀疏问题和多标签分类中分类复杂度高而精度低的问题，从不同的角度尝试运用粗糙集理论加以解决，提出了相应的算法，主要包括：针对n-gram作为中文文本特征时带来的维数灾难问题，提出了两步特征选择的方法，即去除类内稀有特征和类间特征选择相结合的方法，并就n-gram作为特征时的n值选取、特征权重的选择和特征相关性等问题在大规模中文语料库上进行了大量的实验，得出一些有用的结论。针对文本分类中运用高维特征表示文本带来的分类效率低，开销大等问题，提出了基于LDA模型的多标签文本分类算法，利用LDA模型提取的主题作为文本特征，构建高效的分类器。在PT3多标签分类转换方法下，该分类算法在中英文数据集上都表现出很好的效果，与目前公认最好的多标签分类方法效果相当。针对LDA模型现有平滑策略的随意性和武断性的缺点，提出了基于容差粗糙集的LDA语言模型平滑策略。该平滑策略首先在全局词表上构造词的容差类，再根据容差类中词的频率为每类文档的未登录词赋予平滑值。在中英文、平衡和不平衡语料库上的大量实验都表明该平滑方法显著提高了LDA模型的分类性能，在不平衡语料库上的提高尤其明显。针对多标签分类中分类复杂度高而精度低的问题，提出了一种基于可变精度粗糙集的复合多标签文本分类框架，该框架通过可变精度粗糙集方法划分文本特征空间，进而将多标签分类问题分解为若干个两类单标签分类问题和若干个标签数减少了的多标签分类问题。即，当一篇未知文本被划分到某一类文本的下近似区域时，可以直接用简单的单标签文本分类器判断其类别；当未知文本被划分在边界域时，则采用相应区域的多标签分类器进行分类。实验表明，这种分类框架下，分类的精确度和算法效率都有较大的提高。本文还设计和实现了一个基于多标签分类的网页搜索结果可视化系统（MLWC），该系统能够直接调用搜索引擎返回的搜索结果，并采用改进的Naïve Bayes多标签分类算法实现实时的搜索结果分类，使用户可以快速地定位搜索结果中感兴趣的文本。 La Classification de texte N-grams Codage de la texte chiniose La classification multi-labels Latent Dirichlet Model L’ensembles approximatifs Assouplissement Le corpus de textes chinois multi-labels Chinese text classification Text representation Multi-label classification Rough Set Latent Dirichlet Allocation (LDA) Classification method Smoothing model Chinese text multi-label corpus 中文文本分类文本表示多标签分类 N-gram 粗糙集隐含狄利克雷分配分类器设计同济多标签网页分类系统中文文本多标签语料库
596	Mining of Textual Data from the Web for Speech Recognition / Mining of Textual Data from the Web for Speech Recognition Kubalík, Jakub January 2010 (has links) Prvotním cílem tohoto projektu bylo prostudovat problematiku jazykového modelování pro rozpoznávání řeči a techniky pro získávání textových dat z Webu. Text představuje základní techniky rozpoznávání řeči a detailněji popisuje jazykové modely založené na statistických metodách. Zvláště se práce zabývá kriterii pro vyhodnocení kvality jazykových modelů a systémů pro rozpoznávání řeči. Text dále popisuje modely a techniky dolování dat, zvláště vyhledávání informací. Dále jsou představeny problémy spojené se získávání dat z webu, a v kontrastu s tím je představen vyhledávač Google. Součástí projektu byl návrh a implementace systému pro získávání textu z webu, jehož detailnímu popisu je věnována náležitá pozornost. Nicméně, hlavním cílem práce bylo ověřit, zda data získaná z Webu mohou mít nějaký přínos pro rozpoznávání řeči. Popsané techniky se tak snaží najít optimální způsob, jak data získaná z Webu použít pro zlepšení ukázkových jazykových modelů, ale i modelů nasazených v reálných rozpoznávacích systémech.

Page generated in 0.0455 seconds