Global ETD Search

281	Efficient multi-class objet detection with a hierarchy of classes / Détection efficace des objets multi-classes avec une hiérarchie des classes Odabai Fard, Seyed Hamidreza 20 November 2015 (has links) Dans cet article, nous présentons une nouvelle approche de détection multi-classes basée sur un parcours hiérarchique de classifieurs appris simultanément. Pour plus de robustesse et de rapidité, nous proposons d’utiliser un arbre de classes d’objets. Notre modèle de détection est appris en combinant les contraintes de tri et de classification dans un seul problème d’optimisation. Notre formulation convexe permet d’utiliser un algorithme de recherche pour accélérer le temps d’exécution. Nous avons mené des évaluations de notre algorithme sur les benchmarks PASCAL VOC (2007 et 2010). Comparé à l’approche un-contre-tous, notre méthode améliore les performances pour 20 classes et gagne 10x en vitesse. / Recent years have witnessed a competition in autonomous navigation for vehicles boosted by the advances in computer vision. The on-board cameras are capable of understanding the semantic content of the environment. A core component of this system is to localize and classify objects in urban scenes. There is a need to have multi-class object detection systems. Designing such an efficient system is a challenging and active research area. The algorithms can be found for applications in autonomous driving, object searches in images or video surveillance. The scale of object classes varies depending on the tasks. The datasets for object detection started with containing one class only e.g. the popular INRIA Person dataset. Nowadays, we witness an expansion of the datasets consisting of more training data or number of object classes. This thesis proposes a solution to efficiently learn a multi-class object detector. The task of such a system is to localize all instances of target object classes in an input image. We distinguish between three major efficiency criteria. First, the detection performance measures the accuracy of detection. Second, we strive low execution times during run-time. Third, we address the scalability of our novel detection framework. The two previous criteria should scale suitably with the number of input classes and the training algorithm has to take a reasonable amount of time when learning with these larger datasets. Although single-class object detection has seen a considerable improvement over the years, it still remains a challenge to create algorithms that work well with any number of classes. Most works on this subject extent these single-class detectors to work accordingly with multiple classes but remain hardly flexible to new object descriptors. Moreover, they do not consider all these three criteria at the same time. Others use a more traditional approach by iteratively executing a single-class detector for each target class which scales linearly in training time and run-time. To tackle the challenges, we present a novel framework where for an input patch during detection the closest class is ranked highest. Background labels are rejected as negative samples. The detection goal is to find the highest scoring class. To this end, we derive a convex problem formulation that combines ranking and classification constraints. The accuracy of the system is improved by hierarchically arranging the classes into a tree of classifiers. The leaf nodes represent the individual classes and the intermediate nodes called super-classes group recursively these classes together. The super-classes benefit from the shared knowledge of their descending classes. All these classifiers are learned in a joint optimization problem along with the previouslymentioned constraints. The increased number of classifiers are prohibitive to rapid execution times. The formulation of the detection goal naturally allows to use an adapted tree traversal algorithm to progressively search for the best class but reject early in the detection process the background samples and consequently reduce the system’s run-time. Our system balances between detection performance and speed-up. We further experimented with feature reduction to decrease the overhead of applying the high-level classifiers in the tree. The framework is transparent to the used object descriptor where we implemented the histogram of orientated gradients and deformable part model both introduced in [Felzenszwalb et al., 2010a]. The capabilities of our system are demonstrated on two challenging datasets containing different object categories not necessarily semantically related. We evaluate both the detection performance with different number of classes and the scalability with respect to run-time. Our experiments show that this framework fulfills the requirements of a multi-class object detector and highlights the advantages of structuring class-level knowledge. Détection multi-classes d’objets Classification hiérarchique Inférence rapide Arbre de classifieurs Parcours d’arbre Apprentissage hiérarchique SVM structuré Multi-class object detection Hierarchical classification Rapid inference Tree of classifiers Tree traversal Hierarchical learning Structured SVM
282	Adaptation des techniques actuelles de scoring aux besoins d'une institution de crédit : le CFCAL-Banque / Adaptation of current scoring techniques to the needs of a credit institution : the Crédit Foncier et Communal d'Alsace et de Lorraine (CFCAL-banque) Kouassi, Komlan Prosper 26 July 2013 (has links) Les institutions financières sont, dans l’exercice de leurs fonctions, confrontées à divers risques, entre autres le risque de crédit, le risque de marché et le risque opérationnel. L’instabilité de ces facteurs fragilise ces institutions et les rend vulnérables aux risques financiers qu’elles doivent, pour leur survie, être à même d’identifier, analyser, quantifier et gérer convenablement. Parmi ces risques, celui lié au crédit est le plus redouté par les banques compte tenu de sa capacité à générer une crise systémique. La probabilité de passage d’un individu d’un état non risqué à un état risqué est ainsi au cœur de nombreuses questions économiques. Dans les institutions de crédit, cette problématique se traduit par la probabilité qu’un emprunteur passe d’un état de "bon risque" à un état de "mauvais risque". Pour cette quantification, les institutions de crédit recourent de plus en plus à des modèles de credit-scoring. Cette thèse porte sur les techniques actuelles de credit-scoring adaptées aux besoins d’une institution de crédit, le CFCAL-banque, spécialisé dans les prêts garantis par hypothèques. Nous présentons en particulier deux modèles non paramétriques (SVM et GAM) dont nous comparons les performances en termes de classification avec celles du modèle logit traditionnellement utilisé dans les banques. Nos résultats montrent que les SVM sont plus performants si l’on s’intéresse uniquement à la capacité de prévision globale. Ils exhibent toutefois des sensibilités inférieures à celles des modèles logit et GAM. En d’autres termes, ils prévoient moins bien les emprunteurs défaillants. Dans l’état actuel de nos recherches, nous préconisons les modèles GAM qui ont certes une capacité de prévision globale moindre que les SVM, mais qui donnent des sensibilités, des spécificités et des performances de prévision plus équilibrées. En mettant en lumière des modèles ciblés de scoring de crédit, en les appliquant sur des données réelles de crédits hypothécaires, et en les confrontant au travers de leurs performances de classification, cette thèse apporte une contribution empirique à la recherche relative aux modèles de credit-scoring. / Financial institutions face in their functions a variety of risks such as credit, market and operational risk. These risks are not only related to the nature of the activities they perform, but also depend on predictable external factors. The instability of these factors makes them vulnerable to financial risks that they must appropriately identify, analyze, quantify and manage. Among these risks, credit risk is the most prominent due to its ability to generate a systemic crisis. The probability for an individual to switch from a risked to a riskless state is thus a central point to many economic issues. In credit institution, this problem is reflected in the probability for a borrower to switch from a state of “good risk” to a state of “bad risk”. For this quantification, banks increasingly rely on credit-scoring models. This thesis focuses on the current credit-scoring techniques tailored to the needs of a credit institution: the CFCAL-banque specialized in mortgage credits. We particularly present two nonparametric models (SVM and GAM) and compare their performance in terms of classification to those of logit model traditionally used in banks. Our results show that SVM are more effective if we only focus on the global prediction performance of the models. However, SVM models give lower sensitivities than logit and GAM models. In other words the predictions of SVM models on defaulted borrowers are not satisfactory as those of logit or GAM models. In the present state of our research, even GAM models have lower global prediction capabilities, we recommend these models that give more balanced sensitivities, specificities and performance prediction. This thesis is not completely exhaustive about the scoring techniques for credit risk management. By trying to highlight targeted credit scoring models, adapt and apply them on real mortgage data, and compare their performance through classification, this thesis provides an empirical and methodological contribution to research on scoring models for credit risk management. Risque de crédit Credit-scoring Probabilité de défaut Hyperplan séparateur Scoring par les SVM Technique SMOTE Scoring par les GAM Smooth backfitting Credit risk Credit-scoring Probability of default Separating hyperplane Support vector machines (SVM) SMOTE technique Scoring with GAM Smooth backfitting 332.7
283	[en] SEMANTIC ROLE-LABELING FOR PORTUGUESE / [pt] ANOTADOR DE PAPEIS SEMÂNTICOS PARA PORTUGUÊS ARTHUR BELTRAO CASTILHO NETO 23 June 2017 (has links) [pt] A anotação de papeis semânticos (APS) é uma importante tarefa do processamento de linguagem natural (PLN), que possibilita estabelecer uma relação de significado entre os eventos descritos em uma sentença e seus participantes. Dessa forma, tem o potencial de melhorar o desempenho de inúmeros outros sistemas, tais como: tradução automática, correção ortográfica, extração e recuperação de informações e sistemas de perguntas e respostas, uma vez que reduz as ambiguidades existentes no texto de entrada. A grande maioria dos sistemas de APS publicados no mundo realiza a tarefa empregando técnicas de aprendizado supervisionado e, para obter melhores resultados, usam corpora manualmente revisados de tamanho considerável. No caso do Brasil, o recurso lexical que possui anotações semânticas (Propbank.br) é muito menor. Por isso, nos últimos anos, foram feitas tentativas de melhorar esse resultado utilizando técnicas de aprendizado semisupervisionado ou não-supervisionado. Embora esses trabalhos tenham contribuido direta e indiretamente para a área de PLN, não foram capazes de superar o desempenho dos sistemas puramente supervisionados. Este trabalho apresenta uma abordagem ao problema de anotação de papéis semânticos no idioma português. Utilizamos aprendizado supervisionado sobre um conjunto de 114 atributos categóricos e empregando duas técnicas de regularização de domínio, combinadas para reduzir o número de atributos binários em 96 por cento. O modelo gerado usa uma support vector machine com solver L2-loss dual support vector classification e é testado na base PropBank.br, apresentando desempenho ligeiramente superior ao estado-da-arte. O sistema é avaliado empiricamente pelo script oficial da CoNLL 2005 Shared Task, obtendo 82,17 por cento de precisão, 82,88 por cento de cobertura e 82,52 por cento de F1 ao passo que o estado-da-arte anterior atinge 83,0 por cento de precisão, 81,7 por cento de cobertura e 82,3 por cento de F1. / [en] Semantic role-labeling (SRL) is an important task of natural language processing (NLP) which allows establishing meaningful relationships between events described in a given sentence and its participants. Therefore, it can potentially improve performance on a large number of NLP systems such as automatic translation, spell correction, information extraction and retrieval and question answering, as it decreases ambiguity in the input text. The vast majority of SRL systems reported so far employed supervised learning techniques to perform the task. For better results, large sized manually reviewed corpora are used. The Brazilian semantic role labeled lexical resource (Propbank.br) is much smaller. Hence, in recent years, attempts have been made to improve performance using semi supervised and unsupervised learning. Even making several direct and indirect contributions to NLP, those studies were not able to outperform exclusively supervised systems. This paper presents an approach to the SRL task in Portuguese language using supervised learning over a set of 114 categorical features. Over those, we apply a combination of two domain regularization methods to cut binary features down to 96 percent. We test a SVM model (L2-loss dual support vector classification) on PropBank.Br dataset achieving results slightly better than state-of-the-art. We empirically evaluate the system using official CoNLL 2005 Shared Task script pulling 82.17 percent precision, 82.88 percent coverage and 82.52 percent F1. The previous state-of-the-art Portuguese SRL system scores 83.0 percent precision, 81.7 percent coverage and 82.3 percent F1. [pt] SELECAO DE ATRIBUTOS [en] FEATURE SELECTION [pt] SUPPORT VECTOR MACHINES [en] SUPPORT VECTOR MACHINES [pt] SVM [en] SVM [pt] APRENDIZADO SUPERVISIONADO [pt] ANOTACAO DE PAPEIS SEMANTICOS [pt] APS [pt] PROCESSAMENTO DE LINGUA NATURAL [pt] PLN [pt] LIBLINEAR [pt] PROPBANK BR [pt] REGULARIZACAO DE DOMINIO
284	Estudo da previsão de propriedades do biodiesel utilizando espectros de infravermelho e calibração multivariada / Study of prediction of biodiesel properties using infrared spectra and multivariate calibration Camilla Lima Cunha 25 February 2014 (has links) O biodiesel tem sido amplamente utilizado como uma fonte de energia renovável, que contribui para a diminuição de demanda por diesel mineral. Portanto, existem várias propriedades que devem ser monitoradas, a fim de produzir e distribuir biodiesel com a qualidade exigida. Neste trabalho, as propriedades físicas do biodiesel, tais como massa específica, índice de refração e ponto de entupimento de filtro a frio foram medidas e associadas a espectrometria no infravermelho próximo (NIR) e espectrometria no infravermelho médio (Mid-IR) utilizando ferramentas quimiométricas. Os métodos de regressão por mínimos quadrados parciais (PLS), regressão de mínimos quadrados parciais por intervalos (iPLS), e regressão por máquinas de vetor de suporte (SVM) com seleção de variáveis por Algoritmo Genético (GA) foram utilizadas para modelar as propriedades mencionadas. As amostras de biodiesel foram sintetizadas a partir de diferentes fontes, tais como canola, girassol, milho e soja. Amostras adicionais de biodiesel foram adquiridas de um fornecedor da região sul do Brasil. Em primeiro lugar, o pré-processamento de correção de linha de base foi usado para normalizar os dados espectrais de NIR, seguidos de outros tipos de pré-processamentos que foram aplicados, tais como centralização dos dados na média, 1 derivada e variação de padrão normal. O melhor resultado para a previsão do ponto de entupimento de filtro a frio foi utilizando os espectros de Mid-IR e o método de regressão GA-SVM, com alto coeficiente de determinação da previsão, R2Pred=0,96 e baixo valor da Raiz Quadrada do Erro Médio Quadrático da previsão, RMSEP (C)= 0,6. Para o modelo de previsão da massa específica, o melhor resultado foi obtido utilizando os espectros de Mid-IR e regressão por PLS, com R2Pred=0,98 e RMSEP (g/cm3)= 0,0002. Quanto ao modelo de previsão para o índice de refração, o melhor resultado foi obtido utilizando os espectros de Mid-IR e regressão por PLS, com excelente R2Pred=0,98 e RMSEP= 0,0001. Para esses conjuntos de dados, o PLS e o SVM demonstraram sua robustez, apresentando-se como ferramentas úteis para a previsão das propriedades do biodiesel estudadas / Biodiesel has been widely used as a renewable energy source which contributes to the mineral diesel decrease demand. Therefore, there are several properties that must be monitored in order to produce and distribute biodiesel with the required quality. In this work, the biodiesel physical properties such as specific mass, refractive index and cold filter plugging point were measured and associated with near infrared spectroscopy (NIR) and mid-Infrared spectroscopy (mid-IR) spectra using chemometric tools. The Partial Least Squares Regression (PLS), Interval Partial Least Squares Regression (iPLS), and Support Vector Machines Regression (SVM) with variable selection by Genetic Algorithm (GA) methods were used to model the aforementioned properties. The biodiesel samples were synthesized from different sources such as canola, sunflower, corn, and soybean. Additional biodiesel samples were purchased from a Brazil South Region supplier. Firstly, the preprocessing baseline correction was used to normalize the NIR spectral data, following others preprocessing types were applied in such as the mean center, the first derivative and standard normal variate. The best result for predicting the cold filter plugging point was using Mid-IR spectra and GA-SVM regression method, with high coefficient determination of prediction, R2Pred = 0.94 and low value of the Root Mean Square Error of Prediction, RMSEP (C) = 0.7. For the specific mass prediction model, the best result was obtained using the Mid-IR spectrums and PLS regression, with the R2Pred = 0.98 and RMSEP (g/cm3) = 0.0002. As for a prediction model for the refractive index, the best result was obtained using the Mid-IR spectrums and PLS regression, with the R2Pred = 0.98 and RMSEP = 0.0001. For these datasets, the PLS and SVM models demonstrated theirs robustness, presenting themselves as useful tools for the biodiesel properties prediction studied Biodiesel PLS SVM NIR Mid-IR massa específica Índice de refração ponto de entupimento de filtro a frio Biodiesel PLS SVM NIR Mid-IR Specific mass Refractive index Cold filter plugging point TECNOLOGIA QUIMICA
285	Stochastic density ratio estimation and its application to feature selection / Estimação estocástica da razão de densidades e sua aplicação em seleção de atributos Ígor Assis Braga 23 October 2014 (has links) The estimation of the ratio of two probability densities is an important statistical tool in supervised machine learning. In this work, we introduce new methods of density ratio estimation based on the solution of a multidimensional integral equation involving cumulative distribution functions. The resulting methods use the novel V -matrix, a concept that does not appear in previous density ratio estimation methods. Experiments demonstrate the good potential of this new approach against previous methods. Mutual Information - MI - estimation is a key component in feature selection and essentially depends on density ratio estimation. Using one of the methods of density ratio estimation proposed in this work, we derive a new estimator - VMI - and compare it experimentally to previously proposed MI estimators. Experiments conducted solely on mutual information estimation show that VMI compares favorably to previous estimators. Experiments applying MI estimation to feature selection in classification tasks evidence that better MI estimation leads to better feature selection performance. Parameter selection greatly impacts the classification accuracy of the kernel-based Support Vector Machines - SVM. However, this step is often overlooked in experimental comparisons, for it is time consuming and requires familiarity with the inner workings of SVM. In this work, we propose procedures for SVM parameter selection which are economic in their running time. In addition, we propose the use of a non-linear kernel function - the min kernel - that can be applied to both low- and high-dimensional cases without adding another parameter to the selection process. The combination of the proposed parameter selection procedures and the min kernel yields a convenient way of economically extracting good classification performance from SVM. The Regularized Least Squares - RLS - regression method is another kernel method that depends on proper selection of its parameters. When training data is scarce, traditional parameter selection often leads to poor regression estimation. In order to mitigate this issue, we explore a kernel that is less susceptible to overfitting - the additive INK-splines kernel. Then, we consider alternative parameter selection methods to cross-validation that have been shown to perform well for other regression methods. Experiments conducted on real-world datasets show that the additive INK-splines kernel outperforms both the RBF and the previously proposed multiplicative INK-splines kernel. They also show that the alternative parameter selection procedures fail to consistently improve performance. Still, we find that the Finite Prediction Error method with the additive INK-splines kernel performs comparably to cross-validation. / A estimação da razão entre duas densidades de probabilidade é uma importante ferramenta no aprendizado de máquina supervisionado. Neste trabalho, novos métodos de estimação da razão de densidades são propostos baseados na solução de uma equação integral multidimensional. Os métodos resultantes usam o conceito de matriz-V , o qual não aparece em métodos anteriores de estimação da razão de densidades. Experimentos demonstram o bom potencial da nova abordagem com relação a métodos anteriores. A estimação da Informação Mútua - IM - é um componente importante em seleção de atributos e depende essencialmente da estimação da razão de densidades. Usando o método de estimação da razão de densidades proposto neste trabalho, um novo estimador - VMI - é proposto e comparado experimentalmente a estimadores de IM anteriores. Experimentos conduzidos na estimação de IM mostram que VMI atinge melhor desempenho na estimação do que métodos anteriores. Experimentos que aplicam estimação de IM em seleção de atributos para classificação evidenciam que uma melhor estimação de IM leva as melhorias na seleção de atributos. A tarefa de seleção de parâmetros impacta fortemente o classificador baseado em kernel Support Vector Machines - SVM. Contudo, esse passo é frequentemente deixado de lado em avaliações experimentais, pois costuma consumir tempo computacional e requerer familiaridade com as engrenagens de SVM. Neste trabalho, procedimentos de seleção de parâmetros para SVM são propostos de tal forma a serem econômicos em gasto de tempo computacional. Além disso, o uso de um kernel não linear - o chamado kernel min - é proposto de tal forma que possa ser aplicado a casos de baixa e alta dimensionalidade e sem adicionar um outro parâmetro a ser selecionado. A combinação dos procedimentos de seleção de parâmetros propostos com o kernel min produz uma maneira conveniente de se extrair economicamente um classificador SVM com boa performance. O método de regressão Regularized Least Squares - RLS - é um outro método baseado em kernel que depende de uma seleção de parâmetros adequada. Quando dados de treinamento são escassos, uma seleção de parâmetros tradicional em RLS frequentemente leva a uma estimação ruim da função de regressão. Para aliviar esse problema, é explorado neste trabalho um kernel menos suscetível a superajuste - o kernel INK-splines aditivo. Após, são explorados métodos de seleção de parâmetros alternativos à validação cruzada e que obtiveram bom desempenho em outros métodos de regressão. Experimentos conduzidos em conjuntos de dados reais mostram que o kernel INK-splines aditivo tem desempenho superior ao kernel RBF e ao kernel INK-splines multiplicativo previamente proposto. Os experimentos também mostram que os procedimentos alternativos de seleção de parâmetros considerados não melhoram consistentemente o desempenho. Ainda assim, o método Finite Prediction Error com o kernel INK-splines aditivo possui desempenho comparável à validação cruzada. Estimação da informação mútua Estimação da razão de densidades Seleção de atributos Seleção de parâmetros em RLS Seleção de parâmetros em SVM Density ratio estimation Feature selection Mutual information estimation RLS parameter selection SVM parameter selection
286	Detecção e classificação de arritmias em eletrocardiogramas usando transformadas wavelets, máquinas de vetores de suporte e rede Bayesiana Rodrigues, Luiz Carlos Ferreira 02 March 2012 (has links) Made available in DSpace on 2016-03-15T19:37:40Z (GMT). No. of bitstreams: 1 Luiz Carlos Ferreira Rodrigues.pdf: 3281430 bytes, checksum: ce62f748aa1e8330c7d6402e06d3d41f (MD5) Previous issue date: 2012-03-02 / The cardiopathies are currently, according the Ministério da Saúde, the second biggest cause of mortality among the Brazilians, behind only the brain vascular diseases. The motivation for the work here presented is the identification and classification of cardiopathies registered in Electrocardiogram exams, ECG, such as premature contractions, branches blocks, tachycardia and other rhythms disturbance. Due its easy application and low cost, the ECG is one of the resources more commonly used by researchers and health professionals in the assessment of cardiac conditions. The computational application developed in this study relies in the application of Wavelets Transforms for the digital signal processing of ECG, in extracting the morphologic characteristics, dynamics and spectral of the cycles of the signal and in the submission of these characteristics to two Support Vector Machines (SVM). The output of these two SVM's are combined as input to a Bayesian Network for the identification and classification of the cardiopathies. The characteristic of each cycle, morphologic and spectral, has it dimensionality reduced by Principal Component Analysis (PCA). The spectral characteristics are extracted by the extractions of the Wavelets Transforms coefficients of the signal, whilst the dynamics characteristics are defined by the interval between the global maxima of each cycle. For development, testings and validations of the application we utilize the MIT-BIH Arrhythmia database, made available by Massachusetts Institute of Technology (MIT). At the end of this work we demonstrate that the application is able to recognize and classify 8 types of heart beats in ECG records, with an medium accuracy above 95,0%. / As cardiopatias são atualmente, segundo o Ministério da Saúde, a segunda maior causa de mortalidade entre brasileiros, ficando atrás apenas das doenças cerebrovasculares. A motivação do trabalho aqui apresentado é a identificação e classificação de cardiopatias registradas em exames de Eletrocardiograma, o ECG, tais como contrações prematuras, bloqueio de ramos, taquicardias e outros distúrbios de ritmo. Devido a sua fácil aplicação e baixo custo, o ECG é um dos recursos mais largamente utilizados por pesquisadores e profissionais da saúde na avaliação da saúde do coração. A aplicação computacional desenvolvida neste estudo concentra-se no uso de Transformadas Wavelets para o processamento digital dos sinais de ECG, na extração das características morfológicas, dinâmicas e espectrais de ciclos do sinal e na submissão dessas características a duas Máquinas de Vetores de Suporte (SVM). Os resultados das SVM's são combinadas em uma Rede Bayesiana para a identificação e classificação das cardiopatias. As características morfológicas de cada ciclo do sinal são extraídas através de Análise de Componentes Principais (PCA), as características espectrais são extraídas através da decomposição do sinal em coeficientes de Transformadas Wavelets enquanto as características dinâmicas são definidas pelos intervalos entre o máximo global de cada ciclo. Para desenvolvimento, testes e validação da aplicação foi utilizado o Banco de Arritmias MIT-BIH, disponibilizado pelo Massachusetts Institute of Technology (MIT). Neste trabalho demonstramos que a aplicação desenvolvida é capaz de reconhecer e classificar 8 tipos de batimentos cardíacos em registros de ECG, com uma acurácia média total de classificação superior a 95,0%. ECG (Eletrocardiograma) complexo QRS wavelets SVM (Support Vector Machines) rede Bayesiana ECG (electrocardiogram) QRS complex wavelets SVM (Support Vector Machines) Bayesian networks
287	Descritor de movimento baseado em tensor e histograma de gradientes Perez, Eder de Almeida 24 August 2012 (has links) Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-03-06T15:14:46Z No. of bitstreams: 1 ederdealmeidaperez.pdf: 749381 bytes, checksum: 7338f694cc850057100e730b520d74eb (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-03-06T20:25:35Z (GMT) No. of bitstreams: 1 ederdealmeidaperez.pdf: 749381 bytes, checksum: 7338f694cc850057100e730b520d74eb (MD5) / Made available in DSpace on 2017-03-06T20:25:35Z (GMT). No. of bitstreams: 1 ederdealmeidaperez.pdf: 749381 bytes, checksum: 7338f694cc850057100e730b520d74eb (MD5) Previous issue date: 2012-08-24 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / O reconhecimento de padrões de movimentos tem se tornado um campo de pesquisa muito atrativo nos últimos anos devido, entre outros fatores, à grande massificação de dados em vídeos e a tendência na criação de interfaces homem-máquina que utilizam expressões faciais e corporais. Esse campo pode ser considerado um dos requisitos chave para análise e entendimento de vídeos. Neste trabalho é proposto um descritor de movimentos baseado em tensores de 2a ordem e histogramas de gradientes (HOG - Histogram of Oriented Gradients). O cálculo do descritor é rápido, simples e eficaz. Além disso, nenhum aprendizado prévio é necessário sendo que a adição de novas classes de movimentos ou novos vídeos não necessita de mudanças ou que se recalculem os descritores já existentes. Cada quadro do vídeo é particionado e em cada partição calcula-se o histograma de gradientes no espaço e no tempo. A partir daí calcula-se o tensor do quadro e o descritor final é formado por uma série de tensores de cada quadro. O descritor criado é avaliado classificando-se as bases de vídeos KTH e Hollywood2, utilizadas na literatura atual, com um classificador Máquina Vetor Suporte (SVM). Os resultados obtidos na base KTH são próximos aos descritores do estado da arte que utilizam informação local do vídeo. Os resultados obtidos na base Hollywood2 não superam o estado da arte, mas são próximos o suficiente para concluirmos que o método proposto é eficaz. Apesar de a literatura apresentar descritores que possuem resultados superiores na classificação, suas abordagens são complexas e de alto custo computacional. / The motion pattern recognition has become a very attractive research field in recent years due to the large amount of video data and the creation of human-machine interfaces that use facial and body expressions. This field can be considered one of the key requirements for analysis and understanding in video. This thesis proposes a motion descriptor based on second order tensor and histograms of oriented gradients. The calculation of the descriptor is fast, simple and effective. Furthermore, no prior knowledge of data basis is required and the addition of new classes of motion and videos do not need to recalculate the existing descriptors. The frame of a video is divided into a grid and the histogram of oriented gradients is computed in each cell. After that, the frame tensor is computed and the final descriptor is built by a series of frame tensors. The descriptor is evaluated in both KTH and Hollywood2 data basis, used in the current literature, with a Support Vector Machine classifier (SVM). The results obtained on the basis KTH are very close to the descriptors of the state-of-the-art that use local information of the video. The results obtained on the basis Hollywood2 not outweigh the state-of-the-art but are close enough to conclude that the proposed method is effective. Although the literature presents descriptors that have superior results, their approaches are complex and with computational cost. CNPQ::CIENCIAS EXATAS E DA TERRA Descritor de movimento Tensor de 2º ordem Série de tensores SVM Histograma de gradientes Modelagem do movimento Motion descriptor Second order tensor Series of tensors SVM Histogram of oriented gradients Motion modeling
288	公司財務困境機率之評估—Logistic-SVM模型之應用 / The Evaluation of Companies' Probability of Financial Distress—The Application of Logistic-SVM Model 羅子欣, Luo,Zi Xin Unknown Date (has links) 近年來，在中國大陸市場有大量公司進行掛牌上市的同時，也有越來越多的公司出現債務逾期甚至是違約的情況。考慮到目前中國經濟增速放緩，處在轉型發展的複雜階段，銀行信貸等資金供應鏈需要謹慎評估企業出現財務困境的風險。但是我們發現金融機構在平常管理信貸業務的時候會盲目地看重高額利潤的回報而忽略借款者潛在的財務危機，而且投資人在進行投資分析的時候往往也會忽略企業的財務狀況而使自己遭受損失，因此從企業的財務狀況入手對其進行財務困境機率的評估有非常重大的現實意義。本文通過對企業財務指標進行相關分析以構建公司財務困境機率評估模型。本文選取了不良貸款率最高的製造業作為研究對象，將2015年滬深兩地的124家上市製造業公司的財務資料作為訓練樣本，將2014年120家上市公司的財務資料作為檢驗樣本，將交易所特別處理公司劃分為非正常組公司，其餘為正常組。本文通過篩選得出23個財務指標作為研究變數，引入了 Logistic 模型與 SVM 模型，針對單一模型的預測結果在準確率和穩定性方面不理想的問題引入了基於 Logistic 模型、SVM 模型的組合模型，並用檢驗樣本進行了四個模型的相關實證分析，比較了四個模型之間的準確度。對四個模型進行實證分析的結果表明：Logistic模型穩健性好、可解釋性強、建模過程簡單易操作，但分類精度略低於 SVM 模型；SVM雖然分類精度高，但缺乏可解釋性和穩定性，且建模過程依賴專家知識和經驗；Logistic -SVM 組合模型則兼具其優點，預測精確度較單一模型均有提高，而且研究發現異態並行結構優於串型結構。通過本文建立的模型可以計算出企業的陷入財務困境的機率，有效評估企業的違約風險，進而為相關金融機構和投資者提供放款或投資的判斷依據。 / At present, more and more companies are listed in the Chinese mainland market. At the same time, more and more companies are frequently at risk of default and overdue. Given the slowdown in China's economic growth and the complex environment of transformation and development, the supply of funds such as bank loans and other capital needs to be cautious, debt default, loan overdue cases are still likely to occur one after another. However, we find that financial institutions blindly value the return of high profits while ignoring the potential financial crisis of borrowers in the normal management of credit business, it is of great significance to start with the financial status of a company to assess the probability of financial distress. This paper builds a company default probability assessment model by analyzing the financial indicators of enterprises. This paper selects the manufacturing industry with the highest NPL as the research object. Taking the financial data of 124 listed manufacturing companies in Shanghai and Shenzhen in 2015 as the training samples, using the financial data of 120 listed companies in 2014 as the test sample, Exchange special treatment companies divided into non-normal group companies, the rest for the normal group. According to the data of its 2015 financial indicators, 23 financial indicators were screened out as research variables, and a comprehensive analysis was carried out. The Logistic model and SVM model were introduced. Combined model was introduced based on the Logistic model and SVM model to solve the problem that the prediction accuracy and stability of the single model were not ideal,. Finally, empirical analysis of the four models is carried out using the sample data of listed companies in 2014, and the accuracy of the four models is compared. The results of empirical analysis of the four models show that Logistic regression model has no strict assumptions on the data, a better stability and interpretation, but the classification accuracy is slightly lower than the SVM model. The SVM model has higher classification accuracy, but the disadvantage is the lack of interpretability and stability, the modeling process depends on expert knowledge and experience. In order to balance the stability of Logistic model and the accuracy of SVM model, this paper introduces a combined model based on Logistic model and SVM model. The analysis shows that the prediction accuracy of combined model is higher than that of single model, the combination of Logistic regression model and SVM model based on Parallel structure has a higher prediction accuracy than Sequential structure. The model established in this paper can calculate the default probability of an enterprise, effectively assess companies’ risk of financial distress, and then provide the judgment basis for the relevant financial institutions and investors to lend or invest. 製造業上市公司 Logistic模型 SVM模型組合模型財務困境 Manufacturing enterprises Listed companies Logistic model SVM model Combined model Financial distress
289	De l'usage de la sémantique dans la classification supervisée de textes : application au domaine médical / On the use of semantics in supervised text classification : application in the medical domain Albitar, Shereen 12 December 2013 (has links) Cette thèse porte sur l’impact de l’usage de la sémantique dans le processus de la classification supervisée de textes. Cet impact est évalué au travers d’une étude expérimentale sur des documents issus du domaine médical et en utilisant UMLS (Unified Medical Language System) en tant que ressource sémantique. Cette évaluation est faite selon quatre scénarii expérimentaux d’ajout de sémantique à plusieurs niveaux du processus de classification. Le premier scénario correspond à la conceptualisation où le texte est enrichi avant indexation par des concepts correspondant dans UMLS ; le deuxième et le troisième scénario concernent l’enrichissement des vecteurs représentant les textes après indexation dans un sac de concepts (BOC – bag of concepts) par des concepts similaires. Enfin le dernier scénario utilise la sémantique au niveau de la prédiction des classes, où les concepts ainsi que les relations entre eux, sont impliqués dans la prise de décision. Le premier scénario est testé en utilisant trois des méthodes de classification: Rocchio, NB et SVM. Les trois autres scénarii sont uniquement testés en utilisant Rocchio qui est le mieux à même d’accueillir les modifications nécessaires. Au travers de ces différentes expérimentations nous avons tout d’abord montré que des améliorations significatives pouvaient être obtenues avec la conceptualisation du texte avant l’indexation. Ensuite, à partir de représentations vectorielles conceptualisées, nous avons constaté des améliorations plus modérées avec d’une part l’enrichissement sémantique de cette représentation vectorielle après indexation, et d’autre part l’usage de mesures de similarité sémantique en prédiction. / The main interest of this research is the effect of using semantics in the process of supervised text classification. This effect is evaluated through an experimental study on documents related to the medical domain using the UMLS (Unified Medical Language System) as a semantic resource. This evaluation follows four scenarios involving semantics at different steps of the classification process: the first scenario incorporates the conceptualization step where text is enriched with corresponding concepts from UMLS; both the second and the third scenarios concern enriching vectors that represent text as Bag of Concepts (BOC) with similar concepts; the last scenario considers using semantics during class prediction, where concepts as well as the relations between them are involved in decision making. We test the first scenario using three popular classification techniques: Rocchio, NB and SVM. We choose Rocchio for the other scenarios for its extendibility with semantics. According to experiment, results demonstrated significant improvement in classification performance using conceptualization before indexing. Moderate improvements are reported using conceptualized text representation with semantic enrichment after indexing or with semantic text-to-text semantic similarity measures for prediction. Classification supervisée de texte Sémantique Conceptualisation Enrichissement sémantique Mesures de similarité sémantique Domaine médical UMLS Rocchio NB SVM Supervised text classification Semantics Conceptualization Semantic enrichment Semantic similarity measures Medical domain UMLS Rocchio NB SVM
290	Automatická klasifikace spánkových fází / Automatic sleep scoring Schwanzer, Miroslav January 2019 (has links) This master thesis deals with classification of sleep stages on the base of polysomnographic signals. On several signals was performed analysis and feature extraxtion in time domain and in frequency domain as well. For feature extraxtion was used EEG, EOG and EMG signals. For classification was selected classification models K-NN, SVM and artifical neural network. Accuracy of classifation is different depending on used method and spleep stages split. The best results achieved classification among stages Wake, REM, and N3, with neural network usage. In this case the succes was 93,1 %.

Search results