Global ETD Search

31	Uma comparação da aplicação de métodos computacionais de classificação de dados aplicados ao consumo de cinema no Brasil / A comparison of the application of data classification computational methods to the consumption of film at theaters in Brazil Nathalia Nieuwenhoff 13 April 2017 (has links) As técnicas computacionais de aprendizagem de máquina para classificação ou categorização de dados estão sendo cada vez mais utilizadas no contexto de extração de informações ou padrões em bases de dados volumosas em variadas áreas de aplicação. Em paralelo, a aplicação destes métodos computacionais para identificação de padrões, bem como a classificação de dados relacionados ao consumo dos bens de informação é considerada uma tarefa complexa, visto que tais padrões de decisão do consumo estão relacionados com as preferências dos indivíduos e dependem de uma composição de características individuais, variáveis culturais, econômicas e sociais segregadas e agrupadas, além de ser um tópico pouco explorado no mercado brasileiro. Neste contexto, este trabalho realizou o estudo experimental a partir da aplicação do processo de Descoberta do conhecimento (KDD), o que inclui as etapas de seleção e Mineração de Dados, para um problema de classificação binária, indivíduos brasileiros que consomem e não consomem um bem de informação, filmes em salas de cinema, a partir dos dados obtidos na Pesquisa de Orçamento Familiar (POF) 2008-2009, pelo Instituto Brasileiro de Geografia e Estatística (IBGE). O estudo experimental resultou em uma análise comparativa da aplicação de duas técnicas de aprendizagem de máquina para classificação de dados, baseadas em aprendizado supervisionado, sendo estas Naïve Bayes (NB) e Support Vector Machine (SVM). Inicialmente, a revisão sistemática realizada com o objetivo de identificar estudos relacionados a aplicação de técnicas computacionais de aprendizado de máquina para classificação e identificação de padrões de consumo indica que a utilização destas técnicas neste contexto não é um tópico de pesquisa maduro e desenvolvido, visto que não foi abordado em nenhum dos trabalhos estudados. Os resultados obtidos a partir da análise comparativa realizada entre os algoritmos sugerem que a escolha dos algoritmos de aprendizagem de máquina para Classificação de Dados está diretamente relacionada a fatores como: (i) importância das classes para o problema a ser estudado; (ii) balanceamento entre as classes; (iii) universo de atributos a serem considerados em relação a quantidade e grau de importância destes para o classificador. Adicionalmente, os atributos selecionados pelo algoritmo de seleção de variáveis Information Gain sugerem que a decisão de consumo de cultura, mais especificamente do bem de informação, filmes em cinema, está fortemente relacionada a aspectos dos indivíduos relacionados a renda, nível de educação, bem como suas preferências por bens culturais / Machine learning techniques for data classification or categorization are increasingly being used for extracting information or patterns from volumous databases in various application areas. Simultaneously, the application of these computational methods to identify patterns, as well as data classification related to the consumption of information goods is considered a complex task, since such decision consumption paterns are related to the preferences of individuals and depend on a composition of individual characteristics, cultural, economic and social variables segregated and grouped, as well as being not a topic explored in the Brazilian market. In this context, this study performed an experimental study of application of the Knowledge Discovery (KDD) process, which includes data selection and data mining steps, for a binary classification problem, Brazilian individuals who consume and do not consume a information good, film at theaters in Brazil, from the microdata obtained from the Brazilian Household Budget Survey (POF), 2008-2009, performed by the Brazilian Institute of Geography and Statistics (IBGE). The experimental study resulted in a comparative analysis of the application of two machine-learning techniques for data classification, based on supervised learning, such as Naïve Bayes (NB) and Support Vector Machine (SVM). Initially, a systematic review with the objective of identifying studies related to the application of computational techniques of machine learning to classification and identification of consumption patterns indicates that the use of these techniques in this context is not a mature and developed research topic, since was not studied in any of the papers analyzed. The results obtained from the comparative analysis performed between the algorithms suggest that the choice of the machine learning algorithms for data classification is directly related to factors such as: (i) importance of the classes for the problem to be studied; (ii) balancing between classes; (iii) universe of attributes to be considered in relation to the quantity and degree of importance of these to the classifiers. In addition, the attributes selected by the Information Gain variable selection algorithm suggest that the decision to consume culture, more specifically information good, film at theaters, is directly related to aspects of individuals regarding income, educational level, as well as preferences for cultural goods Algoritmos de classificação Bens de informação Consumo Naïve Bayes Reconhecimento de padrões Support Vector Machine SVM Classification algorithm Consumption Information goods Naïve Bayes Pattern recognition Support Vector Machine SVM
32	Utilização do algoritmo de aprendizado de máquinas para monitoramento de falhas em estruturas inteligentes / Guimarães, Ana Paula Alves January 2016 (has links) Orientador: Vicente Lopes Junior / Resumo: Structural health monitoring (SHM) is an area that has been extensively studied for allowing the construction of systems that have the ability to identify damages at an early stage, thus being able to avoid serious future losses. Ideally, these systems have the minimum of human interference. Systems that address the concept of learning have the ability to be autonomous. It is believed that by having these properties, the machine learning algorithms are an excellent choice to perform the steps of identifying, locating and assessing damage with ability to obtain highly accurate results with minimum error rates. This work is mainly focused on using support vector machine algorithm for monitoring structural condition and, thus, get better accuracy in identifying the presence or absence of damage, reducing error rates through the approaches of machine learning. It allows an intelligent and efficient monitoring system. LIBSVM library was used for analysing and validation of the proposed approach. Thus, it was feasible to conduct training and classification of data promoting the identification of damages. It was also possible to locate the damages in the structure. The results of identification and location of the damage was quite satisfactory. / Mestre Monitoramento da condição estrutural Aprendizado de máquina Estruturas inteligentes Inteligência artificial. Support vector machine Structural health monitoring Machine learning Smart structures Artificial intelligence Support vector machine
33	Evaluation von Signaleigenschaften zur Lokalisierung von Einschlägen mit Piezokeramischen Sensoren Böhle, André 16 July 2019 (has links) Intelligente Bauteile sind zunehmend in der Forschung und Industrie von Interesse, aufgrund ihrer vielfältigen Einsatzmöglichkeiten. Ein Beispiel dafür ist ein aktuelles Projekt des Bundesexzellenzclusters MERGE, welches sich mit der Entwicklung einer Mittelkonsole befasst, die als Bedienelement in einem Kraftfahrzeug dienen und durch Berührungen Aktionen ausführen soll. Um diese Funktionalität zu ermöglichen, ist es notwendig, die mittels piezokeramischer Sensoren erzeugten elektrischen Signale hinsichtlich der Lokalisation des Einschlags auszuwerten. Dies bezüglich werden verschiedene Signaleigenschaften auf ihre Eignung unter Verwendung einer support vector machine untersucht. Die Ergebnisse zeigen, dass durch die energetische Betrachtung der Signale eine Einschlagslokalisation realisierbar ist, aber Einschränkungen in der praktischen Verwendbarkeit aufweist. MERGE, support vector machine info:eu-repo/classification/ddc/004 ddc:004
34	An Efficient Ranking and Classification Method for Linear Functions, Kernel Functions, Decision Trees, and Ensemble Methods Glass, Jesse Miller January 2020 (has links) Structural algorithms incorporate the interdependence of outputs into the prediction, the loss, or both. Frank-Wolfe optimizations of pairwise losses and Gaussian conditional random fields for multivariate output regression are two such structural algorithms. Pairwise losses are standard 0-1 classification surrogate losses applied to pairs of features and outputs, resulting in improved ranking performance (area under the ROC curve, average precision, and F-1 score) at the cost of increased learning complexity. In this dissertation, it is proven that the balanced loss 0-1 SVM and the pairwise SVM have the same dual loss and the pairwise dual coefficient domain is a subdomain of the balanced loss 0-1 SVM with bias dual coefficient domain. This provides a theoretical advancement in the understanding of pairwise loss, which we exploit for the development of a novel ranking algorithm that is fast and memory efficient method with state the art ranking metric performance across eight benchmark data sets. Various practical advancements are also made in multivariate output regression. The learning time for Gaussian conditional random fields is greatly reduced and the parameter domain is expanded to enable repulsion between outputs. Last, a novel multivariate regression is presented that keeps the desirable elements of GCRF and infuses them into a local regression model that improves mean squared error and reduces learning complexity. / Computer and Information Science Artificial Intelligence Bipartite Ranking Frank-wolfe Algorithm Gaussian Conditional Random Fields Multivariate Output Regression Pairwise Support Vector Machine Structural Support Vector Machine
35	Learning via Query Synthesis Alabdulmohsin, Ibrahim Mansour 07 May 2017 (has links) Active learning is a subfield of machine learning that has been successfully used in many applications. One of the main branches of active learning is query synthe- sis, where the learning agent constructs artificial queries from scratch in order to reveal sensitive information about the underlying decision boundary. It has found applications in areas, such as adversarial reverse engineering, automated science, and computational chemistry. Nevertheless, the existing literature on membership query synthesis has, generally, focused on finite concept classes or toy problems, with a limited extension to real-world applications. In this thesis, I develop two spectral algorithms for learning halfspaces via query synthesis. The first algorithm is a maximum-determinant convex optimization method while the second algorithm is a Markovian method that relies on Khachiyan’s classical update formulas for solving linear programs. The general theme of these methods is to construct an ellipsoidal approximation of the version space and to synthesize queries, afterward, via spectral decomposition. Moreover, I also describe how these algorithms can be extended to other settings as well, such as pool-based active learning. Having demonstrated that halfspaces can be learned quite efficiently via query synthesis, the second part of this thesis proposes strategies for mitigating the risk of reverse engineering in adversarial environments. One approach that can be used to render query synthesis algorithms ineffective is to implement a randomized response. In this thesis, I propose a semidefinite program (SDP) for learning a distribution of classifiers, subject to the constraint that any individual classifier picked at random from this distributions provides reliable predictions with a high probability. This algorithm is, then, justified both theoretically and empirically. A second approach is to use a non-parametric classification method, such as similarity-based classification. In this thesis, I argue that learning via the empirical kernel maps, also commonly referred to as 1-norm Support Vector Machine (SVM) or Linear Programming (LP) SVM, is the best method for handling indefinite similarities. The advantages of this method are established both theoretically and empirically. active learning query synthesis reverse engineering support vector machine indefinite kernels linear classification
36	A machine learning approach to fundraising success in higher education Ye, Liang 01 May 2017 (has links) New donor acquisition and current donor promotion are the two major programs in fundraising for higher education, and developing proper targeting strategies plays an important role in the both programs. This thesis presents machine learning solutions as targeting strategies for the both programs based on readily available alumni data in almost any institution. The targeting strategy for new donor acquisition is modeled as a donor identification problem. The Gaussian na ̈ıve bayes, random forest, and support vector machine algorithms are used and evaluated. The test results show that having been trained with enough samples, all three algorithms can distinguish donors from rejectors well, and big donors are identified more often than others.While there is a trade off between the cost of soliciting candidates and the success of donor acquisition, the results show that in a practical scenario where the models are properly used as the targeting strategy, more than 85% of new donors and more than 90% of new big donors can be acquired when only 40% of the candidates are solicited. The targeting strategy for donor promotion is modeled as a promising donor(i.e., those who will upgrade their pledge) prediction problem in machine learning.The Gaussian na ̈ıve bayes, random forest, and support vector machine algorithms are tested. The test results show that all the three algorithms can distinguish promising donors from non-promising donors (i.e., those who will not upgrade their pledge).When the age information is known, the best model produces an overall accuracy of 97% in the test set. The results show that in a practical scenario where the models are properly used as the targeting strategy, more than 85% of promising donors can be acquired when only 26% candidates are solicited. / Graduate / liangye714@gmail.com machine learning fundraising support vector machine random forest na ̈ıve bayes predictive analysis prospect research
37	Direct L2 Support Vector Machine Zigic, Ljiljana 01 January 2016 (has links) This dissertation introduces a novel model for solving the L2 support vector machine dubbed Direct L2 Support Vector Machine (DL2 SVM). DL2 SVM represents a new classification model that transforms the SVM's underlying quadratic programming problem into a system of linear equations with nonnegativity constraints. The devised system of linear equations has a symmetric positive definite matrix and a solution vector has to be nonnegative. Furthermore, this dissertation introduces a novel algorithm dubbed Non-Negative Iterative Single Data Algorithm (NN ISDA) which solves the underlying DL2 SVM's constrained system of equations. This solver shows significant speedup compared to several other state-of-the-art algorithms. The training time improvement is achieved at no cost, in other words, the accuracy is kept at the same level. All the experiments that support this claim were conducted on various datasets within the strict double cross-validation scheme. DL2 SVM solved with NN ISDA has faster training time on both medium and large datasets. In addition to a comprehensive DL2 SVM model we introduce and derive its three variants. Three different solvers for the DL2's system of linear equations with nonnegativity constraints were implemented, presented and compared in this dissertation. machine learning large-scale classification support vector machine non-negative least squares Computer Engineering
38	Mining Aspects through Cluster Analysis Using Support Vector Machines and Genetic Algorithms Hacoupian, Yourik 01 January 2013 (has links) The main purpose of object-oriented programming is to use encapsulation to reduce the amount of coupling within each object. However, object-oriented programming has some weaknesses in this area. To address this shortcoming, researchers have proposed an approach known as aspect-oriented programming (AOP). AOP is intended to reduce the amount of tangled code within an application by grouping similar functions into an aspect. To demonstrate the powerful aspects of AOP, it is necessary to extract aspect candidates from current object-oriented applications. Many different approaches have been proposed to accomplish this task. One of such approaches utilizes vector based clustering to identify the possible aspect candidates. In this study, two different types of vectors are applied to two different vector-based clustering techniques. In this approach, each method in a software system S is represented by a d-dimensional vector. These vectors take into account the Fan-in values of the methods as well as the number of calls made to individual methods within the classes in software system S. Then a semi-supervised clustering approach known as Support Vector Clustering is applied to the vectors. In addition, an improved K-means clustering approach which is based on Genetic Algorithms is also applied to these vectors. The results obtained from these two approaches are then evaluated using standard metrics for aspect mining. In addition to introducing two new clustering based approaches to aspect mining, this research investigates the effectiveness of the currently known metrics used in aspect mining to evaluate a given vector based approach. Many of the metrics currently used for aspect mining evaluations are singleton metrics. Such metrics evaluate a given approach by taking into account only one aspect of a clustering technique. This study, introduces two different sets of metrics by combining these singleton measures. The iDIV metric combines the Diversity of a partition (DIV), Intra-cluster distance of a partition (IntraD), and the percentage of the number of methods analyzed (PAM) values to measure the overall effectiveness of the diversity of the partitions. While the iDISP metric combines the Dispersion of crosscutting concerns (DISP) along with Inter-cluster distance of a partition (InterD) and the PAM values to measure the quality of the clusters formed by a given method. Lastly, the oDIV and oDISP metrics introduced, take into account the complexity of the algorithms in relation with the DIV and DISP values. By comparing the obtained values for each of the approaches, this study is able to identify the best performing method as it pertains to these metrics. Aspect Mining Clustering Genetic Algorithms Software Engineering Support Vector Machine Computer Sciences
39	Distributed Support Vector Machine Learning Armond, Kenneth C., Jr. 07 August 2008 (has links) Support Vector Machines (SVMs) are used for a growing number of applications. A fundamental constraint on SVM learning is the management of the training set. This is because the order of computations goes as the square of the size of the training set. Typically, training sets of 1000 (500 positives and 500 negatives, for example) can be managed on a PC without hard-drive thrashing. Training sets of 10,000 however, simply cannot be managed with PC-based resources. For this reason most SVM implementations must contend with some kind of chunking process to train parts of the data at a time (10 chunks of 1000, for example, to learn the 10,000). Sequential and multi-threaded chunking methods provide a way to run the SVM on large datasets while retaining accuracy. The multi-threaded distributed SVM described in this thesis is implemented using Java RMI, and has been developed to run on a network of multi-core/multi-processor computers. Distributed Parallel SVM Support Vector Machine Machine Learning SMO Sequential Minimization Optimization
40	A Semi-Supervised Information Extraction Framework for Large Redundant Corpora Normand, Eric 19 December 2008 (has links) The vast majority of text freely available on the Internet is not available in a form that computers can understand. There have been numerous approaches to automatically extract information from human- readable sources. The most successful attempts rely on vast training sets of data. Others have succeeded in extracting restricted subsets of the available information. These approaches have limited use and require domain knowledge to be coded into the application. The current thesis proposes a novel framework for Information Extraction. From large sets of documents, the system develops statistical models of the data the user wishes to query which generally avoid the lim- itations and complexity of most Information Extractions systems. The framework uses a semi-supervised approach to minimize human input. It also eliminates the need for external Named Entity Recognition systems by relying on freely available databases. The final result is a query-answering system which extracts information from large corpora with a high degree of accuracy. Information Extraction Natural Language Processing Support Vector Machine Machine Learn- ing Information Retrieval unstructured text

Search results