1 |
Diversity and Efficiency: An Unexpected ResultJohnson, Joseph Smith 01 May 2017 (has links)
Empirical evidence shows that ensembles with adequate levels of pairwise diversity among a set of accurate member algorithms significantly outperform any of the individual algorithms. As a result, several diversity measures have been developed for use in optimizing ensembles. We show that diversity measures that properly combine the diversity space in an additive and multiplicative manner, not only result in ensembles whose accuracy is comparable to the naive ensemble of choosing the most accurate learners, but also results in ensembles that are significantly more efficient than such naive ensembles. In addition to diversity measures found in the literature, we submit two measures of diversity that span the diversity space in unique ways. Each of these measures considers not only the diversity of ratings between a pair of algorithms, but how this diversity relates to the target values.
|
2 |
Agent optimization by means of genetic programming / Agent optimization by means of genetic programmingŠmíd, Jakub January 2012 (has links)
This thesis deals with a problem of choosing the most suitable agent for a new data mining task not yet seen by the agents. The metric is proposed on the data mining tasks space, and based on this metric similar tasks are identified. This set is advanced as an input to a program evolved by means of genetic programming. The program estimates agents performance on the new task from both the time and error point of view. A JADE agent is implemented which provides an interface allowing other agents to obtain estimation results in real time.
|
3 |
Meta-aprendizagem aplicada à classificação de dados de expressão gênica / Meta-learning applied to gene expression data classificationSouza, Bruno Feres de 26 October 2010 (has links)
Dentre as aplicações mais comuns envolvendo microarrays, pode-se destacar a classificação de amostras de tecido, essencial para a identificação correta da ocorrência de câncer. Essa classificação é realizada com a ajuda de algoritmos de Aprendizagem de Máquina. A escolha do algoritmo mais adequado para um dado problema não é trivial. Nesta tese de doutorado, estudou-se a utilização de meta-aprendizagem como uma solução viável. Os resultados experimentais atestaram o sucesso da aplicação utilizando um arcabouço padrão para caracterização dos dados e para a construção da recomendação. A partir de então, buscou-se realizar melhorias nesses dois aspectos. Inicialmente, foi proposto um novo conjunto de meta-atributos baseado em índices de validação de agrupamentos. Em seguida, estendeu-se o método de construção de rankings kNN para ponderar a influência dos vizinhos mais próximos. No contexto de meta-regressão, introduziu-se o uso de SVMs para estimar o desempenho de algoritmos de classificação. Árvores de decisão também foram empregadas para a construção da recomendação de algoritmos. Ante seu desempenho inferior, empregou-se um esquema de comitês de árvores, que melhorou sobremaneira a qualidade dos resultados / Among the most common applications involving microarray, one can highlight the classification of tissue samples, which is essential for the correct identification of the occurrence of cancer and its type. This classification takes place with the aid of machine learning algorithms. Choosing the best algorithm for a given problem is not trivial. In this thesis, we studied the use of meta-learning as a viable solution. The experimental results confirmed the success of the application using a standard framework for characterizing data and constructing the recommendation. Thereafter, some improvements were made in these two aspects. Initially, a new set of meta-attributes was proposed, which are based on cluster validation indices. Then the kNN method for ranking construction was extended to weight the influence of nearest neighbors. In the context of meta-regression, the use of SVMs was introduced to estimate the performance of ranking algorithms. Decision trees were also employed for recommending algorithms. Due to their low performance, a ensemble of trees was employed, which greatly improved the quality of results
|
4 |
Meta-aprendizado aplicado a fluxos contínuos de dados / Metalearning for algorithm selection in data stramsRossi, Andre Luís Debiaso 19 December 2013 (has links)
Algoritmos de aprendizado de máquina são amplamente empregados na indução de modelos para descoberta de conhecimento em conjuntos de dados. Como grande parte desses algoritmos assume que os dados são gerados por uma função de distribuição estacionária, um modelo é induzido uma única vez e usado indefinidamente para a predição do rótulo de novos dados. Entretanto, atualmente, diversas aplicações, como gerenciamento de transportes e monitoramento por redes de sensores, geram fluxos contínuos de dados que podem mudar ao longo do tempo. Consequentemente, a eficácia do algoritmo escolhido para esses problemas pode se deteriorar ou outros algoritmos podem se tornar mais apropriados para as características dos novos dados. Nesta tese é proposto um método baseado em meta-aprendizado para gerenciar o processo de aprendizado em ambientes dinâmicos de fluxos contínuos de dados com o objetivo de melhorar o desempenho preditivo do sistema de aprendizado. Esse método, denominado MetaStream, seleciona regularmente o algoritmo mais promissor para os dados que estão chegando, de acordo com as características desses dados e de experiências passadas. O método proposto emprega técnicas de aprendizado de máquina para gerar o meta-conhecimento, que relaciona as características extraídas dos dados em diferentes instantes do tempo ao desempenho preditivo dos algoritmos. Entre as medidas usadas para extrair informação relevante dos dados, estão aquelas comumente empregadas em meta-aprendizado convencional com diferentes conjuntos de dados, que são adaptadas para as especificidades do cenário de fluxos, e de áreas correlatas, que consideram, por exemplo, a ordem de chegada dos dados. O MetaStream é avaliado para três conjuntos de dados reais e seis algoritmos de aprendizado diferentes. Os resultados mostram a aplicabilidade do MetaStream e sua capacidade de melhorar o desempenho preditivo geral do sistema de aprendizado em relação a um método de referência para a maioria dos problemas investigados. Deve ser observado que uma combinação de modelos mostrou-se superior ao MetaStream para dois conjuntos de dados. Assim, foram analisados os principais fatores que podem ter influenciado nos resultados observados e são indicadas possíveis melhorias do método proposto / Machine learning algorithms are widely employed to induce models for knowledge discovery in databases. Since most of these algorithms suppose that the underlying distribution of the data is stationary, a model is induced only once e it is applied to predict the label of new data indefinitely. However, currently, many real applications, such as transportation management systems and monitoring of sensor networks, generate data streams that can change over time. Consequently, the effectiveness of the algorithm chosen for these problems may deteriorate or other algorithms may become more suitable for the new data characteristics. This thesis proposes a metalearning based method for the management of the learning process in dynamic environments of data streams aiming to improve the general predictive performance of the learning system. This method, named MetaStream, regularly selects the most promising algorithm for arriving data according to its characteristics and past experiences. The proposed method employs machine learning techniques to generate metaknowledge, which relates the characteristics extracted from data in different time points to the predictive performance of the algorithms. Among the measures applied to extract relevant information are those commonly used in conventional metalearning for different data sets, which are adapted for the data stream particularities, and from other related areas that consider the order of the data stream. We evaluate MetaStream for three real data stream problems and six different learning algorithms. The results show the applicability of the MetaStream and its capability to improve the general predictive performance of the learning system compared to a baseline method for the majority of the cases investigated. It must be observed that an ensemble of models is usually superior to MetaStream. Thus, we analyzed the main factors that may have influenced the results and indicate possible improvements for the proposed method
|
5 |
Abordagem de constru??o de arquitetura homog?nea para comit?s via meta-aprendizagemParente, Regina Rosa 21 May 2012 (has links)
Made available in DSpace on 2014-12-17T15:48:00Z (GMT). No. of bitstreams: 1
ReginaRP_DISSERT.pdf: 1401966 bytes, checksum: 6823e0f79e0b8c97c1c944a55bbfe9c4 (MD5)
Previous issue date: 2012-05-21 / In the world we are constantly performing everyday actions. Two of these actions are
frequent and of great importance: classify (sort by classes) and take decision. When we
encounter problems with a relatively high degree of complexity, we tend to seek other opinions,
usually from people who have some knowledge or even to the extent possible, are
experts in the problem domain in question in order to help us in the decision-making process.
Both the classification process as the process of decision making, we are guided by
consideration of the characteristics involved in the specific problem. The characterization
of a set of objects is part of the decision making process in general. In Machine Learning
this classification happens through a learning algorithm and the characterization is applied
to databases. The classification algorithms can be employed individually or by machine
committees. The choice of the best methods to be used in the construction of a committee
is a very arduous task. In this work, it will be investigated meta-learning techniques in selecting
the best configuration parameters of homogeneous committees for applications in
various classification problems. These parameters are: the base classifier, the architecture
and the size of this architecture. We investigated nine types of inductors candidates for
based classifier, two methods of generation of architecture and nine medium-sized groups
for architecture. Dimensionality reduction techniques have been applied to metabases looking
for improvement. Five classifiers methods are investigated as meta-learners in the
process of choosing the best parameters of a homogeneous committee. / No universo cotidiano estamos constantemente realizando a??es. Duas dessas a??es
s?o frequentes e de grande import?ncia: classificar (distribuir por classes) e tomar decis?o.
Quando nos deparamos com problemas com um grau de complexidade relativamente alto,
tendemos a buscar outras opini?es, geralmente de pessoas que tenham certo conhecimento
ou at? mesmo, na medida do poss?vel, sejam especialistas no dom?nio do problema em
quest?o, de forma que nos ajudem no processo de tomada de decis?o. Tanto no processo
de classifica??o quanto em um processo de tomada de decis?o, somos orientados pela
considera??o das caracter?sticas envolvidas no problema espec?fico. A caracteriza??o de
um conjunto de objetos faz parte do processo de tomada de decis?o em geral. Em Aprendizado
de M?quina essa classifica??o acontece atrav?s de um algoritmo de aprendizado
e a caracteriza??o ? aplicada ?s bases de dados. Os algoritmos de classifica??o podem
ser empregados individualmente ou em forma de comit?s de m?quinas. A escolha dos
melhores m?todos a serem utilizados na constru??o de um comit? ? uma tarefa bastante
?rdua. Neste trabalho, ser?o investigadas t?cnicas de meta-aprendizagem na sele??o dos
melhores par?metros de configura??o de comit?s homog?neos para aplica??es em problemas
diversos de classifica??o. Tais par?metros s?o: o classificador base, a arquitetura e o
tamanho desta arquitetura. Foram investigados nove tipos de indutores candidatos a classificador
base, dois m?todos de gera??o de arquitetura e tr?s grupos de tamanho m?dio
para arquitetura, pequeno, m?dio e grande. Ante um desempenho fraco no processo de
meta-aprendizagem foram aplicadas t?cnicas de redu??o de dimensionalidade ?s metabases
e seis novos crit?rios de tamanho m?dio da arquitetura foram estabelecidos. Cinco
m?todos classificadores s?o investigados como meta-aprendizes no processo de escolha
dos melhores par?metros de um comit? homog?neo.
|
6 |
Meta-aprendizado aplicado a fluxos contínuos de dados / Metalearning for algorithm selection in data stramsAndre Luís Debiaso Rossi 19 December 2013 (has links)
Algoritmos de aprendizado de máquina são amplamente empregados na indução de modelos para descoberta de conhecimento em conjuntos de dados. Como grande parte desses algoritmos assume que os dados são gerados por uma função de distribuição estacionária, um modelo é induzido uma única vez e usado indefinidamente para a predição do rótulo de novos dados. Entretanto, atualmente, diversas aplicações, como gerenciamento de transportes e monitoramento por redes de sensores, geram fluxos contínuos de dados que podem mudar ao longo do tempo. Consequentemente, a eficácia do algoritmo escolhido para esses problemas pode se deteriorar ou outros algoritmos podem se tornar mais apropriados para as características dos novos dados. Nesta tese é proposto um método baseado em meta-aprendizado para gerenciar o processo de aprendizado em ambientes dinâmicos de fluxos contínuos de dados com o objetivo de melhorar o desempenho preditivo do sistema de aprendizado. Esse método, denominado MetaStream, seleciona regularmente o algoritmo mais promissor para os dados que estão chegando, de acordo com as características desses dados e de experiências passadas. O método proposto emprega técnicas de aprendizado de máquina para gerar o meta-conhecimento, que relaciona as características extraídas dos dados em diferentes instantes do tempo ao desempenho preditivo dos algoritmos. Entre as medidas usadas para extrair informação relevante dos dados, estão aquelas comumente empregadas em meta-aprendizado convencional com diferentes conjuntos de dados, que são adaptadas para as especificidades do cenário de fluxos, e de áreas correlatas, que consideram, por exemplo, a ordem de chegada dos dados. O MetaStream é avaliado para três conjuntos de dados reais e seis algoritmos de aprendizado diferentes. Os resultados mostram a aplicabilidade do MetaStream e sua capacidade de melhorar o desempenho preditivo geral do sistema de aprendizado em relação a um método de referência para a maioria dos problemas investigados. Deve ser observado que uma combinação de modelos mostrou-se superior ao MetaStream para dois conjuntos de dados. Assim, foram analisados os principais fatores que podem ter influenciado nos resultados observados e são indicadas possíveis melhorias do método proposto / Machine learning algorithms are widely employed to induce models for knowledge discovery in databases. Since most of these algorithms suppose that the underlying distribution of the data is stationary, a model is induced only once e it is applied to predict the label of new data indefinitely. However, currently, many real applications, such as transportation management systems and monitoring of sensor networks, generate data streams that can change over time. Consequently, the effectiveness of the algorithm chosen for these problems may deteriorate or other algorithms may become more suitable for the new data characteristics. This thesis proposes a metalearning based method for the management of the learning process in dynamic environments of data streams aiming to improve the general predictive performance of the learning system. This method, named MetaStream, regularly selects the most promising algorithm for arriving data according to its characteristics and past experiences. The proposed method employs machine learning techniques to generate metaknowledge, which relates the characteristics extracted from data in different time points to the predictive performance of the algorithms. Among the measures applied to extract relevant information are those commonly used in conventional metalearning for different data sets, which are adapted for the data stream particularities, and from other related areas that consider the order of the data stream. We evaluate MetaStream for three real data stream problems and six different learning algorithms. The results show the applicability of the MetaStream and its capability to improve the general predictive performance of the learning system compared to a baseline method for the majority of the cases investigated. It must be observed that an ensemble of models is usually superior to MetaStream. Thus, we analyzed the main factors that may have influenced the results and indicate possible improvements for the proposed method
|
7 |
Joint discourses or disjointed courses : A study on learning in upper secondary school.Molander, Bengt-Olov January 1997 (has links)
The main purpose of the present study is to investigate whether learning and ways of understanding subject content and structure differ between successful and less successful students—i.e. in terms of their grade point average—in upper secondary school. A second issue is whether different subjects and disciplines—i.e. science on the one hand and humanities/social sciences on the other—make different demands on students. Data were gathered through interviews with a total of 36 students in two classes at two periods of their schooling. Additional data were gathered from interviews with teachers in the two classes and a sample of the tests given to the classes. Both classes receive instruction in science as well as humanities/social sciences but in one class (N) the emphasis is on science whereas in the other (S) the emphasis is on humanities/social sciences. A common characteristic of successful students is that they adjust to the teacher’s way of structuring the subject by means of a deep approach and a pronounced cue-seeking. They also play a dominant role in classroom communication. Less successful students more frequently use a surface or procedural approach to learning, are less sensitive for cues, do not adjust to the structure of subjects as presented by the teachers and do not participate to the same extent in classroom communication. The characteristics for successful students are very stable over time. As for the less successful students, there is a difference between N- and S-students. A majority of the S-students who use a surface approach in the first year change towards a deep approach later in their schooling, whereas the procedural approach of N-students is stable. It is concluded that the stability shown by the successful students can be explained in that their deep approach reflects their understanding that subject structure may vary and cue-seeking for these students signifies an awareness of and subsequent adjustment to the particular structure presented by the teachers. By understanding the structure according to teachers’ intentions, successful students are able to participate in classroom communication, eventually establishing a joint discourse. The differences between a change of learning for S- and N-students could be interpreted in light of differences in subject structure and instruction between subjects. In humanities/social sciences, classroom communication and the presentation of alternative interpretations of subject matter play a prominent role in instruction, and students who initially use a surface approach might get guidance to alternative ways of understanding the subject matter and subject structure. In the science subjects in the N-programme, the presentation of alternative interpretations is not as common. These subjects also have a hierarchical structure, and understanding the basic fundamentals is a prerequisite for understanding later topics. For the students who initially use a surface approach in these hierarchically ordered subjects, learning becomes a matter of memorising more and more disconnected facts in what might seem to be disjointed courses.
|
8 |
Intelligent Adaptation of Ensemble Size in Data Streams Using Online BaggingOlorunnimbe, Muhammed January 2015 (has links)
In this era of the Internet of Things and Big Data, a proliferation of connected devices continuously produce massive amounts of fast evolving streaming data. There is a need to study the relationships in such streams for analytic applications, such as network intrusion detection, fraud detection and financial forecasting, amongst other. In this setting, it is crucial to create data mining algorithms that are able to seamlessly adapt to temporal changes in data characteristics that occur in data streams. These changes are called concept drifts. The resultant models produced by such algorithms should not only be highly accurate and be able to swiftly adapt to changes. Rather, the data mining techniques should also be fast, scalable, and efficient in terms of resource allocation. It then becomes important to consider issues such as storage space needs and memory utilization. This is especially relevant when we aim to build personalized, near-instant models in a Big Data setting.
This research work focuses on mining in a data stream with concept drift, using an online bagging method, with consideration to the memory utilization. Our aim is to take an adaptive approach to resource allocation during the mining process. Specifically, we consider metalearning, where the models of multiple classifiers are combined into an ensemble, has been very successful when building accurate models against data streams. However, little work has been done to explore the interplay between accuracy, efficiency and utility. This research focuses on this issue. We introduce an adaptive metalearning algorithm that takes advantage of the memory utilization cost of concept drift, in order to vary the ensemble size during the data mining process. We aim to minimize the memory usage, while maintaining highly accurate models with a high utility.
We evaluated our method against a number of benchmarking datasets and compare our results against the state-of-the art. Return on Investment (ROI) was used to evaluate the gain in performance in terms of accuracy, in contrast to the time and memory invested. We aimed to achieve high ROI without compromising on the accuracy of the result. Our experimental results indicate that we achieved this goal.
|
9 |
Meta-aprendizagem aplicada à classificação de dados de expressão gênica / Meta-learning applied to gene expression data classificationBruno Feres de Souza 26 October 2010 (has links)
Dentre as aplicações mais comuns envolvendo microarrays, pode-se destacar a classificação de amostras de tecido, essencial para a identificação correta da ocorrência de câncer. Essa classificação é realizada com a ajuda de algoritmos de Aprendizagem de Máquina. A escolha do algoritmo mais adequado para um dado problema não é trivial. Nesta tese de doutorado, estudou-se a utilização de meta-aprendizagem como uma solução viável. Os resultados experimentais atestaram o sucesso da aplicação utilizando um arcabouço padrão para caracterização dos dados e para a construção da recomendação. A partir de então, buscou-se realizar melhorias nesses dois aspectos. Inicialmente, foi proposto um novo conjunto de meta-atributos baseado em índices de validação de agrupamentos. Em seguida, estendeu-se o método de construção de rankings kNN para ponderar a influência dos vizinhos mais próximos. No contexto de meta-regressão, introduziu-se o uso de SVMs para estimar o desempenho de algoritmos de classificação. Árvores de decisão também foram empregadas para a construção da recomendação de algoritmos. Ante seu desempenho inferior, empregou-se um esquema de comitês de árvores, que melhorou sobremaneira a qualidade dos resultados / Among the most common applications involving microarray, one can highlight the classification of tissue samples, which is essential for the correct identification of the occurrence of cancer and its type. This classification takes place with the aid of machine learning algorithms. Choosing the best algorithm for a given problem is not trivial. In this thesis, we studied the use of meta-learning as a viable solution. The experimental results confirmed the success of the application using a standard framework for characterizing data and constructing the recommendation. Thereafter, some improvements were made in these two aspects. Initially, a new set of meta-attributes was proposed, which are based on cluster validation indices. Then the kNN method for ranking construction was extended to weight the influence of nearest neighbors. In the context of meta-regression, the use of SVMs was introduced to estimate the performance of ranking algorithms. Decision trees were also employed for recommending algorithms. Due to their low performance, a ensemble of trees was employed, which greatly improved the quality of results
|
10 |
Metody výpočetní inteligence pro metaučení / Computational Intelligence Methods in MetalearningŠmíd, Jakub January 2016 (has links)
This thesis focuses on the algorithm selection problem, in which the goal is to recommend machine learning algorithms to a new dataset. The idea behind solving this issue is that algorithm performs similarly on similar datasets. The usual approach is to base the similarity measure on the fixed vector of metafeatures extracted out of each dataset. However, as the number of attributes among datasets varies, we may be loosing important information. Herein, we propose a family of algorithms able to handle even the non-propositional representations of datasets. Our methods use the idea of attribute assignment that builds the distance measure between datasets as a sum of distance given by the optimal assignment and an attribute distance measure. Furthermore, we prove that under certain conditions, we can guarantee the resulting dataset distance to be a metric. We carry out a series of metalearning experiments on the data extracted from the OpenML repository. We build up attribute distance using Genetic Algorithms, Genetic Programming and several regularization techniques such as multi-objectivization, coevolution, and bootstrapping. The experiment indicates that the resulting dataset distance can be successfully applied on the algorithm selection problem. Although we use the proposed distance measures exclusively...
|
Page generated in 0.0919 seconds