Global ETD Search

101	Αλγόριθμοι μηχανικής μάθησης σε πολυεπεξεργαστικά περιβάλλοντα Στεργίου, Κώστας 27 April 2015 (has links) Σκοπός της διπλωματικής εργασίας είναι η μελέτη αλγορίθμων Μηχανικής Μάθησης σε περιβάλλοντα πολυεπεξεργασίας. Οι τελευταίες εξελίξεις στον τομέα της παράλληλης και της κατανεμημένης επεξεργασίας, έχουν φέρει πραγματική επανάσταση στην κατασκευή των υπολογιστών. Παρότι όμως η εξέλιξη του υλικού (hardware) προχωρά με αλματώδεις ρυθμούς, η αντίστοιχη ανάπτυξη του λογισμικού καθυστερεί σε μεγάλο βαθμό. Αποτέλεσμα είναι να παρέχεται σε απλούς χρήστες μεγάλη επεξεργαστική ισχύς, η οποία όμως παραμένει αναξιοποίητη, λόγω των εφαρμογών που δεν μπορούν να εκμεταλλευτούν σε ικανοποιητικό βαθμό την παρεχόμενη ισχύ. Η ανάπτυξη αλγορίθμων στον τομέα της τεχνητής νοημοσύνης έως τώρα γίνονταν σύμφωνα με τα κλασικά πρότυπα της συναρτησιακής μεθόδου ή στην καλύτερη περίπτωση με τεχνικές αντικειμενοστραφούς προγραμματισμού. Σε κάθε περίπτωση, η εκτέλεση του αλγορίθμου γίνεται σε έναν επεξεργαστή με τις εντολές να εκτελούνται σειριακά. Την τελευταία δεκαετία αναπτύχθηκαν εφαρμογές και περιβάλλοντα τα οποία διευκόλυναν την εκτέλεση πολλών διαφορετικών αλγορίθμων και μεθόδων μέσω ενός κοινού περιβάλλοντος διεπαφής με τον χρήστη (πχ Weka, R, Matlab κλπ). Η προσέγγιση αυτή διευκόλυνε την διάδοση αλγορίθμων και μεθόδων που ήταν πριν πολύ δύσκολο να εκτελεστούν από ανθρώπους που δεν είχαν την κατάλληλη εξοικείωση με τον προγραμματισμό. Από την άλλη πλευρά όμως πρόσθεσε ένα ακόμη επίπεδο πολυπλοκότητας στις μεθόδους που αναπτύσσονταν γι’ αυτά τα περιβάλλοντα, κάτι που είχε σαν αποτέλεσμα αλγόριθμους πιο αργούς και με αυξημένους περιορισμούς λόγω των μηχανισμών της εφαρμογής που τους φιλοξενούσε. Οι μετρήσεις έγιναν με την χρήση αλγόριθμων δέντρων απόφασης. Η κατηγορία αυτή των μεθόδων μηχανικής μάθησης είναι ένας εξαιρετικός υποψήφιος, για να μεταφερθεί σε νέες πολυεπεξεργαστικές πλατφόρμες καθώς αποτελούνται από επαναληπτικές διαδικασίες που δεν χρειάζονται να περιμένουν άλλες για να εκτελεστούν. Στην παρούσα εργασία αναπτύχθηκε μια εφαρμογή, η οποία μπορεί να δέχεται σαν όρισμα έναν ή περισσότερους αλγόριθμους δέντρων απόφασης, να ορίζει τις παραμέτρους αυτών και κατόπιν να τους εκτελεί παράλληλα. Οι συγκεκριμένες μέθοδοι που επιλέχθηκαν, έχουν υλοποιηθεί αρχικά μέσα στο περιβάλλον Weka, για να μπορέσουν όμως να εκτελεστούν παράλληλα έπρεπε να ενσωματωθούν σε μια άλλη εφαρμογή, η οποία θα μπορούσε να δημιουργήσει πολλά διαφορετικά στιγμιότυπα. Η χρήση της εφαρμογής που αναπτύξαμε επιτρέπει την παράλληλη εκτέλεση αλγορίθμων, αλλά απέχει πολύ από τον μπορεί να τους βελτιστοποιήσει, έτσι ώστε να μπορέσουν να εκτελεστούν με την μέγιστη δυνατή ταχύτητα. Αυτό συμβαίνει, γιατί ακόμα και εκτός του περιβάλλοντος για το οποία αναπτύχθηκε, ο κώδικάς τους χρησιμοποιεί τις γενικές και αφηρημένες τεχνικές που είναι απαραίτητες για την ενσωμάτωσή τους στο Weka. Παρόλα αυτά, καταφέραμε να τρέξουμε πολλές διαφορετικές εκδόσεις αλγορίθμων, μέσα σε ένα κλάσμα του χρόνου που θα απαιτούνταν για να τρέξουν όλες αυτές οι μορφές μέσα στο Weka. Παρατηρήσαμε ότι η επίδραση των παραμέτρων στον χρόνο εκτέλεσης των μεθόδων δεν είναι ιδιαίτερα σημαντική. Αντίθετα, το μέγεθος των δεδομένων μεταβάλλει σε αρκετά βαθμό τον χρόνο εκτέλεσης, χωρίς όμως η σχέση του χρόνου εκτέλεσης και του μεγέθους των στοιχείων να είναι γραμμική. Καλύτερη σχεδίαση των μεθόδων θα μπορούσε να επιφέρει δραματική επιτάχυνση του χρόνου εκτέλεσης. / -- Μηχανική μάθηση Πολυεπεξεργασία Νήματα 006.331 Machine learning Multi processing Threads Java Random forest
102	A Universal Islanding Detection Technique for Distributed Generation Using Pattern Recognition Faqhruldin, Omar 22 August 2013 (has links) In the past, distribution systems were characterized by a unidirectional power flow where power flows from the main power generation units to consumers. However, with changes in power system regulation and increasing incentives for integrating renewable energy sources, Distributed Generation (DG) has become an important component of modern distribution systems. However, when a portion of the system is energized by one or more DG and is disconnected from the grid, this portion becomes islanded and might cause several operational and safety issues. Therefore, an accurate and fast islanding detection technique is needed to avoid these issues as per IEEE Standard 1547-2003 [1]. Islanding detection techniques are dependent on the type of the DG connected to the system and can achieve accurate results when only one type of DG is used in the system. Thus, a major challenge is to design a universal islanding technique to detect islanding accurately and in a timely manner for different DG types and multiple DG units in the system. This thesis introduces an efficient universal islanding detection method that can be applied to both Inverter-based DG and Synchronous-based DG. The proposed method relies on extracting a group of features from measurements of the voltage and frequency at the Point of Common Coupling (PCC) of the targeted island. The Random Forest (RF) classification technique is used to distinguish between islanding and non-islanding situations with the goals of achieving a zero Non-Detection Zone (NDZ), which is a region where islanding detection techniques fail to detect islanding, as well as avoiding nuisance DG tripping during non-islanding conditions. The accuracy of the proposed technique is evaluated using a cross-validation technique. The methodology of the proposed islanding detection technique is shown to have a zero NDZ, 98% accuracy, and fast response when applied to both types of DGs. Finally, four other classifiers are compared with the Random Forest classifier, and the RF technique proved to be the most efficient approach for islanding detection. Distributed Generation Pattern Recognition DG Random Forest Inverter DG Synchronous DG Electrical and Computer Engineering
103	Modelling soil bulk density using data-mining and expert knowledge Taalab, Khaled Paul January 2013 (has links) Data about the spatial variation of soil attributes is required to address a great number of environmental issues, such as improving water quality, flood mitigation, and determining the effects of the terrestrial carbon cycle. The need for a continuum of soils data is problematic, as it is only possible to observe soil attributes at a limited number of locations, beyond which, prediction is required. There is, however, disparity between the way in which much of the existing information about soil is recorded and the format in which the data is required. There are two primary methods of representing the variation in soil properties, as a set of distinct classes or as a continuum. The former is how the variation in soils has been recorded historically by the soil survey, whereas the latter is how soils data is typically required. One solution to this issue is to use a soil-landscape modelling approach which relates the soil to the wider landscape (including topography, land-use, geology and climatic conditions) using a statistical model. In this study, the soil-landscape modelling approach has been applied to the prediction of soil bulk density (Db). The original contribution to knowledge of the study is demonstrating that producing a continuous surface of Db using a soil-landscape modelling approach is that a viable alternative to the ‘classification’ approach which is most frequently used. The benefit of this method is shown in relation to the prediction of soil carbon stocks, which can be predicted more accurately and with less uncertainty. The second part of this study concerns the inclusion of expert knowledge within the soil-landscape modelling approach. The statistical modelling approaches used to predict Db are data driven, hence it is difficult to interpret the processes which the model represents. In this study, expert knowledge is used to predict Db within a Bayesian network modelling framework, which structures knowledge in terms of probability. This approach creates models which can be more easily interpreted and consequently facilitate knowledge discovery, it also provides a method for expert knowledge to be used as a proxy for empirical data. The contribution to knowledge of this section of the study is twofold, firstly, that Bayesian networks can be used as tools for data-mining to predict a continuous soil attribute such as Db and that in lieu of data, expert knowledge can be used to accurately predict landscape-scale trends in the variation of Db using a Bayesian modelling approach. 631.4
104	Assessing Ponderosa Pine (Pinus ponderosa) Suitable Habitat throughout Arizona in Response to Future Climate Models January 2011 (has links) abstract: The species distribution model DISTRIB was used to model and map potential suitable habitat of ponderosa pine throughout Arizona under current and six future climate scenarios. Importance Values for each climate scenario were estimated from 24 predictor variables consisting of climate, elevation, soil, and vegetation data within a 4 km grid cell. Two emission scenarios, (A2 (high concentration) and B1 (low concentration)) and three climate models (the Parallel Climate Model, the Geophysical Fluid Dynamics Laboratory, and the HadleyCM3) were used to capture the potential variability among future climates and provide a range of responses from ponderosa pine. Summary tables for federal and state managed lands show the potential change in suitable habitat under the different climate scenarios; while an analysis of three elevational regions explores the potential shift of habitat upslope. According to the climate scenarios, mean annual temperature in Arizona could increase by 3.5% while annual precipitation could decrease by 36% over this century. Results of the DISTRIB model indicate that in response to the projected changes in climate, suitable habitat for ponderosa pine could increase by 13% throughout the state under the HadleyCM3 high scenario or lose 1.1% under the average of the three low scenarios. However, the spatial variability of climate changes will result in gains and losses among the ecoregions and federally and state managed lands. Therefore, alternative practices may need to be considered to limit the loss of suitable habitat in areas identified by the models. / Dissertation/Thesis / M.S. Applied Biological Sciences 2011 Ecology Forestry Natural Resource Management Climate change Random Forest regression Range shifts Species Distribution Modeling
105	Presente e futuro da análise de dados de fatores associados à soroprevalência da diarreia viral bovina / Present and future of data analysis of associated factors to seroprevalence of bovine viral diarrhea Machado, Gustavo January 2016 (has links) O vírus da diarreia viral bovina (BVDV) causa uma das doenças mais importantes de bovinos em termos de custos econômicos e sociais, uma vez que é largamente disseminado na população de gado leiteiro. Os objetivos do trabalho foram estimar a prevalência em nível de rebanho e investigar fatores associados aos níveis de anticorpos em leite de tanque através de um estudo transversal, bem como discutir e comparar diferentes técnicas de modelagem, as tradicionais como regressão e as menos usuais para este fim, como as de Machine learning (ML) como Random Forest. O estudo transversal foi realizado no estado do Rio Grande do Sul para a estimação da prevalência de doenças reprodutivas baseados em amostras de tanque de leite, partindo de uma população total de 81.307 rebanhos. Foram coletadas 388 amostras de tanque de leite, e nas propriedades selecionadas foi aplicado um questionário epidemiológico. Como resultados se identificou uma prevalência de 23,9% (IC95% = 19,8 - 28,1) de propriedades positivas. Através de análise de regressão de Poisson se identificou como fatores associados o BVDV: o exame retal como rotina para o diagnóstico de prenhes, Razão de Prevalência [PR] = 2,73 (IC 95%: 1.87-3.98), contato direto entre animais (contato via cerca de propriedades lindeiras) (PR=1,63, IC 95%: 1.13-2.95) e propriedades que não utilizavam inseminação artificial (PR=2.07, IC 95%: 1.38-3.09) Na técnica de Random Forest pôde-se identificar uma dependência na ocorrência de BVDV devido a: inseminação artificial quando realizada pelo proprietário da propriedade ou capataz, o número de vizinhos que também possuem criação de bovinos, e em concordância com os resultados da regressão quanto a dependência da ocorrência de BVDV devido a palpação retal. Como resultado, pôde-se perceber que o BVDV está distribuído no estado do RS e caso seja de interesse do poder público, o desenvolvimento de um programa de controle da doença pode ser baseado nos resultados encontrados. Por outro lado, a contribuição deste estudo vai além das tradicionais análises realizadas em epidemiologia veterinária, principalmente devido os bons resultados obtidos com a abordagem por ML neste estudo transversal. Por fim, a utilização de técnicas estatísticas mais avançadas contribuiu para elucidar melhor os fatores possivelmente envolvidos com a ocorrência de BVDV no rebanho leiteiro gaúcho. / The bovine viral diarrhea virus (BVDV) causes one of the most important disease of cattle in terms of economic and social costs, since it is widely disseminated in dairy cattle population. The objectives were to estimate the herd level prevalence at and investigate factors associated with antibody levels in bulk tank milk through a cross sectional study, discuss and compare different modeling techniques such as the traditional regression with the ones less used for this approach machine learning (ML). The cross sectional study was conducted in Rio Grande do Sul state to estimate the prevalence of reproductive diseases based on bulk tank milk samples, from a total population of 81,307 herds. Milk samples from 388 bulk tank were sampled, and an epidemiological questionnaire was applied in each farm. The prevalence was 23.9% (95% CI 19.8 - 28.1). Through the Poisson regression analysis, the following factors associated with BVDV were found: routine use of rectal examination for pregnancy (Prevalence Ratio [PR] = 2.73 (IC 95%: 1.87-3.98), direct contact between/among animals (contact over the fence of neighboring farms) (PR = 1.63, IC 95%: 1.13-2.95) and properties that did not use artificial insemination (PR = 2.07, IC 95%: 1.38-3.09). On the other hand, using ML techniques, it was identify a dependency upon the occurrence of BVDV due to: artificial insemination when carried out by the owner of the property or foreman; the number of neighbors who also have cattle, and in accordance with the regression results as the dependence of the occurrence of BVDV due to routine use of rectal examination for pregnancy. BVDV is spread across the State and if the government's interest to launch a disease control program measures should be focusing mainly on better conditions and care in reproduction. On the other hand, the contribution of this study goes beyond traditional analyzes in veterinary epidemiology, mainly due to the good results obtained with the approach by ML in this cross-sectional study. Finally, the use of advances statistics techniques it has been made progress to better elucidate the factors possibly involved in the occurrence of BVDV in state dairy herds. Diarreia viral : Bovinos Soroprevalência Epidemiologia veterinaria Gado leiteiro BVDV Epidemiology Bulk tank milk Regression Random forest
106	O que h? por tr?s das diferen?as individuais? Perfis comportamentais e fisiol?gicos em Betta splendens Andrade, Priscilla Valessa de Castro 28 April 2017 (has links) Submitted by Automa??o e Estat?stica (sst@bczm.ufrn.br) on 2017-09-04T21:45:51Z No. of bitstreams: 1 PriscillaValessaDeCastroAndrade_DISSERT.pdf: 1841839 bytes, checksum: 3fb757eaa049425550138768d7d96f9b (MD5) / Approved for entry into archive by Arlan Eloi Leite Silva (eloihistoriador@yahoo.com.br) on 2017-09-12T19:30:23Z (GMT) No. of bitstreams: 1 PriscillaValessaDeCastroAndrade_DISSERT.pdf: 1841839 bytes, checksum: 3fb757eaa049425550138768d7d96f9b (MD5) / Made available in DSpace on 2017-09-12T19:30:23Z (GMT). No. of bitstreams: 1 PriscillaValessaDeCastroAndrade_DISSERT.pdf: 1841839 bytes, checksum: 3fb757eaa049425550138768d7d96f9b (MD5) Previous issue date: 2017-04-28 / De acordo com as mudan?as ambientais, os indiv?duos apresentam diferentes estrat?gias para lidar com os variados est?mulos externos. Os diferentes comportamentos compreendem os diferentes fen?tipos que comp?em uma popula??o. Essas diferen?as podem ser explicadas por altera??es end?genas, como a secre??o hormonal. Por exemplo, os horm?nios modulam comportamentos reprodutivos e processos cognitivos. Com o objetivo de caracterizar as diferen?as individuais em uma popula??o, o presente estudo teve como objetivo testar a rela??o entre os perfis comportamental e hormonal em um grupo de machos lutando peixes, Betta splendens. Um grupo de 86 machos foi observado para constru??o de ninho de bolha, exposi??es agon?sticas em competi??es coespec?ficas e desempenho em um protocolo de aprendizagem espacial. Depois disso, mediram-se os n?veis plasm?ticos de cortisol e testosterona. Um procedimento estat?stico inovador e elegante foi aplicado ao conjunto de dados para separar animais em grupos relacionados ao seu comportamento de constru??o de ninhos (teste de m?dias de k) e depois mostrar quais os par?metros comportamentais e fisiol?gicos que melhor explicam os perfis dos grupos (Random Forest and Classification Tree). Nossos resultados apontam para tr?s perfis distintos: construtores de ninhos (ninhos de 30,74 ? 9,84 cm?), intermedi?rios (ninhos de 13,57 ? 4,23 cm?) e n?o-construtores (ninhos de 2,17 ? 2,25 cm?). Estes grupos apresentaram diferen?as nos comportamentos agon?stico e de aprendizagem, bem como nos n?veis hormonais. O cortisol foi o principal preditor apontado pelo teste Random Forest para a separa??o de indiv?duos nos diferentes grupos: construtores de ninhos e intermedi?rios apresentaram n?veis mais baixos de cortisol, enquanto os n?o-construtores apresentaram os maiores valores de cortisol basal. O segundo mais importante preditor foi o desempenho de aprendizagem, que separou os animais intermedi?rios dos construtores de ninhos (aqueles que aprenderam mais r?pido), seguidos pelos n?veis basais de testosterona e comportamentos agon?sticos. Enquanto os n?veis de testosterona n?o foram significativos para explicar as diferen?as comportamentais, parece estar relacionado com o perfil de constru??o. Nosso achado mostra que diferentes perfis investem de forma diferente na reprodu??o e que o cortisol influencia negativamente o comportamento e a aprendizagem do nidifica??o. Em resumo, nossos dados sugerem que diferentes perfis em uma popula??o s?o determinados por respostas hormonais e comportamentais, e essas diferen?as conferem flexibilidade ? popula??o, permitindo a presen?a de animais que investem mais na reprodu??o enquanto outros mostram defesa e agress?o como a dominante caracter?stica expressa. / According to environmental changes, the individuals show different strategies to coping with the varied external stimuli. The different responders comprise the different phenotypes that compose a population. These differences can be explained by endogenous changes, such as hormonal secretion. For instance, hormones modulate reproductive behaviors and cognitive processes. In order to characterize individual differences in a population, the present study aimed to testing the relationship between behavioral and hormonal profiles in a group of males Fighting fish, Betta splendens. A group of 86 males were observed for bubble nest construction, agonistic displays in conspecific contests and performance in a spatial learning protocol. After that, cortisol and testosterone plasma levels were measured. An innovative and stylish statistical procedure was applied to the data set in order to separate animal in groups related to its nest building behavior (k-means test) and then shown which behavioral and physiological parameters better explain the groups? profiles (Random forest and Classification tree). Our results point to three distinct profiles: nest builders (nests of 30.74 ? 9.84 cm?), intermediates (nests of 13.57 ? 4.23 cm?) and non-builders (nests of 2.17 ? 2.25 cm?). These groups presented marked different in agonistic and learning behavior, as well as hormone levels. Cortisol was the main predictor prepared by the Random Forest test for the separation of individuals in the different groups: nest builders and intermediates showed lower levels of cortisol while non-builders presented the highest basal cortisol values. The second most important predictor was learning performance, that separated animals from the intermediate from the nest builders (faster learners), followed by basal testosterone levels and agonistic behavior displays. While the testosterone levels were not significant to explain behavioral differences, it seems to be related to the construction profile. Our finding shows that different profiles invest differently in reproduction and that cortisol negatively influences nesting behavior and learning. In summary, our data suggest that different profiles in a population are determined by both hormonal and behavioral responses, and these differences confer flexibility to the population, allowing the presence of animals that invest the most in reproduction while other show defense and aggression as the dominant feature expressed. CNPQ::CIENCIAS BIOLOGICAS: PSICOBIOLOGIA Perfil comportamental Cortisol Testosterona Peixe de briga Aprendizagem Random forest Classification tree
107	Presente e futuro da análise de dados de fatores associados à soroprevalência da diarreia viral bovina / Present and future of data analysis of associated factors to seroprevalence of bovine viral diarrhea Machado, Gustavo January 2016 (has links) O vírus da diarreia viral bovina (BVDV) causa uma das doenças mais importantes de bovinos em termos de custos econômicos e sociais, uma vez que é largamente disseminado na população de gado leiteiro. Os objetivos do trabalho foram estimar a prevalência em nível de rebanho e investigar fatores associados aos níveis de anticorpos em leite de tanque através de um estudo transversal, bem como discutir e comparar diferentes técnicas de modelagem, as tradicionais como regressão e as menos usuais para este fim, como as de Machine learning (ML) como Random Forest. O estudo transversal foi realizado no estado do Rio Grande do Sul para a estimação da prevalência de doenças reprodutivas baseados em amostras de tanque de leite, partindo de uma população total de 81.307 rebanhos. Foram coletadas 388 amostras de tanque de leite, e nas propriedades selecionadas foi aplicado um questionário epidemiológico. Como resultados se identificou uma prevalência de 23,9% (IC95% = 19,8 - 28,1) de propriedades positivas. Através de análise de regressão de Poisson se identificou como fatores associados o BVDV: o exame retal como rotina para o diagnóstico de prenhes, Razão de Prevalência [PR] = 2,73 (IC 95%: 1.87-3.98), contato direto entre animais (contato via cerca de propriedades lindeiras) (PR=1,63, IC 95%: 1.13-2.95) e propriedades que não utilizavam inseminação artificial (PR=2.07, IC 95%: 1.38-3.09) Na técnica de Random Forest pôde-se identificar uma dependência na ocorrência de BVDV devido a: inseminação artificial quando realizada pelo proprietário da propriedade ou capataz, o número de vizinhos que também possuem criação de bovinos, e em concordância com os resultados da regressão quanto a dependência da ocorrência de BVDV devido a palpação retal. Como resultado, pôde-se perceber que o BVDV está distribuído no estado do RS e caso seja de interesse do poder público, o desenvolvimento de um programa de controle da doença pode ser baseado nos resultados encontrados. Por outro lado, a contribuição deste estudo vai além das tradicionais análises realizadas em epidemiologia veterinária, principalmente devido os bons resultados obtidos com a abordagem por ML neste estudo transversal. Por fim, a utilização de técnicas estatísticas mais avançadas contribuiu para elucidar melhor os fatores possivelmente envolvidos com a ocorrência de BVDV no rebanho leiteiro gaúcho. / The bovine viral diarrhea virus (BVDV) causes one of the most important disease of cattle in terms of economic and social costs, since it is widely disseminated in dairy cattle population. The objectives were to estimate the herd level prevalence at and investigate factors associated with antibody levels in bulk tank milk through a cross sectional study, discuss and compare different modeling techniques such as the traditional regression with the ones less used for this approach machine learning (ML). The cross sectional study was conducted in Rio Grande do Sul state to estimate the prevalence of reproductive diseases based on bulk tank milk samples, from a total population of 81,307 herds. Milk samples from 388 bulk tank were sampled, and an epidemiological questionnaire was applied in each farm. The prevalence was 23.9% (95% CI 19.8 - 28.1). Through the Poisson regression analysis, the following factors associated with BVDV were found: routine use of rectal examination for pregnancy (Prevalence Ratio [PR] = 2.73 (IC 95%: 1.87-3.98), direct contact between/among animals (contact over the fence of neighboring farms) (PR = 1.63, IC 95%: 1.13-2.95) and properties that did not use artificial insemination (PR = 2.07, IC 95%: 1.38-3.09). On the other hand, using ML techniques, it was identify a dependency upon the occurrence of BVDV due to: artificial insemination when carried out by the owner of the property or foreman; the number of neighbors who also have cattle, and in accordance with the regression results as the dependence of the occurrence of BVDV due to routine use of rectal examination for pregnancy. BVDV is spread across the State and if the government's interest to launch a disease control program measures should be focusing mainly on better conditions and care in reproduction. On the other hand, the contribution of this study goes beyond traditional analyzes in veterinary epidemiology, mainly due to the good results obtained with the approach by ML in this cross-sectional study. Finally, the use of advances statistics techniques it has been made progress to better elucidate the factors possibly involved in the occurrence of BVDV in state dairy herds. Diarreia viral : Bovinos Soroprevalência Epidemiologia veterinaria Gado leiteiro BVDV Epidemiology Bulk tank milk Regression Random forest
108	Alternative Methods via Random Forest to Identify Interactions in a General Framework and Variable Importance in the Context of Value-Added Models January 2013 (has links) abstract: This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’ test scores as outcome variables and teachers’ contributions as random effects to ascribe changes in student performance to the teachers who have taught them. The VAMs teacher score is the empirical best linear unbiased predictor (EBLUP). This approach is limited by the adequacy of the assumed model specification with respect to the unknown underlying model. In that regard, this study proposes alternative ways to rank teacher effects that are not dependent on a given model by introducing two variable importance measures (VIMs), the node-proportion and the covariate-proportion. These VIMs are novel because they take into account the final configuration of the terminal nodes in the constitutive trees in a random forest. In a simulation study, under a variety of conditions, true rankings of teacher effects are compared with estimated rankings obtained using three sources: the newly proposed VIMs, existing VIMs, and EBLUPs from the assumed linear model specification. The newly proposed VIMs outperform all others in various scenarios where the model was misspecified. The second study develops two novel interaction measures. These measures could be used within but are not restricted to the VAM framework. The distribution-based measure is constructed to identify interactions in a general setting where a model specification is not assumed in advance. In turn, the mean-based measure is built to estimate interactions when the model specification is assumed to be linear. Both measures are unique in their construction; they take into account not only the outcome values, but also the internal structure of the trees in a random forest. In a separate simulation study, under a variety of conditions, the proposed measures are found to identify and estimate second-order interactions. / Dissertation/Thesis / Ph.D. Statistics 2013 Statistics Data Mining Interactions Random Forest Statistical Learning Value Added Models Variable Importance
109	A Visual Analytics Based Decision Support Methodology For Evaluating Low Energy Building Design Alternatives January 2013 (has links) abstract: The ability to design high performance buildings has acquired great importance in recent years due to numerous federal, societal and environmental initiatives. However, this endeavor is much more demanding in terms of designer expertise and time. It requires a whole new level of synergy between automated performance prediction with the human capabilities to perceive, evaluate and ultimately select a suitable solution. While performance prediction can be highly automated through the use of computers, performance evaluation cannot, unless it is with respect to a single criterion. The need to address multi-criteria requirements makes it more valuable for a designer to know the "latitude" or "degrees of freedom" he has in changing certain design variables while achieving preset criteria such as energy performance, life cycle cost, environmental impacts etc. This requirement can be met by a decision support framework based on near-optimal "satisficing" as opposed to purely optimal decision making techniques. Currently, such a comprehensive design framework is lacking, which is the basis for undertaking this research. The primary objective of this research is to facilitate a complementary relationship between designers and computers for Multi-Criterion Decision Making (MCDM) during high performance building design. It is based on the application of Monte Carlo approaches to create a database of solutions using deterministic whole building energy simulations, along with data mining methods to rank variable importance and reduce the multi-dimensionality of the problem. A novel interactive visualization approach is then proposed which uses regression based models to create dynamic interplays of how varying these important variables affect the multiple criteria, while providing a visual range or band of variation of the different design parameters. The MCDM process has been incorporated into an alternative methodology for high performance building design referred to as Visual Analytics based Decision Support Methodology [VADSM]. VADSM is envisioned to be most useful during the conceptual and early design performance modeling stages by providing a set of potential solutions that can be analyzed further for final design selection. The proposed methodology can be used for new building design synthesis as well as evaluation of retrofits and operational deficiencies in existing buildings. / Dissertation/Thesis / M.S. Architecture 2013 Architecture Engineering Energy Building Design High Performance Random Forest Satisficing Visual Analytics
110	Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classification Halmann, Marju January 2017 (has links) Filtering out and replying automatically to emails are of interest to many but is hard due to the complexity of the language and to dependencies of background information that is not present in the email itself. This paper investigates whether Latent Dirichlet Allocation (LDA) combined with Random Forest classifier can be used for the more general email classification task and how it compares to other existing email classifiers. The comparison is based on the literature study and on the empirical experimentation using two real-life datasets. Firstly, a literature study is performed to gain insight of the accuracy of other available email classifiers. Secondly, proposed model’s accuracy is explored with experimentation. The literature study shows that the accuracy of more general email classifiers differs greatly on different user sets. The proposed model accuracy is within the reported accuracy range, however in the lower part. It indicates that the proposed model performs poorly compared to other classifiers. On average, the classifier performance improves 15 percentage points with additional information. This indicates that Latent Dirichlet Allocation (LDA) combined with Random Forest classifier is promising, however future studies are needed to explore the model and ways to further increase the accuracy. Email mining Latent Dirichlet Allocation Random Forest classification Computer Sciences Datavetenskap (datalogi)

Search results