• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 340
  • 26
  • 21
  • 13
  • 8
  • 5
  • 5
  • 5
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 510
  • 510
  • 273
  • 271
  • 147
  • 135
  • 129
  • 128
  • 113
  • 92
  • 88
  • 77
  • 76
  • 74
  • 59
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Application of Machine Learning Techniques for Real-time Classification of Sensor Array Data

Li, Sichu 15 May 2009 (has links)
There is a significant need to identify approaches for classifying chemical sensor array data with high success rates that would enhance sensor detection capabilities. The present study attempts to fill this need by investigating six machine learning methods to classify a dataset collected using a chemical sensor array: K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Classification and Regression Trees (CART), Random Forest (RF), Naïve Bayes Classifier (NB), and Principal Component Regression (PCR). A total of 10 predictors that are associated with the response from 10 sensor channels are used to train and test the classifiers. A training dataset of 4 classes containing 136 samples is used to build the classifiers, and a dataset of 4 classes with 56 samples is used for testing. The results generated with the six different methods are compared and discussed. The RF, CART, and KNN are found to have success rates greater than 90%, and to outperform the other methods.
82

Do people actually listen to ads in podcasts? : A study about how machine learning can be used to gain insight in listening behaviour

Hane, Sara, Angergård, Madeleine January 2019 (has links)
Today, listening to podcasts is a common way of consuming media and it has been proven that listeners are much more recipient to advertisement when being addressed in a podcast, rather than through radio. This study has been performed at Acast, an audio-on-demand and podcast platform that hosts, monetizes, and distributes podcasts globally. With the use of machine learning, the goal of this study has been to obtain a credible estimate of how listeners outside the application tend to respond when exposed to ads in podcasts. The study includes a number of different machine learning models, such as Random Forest, Logistic Regression, Neural Networks and kNN. It was shown that machine learning could be applied to obtain a credible estimate of how ads are received outside the Acast application, based on data collected from the application. Additionally, out of the models included in the study, Random Forest was proven being the best performing model for this problem. Please note that the results presented in the report are based on a mix of real and simulated data.
83

Desarrollo de un modelo predictivo de deserción de estudiantes de primer año en institución de educación superior

Galleguillos Aguilar, Matías January 2018 (has links)
Memoria para optar al título de Ingeniero Civil Eléctrico / En Chile, durante los últimos 30 años ha habido un crecimiento significativo en el acceso de las personas a la educación superior. Acompañado de este crecimiento se ha visto un aumento en la deserción universitaria, siendo particularmente elevada la de alumnos de primer año. Este problema tiene grandes costos de distinta índole tanto para los alumnos como para las universidades, haciendo que se haya posicionado como una de las métricas más importantes que se utiliza para acreditar a las instituciones. La Universidad de las Américas se ha visto enfrentada a una alta tasa de deserción, traduciéndose en que en el año 2013 haya contribuido de manera importante a la pérdida de su acreditación, por lo que se transformó en tema prioritario a resolver. Por esto se ideó un plan para ayudar a los alumnos con mayor probabilidad de desertar. Actualmente UDLA no posee un sistema automatizado que clasifique a los alumnos en base a análisis de datos de su comportamiento, sólo se cuenta con un sistema de reglas creado en base al conocimiento de deserción de miembros de la universidad, por lo que tiene una alta tasa de errores. En el último estudio publicado por el Servicio de Información de Educación Superior sobre retención de alumnos de primer año, construido con datos de alumnos que ingresaron a estudiar el año 2016, la Universidad de las Américas se ubica en la posición 47 de 58 universidades. Por esto, desarrollar un sistema capaz de identificar a los alumnos que estén en riesgo de desertar sigue siendo un tema prioritario para la institución. El objetivo del presente trabajo es desarrollar un sistema capaz de entregar un índice de riesgo de deserción de cada alumno de primer año. Para esto se propone plantear el proceso de asignar riesgo como un problema de clasificación y afrontarlo con herramientas de inteligencia computacional. Para resolver el problema se dividió el semestre en tramos y se entrenó un modelo para cada uno de éstos. La precisión del primer modelo fue más baja que la de estudios similares que afrontaron el mismo problema en otras universidades del mundo, teniendo un 70,1% de aciertos. El modelo de cada tramo entregó mejores resultados que los del tramo anterior, siendo el del final del semestre el de mejores resultados llegando a un 82,5% de precisión, lo que se asemeja a otros trabajos.
84

DEVELOPMENT AND DEPLOYMENT OF A FIELD BASED SOIL MAPPING TOOL USING A COMPARATIVE EVALUATION OF GEOSTATISTICS AND MACHINE LEARNING

Jeff Fiechter (7046756) 13 August 2019 (has links)
Soil property variability is a large component of the overall environmental variability that Precision Agriculture practices seek to address. Thus, the creation of accurate field soil maps from field soil samples is of utmost importance to practitioners of Precision Agriculture, as understanding and characterizing variability is the first step in addressing it. Today, growers often interpolate their soil maps in a “black-box” fashion, and there is a need for an easy to use, accurate method of interpolation. In this study, current interpolation practices are examined as a benchmark, a Random Forest (RF) based prediction framework utilizes public data to aid predictions, and the RF framework is exposed via a webtool. A high density (0.20 ha/sample) field soil sample dataset provides 28 training points and 82 validation points to be used as a case study. In the prediction of soil percent organic matter (OM), the grid and ordinary kriging interpolations both had higher Mean Absolute Error (MAE) scores than a field average prediction, though the difference was not statistically significant at a 5\% confidence level. A RF framework interpolation utilizing a high resolution (1.52 m) DEM and distances to known points as the feature set had a significantly lower MAE score than the field average, grid, and ordinary kriging interpolations. The results suggest that for the study site, RF framework performed better compared to a field average, a grid based, and an ordinary kriging interpolation methods.
85

Predicting the Unobserved : A statistical analysis of missing data techniques for binary classification

Säfström, Stella January 2019 (has links)
The aim of the thesis is to investigate how the classification performance of random forest and logistic regression differ, given an imbalanced data set with MCAR missing data. The performance is measured in terms of accuracy and sensitivity. Two analyses are performed: one with a simulated data set and one application using data from the Swedish population registries. The simulation study is created to have the same class imbalance at 1:5. The missing values are handled using three different techniques: complete case analysis, predictive mean matching and mean imputation. The thesis concludes that logistic regression and random forest are on average equally accurate, with some instances of random forest outperforming logistic regression. Logistic regression consistently outperforms random forest with regards to sensitivity. This implies that logistic regression may be the best option for studies where the goal is to accurately predict outcomes in the minority class. None of the missing data techniques stood out in terms of performance.
86

Multi-Output Random Forests

Linusson, Henrik January 2013 (has links)
The Random Forests ensemble predictor has proven to be well-suited for solving a multitudeof different prediction problems. In this thesis, we propose an extension to the Random Forestframework that allows Random Forests to be constructed for multi-output decision problemswith arbitrary combinations of classification and regression responses, with the goal ofincreasing predictive performance for such multi-output problems. We show that our methodfor combining decision tasks within the same decision tree reduces prediction error for mosttasks compared to single-output decision trees based on the same node impurity metrics, andprovide a comparison of different methods for combining such metrics. / Program: Magisterutbildning i informatik
87

Hybridmodeller för prediktiv modellering skapade med genetisk programmering. / Hybrid models for predictive modeling created with genetic programming.

Johansson, Fredrik, Lindgren, Markus January 2013 (has links)
Det finns idag ett stort behov av att kunna klassificera stora mängder data på ett effektivt sätt. Prediktiv modellering är ett område inom data mining där prediktioner kan utföras baserat på tidigare erfarenheter. Dessa prediktioner presenteras sedan i en modell. Avvägningen mellan tolkningsbarhet och träffsäkerhet är ett begrepp som beskriver hur träffsäkra modeller ofta är ogenomskinliga, medan genomskinliga modeller ofta har lägre träffsäkerhet. Detta är ett problem eftersom det finns ett behov av modeller som är både träffsäkra och tolkningsbara.I denna studie visas hur man kan gå till väga för att skapa en modell som har en träffsäkerhet i klass med en ogenomskinlig modell, men samtidigt har en högre tolkningsbarhet. Två algoritmer presenteras för att ta fram en hybridmodell som bygger på beslutsträd där en implementering av Random Forest hanteras som alternativa lövnoder. Kontrollerade experiment och statistiska tester genomfördes för att mäta hybridmodellens träffsäkerhet mot träffsäkerheten hos J48 och Random Forest. Träffsäkerheten mättes även mot beslutsträd som genererats av den genetiska programmeringen som finns implementerad i ramverket G-REX.Resultatet visar att hybridmodellen kan uppnå en träffsäkerhet som är jämförbar med Random Forest men samtidigt hanterar de vanliga prediktionslöven i genomsnitt 39,21% av instanserna. Alltså är den hybridmodell som presenteras i studien mer tolkningsbar än Random Forest utan att ha någon signifikant skillnad i träffsäkerhet. / Program: Systemarkitekturutbildningen
88

Evaluation of infrared QCL, Synchrotron and bench-top sources for cell imaging in aqueous media

Zhang, Zhe January 2017 (has links)
Live cell imaging with FTIR spectroscopy offers a high throughput, non-damage and lab-free method to study the cells in vivo which has significant advantages in the field of cancer diagnosis and drug screening. However, due to the strong absorbance of water, using infrared spectroscopy in such field remains to be an underdeveloped topic. This project demonstrates a novel method to perform IR imaging of cells in solution. A novel water correction method, which avoids the using of water combination band, is proposed. A buffer reference and a cell reference spectra were introduced to fitting the contribution based on protein bands. This method was implemented on three types of IR spectrometers, namely conventional FTIR spectrometer, synchrotron-based FTIR spectrometer and quantum cascade laser (QCL) microscope. To date, most of the live cell imaging carried out with IR sources utilise synchrotron radiation. Recently, a new bench top system, QCL microscope, has been developed. It incorporates four tunable QCL laser sources covering the wavenumber range 900-1800 cm-1 which are many orders of magnitude brighter than conventional sources. The proposed water correction method is, therefore, capable of processing the data recorded by all three types of IR spectrometers. Three prostate cancer cell lines were employed to evaluate the water correction method and the performance of three spectrometers on imaging of cell in solution. The obtained spectra was analysed with multivariate analysis, PCA and PC-LDA which shows good separation between cell lines. The data was also examined with Random Forest algorithm to establish a classifier and the diagnostic capability of the water corrected spectra was proven.
89

Presente e futuro da análise de dados de fatores associados à soroprevalência da diarreia viral bovina / Present and future of data analysis of associated factors to seroprevalence of bovine viral diarrhea

Machado, Gustavo January 2016 (has links)
O vírus da diarreia viral bovina (BVDV) causa uma das doenças mais importantes de bovinos em termos de custos econômicos e sociais, uma vez que é largamente disseminado na população de gado leiteiro. Os objetivos do trabalho foram estimar a prevalência em nível de rebanho e investigar fatores associados aos níveis de anticorpos em leite de tanque através de um estudo transversal, bem como discutir e comparar diferentes técnicas de modelagem, as tradicionais como regressão e as menos usuais para este fim, como as de Machine learning (ML) como Random Forest. O estudo transversal foi realizado no estado do Rio Grande do Sul para a estimação da prevalência de doenças reprodutivas baseados em amostras de tanque de leite, partindo de uma população total de 81.307 rebanhos. Foram coletadas 388 amostras de tanque de leite, e nas propriedades selecionadas foi aplicado um questionário epidemiológico. Como resultados se identificou uma prevalência de 23,9% (IC95% = 19,8 - 28,1) de propriedades positivas. Através de análise de regressão de Poisson se identificou como fatores associados o BVDV: o exame retal como rotina para o diagnóstico de prenhes, Razão de Prevalência [PR] = 2,73 (IC 95%: 1.87-3.98), contato direto entre animais (contato via cerca de propriedades lindeiras) (PR=1,63, IC 95%: 1.13-2.95) e propriedades que não utilizavam inseminação artificial (PR=2.07, IC 95%: 1.38-3.09) Na técnica de Random Forest pôde-se identificar uma dependência na ocorrência de BVDV devido a: inseminação artificial quando realizada pelo proprietário da propriedade ou capataz, o número de vizinhos que também possuem criação de bovinos, e em concordância com os resultados da regressão quanto a dependência da ocorrência de BVDV devido a palpação retal. Como resultado, pôde-se perceber que o BVDV está distribuído no estado do RS e caso seja de interesse do poder público, o desenvolvimento de um programa de controle da doença pode ser baseado nos resultados encontrados. Por outro lado, a contribuição deste estudo vai além das tradicionais análises realizadas em epidemiologia veterinária, principalmente devido os bons resultados obtidos com a abordagem por ML neste estudo transversal. Por fim, a utilização de técnicas estatísticas mais avançadas contribuiu para elucidar melhor os fatores possivelmente envolvidos com a ocorrência de BVDV no rebanho leiteiro gaúcho. / The bovine viral diarrhea virus (BVDV) causes one of the most important disease of cattle in terms of economic and social costs, since it is widely disseminated in dairy cattle population. The objectives were to estimate the herd level prevalence at and investigate factors associated with antibody levels in bulk tank milk through a cross sectional study, discuss and compare different modeling techniques such as the traditional regression with the ones less used for this approach machine learning (ML). The cross sectional study was conducted in Rio Grande do Sul state to estimate the prevalence of reproductive diseases based on bulk tank milk samples, from a total population of 81,307 herds. Milk samples from 388 bulk tank were sampled, and an epidemiological questionnaire was applied in each farm. The prevalence was 23.9% (95% CI 19.8 - 28.1). Through the Poisson regression analysis, the following factors associated with BVDV were found: routine use of rectal examination for pregnancy (Prevalence Ratio [PR] = 2.73 (IC 95%: 1.87-3.98), direct contact between/among animals (contact over the fence of neighboring farms) (PR = 1.63, IC 95%: 1.13-2.95) and properties that did not use artificial insemination (PR = 2.07, IC 95%: 1.38-3.09). On the other hand, using ML techniques, it was identify a dependency upon the occurrence of BVDV due to: artificial insemination when carried out by the owner of the property or foreman; the number of neighbors who also have cattle, and in accordance with the regression results as the dependence of the occurrence of BVDV due to routine use of rectal examination for pregnancy. BVDV is spread across the State and if the government's interest to launch a disease control program measures should be focusing mainly on better conditions and care in reproduction. On the other hand, the contribution of this study goes beyond traditional analyzes in veterinary epidemiology, mainly due to the good results obtained with the approach by ML in this cross-sectional study. Finally, the use of advances statistics techniques it has been made progress to better elucidate the factors possibly involved in the occurrence of BVDV in state dairy herds.
90

Utiliza??o de imagens ALOS/PALSAR no mapeamento digital de atributos f?sicos dos solos / Digital mapping of physical attributes of soils using ALOS/PALSAR images

BERNINI, Thiago Andrade 26 February 2016 (has links)
Submitted by Jorge Silva (jorgelmsilva@ufrrj.br) on 2018-05-17T19:12:34Z No. of bitstreams: 1 2016 - Thiago Andrade Bernini.pdf: 6680378 bytes, checksum: 98cf74e5c188b6420235be6f37868b6b (MD5) / Made available in DSpace on 2018-05-17T19:12:34Z (GMT). No. of bitstreams: 1 2016 - Thiago Andrade Bernini.pdf: 6680378 bytes, checksum: 98cf74e5c188b6420235be6f37868b6b (MD5) Previous issue date: 2016-02-26 / CAPES / The survey and analysis of the spatial distribution of soil attributes through geostatistics tools are essential for agricultural land use according to soil capability. The images of synthetic aperture radar (SAR) have great potential for soil moisture estimation and, thus, these sensors can assist in mapping the physical-hydric and physical properties of soils. The overall objective of this study was to evaluate the potential use of radar images (microwave) ALOS/PALSAR on the identification of soils in an area of the Botucatu formation, dominated by sandy and medium texture soils in the municipality of Mineiros, Goi?s State, Brazil. The area has approximately 946 hectares, with the relief of the region ranging from plain to low undulating hills and the geology of the area is composed basically by sandstones of the Botucatu formation. In the present study there were sampled 84 points for calibration and 25 points for validation, collected in the depths of 0-20 cm and 60-80 cm. The soil samples were analyzed for the determination of sand, silt, clay, field capacity (CC), permanent wilting point (PMP) and total water available (AD). For the development of the work were acquired ALOS/PALSAR radar images of five dates and different polarizations, totaling 14 images, which were processed for the geographic and radiometric corrections, using a DEM. Were also generated covariates of terrain attributes: high (ELEV), slope (DECLIV), relative position of the slope (PR-DECL), vertical distance of the drainage channel (DVCD), ls factor (FACTOR-LS) and Euclidean distance (D-EUCL). Prediction of soil attributes was performed using Random Forest methods (RF) and Random Forest Kriging (RFK), having as predictive covariates the radar imaging and terrain attributes. Image processing of the ALOS/PALSAR radar images enabled the geographical and radiometric corrections, transforming the data into backscatter coefficient (??) in units of dB, corrected by digital elevation model (MDE). The acquired images represented broad range of ?? between the different dates. The soils of the study area are predominantly sandy, with most of the sampled points classified as Neossolos Quartzar?nicos (Entisols), followed by Latossolos (Oxisols). The RF models employed for prediction of physical-hydric and physical attributes of soils provided an analysis of the contribution of these covariates in the predictive models. The landscape attributes that caused the largest impact in the prediction of the studied attributes are related to the altitude. The images of 5/3/2009 (HH1, VV1, HV1 and VH1) and 9/26/2010 (HH3 and HV3), obtained in drier periods, had best correlations with the soil attributes. The analysis of the semivariograms of the RF prediction models residues demonstrated greater spatial dependence in the 60 to 80 cm layer. The Kriging approach coupled with RF model contributed to the improvement of the prediction of sand, clay, CC and PMP. Using ALOS/PALSAR radar images and terrain attributes as covariates in RFK models showed potential to estimate the physical (sand and clay) and physical-hydric (CC and PMP) attributes, which can assist in mapping of soils associated with the Botucatu formation parent materials. / O levantamento e a an?lise da espacializa??o dos atributos do solo atrav?s de ferramentas de geoestat?stica s?o fundamentais para que cada hectare de terra seja cultivado segundo as suas reais aptid?es. As imagens de radar de abertura sint?tica (SAR) t?m um grande potencial para a estima??o de umidade do solo e, desta forma, estes sensores podem auxiliar no mapeamento de propriedades f?sicas e f?sico-h?dricas dos solos. O objetivo geral deste estudo foi avaliar o potencial de utiliza??o de imagens de radar (micro-ondas) ALOS/PALSAR na identifica??o de solos em uma ?rea da Forma??o Botucatu, dominada por solos de textura arenosa e m?dia no munic?pio de Mineiros - GO. A ?rea tem aproximadamente 946 ha, com o relevo da regi?o variando de plano a suave ondulado e geologia da ?rea ? composta basicamente, por Arenitos da Forma??o Botucatu. No presente estudo foram amostrados 84 pontos para calibra??o e 25 pontos para valida??o, coletados nas profundidades de 0-20 cm e 60-80 cm. As amostras de solo analisadas para a determina??o de areia, silte, argila, capacidade de campo (CC), ponto de murcha permanente (PMP) e ?gua total dispon?vel (AD). Para o desenvolvimento do trabalho foram adquiridas imagens de cinco datas e diferentes polariza??es, totalizando 14 imagens, que foram processadas para a corre??o geom?trica e corre??o radiom?trica, utilizando o MDE. Tamb?m foram gerados covari?veis dos atributos do terreno: eleva??o (ELEV), declividade (DECLIV), posi??o relativa da declividade (PR-DECL), dist?ncia vertical do canal de drenagem (DVCD), fator-ls (FATOR-LS) e dist?ncia euclidiana (D-EUCL). A predi??o dos atributos do solo foi realizada utilizando os m?todos Random Forest (RF) e Random Forest Krigagem (RFK), tendo como covari?veis preditoras as imagens de radar e os atributos do terreno. O processamento das imagens do radar ALOS/PALSAR possibilitou as corre??es geom?trica e radiom?trica, transformando os dados em unidades de coeficiente de retroespalhamento (??) corrigidos pelo modelo digital de eleva??o (MDE). As imagens adquiridas representaram de forma ampla as varia??es de ?? ocorridos em diferentes datas. Os solos da ?rea de estudo s?o predominantemente arenosos, com a maioria dos pontos amostrados classificados como NEOSSOLOS QUARTZAR?NICOS, seguidos dos LATOSSOLOS. Os modelos RF empregados para a predi??o dos atributos f?sicos e f?sico-h?dricos dos solos proporcionaram a an?lise da contribui??o das covari?veis preditoras. Os atributos do terreno que exerceram maior influ?ncia na predi??o dos atributos estudados est?o relacionados ? eleva??o. As imagens de 03/05/2009 (HH1, VV1, HV1 e VH1) e 26/09/2010 (HH3 e HV3), obtidas em per?odos mais secos, tiveram melhores correla??es com os atributos do solo. As an?lises dos semivariogramas dos res?duos da predi??o dos modelos RF demonstraram maior depend?ncia espacial na camada de 60 a 80 cm. A abordagem da Krigagem somada ao modelo RF contribu?ram para a melhoria da predi??o dos atributos areia, argila, CC e PMP. O uso de imagens de radar ALOS/PALSAR e atributos do terreno como covari?veis em modelos RFK mostrou potencial para estimar os atributos f?sicos (areia e argila) e f?sico-h?dricos (CC e PMP), que podem auxiliar no mapeamento de solos associados aos materiais de origem da Forma??o Botucatu.

Page generated in 0.0473 seconds