Spelling suggestions: "subject:"[een] CLASSIFICATION MODELS"" "subject:"[enn] CLASSIFICATION MODELS""
1 |
A Psychometric Analysis of the Precalculus Concept AssessmentJones, Brian Lindley 02 April 2021 (has links)
The purpose of this study was to examine the psychometric properties of the Precalculus Concept Assessment (PCA), a 25-item multiple-choice instrument designed to assess student reasoning abilities and understanding of foundational calculus concepts (Carlson et al., 2010). When this study was conducted, the extant research on the PCA and the PCA Taxonomy lacked in-depth investigations of the instruments' psychometric properties. Most notably was the lack of studies into the validity of the internal structure of PCA response data implied by the PCA Taxonomy. This study specifically investigated the psychometric properties of the three reasoning constructs found in the PCA taxonomy, namely, Process View of Function (R1), Covariational Reasoning (R2), and Computational Abilities (R3). Confirmatory Factor Analysis (CFA) was conducted using a total of 3,018 pretest administrations of the PCA. These data were collected in select College Algebra and Precalculus sections at a large private university in the mountain west and one public university in the Phoenix metropolitan area. Results showed that the three hypothesized reasoning factors were highly correlated. Rival statistical models were evaluated to explain the relationship between the three reasoning constructs. The bifactor model was the best fitting model and successfully partitioned the variance between a general reasoning ability factor and two specific reasoning ability factors. The general factor was the dominant factor accounting for 76% of the variance and accounted for 91% of the reliability. The omegaHS values were low, indicating that this model does not serve as a reliable measure of the two specific factors. PCA response data were retrofitted to diagnostic classification models (DCMs) to evaluate the extent to which individual mastery profiles could be generated to classify individuals as masters or non-masters of the three reasoning constructs. The retrofitting of PCA data to DCMs were unsuccessful. High attribute correlations and other model deficiencies limit the confidence in which these particular models could estimate student mastery. The results of this study have several key implications for future researchers and practitioners using the PCA. Researchers interested in using PCA scores in predictive models should use the General Reasoning Ability factor from the respecified bifactor model or the single-factor model in conjunction with structural equation modeling techniques. Practitioners using the PCA should avoid using PCA subscores for reasoning abilities and continue to follow the recommended practice of reporting a simple sum score (i.e., unit-weighted composite score).
|
2 |
Utilização de modelos de classificação para mineração de dados relacionados à aprendizagem de matemática e ao perfil de professores do ensino fundamental / Application of classification models for mining of data related to mathematics learning and elementary school teachers profileStella Oggioni da Fonseca 20 February 2014 (has links)
Conselho Nacional de Desenvolvimento Científico e Tecnológico / No presente trabalho foram utilizados modelos de classificação para minerar dados relacionados à aprendizagem de Matemática e ao perfil de professores do ensino fundamental.
Mais especificamente, foram abordados os fatores referentes aos educadores do Estado do Rio de Janeiro que influenciam positivamente e negativamente no desempenho dos alunos do 9 ano do ensino básico nas provas de Matemática. Os dados utilizados para extrair estas informações são disponibilizados pelo Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira que avalia o sistema educacional brasileiro em
diversos níveis e modalidades de ensino, incluindo a Educação Básica, cuja avaliação, que foi foco deste estudo, é realizada pela Prova Brasil. A partir desta base, foi aplicado o processo de Descoberta de Conhecimento em Bancos de Dados (KDD - Knowledge Discovery in Databases), composto das etapas de preparação, mineração e pós-processamento dos dados. Os padrões foram extraídos dos modelos de classificação gerados pelas técnicas árvore de decisão, indução de regras e classificadores Bayesianos, cujos algoritmos estão
implementados no software Weka (Waikato Environment for Knowledge Analysis). Além disso, foram aplicados métodos de grupos e uma metodologia para tornar as classes uniformemente
distribuídas, afim de melhorar a precisão dos modelos obtidos. Os resultados apresentaram importantes fatores que contribuem para o ensino-aprendizagem de Matemática, assim como evidenciaram aspectos que comprometem negativamente o desempenho dos discentes. Por fim, os resultados extraídos fornecem ao educador e elaborador de políticas públicas fatores para uma análise que os auxiliem em posteriores tomadas de
decisão. / Classification models were applied in this work in order to mine data related to
elementary school teachers profiles and students' mathematics learning. More specifically,
teacher characteristics which in
uence positively and negatively on the Mathematics tests
performance of the students in the 9th grade of elementary education in Rio de Janeiro
State were addressed. The data used to extract this information are provided by the
National Institute of Studies and Educational Research Anisio Teixeira (INEP), which
evaluates the Brazilian educational system at various levels and types of education, including
Elementary Education. The Knowledge Discovery in Databases (KDD) process
was applied comprising the steps of preparation, mining and post processing of data. The
patterns were extracted from the classification models generated by decision tree, rule
induction and Bayesian classifiers, whose algorithms are implemented in software Weka
(Waikato Environment for Knowledge Analysis). In addition, group methods were used
as well as a methodology for making uniformly distributed classes in order to improve the
accuracy of the models obtained. The results showed important factors that contribute
to the learning of mathematics and aspects that negatively compromise the performance
of students. The extracted results can provide to educators and public policies makers
the support for analysis and decision making.
|
3 |
Utilização de modelos de classificação para mineração de dados relacionados à aprendizagem de matemática e ao perfil de professores do ensino fundamental / Application of classification models for mining of data related to mathematics learning and elementary school teachers profileStella Oggioni da Fonseca 20 February 2014 (has links)
Conselho Nacional de Desenvolvimento Científico e Tecnológico / No presente trabalho foram utilizados modelos de classificação para minerar dados relacionados à aprendizagem de Matemática e ao perfil de professores do ensino fundamental.
Mais especificamente, foram abordados os fatores referentes aos educadores do Estado do Rio de Janeiro que influenciam positivamente e negativamente no desempenho dos alunos do 9 ano do ensino básico nas provas de Matemática. Os dados utilizados para extrair estas informações são disponibilizados pelo Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira que avalia o sistema educacional brasileiro em
diversos níveis e modalidades de ensino, incluindo a Educação Básica, cuja avaliação, que foi foco deste estudo, é realizada pela Prova Brasil. A partir desta base, foi aplicado o processo de Descoberta de Conhecimento em Bancos de Dados (KDD - Knowledge Discovery in Databases), composto das etapas de preparação, mineração e pós-processamento dos dados. Os padrões foram extraídos dos modelos de classificação gerados pelas técnicas árvore de decisão, indução de regras e classificadores Bayesianos, cujos algoritmos estão
implementados no software Weka (Waikato Environment for Knowledge Analysis). Além disso, foram aplicados métodos de grupos e uma metodologia para tornar as classes uniformemente
distribuídas, afim de melhorar a precisão dos modelos obtidos. Os resultados apresentaram importantes fatores que contribuem para o ensino-aprendizagem de Matemática, assim como evidenciaram aspectos que comprometem negativamente o desempenho dos discentes. Por fim, os resultados extraídos fornecem ao educador e elaborador de políticas públicas fatores para uma análise que os auxiliem em posteriores tomadas de
decisão. / Classification models were applied in this work in order to mine data related to
elementary school teachers profiles and students' mathematics learning. More specifically,
teacher characteristics which in
uence positively and negatively on the Mathematics tests
performance of the students in the 9th grade of elementary education in Rio de Janeiro
State were addressed. The data used to extract this information are provided by the
National Institute of Studies and Educational Research Anisio Teixeira (INEP), which
evaluates the Brazilian educational system at various levels and types of education, including
Elementary Education. The Knowledge Discovery in Databases (KDD) process
was applied comprising the steps of preparation, mining and post processing of data. The
patterns were extracted from the classification models generated by decision tree, rule
induction and Bayesian classifiers, whose algorithms are implemented in software Weka
(Waikato Environment for Knowledge Analysis). In addition, group methods were used
as well as a methodology for making uniformly distributed classes in order to improve the
accuracy of the models obtained. The results showed important factors that contribute
to the learning of mathematics and aspects that negatively compromise the performance
of students. The extracted results can provide to educators and public policies makers
the support for analysis and decision making.
|
4 |
Detekce stresu a únavy v komplexních datech řidiče / Stresss and fatique detection in complex driver's dataŠimoňáková, Sabína January 2021 (has links)
Main aim of our thesis is fatigue and stress detection from biological signals of a driver. Introduction contains information on published methods of detection and thoroughly informs readers about theoretical background necessary for our thesis. In the practical application we have firstly worked with a database of measured rides and subsequently chose their most relevant sections. Extraction and selection of features followed afterward. Five different classification models for tiredness and stress detection were used in the thesis and prediction was based on actual data. Lastly, the final section compares the best model of our thesis with the already published results.
|
5 |
Development of infrared spectroscopic methods to assess table grape qualityDaniels, Andries Jerrick 03 1900 (has links)
Thesis (MScAgric)--Stellenbosch University, 2013. / ENGLISH ABSTRACT: The two white seedless table grape cultivars, Regal Seedless and Thompson Seedless fulfil a
very important role in securing foreign income not only for the South African table grape
industry, but the South African economy as a whole. These two cultivars, however, are like so
many other white table grape cultivars, also prone to browning, especially netlike browning on
Regal Seedless and internal browning on Thompson Seedless grapes. This leads to huge
financial losses every year, since there is no established way to assess at harvest, during
storage or during packaging, whether the grapes will eventually turn brown. In other words,
there is no well-known protocol of assessing the browning risk of a particular batch of grapes
prior to export. Numerous studies have been undertaken to determine the exact cause of
browning and how it should be managed, but to date, no chemical or physical parameter has
been firmly associated with the phenomenon.
The overall aim of this study was thus to find an alternative way to deal with the problem by
investigating the potential of near infrared (NIR) spectroscopy as a fast, non-destructive
measurement technique to determine the browning potential of whole white seedless table
grapes. A secondary aim was the determination of optimal ripeness of table grapes. In this way
harvest maturity and quality indicative parameters namely total soluble solids (TSS), titratable
acidity (TA), pH, glucose and fructose, also associated with the browning phenomenon, was
quantified using models based on infrared spectra.
Three different techniques (a) Fourier transform Near Infrared (FT-NIR), (b) Fourier
transform – Mid Infrared (FT-MIR) and (c) Fourier transform – Mid Infrared Attenuated Total
Reflectance (FT-MIR ATR) spectroscopy were investigated to determine these parameters. This
was done so that a platform of different technologies would be available to the table grape
industry.
The grapes used in this study were harvested over two years (2008 and 2009) and were
sourced from two different commercial vineyards in the Hex River valley, Western Cape, South
Africa. Different crop loads (the total amount of bunches on the vines per hectare) were left for
Regal Seedless (75 000, 50 000 and 35 000) and for Thompson Seedless (75 000 and 50 000).
Three rows were used for Regal Seedless and two rows for Thompson Seedless. Each row had
six sections which each represented a repetition for each crop load. In 2008 these cultivars
were harvested early at 16°Brix, at optimum ripeness (18°Brix) and late at 20°Brix. In 2009 they
were harvested twice at the optimum ripeness level.
Berries from harvested bunches were crushed and the juice was used to determine the
reference values for the different parameters in the laboratory according to their specific
methods. The obtained juice was also scanned on the three different instruments. Different
software (OPUS 6.5 for the FT-NIR and FT-MIR ATR instruments and Unscrambler version 9.2
for the FT-MIR instrument) as well as different spectral pre-processing techniques were also
evaluated before construction of the models for all the instruments.
Partial least squares (PLS) regression was used for the construction of the different
calibration models. Different regression statistics, that included the root mean square error for
prediction (RMSEP); the coefficient of determination (R2); the residual prediction deviation
(RPD) and the bias were used to evaluate the performance of the developed calibration models.
Calibration models which are fit for screening purposes were obtained on the FT-NIR and FTMIR
ATR instruments for TSS (11.40 - 21.80°Brix) (R2 = 85.92%, RMSEP = 0.71 °Brix RPD =
2.67 and bias = 0.03°Brix), pH (2.94 - 3.9) (R2 = 85.00%, RMSEP = 0.08 RPD = 2.59 and bias =
-0.01) and TA (4.3 - 13.1 g/L), (R2 = 90.77%, RMSEP = 0.48 g/L RPD = 3.30 and bias = -0.03
g/L). Models for fructose (46.70 – 176.82 g/L) (R2 = 74.66%, RMSEP = 9.28 g/L RPD = 2.00 and
bias = 1.10 g/L) and glucose (20.36 – 386.67 g/L) (R2 = 70.71%, RMSEP = 11.10 g/L RPD = 1.87 and bias = 1.64 g/L) were obtained with the FT-NIR and FT-MIR ATR instruments that
were in some instances fit for screening purposes and in some instances unsuitable for
quantification purposes. The FT-MIR instrument gave models for all the parameters that were
not yet suitable for quantification purposes.
Combined spectral ranges used for calibration were often similar for some parameters,
namely 12 493 - 5 446.2 for TSS and pH, 6 101.9 - 5 446.2 for TSS, TA and fructose and
4 601.5 - 4 246.7 for pH and fructose on the FT-NIR instrument, 2 993.2 - 2 322.3 for pH, TA
and glucose and 1 654.3 - 649.4 for pH and glucose on the FT-MIR ATR instrument and
sometimes they were adjacent (3 996.6 - 3 661.2, 3 663.5 - 3 327.7 and 3 327.2 - 2 322.3 for
TSS and glucose, 1 988.3 - 1 652.8 and 1 654.3 - 649.4 for TSS, pH and TA. Other times they
were overlapping (1 654.3 - 649.4 and 1 318.8 - 649.4) for pH, TA and fructose on the FT-MIR
ATR instrument. This is a very good sign for transfer of this technology to a handheld device,
where adjacent and/ or overlapping wavenumbers are crucial. Instruments which have to
determine different parameters over large spectral ranges are not only impractical, because the
instrument has to be big, but because it is also very expensive.
Another advantage of implementing especially FT-NIR spectroscopy as a fast, accurate and
inexpensive technique for determining harvest maturity and quality parameters is because no
sample preparation is necessary and very little waste (few single berries tested) is produced.
This is a pre-requisite which is highly recommended in the green era that we are currently living
in and will do so for aeons to come. A platform of technologies has now been made available
through this study for the determination of the respective parameters in future table grape
samples by just taking their spectra on one of the instruments. Indeed something that has not
been possible or available for the South African table grape industry before. Berries for the browning experiments were scanned on a FT-NIR instrument immediately
after harvest (before cold storage) and again after cold storage. Before cold storage they were
scanned on each side of the berry and after cold storage they were scanned twice on a brown
spot if browning was present and twice on a clear spot, irrespective of whether browning was
present or not. Inspection of the berries for the incidence of browning after cold storage
revealed that Regal Seedless had a higher incidence of browning (68% in 2008 and 66% in
2009) than Thompson Seedless (21% in 2008 and 25% in 2009). Regal Seedless was also
more prone to external browning, specifically netlike browning, whereas Thompson Seedless
was more prone to internal browning, despite the different phenotypes of browning that were
present on both.
Principal component analysis (PCA) done on the spectra obtained before and after cold
storage revealed that NIR can capture the changes related to cold storage with the first principal
components explaining almost 100% of the variation in the spectra. Classification models also
build using PCA was based on spectra of berries that remained clear before and after cold
storage and those that turned brown after cold storage. Classification models of berries based
on spectra obtained after cold storage (browning present) had a better total accuracy (94% for
training- and 87% for test datasets), than the classification models based on spectra obtained
before cold storage (79% for training- and 64% for test datasets). The implication of this is that
the current models will be able to classify berries in terms of those which have turned brown
already and those that remained clear better after cold storage than before cold storage, which
is the critical stage where we want to actually know whether the berries will turn brown or not.
The potential, however, to use NIR spectroscopy to detect browning before harvest already on
white seedless grapes is still present, since all these models were built using the whole NIR
spectrum. No variable selection was thus done and all the different browning phenotypes were
also used together. Further analysis of the data will thus be based on using variable selection techniques like particle swarm optimization (PSO) to select certain wavelengths strongly
associated with the browning phenomenon and only on the main types of browning (netlike on
Regal Seedless and internal browning on Thompson Seedless). This study has major
implications for the table grape industry, since it is the first time that the possibility to predict
browning with other methods than visual inspection, especially before cold storage, is shown. / AFRIKAANSE OPSOMMING: Die twee wit pitlose tafeldruif kultivars, Regal Seedless en Thompson Seedless onderskeidelik,
speel 'n baie belangrike rol in die verkryging van buitelandse inkomste, nie net vir die Suid-
Afrikaanse tafeldruif industrie nie, maar ook vir die Suid-Afrikaanse ekonomie as 'n geheel.
Hierdie twee kultivars is egter, soos baie ander wit kultivars, ook geneig tot verbruining. Dit is
veral netagtige verbruining op Regal Seedless en interne verbruining op Thompson Seedless
wat pertinent is. Hierdie belangrike kwaliteitsprobleme lei jaarliks tot groot finansiële verliese,
aangesien daar huidiglik geen gevestigde prosedure is om voor oes, tydens opberging of
tydens verpakking te bepaal of die druiwe uiteindelik gaan verbruin nie. Met ander woorde, daar
is geen gevestigde protokol vir die beoordeling van die verbruinings risiko van 'n bepaalde
groep druiwe voor dit uitgevoer word nie. Talle studies is alreeds onderneem om vas te stel wat
die presiese oorsaak van hierdie verskynsel is en hoe dit bestuur moet word, maar geen enkele
aspek wat bestudeer is kon tot op hede, herhaaldelik ge-assosieer word met die presiese
oorsaak van verbruining nie.
Die oorkoepelende doel van hierdie studie was dus om 'n alternatiewe manier te kry om
hierdie probleem aan te spreek. ‘n Ondersoek na die potensiaal van naby infrarooi (NIR)
spektroskopie as 'n vinnige en nie-vernietigende metings tegniek om die verbruinings potensiaal
van ‘n wit pitlose tafeldruifkorrel wat nog heel is te bepaal, is onderneem. 'n Sekondêre doel
was om die bepaling van optimale rypheid van tafeldruiwe te onderosek. Op hierdie manier is
oesrypheid, en die kwaliteitsfaktore, naamlik totale oplosbare vastestowwe (TOVS), titreerbare
suur (TS), pH, glukose en fruktose, wat ook gekoppel word aan die voorkoms van verbruining,
deur middel van infrarooi (IR) spektroskopie modelle gekwantifiseer. Drie verskillende infrarooi
metodes naamlik (a) die Fourier transform naby infrarooi (FT-NIR), (b) Fourier transform - Mid
Infrarooi (FT-MIR) en (c) Fourier transform - Mid Infrarooi Verswakte Totale Refleksie (FT-MIR
VTR) spektroskopie is gebruik om die aspekte te bepaal. Dis gedoen sodat 'n platform van
tegnologie beskikbaar sou wees vir die tafeldruif industrie. Die druiwe wat in hierdie studie gebruik is, is oor twee jaar (2008 en 2009) en van twee
verskillende kommersiële wingerde in die Hexriviervallei, Wes-Kaap, Suid-Afrika ge-oes.
Verskillende oesladings (die totale aantal trosse op die wingerdstokke per hektaar) is vir Regal
Seedless (75 000, 50 000 en 35 000) en Thompson Seedless (75 000 en 50 000) gelaat. Daar
is drie rye gebruik Regal Seedless en twee vir Thompson Seedless. Elke ry het ses vakkies
gehad wat dan verteenwoordigend was van ‘n herhaling vir elke oeslading. In 2008 is hierdie
kultivars by vroeë rypwording (16°Brix), by optimale rypheid (18°Brix) en by laat rypheid
(20°Brix) geoes. In 2009 is dit twee keer by die optimale rypheidsgraad geoes. Vir die bepaling
van oesrypheid, en die kwaliteitsapekte is verskillende sagteware (OPUS 6.5 op die FT-NIR en
FT-MIR VTR instrumente en Unscrambler weergawe 9.2 vir die FT-MIR instrument) sowel as
verskillende spektrale voor-verwerking tegnieke ëvalueer voor die konstruksie van die kalibrasie
modelle op die verskillende instrumente.
Parsiële kleinste kwadraat (PKK) regressie is gebruik vir die opstel van kalibrasiemodelle
vir die bepaling van laasgenoemde aspekte. Verskillende statistieke gegewens is gebruik om
die kalibrasie modelle te evalueer, naamlik die bepalingskoëffisiënt (R2), die vierkantswortelgemiddelde-
kwadraat fout vir voorspelling (VGKV), relatiewe voorspellingsafwyking (RVA) en
sydigheid. Kalibrasie modelle wat geskik is vir keuring is verkry op die FT-NIR en FT-MIR VTR
instrumente vir TOVS (11.40 – 21.80°Brix) (R2 = 85.92%, VGKV = 0.71°Brix, RVA = 2.67 en
sydigheid = 0.03°Brix), pH (2.94 – 3.9) (R2 = 85.00%, VGKV = 0.08 g/L, RVA = 2.59 en
sydigheid = -0.01 g/L), en TS (4.3 – 13.1 g/L), (R2 = 90.77%, VGKV = 0.48 g/L RVA = 3.30 en
sydigheid = -0.03 g/L). Modelle vir fruktose (46.70-176.82 g/L) (R2 = 74.66%, VGKV = 9.28 g/L
RVA = 2.00 en sydigheid = 1.10 g/L) en glukose (20.36 – 386.67 g/L) (R2 = 70.71%, VGKV = 11.10 g/L RVA = 1.87 en sydigheid = 1.64 g/L) is verkry met die FT-NIR en FT-MIR VTR
instrumente wat in sommige gevalle gepas was vir keuringsdoeleindes en in sommige gevalle
nie geskik was vir kwantifiserings doeleindes nie. Die FT-MIR-instrument het modelle vir al die
aspekte gegee wat nog nie vir kwantifiserings doeleindes of vir keuringsdoeleindes geskik was
nie.
Gekombineerde spektrale reekse is gebruik vir die kalibrasies wat dikwels soortgelyk was
vir sommige aspekte naamlik 12 493 - 5 446.2 vir TOVS en pH, 6 101.9 - 5 446,2 vir TOVS, TS
en fruktose en 4 601.5 - 4 246.7 vir pH en fruktose op die FT-NIR instrument, 2 993.2 - 2 322.3
vir pH, TA en glukose en 1 654.3 – 649.4 vir pH en glukose op die FT-MIR VTR instrument.
Andersyds, was dit aangrensend (3 996.6 - 3 661.2, 3 663.5 - 3 327.7 en 3 327.2 - 2 322.3) vir
TOVS en glukose, 1 988.3 - 1 652.8, 1 654.3 – 649.4 vir TOVS, pH en TS en ander tye was dit
weer oorvleuelend 1 654.3 – 649.4 en 1 318.8 – 649.4 vir pH, TS en fruktose op die FT-MIR
VTR instrument. Dit is 'n baie goeie teken vir die oordrag van hierdie tegnologie na ‘n
handgedraagde instrument, waar aanliggende en/of oorvleuelende golfnommers noodsaaklik is.
Instrumente wat verskillende aspekte oor groot spektrale reekse moet bepaal is nie net
onprakties, omdat die instrument groot moet wees nie, maar dit is ook baie duur.
Nog 'n voordeel van die implementering van veral FT-NIR spektroskopie as 'n vinnige,
akkurate en goedkoop tegniek vir die bepaling van oesrypheid, en die kwaliteit aspekte van
druiwe is omdat daar geen monster voorbereiding nodig is nie en baie min afval (paar enkele
korrels word gemonster) geproduseer word. 'n Voorvereiste wat sterk aanbeveel kom in die
groen era waarin ons tans leef en nog vir eeue van nou af gaan doen. ‘n Platform van
tegnologie is nou beskikbaar gestel deur middel van hierdie studie vir die bepaling van die
onderskeie aspekte in toekomstige tafeldruif monsters deur net op een van die instrumente
hulle spektra te neem. Inderdaad iets wat nie voorheen moontlik of beskikbaar was vir die Suid-
Afrikaanse tafeldruif industrie nie. Korrels vir die verbruiningseksperimente is geskandeer direk na oes (voor koelopberging)
en weer na koelopberging. Dit was voor koelopberging op elke kant van die korrel skandeer en
na koelopberging was dit twee maal skandeer op 'n bruin vlek indien verbruining teenwoordig
was en twee keer op 'n helder plek, ongeag of verbruining teenwoordig was of nie. Inspeksie
van die korrels vir die voorkoms van verbruining na koelopberging het aan die lig gebring dat
Regal Seedless 'n hoër voorkoms van verbruining (68% in 2008 en 66% in 2009) as Thompson
Seedless (21% in 2008 en 25% in 2009) gehad het. Regal Seedless was ook meer geneig om
eksterne verbruining, spesifiek netagtige verbruining te vertoon, terwyl Thompson Seedless
meer geneig was om interne verbruining te vertoon, ten spyte van die verskillende fenotipes van
verbruining wat teenwoordig was op beide kultivars.
Hoofkomponente analise (HKA) is op die spektra gedoen voor en na koelopberging en
naby infrarooi spektroskopie het aan die lig gebring dat die veranderinge wat verband hou met
koelopberging met die eerste hoofkomponent (HK) verduidelik kan word met byna 100% van
die variasie in die spektra wat daarin vasgevang is. Klassifikasiemodelle is ook deur die gebruik
van HKA gebou en was gebaseer op die spektra van korrels wat vekry is voor en na
koelopberging asook die wat verkry is nadat korrels verbruin het na koelopberging.
Klassifikasiemodelle van korrels wat gebaseer was op spektra na koelopberging (verbruining
teenwoordig) het 'n beter algehele akkuraatheid (94% vir opleidingsdata en 87% vir toetsdata),
getoon as die klassifikasiemodelle wat gebaseer was op spektra van korrels voor koelopberging
(79% vir opleidings data en 64% vir toetsdata). Die implikasie hiervan is dat die huidige modelle
in staat sal wees om korrels beter te klassifiseer in terme van diegene wat alreeds verbruin het
en die wat nie verbruin het na koelopberging as daardie voor koelopberging, wat juis die kritieke
stadium is waar ons wil weet of die korrels wel gaan verbruin of nie. Daar is wel potensiaal wat verder ontgin kan word, aangesien al hierdie modelle gebou is deur gebruik te maak van die
hele NIR spektrum. Geen veranderlike seleksie is dus gedoen nie en al die verskillende
verbruiningsfenotipes is ook saam gebruik in die opstel van die modelle. Verdere analise van
die data sal dus gebaseer word op die gebruik van veranderlike seleksie tegnieke soos deeltjie
swerm optimisasie (DSO) wat sekere golflengtes kies wat sterk verband hou met die verbruining
verskynsel en slegs die belangrikste tipes van verbruining (netagtig op Regal Seedless en
interne verbruining op Thompson Seedless) sal gebruik word. Hierdie studie het 'n baie
belangrike implikasie vir die tafeldruifbedryf, want dit is die eerste keer dat die moontlikheid om
verbruining te voorspel met ander metodes as visuele inspeksie, veral voor koelopberging,
getoon word. / The Postharvest and Innovation Programme, for financing this study
|
6 |
導入雲端運算概念於資料採礦之分類系統 / Implement the concept of the cloud computing into the classification system of data mining林盈方, Lin, Ying Fang Unknown Date (has links)
近幾年來資料採礦及雲端運算的興起,導致許多公司企業紛紛推出有關雲端運算的服務,或利用資料採礦的技術以助於了解客戶行為。而資料採礦的技術不僅是企業所獨享的一個工具,一般非企業的使用者也常常會面臨到決策問題,為了讓一般使用者能夠方便取得軟體工具以及節省時間成本,本研究以雲端運算為概念,利用RExcel軟體和Excel VBA程式語言為研究工具,發展出一個資料採礦分類雲端運算系統。
本研究將欲分類的目標變數分為三種型態:數字連續型、數字類別型以及文字類別型,此分類系統會依照目標變數型態的不同,而採取不同的分類模型來分析使用者之資料,並分別以三個資料檔為例,上傳至此資料採礦之分類系統進行分析後,其分析結果報表將以網頁預覽的方式呈現給使用者,使用者可以針對連續型目標變數的資料分析結果,利用MAPE值評估分類模型之優劣,而類別型目標變數的資料分析結果,則可以正確率來評估分類模型之優劣。
使用者可透過簡易步驟來操作此系統,並選擇可解釋資料之最佳模型,也可從結果報表中獲取資料之特性,更進一步地可以進行所需的決策。
關鍵字:雲端運算、資料採礦、分類模型 / In resent years, the rise of data mining and cloud computing has led many enterprises have been offering services related to cloud computing, or using data mining techniques to understand customer behaviors. Data mining is a tool not only for enterprises, but also for general non-business users who often face making decisions. In order to enable general users to easily assess the software and save time and costs, this study proposes a classification system of data mining constructed by RExcel and Excel VBA, which is based on cloud computing.
In this study, the target variable is divided into three types: digital continuous, digital categorical and literal categorical. The classification system is in accordance with the different types of target variables, taking different classification models to analyze user’s data. Taking three data as examples, respectively, uploading them to the system, then the analysis results will be present to the user in the way of page preview. The user can use MAPE values to evaluate classification models with regard to the results of the data for the continuous target variable, and use correct rate to evaluate classification models with regard to the results of the data for the categorical target variable.
Users can take simple steps to operate the system, select the best model which can explain the data, and obtain the characteristics of the data from the result reports, further to the necessary decision-making.
Keyword: cloud computing, data mining, classification models
|
7 |
Gamybos išlaidų klasifikavimas / Cost classificationMajauskaitė-Jucevičienė, Vilma 18 May 2006 (has links)
Cost classification and models creation is very important for controlling in every company.
|
8 |
Risco de insustentabilidade financeira dos beneficiÃrios de uma operadora de planos de saÃde: uma comparaÃÃo de modelos de classificaÃÃo / Financial unsustainability risk for recipients of managed care plans: a classification model comparisonDaniele Adelaide BrandÃo de Oliveira 20 August 2014 (has links)
nÃo hà / Este trabalho teve por objetivo realizar um estudo analÃtico relativo à sustentabilidade financeira dos beneficiÃrios da carteira de uma operadora de planos de saÃde. A amostra investigada no estudo à de uma operadora de plano de saÃde vinculada ao Banco do Nordeste do Brasil S.A. (BNB) e à composta por 38.875 usuÃrios, ativos, entre os anos de 2011 e 2013. Especificamente, buscou-se nesse trabalho aplicar tÃcnicas de classificaÃÃo de insustentabilidade financeira de beneficiÃrios de uma operadora de planos de saÃde, identificando o modelo de melhor ajustamento e os principais determinantes de insustentabilidade. As tÃcnicas estatÃsticas de classificaÃÃo supervisionada empregadas foram a regressÃo logÃstica, as Ãrvores de classificaÃÃo e o classificador de vizinhos mais prÃximos. AlÃm disso, foi empregada a curva ROC para comparar os desempenhos das tÃcnicas utilizadas, sendo a Ãrea abaixo da curva (AUC), a principal medida observada. Os resultados obtidos mostraram que a maior parte da amostra à composta por beneficiÃrios sustentÃveis. O modelo de regressÃo logÃstica obteve precisÃo de 68,43% com AUC de 0,7501, as Ãrvores obtiveram 67,76% e AUC de 0,6855, enquanto o classificador dos vizinhos mais prÃximos teve uma precisÃo de 67,22% e AUC de 0,7258. As variÃveis apontadas como mais importantes pelos dois primeiros modelos, considerando uma anÃlise conjunta, sÃo a Idade e o Tipo de Plano, dentre aquelas que definem o perfil do usuÃrio e a Receita, Consulta e Odontologia, daquelas que definem o histÃrico de utilizaÃÃo do usuÃrio / This study aimed to carry out an analytical study on the financial sustainability of the beneficiaries of the portfolio of managed care plans. The sample investigated in the study is a health plan operator linked to the Banco do Nordeste do Brazil SA (BNB) and consists of 38,875 members, assets, between the years 2011 and 2013. Specifically, we sought to apply techniques that work financial unsustainability classification of beneficiaries of a managed care plans, identifying the model best fit and the main determinants of unsustainability. The technical classification statistics were supervised employed logistic regression, classification trees and the classifier closest neighbors. Furthermore, the ROC curve was used to compare the performance of the techniques used, and the area under the curve (AUC), the main extent observed. The results showed that most of the sample is composed of organic recipients. The logistic regression model obtained precision of 68.43% with AUC of 0.7501, the trees obtained 67.76% and AUC of 0.6855, while the classifier of the closest neighbors had an accuracy of 67.22% and AUC of 0.7258. The variables identified as most important by the first two models, considering a joint analysis, are the Middle and the Plan type, among those that define the user profile and the Revenue Consultation and Dentistry, those that define the user use history
|
9 |
Detecting Fraud in Affiliate Marketing: Comparative Analysis of Supervised Machine Learning AlgorithmsAhlqvist, Oskar January 2023 (has links)
Affiliate marketing has become a rapidly growing part of the digital marketing sector. However, fraud in affiliate marketing raises a serious threat to the trust and financial stability of the involved parties. This thesis investigates the performance of three supervised machine learning algorithms - random forest, logistic regression, and support vector machine in detecting fraud in affiliate marketing. The objective is to answer the following main research question by answering two sub-questions: How much can Random Forest, Logistic Regression, and Support Vector Machine contribute to the detection of fraud in affiliate marketing? 1. How can the models be compared in an experiment? 2. How can they be optimized and applied within an affiliate marketing framework? To answer these questions, a dataset of transaction logs is analyzed in collaboration with an affiliate network company. The machine learning experiment employs k-fold crossvalidation and the Area Under the ROC Curve (AUC-ROC) performance metric to evaluate the effectiveness of the classifiers in distinguishing fraudulent from non-fraudulent transactions. The results indicate that the random forest classifier performs best out of the models, achieving the highest mean AUC of 0.7172. Furthermore, using feature importance analysis demonstrates that each feature category had different impact on the performance of the models. It was discovered that the models computes different feature importance meaning that some features displayed greater influence on specific models. By fine-tuning and optimizing the hyperparameters for each model, it is possible to enhance their performance. Despite certain limitations, such as time constraints, data availability, and security restrictions, this study highlights the potential of supervised machine learning algorithms. Particularly random forest showed to how it could be used to improve fraud detection capabilities in affiliate marketing.The insights contribute to closing the knowledge gap in comparing the effectiveness of various classification methods and practical applications for fraud detection.
|
10 |
[pt] SEGMENTAÇÃO E O MODELO RFM NO VAREJO BRASILEIRO: UMA ANÁLISE COM BASES DE DADOS TRANSACIONAIS DO VAREJO DE VESTUÁRIO / [en] THE RFM MODEL: THE IMPACT OF DATA SCIENCE ON MODEL APPLICABILITY DEVELOPMENT, STRATEGIES AND APPLICATIONS IN THE BRAZILIAN RETAIL MARKETANA CLARA ARAGAO FERNANDES 21 November 2022 (has links)
[pt] A pandemia de Covid-19 alterou o comportamento do consumidor no varejo
a nível mundial. Este trabalho apresenta uma análise longitudinal do
comportamento do consumidor ao longo entre 2018 e 2021, possibilitando, dessa
forma, a comparação entre o comportamento do consumidor pré e pós pandemia de
covid-19 em uma loja do varejo brasileiro. Para realizar essa análise, o modelo RFM
é aplicado a partir de métodos de inteligência artificial para a análise de grandes
volumes de dados transacionais com o objetivo de classificar os clientes de acordo
com os seus comportamentos de consumo. Para o caso apresentado foram
identificados 5 segmentos de consumo distintos e de grande utilidade para a gestão
de CRM da empresa. / [en] The Covid-19 pandemic has changed consumer behavior in retail worldwide.
This work presents a longitudinal analysis of consumer behavior between 2018 and
2021, thus making it possible to compare consumer behavior before and after the
covid-19 pandemic in a Brazilian retail store. To perform this analysis, the RFM
model is applied using artificial intelligence methods to analyze large volumes of
transactional data in order to classify customers according to their consumption
behaviors. For the case presented, 5 distinct and very useful consumer segments
were identified for the company s CRM management.
|
Page generated in 0.074 seconds