Spelling suggestions: "subject:"data mining -- south africa."" "subject:"data mining -- south affrica.""
1 |
Educational data mining (EDM) in a South African University: a longitudinal study of factors that affect the academic performance of computer science I studentsMashiloane, Lebogang 22 January 2016 (has links)
Degree of Master of Science by research only:
A Dissertation submitted to the Faculty of Science, University of
the Witwatersrand, Johannesburg, in fulfilment of the
requirements for the degree of Master of Science.
Signed on September 10, 2015 in Johannesburg / The past few years have seen an increase in the number of first year students registering in the School
of Computer Science at Wits University. These students come from different backgrounds both academically
and socially. As do many other institutions, Wits University collects and stores vast amounts of
data about the students they enrol and teach. However this data is not always used after being stored. The
area of Educational Data Mining (EDM) focuses on using this stored data to find trends and patterns that
could enhance the knowledge about the student’s behavior, their academic performance and the learning
environment.
This longitudinal study focuses on the application of EDM techniques to obtain a better understanding
of some of the factors that influence the academic performance of first year computer science students
at the University of the Witwatersrand. Knowledge obtained using these techniques could assist in increasing
the number of students who complete their studies successfully and identifying students who
are at risk of failing and ensuring that early intervention processes can be put into place. A modified
version of the CRISP-DM (CRoss-Industry Standard Process for Data Mining) was used, with three data
mining techniques, namely: Classification, Clustering and Association Rule Mining. Three algorithms
were compared in the first two techniques while only one algorithm was used in the Association Rule
Mining. For the classification technique, the three algorithms that were compared were the J48 Classifier,
Decision Table and Na¨ıve Bayes algorithm. The clustering algorithms used included the Simple
K-means, Expectation Maximization (EM) and the Farthest First algorithm. Finally, the Predictive Apriori
algorithm was selected as the Association Rule Mining technique.
Historical Computer Science I data, from 2006 to 2011, was used as the training data. This set of data
was used to find relationships within the data that could assist with predictive modeling. For each of the
selected techniques a model was created using the training data set. These models were incorporated in
a tool, the Success or Failure Determiner (SOFD), that was created specifically as part of this research.
Thereafter, the test data set was put through the SOFD tool in the testing phase. Test data sets usually
contain a variable whose value is predicted using the models built during the training phase. The 2012
Computer Science I data instances were used during the testing phase. The investigations brought forth
both expected and interesting results. A good relationship was found between academic performance in
Computer Science and three of the factors investigated: Mathematics I, mid-year mark and the module
perceived to be the most difficult in the course. The relationship between Mathematics and Computer
Science was expected, However, the other two factors (mid-year mark and most difficult module) are
new, and may need to be further investigated in other courses or in future studies. An interesting finding
from the Mathematics investigation was the better relationship between Computer Science and Algebra
rather than Calculus. Using these three factors to predict Computer Science performance could assist
in improving throughput and retention rates by identifying students at risk of failing, before they write
their final examinations. The Association Rule Mining technique assisted in identifying the selection of
courses that could yield the best academic performance overall, in first year. This finding is important,
since the information obtained could be used during the registration process to assist students in making
the correct decisions when selecting the courses they would like to do. The overall results show that using
data mining techniques and historical data collected atWits University about first year Computer Science
(CS-1) students can assist in obtaining meaningful information and knowledge, from which a better unii
derstanding of present and future generations of CS-1 students can be derived, and solutions found to
some of the academic problems and challenges facing them. Additionally this can assist in obtaining a
better understanding of the students and factors that influence their academic performance. This study
can be extended to include more courses withinWits University and other higher educational institutions.
Keywords. Educational Data Mining, CRISP-DM, Classification, Clustering, Association Rule Mining,
J48 Classifier, Decision Table, Na¨ıve Bayes, Simple K-means, Expectation Maximization, Farthest
First, Predictive Apriori
|
2 |
The mining and visualisation of application services dataKnoetze, Ronald Morgan January 2005 (has links)
Many network monitoring tools do not provide sufficiently in-depth and useful reports on network usage, particularly in the domain of application services data. The optimisation of network performance is only possible if the networks are monitored effectively. Techniques that identify patterns of network usage can assist in the successful monitoring of network performance. The main goal of this research was to propose a model to mine and visualise application services data in order to support effective network management. To demonstrate the effectiveness of the model, a prototype, called NetPatterns, was developed using data for the Integrated Tertiary Software (ITS) application service collected by a network monitoring tool on the NMMU South Campus network. Three data mining algorithms for application services data were identified for the proposed model. The data mining algorithms used are classification (decision tree), clustering (K-Means) and association (correlation). Classifying application services data serves to categorise combinations of network attributes to highlight areas of poor network performance. The clustering of network attributes serves to indicate sparse and dense regions within the application services data. Association indicates the existence of any interesting relationships between different network attributes. Three visualisation techniques were selected to visualise the results of the data mining algorithms. The visualisation techniques selected were the organisation chart, bubble chart and scatterplots. Colour and a variety of other visual cues are used to complement the selected visualisation techniques. The effectiveness and usefulness of NetPatterns was determined by means of user testing. The results of the evaluation clearly show that the participants were highly satisfied with the visualisation of network usage presented by NetPatterns. All participants successfully completed the prescribed tasks and indicated that NetPatterns is a useful tool for the analysis of network usage patterns.
|
Page generated in 0.1161 seconds