Global ETD Search

81	Fine-Grained Specification and Control of Data Flows in Web-based User Interfaces Book, Matthias, Gruhn, Volker, Richter, Jan 04 December 2018 (has links) When building process-intensive web applications, developers typically spend considerable effort on the exchange of specific data entities between specific web pages and operations under specific condi- tions, as called for by business requirements. Since the WWW infrastructure provides only very coarse data exchange mechanisms, we introduce a notation for the design of fine-grained conditional data flows between user interface components. These specifications can be interpreted by a data flow controller that automatically provides the data entities to the specified receivers at run-time, relieving developers of the need to implement user interface data flows manually. data flow
82	An exploration of media repertoires in South Africa: 2002-2014 Bakker, Hans-Peter 11 March 2020 (has links) This dissertation explores trends in media engagement in South Africa over a period from 2002 until 2014. It utilises data from the South African Audience Research Foundation’s All Media and Products Surveys. Using factor analysis, six media repertoires are identified and, utilising structural equation modelling, marginal means for various demographic categories by year are estimated. Measurement error is determined with the aid of bootstrapping. These estimates are plotted to provide visual aids in interpreting model parameters. The findings show general declines in engagement with traditional media and growth in internet engagement, but these trends can vary markedly for different demographic groups. The findings also show that for many South Africans traditional media such as television remain dominant. Data Science
83	Unsupervised Machine Learning Application for the Identification of Kimberlite Ore Facie using Convolutional Neural Networks and Deep Embedded Clustering Langton, Sean 25 February 2022 (has links) Mining is a key economic contributor to many regions globally - especially those in developing nations. The design and operation of the processing plants associated with each of these mines is highly dependant on the composition of the feed material. The aim of this research is to demonstrate the viability of implementing a computer vision solution to provide online information of the composition of material entering the plant, thus allowing the plant operators to adjust equipment settings and process parameters accordingly. Data is collected in the form of high resolution images captured every couple of seconds of material on the main feed conveyor belt into the Kao Diamond Mine processing plant. The modelling phase of the research is implemented in two stages. The first stage involves the implementation of a Mask Region-based Convolutional Neural Network (Mask R-CNN) model with a ResNet 101 CNN backbone for instance segmentation of individual rocks from each image. These individual rock images are extracted and used for the second phase of the modelling pipeline - utilizing an unsupervised clustering method known as Convolutional Deep Embedded Clustering with Data Augmentation (ConvDEC-DA). The clustering phase of this research provides a method to group feed material rocks into their respective types or facie using features developed from the auto-encoder portion of the ConvDEC-DA modelling. While this research focuses on the clustering of Kimberlite rocks according to their respective facie, similar implementations are possible for a wide range of mining and rock types. Data Science
84	A Machine Learning Approach to Predicting the Employability of a Graduate Modibane, Masego 12 February 2020 (has links) For many credit-offering institutions, such as banks and retailers, credit scores play an important role in the decision-making process of credit applications. It becomes difficult to source the traditional information required to calculate these scores for applicants that do not have a credit history, such as recently graduated students. Thus, alternative credit scoring models are sought after to generate a score for these applicants. The aim for the dissertation is to build a machine learning classification model that can predict a students likelihood to become employed, based on their student data (for example, their GPA, degree/s held etc). The resulting model should be a feature that these institutions should use in their decision to approve a credit application from a recently graduated student. Data Science
85	Collaborative Genre Tagging Leslie, James 19 November 2020 (has links) Recommender systems (RS) are used extensively in online retail and on media streaming platforms to help users filter the plethora of options at their disposal. Their goal is to provide users with suggestions of products or artworks that they might like. Content-based RS's make use of user and/or item metadata to predict user preferences, while collaborative-filtering (CF) has proven to be an effective approach in tasks such as predicting movie or music preferences of users in the absence of any metadata. Latent factor models have been used to achieve state-of-the-art accuracy in many CF settings, playing an especially large role in beating the benchmark set in the Netflix Prize in 2008. These models learn latent features for users and items to predict the preferences of users. The first latent factor models made use of matrix factorisation to learn latent factors, but more recent approaches have made use of neural architectures with embedding layers. This master's dissertation outlines collaborative genre tagging (CGT), a transfer learning application of CF that makes use of latent factors to predict genres of movies, using only explicit user ratings as model inputs. Data Science
86	Computer display of the electrophysiological topography of the human diencephalon during stereotaxic surgery. Hardy, Tyrone L. January 1975 (has links) No description available. Electrophysiologucal data
87	Forecasting and modelling the VIX using Neural Networks Netshivhambe, Nomonde 12 April 2023 (has links) (PDF) This study investigates the volatility forecasting ability of neural network models. In particular, we focus on the performance of Multi-layer Perceptron (MLP) and the Long Short Term (LSTM) Neural Networks in predicting the CBOE Volatility Index (VIX). The inputs into these models includes the VIX, GARCH(1,1) fitted values and various financial and macroeconomic explanatory variables, such as the S&P 500 returns and oil price. In addition, this study segments data into two sub-periods, namely a Calm and Crisis Period in the financial market. The segmentation of the periods caters for the changes in the predictive power of the aforementioned models, given the dierent market conditions. When forecasting the VIX, we show that the best performing model is found in the Calm Period. In addition, we show that the MLP has more predictive power than the LSTM. Data Science
88	Log mining to develop a diagnostic and prognostic framework for the MeerLICHT telescope Roelf, Timothy Brian 20 April 2023 (has links) (PDF) In this work we present the approach taken to address the problems anomalous fault detection and system delays experienced by the MeerLICHT telescope. We make use of the abundantly available console logs, that record all aspects of the telescope's function, to obtain information. The MeerLICHT operational team must devote time to manually inspecting the logs during system downtime to discover faults. This task is laborious, time inefficient given the large size of the logs, and does not suit the time-sensitive nature of many of the surveys the telescope partakes in. We used the novel approach of the Hidden Markov model, to address the problems of fault detection and system delays experienced by the MeerLICHT. We were able to train the model in three separate ways, showing some success at fault detection and none at the addressing the system delays. Data Science
89	Linear regression techniques for identifying influential data and applications in commercial data analysis Jacobs, Michael Kalman 27 September 2023 (has links) (PDF) Recent literature contains many publications on techniques for identifying extreme data points (outliers) and influential observations or groups in sample data sets. This thesis begins by reviewing the statistics and distributional properties of the standard techniques, viz. the standardized residual as a test for outliers, and Cook's distance as a measure of influence. An outlier test which is distributionally neater than the standardized residual is proposed. In practical applications, ordinary least squares regression is often inappropriate, and the use of biased estimators may be preferable. In this thesis, the existing theory is extended to several alternative regression techniques. Ridge regression and generalized inverse regression are suitable techniques when the cross-product matrix is ill-conditioned. Restricted least squares regression, with exact or stochastic prior information, · is used in many econometric application~. Models with selected · variables-are used to eliminate design faults or to reduce computational effort. New statistics are developed for all these techniques, the distributional results are proved, and computational formulae are developed. Computational problems may arise in the actual use of the various techniques, and these are investigated. Computer programs written in BASIC and suitable for microcomputer use are presented, making the techniques accessible to virtually any commercial environment. The performance of the various techniques is examined, using a controlled simulation study and a number of practical data sets drawn from several areas of South African commerce. This is, as far as can be ascertained, the first extensive practical South African study on the effects of influential data. It is shown that the presence of outliers or influential data can bias the results of any study significantly. It is recommended that no data analysis should be attempted without a preliminary scan of outliers and influential observation. The techniques presented can be used advantageously even in data sets where the ultimate analysis does not involve linear regression. It is shown that influential data are not merely of nuisance value in the analysis but may contain valuable - information in their own right._ Influential data
90	Natural Language Processing on Data Warehouses Maree, Stiaan 27 October 2022 (has links) (PDF) The main problem addressed in this research was to use natural language to query data in a data warehouse. To this effect, two natural language processing models were developed and compared on a classic star-schema sales data warehouse with sales facts and date, location and item dimensions. Utterances are queries that people make with natural language, for example, What is the sales value for mountain bikes in Georgia for 1 July 2005?" The first model, the heuristics model, implemented an algorithm that steps through the sequence of utterance words and matches the longest number of consecutive words at the highest grain of the hierarchy. In contrast, the embedding model implemented the word2vec algorithm to create different kinds of vectors from the data warehouse. These vectors are aggregated and then the cosine similarity between vectors was used to identify concepts in the utterances that can be converted to a programming language. To understand question style, a survey was set up which then helped shape random utterances created to use for the evaluation of both methods. The first key insight and main premise for the embedding model to work is a three-step process of creating three types of vectors. The first step is to train vectors (word vectors) for each individual word in the data warehouse; this is called word embeddings. For instance, the word `bike' will have a vector. The next step is when the word vectors are averaged for each unique column value (column vectors) in the data warehouse, thus leaving an entry like `mountain bike' with one vector which is the average of the vectors for `mountain' and `bike'. Lastly, the utterance by the user is averaged (utterance vectors) by using the word vectors created in step one, and then, by using cosine similarity, the utterance vector is matched to the closest column vectors in order to identify data warehouse concepts in the utterance. The second key insight was to train word vectors firstly for location, then separately for item - in other words, per dimension (one set for location, and one set for item). Removing stop words was the third key insight, and the last key insight was to use Global Vectors to instantiate the training of the word vectors. The results of the evaluation of the models indicated that the embedding model was ten times faster than the heuristics model. In terms of accuracy, the embedding algorithm (95.6% accurate) also outperformed the heuristics model (70.1% accurate). The practical application of the research is that these models can be used as a component in a chatbot on data warehouses. Combined with a Structured Query Language query generation component, and building Application Programming Interfaces on top of it, this facilitates the quick and easy distribution of data; no knowledge of a programming language such as Structured Query Language is needed to query the data. Data Science

Search results