1 |
Visualization of training data reportedby football playersGeorgsson, Adam, Christensson, Olof January 2018 (has links)
Background. Data from training sessions is gathered by a trainer from the playerswith the goal of analyzing and getting an overview of how the team is performing.The collected data is represented in tabular form, and over time the effort to inter-pret it becomes more demanding. Objectives. This thesis’ goal is to find out if there is a solution where collecting,processing and representing training data from football players can ease and improvethe trainer’s analysis of the team. Methods. A dataset is received from a football trainer, and it contains informa-tion about training sessions for his team of football players. The dataset is used tofind a suitable method and visualize the data. Feedback from the trainer is used todetermine what works and what does not. Furthermore, a survey with examples ofvisualization is given to the players and the trainer to get an understanding of howthe selected charts are interpreted. Results. Representing the attributes of most importance from received datasetrequires a chain of views (usage flow) to be introduced, from primary view to qua-ternary view. Each step in the chain tightens the level of details represented. Boxplot proved to be an appropriate choice to provide an overview of the team’s trainingdata. Conclusions. Visualizing training data gives a significant advantage to the trainerregarding team analysis. With box plotting will the trainer get an overview of theteam and can hereafter dig into more detailed data while interacting with the charts
|
2 |
Ensemble Learning With Imbalanced DataShoemaker, Larry 20 September 2010 (has links)
We describe an ensemble approach to learning salient spatial regions from arbitrarily
partitioned simulation data. Ensemble approaches for anomaly detection
are also explored. The partitioning comes from the distributed processing requirements
of large-scale simulations. The volume of the data is such that classifiers
can train only on data local to a given partition. Since the data partition reflects
the needs of the simulation, the class statistics can vary from partition to partition.
Some classes will likely be missing from some or even most partitions. We combine
a fast ensemble learning algorithm with scaled probabilistic majority voting in
order to learn an accurate classifier from such data. Since some simulations are
difficult to model without a considerable number of false positive errors, and since
we are essentially building a search engine for simulation data, we order predicted
regions to increase the likelihood that most of the top-ranked predictions are correct
(salient). Results from simulation runs of a canister being torn and from a casing
being dropped show that regions of interest are successfully identified in spite of
the class imbalance in the individual training sets. Lift curve analysis shows that the
use of data driven ordering methods provides a statistically significant improvement
over the use of the default, natural time step ordering. Significant time is saved for
the end user by allowing an improved focus on areas of interest without the need to
conventionally search all of the data. We have also found that using random forests
weighted and distance-based outlier ensemble methods for supervised learning of
anomaly detection provide significant accuracy improvements when compared to
existing methods on the same dataset. Further, distance-based outlier and local
outlier factor ensemble methods for unsupervised learning of anomaly detection
also compare favorably to existing methods.
|
3 |
Computer-assisted instruction: A new approach to teaching safety in vocational education classroomsO'Neal, C. Don 01 January 1984 (has links)
No description available.
|
4 |
Semi-supervised Ensemble Learning Methods for Enhanced Prognostics and Health ManagementShi, Zhe 15 May 2018 (has links)
No description available.
|
5 |
Increasing the Precision of Forest Area Estimates through Improved Sampling for Nearest Neighbor Satellite Image ClassificationBlinn, Christine Elizabeth 25 August 2005 (has links)
The impacts of training data sample size and sampling method on the accuracy of forest/nonforest classifications of three mosaicked Landsat ETM+ images with the nearest neighbor decision rule were explored. Large training data pools of single pixels were used in simulations to create samples with three sampling methods (random, stratified random, and systematic) and eight sample sizes (25, 50, 75, 100, 200, 300, 400, and 500). Two forest area estimation techniques were used to estimate the proportion of forest in each image and to calculate forest area precision estimates. Training data editing was explored to remove problem pixels from the training data pools. All possible band combinations of the six non-thermal ETM+ bands were evaluated for every sample draw. Comparisons were made between classification accuracies to determine if all six bands were needed. The utility of separability indices, minimum and average Euclidian distances, and cross-validation accuracies for the selection of band combinations, prediction of classification accuracies, and assessment of sample quality were determined.
Larger training data sample sizes produced classifications with higher average accuracies and lower variability. All three sampling methods had similar performance. Training data editing improved the average classification accuracies by a minimum of 5.45%, 5.31%, and 3.47%, respectively, for the three images. Band combinations with fewer than all six bands almost always produced the maximum classification accuracy for a single sample draw. The number of bands and combination of bands, which maximized classification accuracy, was dependent on the characteristics of the individual training data sample draw, the image, sample size, and, to a lesser extent, the sampling method. All three band selection measures were unable to select band combinations that produced higher accuracies on average than all six bands. Cross-validation accuracies with sample size 500 had high correlations with classification accuracies, and provided an indication of sample quality.
Collection of a high quality training data sample is key to the performance of the nearest neighbor classifier. Larger samples are necessary to guarantee classifier performance and the utility of cross-validation accuracies. Further research is needed to identify the characteristics of "good" training data samples. / Ph. D.
|
6 |
Ultrasonic acoustic health monitoring of ball bearings using neural network pattern classification of power spectral densityKirchner, William Thomas 12 January 2010 (has links)
This thesis presents a generic passive non-contact based acoustic health monitoring approach using ultrasonic acoustic emissions (UAE) to facilitate classification of bearing health via neural networks. This generic approach is applied to classifying the operating condition of conventional ball bearings. The acoustic emission signals used in this study are in the ultrasonic range (20-120 kHz), which is significantly higher than the majority of the research in this area thus far. A direct benefit of working in this frequency range is the inherent directionality of the microphones capable of measurement in this range, which becomes particularly useful when operating in environments with low signal-to-noise ratios. Using the UAE power spectrum signature, it is possible to pose the health monitoring problem as a multi-class classification problem, and make use of a multi-layer artificial neural network (ANN) to classify the UAE signature. One major problem limiting the usefulness of ANN's for failure classification is the need for large quantities of training data. Artificial training data, based on statistical properties of a significantly smaller experimental data set is created using the combination of a normal distribution and a coordinate transformation. The artificial training data provides a sufficient sized data set to train the neural network, as well as overcome the curse of dimensionality. The combination of the artificial training methods and ultrasonic frequency range being used results in an approach generic enough to suggest that this particular method is applicable to a variety of systems and components where persistent UAE exist. / Master of Science
|
7 |
Dataselektering en –manipulering vir statistiese Engels–Afrikaanse masjienvertaling / McKellar C.A.McKellar, Cindy. January 2011 (has links)
Die sukses van enige masjienvertaalsisteem hang grootliks van die hoeveelheid en kwaliteit van die beskikbare afrigtingsdata af. n Sisteem wat met foutiewe of lae–kwaliteit data afgerig is, sal uiteraard swakker afvoer lewer as n sisteem wat met korrekte of hoë–kwaliteit data afgerig is. In die geval van hulpbronarm tale waar daar min data beskikbaar is en data dalk noodgedwonge vertaal moet word vir die skep van parallelle korpora wat as afrigtingsdata kan dien, is dit dus baie belangrik dat die data wat vir vertaling gekies word, so gekies word dat dit teksgedeeltes insluit wat die meeste waarde tot die masjienvertaalsisteem sal bydra. Dit is ook in so n geval uiters belangrik om die beskikbare data so goed moontlik aan te wend.
Hierdie studie stel ondersoek in na metodes om afrigtingsdata te selekteer met die doel om n optimale masjienvertaalsisteem met beperkte hulpbronne af te rig. Daar word ook aandag gegee aan die moontlikheid om die gewigte van sekere gedeeltes van die afrigtingsdata te verhoog om sodoende die data wat die meeste waarde tot die masjienvertaalsisteem bydra te beklemtoon. Alhoewel hierdie studie spesifiek gerig is op metodes vir dataselektering en –manipulering vir die taalpaar Engels–Afrikaans, sou die metodes ook vir toepassing op ander taalpare gebruik kon word.
Die evaluasieproses dui aan dat beide die dataselekteringsmetodes, asook die aanpassing van datagewigte, n positiewe impak op die kwaliteit van die resulterende masjienvertaalsisteem het. Die uiteindelike sisteem, afgerig deur n kombinasie van verskillende metodes, toon n 2.0001 styging in die NIST–telling en n 0.2039 styging in die BLEU–telling. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2011.
|
8 |
Dataselektering en –manipulering vir statistiese Engels–Afrikaanse masjienvertaling / McKellar C.A.McKellar, Cindy. January 2011 (has links)
Die sukses van enige masjienvertaalsisteem hang grootliks van die hoeveelheid en kwaliteit van die beskikbare afrigtingsdata af. n Sisteem wat met foutiewe of lae–kwaliteit data afgerig is, sal uiteraard swakker afvoer lewer as n sisteem wat met korrekte of hoë–kwaliteit data afgerig is. In die geval van hulpbronarm tale waar daar min data beskikbaar is en data dalk noodgedwonge vertaal moet word vir die skep van parallelle korpora wat as afrigtingsdata kan dien, is dit dus baie belangrik dat die data wat vir vertaling gekies word, so gekies word dat dit teksgedeeltes insluit wat die meeste waarde tot die masjienvertaalsisteem sal bydra. Dit is ook in so n geval uiters belangrik om die beskikbare data so goed moontlik aan te wend.
Hierdie studie stel ondersoek in na metodes om afrigtingsdata te selekteer met die doel om n optimale masjienvertaalsisteem met beperkte hulpbronne af te rig. Daar word ook aandag gegee aan die moontlikheid om die gewigte van sekere gedeeltes van die afrigtingsdata te verhoog om sodoende die data wat die meeste waarde tot die masjienvertaalsisteem bydra te beklemtoon. Alhoewel hierdie studie spesifiek gerig is op metodes vir dataselektering en –manipulering vir die taalpaar Engels–Afrikaans, sou die metodes ook vir toepassing op ander taalpare gebruik kon word.
Die evaluasieproses dui aan dat beide die dataselekteringsmetodes, asook die aanpassing van datagewigte, n positiewe impak op die kwaliteit van die resulterende masjienvertaalsisteem het. Die uiteindelike sisteem, afgerig deur n kombinasie van verskillende metodes, toon n 2.0001 styging in die NIST–telling en n 0.2039 styging in die BLEU–telling. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2011.
|
9 |
Smart task logging : Prediction of tasks for timesheets with machine learningBengtsson, Emil, Mattsson, Emil January 2018 (has links)
Every day most people are using applications and services that are utilising machine learning, in some way, without even knowing it. Some of these applications and services could, for example, be Google’s search engine, Netflix’s recommendations, or Spotify’s music tips. For machine learning to work it needs data, and often a large amount of it. Roughly 2,5 quintillion bytes of data are created every day in the modern information society. This huge amount of data can be utilised to make applications and systems smarter and automated. Time logging systems today are usually not smart since users of these systems still must enter data manually. This bachelor thesis will explore the possibility of applying machine learning to task logging systems, to make it smarter and automated. The machine learning algorithm that is used to predict the user’s task, is called multiclass logistic regression, which is categorical. When a small amount of training data was used in the machine learning process the predictions of a task had a success rate of about 91%.
|
10 |
Odhad výkonnosti diskových polí s využitím prediktivní analytiky / Estimating performance of disk arrays using predictive analyticsVlha, Matej January 2017 (has links)
Thesis focuses on disk arrays, where the goal is to design test scenarios to measure performance of disk array and use predictive analytics tools to train a model that will predict the selected performance parameter on a measured set of data. The implemented web application demonstrates the functionality of the trained model and shows estimate of the disk array performance.
|
Page generated in 0.0719 seconds