Return to search

Klassificering av vinkvalitet / A classification of wine quality

The data used in this paper is an open source data, that was collected in Portugal over a three year period between 2004 and 2007. It consists of the physiochemical parameters, and the quality grade of the wines. This study focuses on assessing which variables that primarily affect the quality of a wine and how the effects of the variables interact with each other, and also compare which of the different classification methods work the best and have the highest degree of accuracy. The data is divided into red and white wine where the response variable is ordered and consists of the grades of quality for the different wines. Due to the distribution in the response variable having too few observations in some of the quality grades, a new response variable was created where several grades were pooled together so that each different grade category would have a good amount of observations. The statistical methods used are Bayesian ordered logistic regression as well as two data mining techniques which are neural networks and decision trees. The result obtained showed that for the two types of wine it is primarily the alcohol content and the amount of volatile acid that are recurring parameters which have a great influence on predicting the quality of the wines. The results also showed that among the three different methods, decision trees were the best at classifying the white wines and the neural network were the best for the red wines.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-139932
Date January 2017
CreatorsBrouwers, Jack, Thellman, Björn
PublisherLinköpings universitet, Statistik och maskininlärning, Linköpings universitet, Statistik och maskininlärning
Source SetsDiVA Archive at Upsalla University
LanguageSwedish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0265 seconds