Outliers are a major concern in data quality as it limits the reliability of any data. The objective of our investigation was to examine the presence and cause of outliers in the system for controlling and recording the feed intake of dairy cows in Lovsta farm, Uppsala Sweden. The analyses were made on data recorded as a timestamp of each visit of the cows to the feeding troughs from the period of August 2015 to January 2016. A three step methodology was applied to this data. The first step was fitting a mixed model to the data then the resulting residuals was used in the second step to fit a model based clustering for Gaussian mixture distribution which resulted in clusters of which 2.5% of the observations were in the outlier cluster. Finally, as the third step, a logistic regression was then fit modelling the presence of outliers versus the non-outlier clusters. It appeared that on early hours of the morning between 6am to 11.59am, there is a high possibility of recorded values to be outliers with odds ratio of 1.1227 and this is also the same time frame noted to have the least activity in feed consumption of the cows with a decrease of 0.027 kilograms as compared to the other timeframes. These findings provide a basis for further investigation to more specifically narrow down the causes of the outliers.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:du-24522 |
Date | January 2016 |
Creators | Kogo, Gloria |
Publisher | Högskolan Dalarna, Mikrodataanalys |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0019 seconds