Global ETD Search

1	Predicting Airbnb Prices in European Cities Using Machine Learning Gangarapu, Shalini, Mernedi, Venkata Surya Akash January 2023 (has links) Background: Machine learning is a field of computer science that focuses on creating models that can predict patterns and relations among data. In this thesis, we use machine learning to predict Airbnb prices in various European cities to help the hosts in setting reasonable prices for their properties. Different supervised machine learning algorithms will be used to determine which model will provide the highest accuracy so that hosts set profitable prices for their housing properties. Objectives: The main goal of this thesis is to use machine learning algorithms to assist the hosts in setting reasonable rental prices for their properties so that they can keep their properties affordable for renters across Europe and achieve maximum occupancy. Methods: The dataset for Airbnb in European cities is gathered from Kaggle and then has been pre-processed using techniques like one-hot encoding, label encoder, standardscaler and principle component analysis. The data set is divided into three parts for training, validation and testing. Next, feature selection is done to determine the most important features that contribute to the pricing, and the dimensionality of the dataset is reduced. Supervised machine learning algorithms are utilized for training. The models are evaluated with reliable performance estimates after tuning the hyperparameters using k-fold cross-validation. Results: The feature_importance_ predicts that room capacity, type of room(shared or not), and the country appear in all three algorithms. Although scores vary between algorithms, these are among the top five attributes that influence the target variable. Day, cleanliness rating, and attr index are some other attributes that are among the top five characteristics. Among the chosen learning algorithms, the random forest regressor gave the best regression model with a R2 score of 0.70. The second best is the gradient boosting regressor with a R2 score of 0.32. While SVM gave the least score of 0.06. Conclusions: Random forest regressor was the best algorithm for predicting the prices of Airbnb and suggests hosts setting reasonable rental prices for their properties with more accurate pricing for renters across Europe compared to other chosen models. Contrary to our expectations SVM had performed the least for this dataset. Machine Learning Supervised Learning Regression Algorithms Airbnb Price Prediction Engineering and Technology Teknik och teknologier
2	Learning Algorithms Using Chance-Constrained Programs Jagarlapudi, Saketha Nath 07 1900 (has links) This thesis explores Chance-Constrained Programming (CCP) in the context of learning. It is shown that chance-constraint approaches lead to improved algorithms for three important learning problems — classification with specified error rates, large dataset classification and Ordinal Regression (OR). Using moments of training data, the CCPs are posed as Second Order Cone Programs (SOCPs). Novel iterative algorithms for solving the resulting SOCPs are also derived. Borrowing ideas from robust optimization theory, the proposed formulations are made robust to moment estimation errors. A maximum margin classifier with specified false positive and false negative rates is derived. The key idea is to employ chance-constraints for each class which imply that the actual misclassification rates do not exceed the specified. The formulation is applied to the case of biased classification. The problems of large dataset classification and ordinal regression are addressed by deriving formulations which employ chance-constraints for clusters in training data rather than constraints for each data point. Since the number of clusters can be substantially smaller than the number of data points, the resulting formulation size and number of inequalities are very small. Hence the formulations scale well to large datasets. The scalable classification and OR formulations are extended to feature spaces and the kernelized duals turn out to be instances of SOCPs with a single cone constraint. Exploiting this speciality, fast iterative solvers which outperform generic SOCP solvers, are proposed. Compared to state-of-the-art learners, the proposed algorithms achieve a speed up as high as 10000 times, when the specialized SOCP solvers are employed. The proposed formulations involve second order moments of data and hence are susceptible to moment estimation errors. A generic way of making the formulations robust to such estimation errors is illustrated. Two novel confidence sets for moments are derived and it is shown that when either of the confidence sets are employed, the robust formulations also yield SOCPs. Machine Learning Classification Dataset Classification Ordinal Regression (OR) Chance-Constrained Programming (CCP) Classification - Algorithms Ordinal Regression - Algorithms Machine Learning - Algorithms Second Order Cone Programs (SOCPs) Maximum Margin Classification Focused Crawling Large Datasets Error Rates Computer Science
3	Quantifying urban land cover by means of machine learning and imaging spectrometer data at multiple spatial scales Okujeni, Akpona 15 December 2014 (has links) Das weltweite Ausmaß der Urbanisierung zählt zu den großen ökologischen Herausforderungen des 21. Jahrhunderts. Die Fernerkundung bietet die Möglichkeit das Verständnis dieses Prozesses und seiner Auswirkungen zu erweitern. Der Fokus dieser Arbeit lag in der Quantifizierung der städtischen Landbedeckung mittels Maschinellen Lernens und räumlich unterschiedlich aufgelöster Hyperspektraldaten. Untersuchungen berücksichtigten innovative methodische Entwicklungen und neue Möglichkeiten, die durch die bevorstehende Satellitenmission EnMAP geschaffen werden. Auf Basis von Bilder des flugzeugestützten HyMap Sensors mit Auflösungen von 3,6 m und 9 m sowie simulierten EnMAP-Daten mit einer Auflösung von 30 m wurde eine Kartierung entlang des Stadt-Umland-Gradienten Berlins durchgeführt. Im ersten Teil der Arbeit wurde die Kombination von Support Vektor Regression mit synthetischen Trainingsdaten für die Subpixelkartierung eingeführt. Ergebnisse zeigen, dass sich der Ansatz gut zur Quantifizierung thematisch relevanter und spektral komplexer Oberflächenarten eignet, dass er verbesserte Ergebnisse gegenüber weiteren Subpixelverfahren erzielt, und sich als universell einsetzbar hinsichtlich der räumlichen Auflösung erweist. Im zweiten Teil der Arbeit wurde der Wert zukünftiger EnMAP-Daten für die städtische Fernerkundung abgeschätzt. Detaillierte Untersuchungen unterstreichen deren Eignung für eine verbesserte und erweiterte Beschreibung der Stadt nach dem bewährten Vegetation-Impervious-Soil-Schema. Analysen der Möglichkeiten und Grenzen zeigen sowohl Nachteile durch die höhere Anzahl von Mischpixel im Vergleich zu hyperspektralen Flugzeugdaten als auch Vorteile aufgrund der verbesserten Differenzierung städtischer Materialien im Vergleich zu multispektralen Daten. Insgesamt veranschaulicht diese Arbeit, dass die Kombination von hyperspektraler Satellitenbildfernerkundung mit Methoden des Maschinellen Lernens eine neue Qualität in die städtische Fernerkundung bringen kann. / The global dimension of urbanization constitutes a great environmental challenge for the 21st century. Remote sensing is a valuable Earth observation tool, which helps to better understand this process and its ecological implications. The focus of this work was to quantify urban land cover by means of machine learning and imaging spectrometer data at multiple spatial scales. Experiments considered innovative methodological developments and novel opportunities in urban research that will be created by the upcoming hyperspectral satellite mission EnMAP. Airborne HyMap data at 3.6 m and 9 m resolution and simulated EnMAP data at 30 m resolution were used to map land cover along an urban-rural gradient of Berlin. In the first part of this work, the combination of support vector regression with synthetically mixed training data was introduced as sub-pixel mapping technique. Results demonstrate that the approach performs well in quantifying thematically meaningful yet spectrally challenging surface types. The method proves to be both superior to other sub-pixel mapping approaches and universally applicable with respect to changes in spatial scales. In the second part of this work, the value of future EnMAP data for urban remote sensing was evaluated. Detailed explorations on simulated data demonstrate their suitability for improving and extending the approved vegetation-impervious-soil mapping scheme. Comprehensive analyses of benefits and limitations of EnMAP data reveal both challenges caused by the high numbers of mixed pixels, when compared to hyperspectral airborne imagery, and improvements due to the greater material discrimination capability when compared to multispectral spaceborne imagery. In summary, findings demonstrate how combining spaceborne imaging spectrometry and machine learning techniques could introduce a new quality to the field of urban remote sensing. Berlin hyperspektral maschinelles Lernen EnMAP städtische Fernerkundung abbildende Spektrometrie Support Vektor Regression Regressionsverfahren sub-pixel Analyse städtische Landbedeckung VIS model Berlin hyperspectral machine learning EnMAP urban remote sensing imaging spectrometry support vector regression regression algorithms sub-pixel mapping urban land cover VIS model 550 Geowissenschaften 31 Geowissenschaften RF 96232 RF 96636 ddc:550

1

Page generated in 0.0791 seconds