Global ETD Search

601	Ordinal Regression to Evaluate Student Ratings Data Bell, Emily Brooke 07 July 2008 (has links) (PDF) Student evaluations are the most common and often the only method used to evaluate teachers. In these evaluations, which typically occur at the end of every term, students rate their instructors on criteria accepted as constituting exceptional instruction in addition to an overall assessment. This presentation explores factors that influence student evaluations using the teacher ratings data of Brigham Young University from Fall 2001 to Fall 2006. This project uses ordinal regression to model the probability of an instructor receiving a good, average, or poor rating. Student grade, instructor status, class level, student gender, total enrollment, term, GE class status, and college are used as explanatory variables. ordinal regression teacher evaluations Statistics and Probability
602	Modelling Non-life Insurance Policyholder Price Sensitivity : A Statistical Analysis Performed with Logistic Regression / Modellering av priskänslighet i sakförsäkring Hardin, Patrik, Tabari, Sam January 2017 (has links) This bachelor thesis within mathematical statistics studies the possibility of modelling the renewal probability for commercial non-life insurance policyholders. The project was carried out in collaboration with the non-life insurance company If P&C Insurance Ltd. at their headquarters in Stockholm, Sweden. The paper includes an introduction to underlying concepts within insurance and mathematics and a detailed review of the analytical process followed by a discussion and conclusions. The first stages of the project were the initial collection and processing of explanatory insurance data and the development of a logistic regression model for policy renewal. An initial model was built and modern methods of mathematics and statistics were applied in order obtain a final model consisting of 9 significant characteristics. The regression model had a predictive power of 61%. This suggests that it to a certain degree is possible to predict the renewal probability of non-life insurance policyholders based on their characteristics. The results from the final model were ultimately translated into a measure of price sensitivity which can be implemented in both pricing models and CRM systems. We believe that price sensitivity analysis, if done correctly, is a natural step in improving the current pricing models in the insurance industry and this project provides a foundation for further research in this area. / Detta kandidatexamensarbete inom matematisk statistik undersöker möjligheten att modellera förnyelsegraden för kommersiella skadeförsärkringskunder. Arbetet utfördes i samarbete med If Skadeförsäkring vid huvudkontoret i Stockholm, Sverige. Uppsatsen innehåller en introduktion till underliggande koncept inom försäkring och matematik samt en utförlig översikt över projektets analytiska process, följt av en diskussion och slutsatser. De huvudsakliga delarna av projektet var insamling och bearbetning av förklarande försäkringsdata samt utvecklandet och tolkningen av en logistisk regressionsmodell för förnyelsegrad. En första modell byggdes och moderna metoder inom matematik och statistik utfördes för att erhålla en slutgiltig regressionsmodell uppbyggd av 9 signifikanta kundkaraktäristika. Regressionsmodellen hade en förklaringsgrad av 61% vilket pekar på att det till en viss grad är möjligt att förklara förnyelsegraden hos försäkringskunder utifrån dessa karaktäristika. Resultaten från den slutgiltiga modellen översattes slutligen till ett priskänslighetsmått vilket möjliggjorde implementering i prissättningsmodeller samt CRM-system. Vi anser att priskänslighetsanalys, om korrekt genomfört, är ett naturligt steg i utvecklingen av dagens prissättningsmodeller inom försäkringsbranschen och detta projekt lägger en grund för fortsatta studier inom detta område. Mathematical Statistics Regression Analysis Logistic Regression Generalized Linear Model Insurance Pricing Price Sensitivity Data Analysis Matematisk Statistik Regression Logistisk Regression Försäkringsprissättning Priskänslighet Dataanalys Computational Mathematics Beräkningsmatematik
603	Optimal experimental designs for two-variable logistic regression models Jia, Yan 06 June 2008 (has links) Binary response data is often modeled using the logistic regression model. Experimental design theory for the logistic model appears to be increasingly important as experimentation becomes more complex and expensive. The optimal design work is extremely valuable in areas such as biomedical and environmental applications. Most design research dealing with the logistic model has been concentrated on the one-variable case. Relative little has been done for the two-variable model. The primary goal of this research is to develop and study efficient and practical experimental design procedures for fitting the logistic model with two independent variables. Optimal designs are developed addressing D optimality, Q optimality, and the estimation of interaction between the design variables. The two-variable models with and without interaction usually have to be handled separately. The equivalence theory concerning D optimal designs is studied. The designs are compared using their relative efficiencies in the presence of interaction. Robustness to parameter misspecification is investigated. Bayesian design procedures are explored to provide relatively more robust experimental plans. / Ph. D. regression logistic Design LD5655.V856 1996.J53
604	Stormwater Monitoring: Evaluation of Uncertainty due to Inadequate Temporal Sampling and Applications for Engineering Education McDonald, Walter Miller 01 July 2016 (has links) The world is faced with uncertain and dramatic changes in water movement, availability, and quality are due to human-induced stressors such as population growth, climatic variability, and land use changes. At the apex of this problem is the need to understand and predict the complex forces that control the movement and life-cycle of water, a critical component of which is stormwater runoff. Success in addressing these issues is also dependent upon educating hydrology professionals who understand the physical processes that produce stormflow and the effects that these stressors have on stormwater runoff and water quality. This dissertation addresses these challenges through methodologies that can improve the way we measure stormflow and educate future hydrology professionals. A methodology is presented to (i) evaluate the uncertainty due to inadequate temporal sampling of stormflow data, and (ii) develop equations using regional regression analysis that can be used to select a stormflow sampling frequency of a watershed. A case study demonstrates how the proposed methodology has been applied to 25 stream gages with watershed areas ranging between 30 and 11,865 km2 within the Valley and Ridge geomorphologic region of Virginia. Results indicate that autocorrelation of stormflow hydrographs, drainage area of the catchment, and time of concentration are statistically significant predictor variables in single-variable regional regression analysis for estimating the site-specific stormflow sampling frequency under a specific magnitude of uncertainty. Methods and resources are also presented that utilize high-frequency continuous stormwater runoff data in hydrology education to improve student learning. Data from a real-time continuous watershed monitoring station (flow, water quality, and weather) were integrated into a senior level hydrology course at Virginia Tech (30 students) and two freshman level introductory engineering courses at Virginia Western Community College (70 students) over a period of 3 years using student-centered modules. The goal was to assess student learning through active and collaborative learning modules that provide students with field and virtual laboratory experiences. A mixed methods assessment revealed that student learning improved through modules that incorporated watershed data, and that students most valued working with real-world data and the ability to observe real-time environmental conditions. / Ph. D. stormwater data uncertainty regional regression engineering education
605	Prediction of International Flight Operations at U.S. Airports Shen, Ni 05 December 2006 (has links) This report presents a top-down methodology to forecast annual international flight operations at sixty-six U.S. airports, whose combined operations accounted for 99.8% of the total international passenger flight operations in National Airspace System (NAS) in 2004. The forecast of international flight operations at each airport is derived from the combination of passenger flight operations at the airport to ten World Regions. The regions include: Europe, Asia, Africa, South America, Mexico, Canada, Caribbean and Central America, Middle East, Oceania and U.S. International. In the forecast, a "top-down" methodology is applied in three steps. In the fist step, individual linear regression models are developed to forecast the total annual international passenger enplanements from the U.S. to each of nine World Regions. The resulting regression models are statistically valid and have parameters that are credible in terms of signs and magnitude. In the second step, the forecasted passenger enplanements are distributed among international airports in the U.S. using individual airport market share factors. The airport market share analysis conducted in this step concludes that the airline business is the critical factor explaining the changes associated with airport market share. In the third and final step, the international passenger enplanements at each airport are converted to flight operations required for transporting the passengers. In this process, average load factor and average seats per aircraft are used. The model has been integrated into the Transportation Systems Analysis Model (TSAM), a comprehensive intercity transportation planning tool. Through a simple graphic user interface implemented in the TSAM model, the user can test different future scenarios by defining a series of scaling factors for GDP, load factor and average seats per aircraft. The default values for the latter two variables are predefined in the model using 2004 historical data derived from Department of Transportation T100 international segment data. / Master of Science International Air Travel Demand Regression Model
606	A Zone-Based Multiple Regression Model to Visualize GPS Locations on a Surveillance Camera Image Moore, Daniel James 17 June 2015 (has links) Surveillance cameras are integral in assisting law enforcement by collecting video information that may help officers detect people for whom they are looking. While surveillance cameras record the area covered by the camera, unlike humans, they cannot "understand" what is happening. My research uses multiple curvilinear regression models to accurately place differentially corrected GPS points with submeter accuracy onto a camera image. Optimal results were achieved after splitting the image into four zones with the focus on calibrating each area separately. This resulted in adjusted R2 values as high as 99.8 percent, indicating that high quality GPS points can form a good manual camera calibration. To ascertain whether or not a lesser quality GPS point associated with a social media application would allow location of the person sending the message, I used an iPhone 5s to do a follow up. Using the zone-based calibration equations on GPS point locations from an iPhone 5s show that the locations collected are less accurate than differentially corrected GPS locations, but there is still a decent chance of being able to locate the correct person in an image based off that person's location. That chance, however, depends on the population density inside the image. Pedestrian density tests show that about 70-80 percent of the phone locations in a low-density environment could be used to locate the correct person that sent a message while 30-60 percent of the phone locations could be used in that manner in a high-density environment. / Master of Science GPS Public Safety Mobile Phones regression surveillance
607	Predictive Probability Model for American Civil War Fortifications using a Geographic Information System Easterbrook, Richard Brian 08 April 1999 (has links) Predictive models have established a niche in the field of archaeology. Valued as tools in predicting potential archaeological sites, their use has increased with development of faster and more affordable computer technology. Predictive models highlight areas within a landscape where archaeological sites have a high probability of occurrence. Therefore, time and resources normally expended on archaeological exploration can then be more efficiently allocated to specified locations within a study area. In addition to the resulting predictive surface, these models also identify significant variables for site selection by prehistoric or historic groups. Relationships with the environment, whether natural or social, are extremely pertinent to strengthening the resource base. In turn, this information can be utilized to better interpret and protect valuable cultural resources. A predictive probability model was generated to locate Union Civil War fortifications around Petersburg, Virginia. This study illustrated the ease with which such analysis can be accomplished with the integrated use of a Geographic Information System with statistical analysis. Stepwise logistic regression proved effective in selecting significant independent variables to predict probabilities of fortifications within the study area, but faired poorly when applied to areas withheld from the initial building stage of the model. Variation of battle tactics between these two separate areas proved great enough to have a detrimental effect the model's effectiveness. / Master of Science geographic information systems logistic regression civil war
608	Regulariserad linjär regression för modellering av företags valutaexponering / Regularised Linear Regression for Modelling of Companies' Currency Exposure Hahn, Karin, Tamm, Erik January 2021 (has links) Inom fondförvaltning används kvantitativa metoder för att förutsäga hur företags räkenskaper kommer att förändras vid nästa kvartal jämfört med motsvarande kvartal året innan. Banken SEB använder i dag multipel linjär regression med förändring av intäkter som beroende variabel och förändring av valutakurser som oberoende variabler. Det är problematiskt av tre anledningar. Först och främst har valutor ofta stor multikolinjäritet, vilket ger instabila skattningar. För det andra det kan ett företags intäkter bero på ett urval av de valutor som används som data varför regression inte bör ske mot alla valutor. För det tredje är nyare data mer relevant för prediktioner. Dessa problem kan hanteras genom att använda regulariserings- och urvalsmetoder, mer specifikt elastic net och viktad regression. Vi utvärderar dessa metoder för en stor mängd företag genom att jämföra medelabsolutfelet mellan multipel linjär regression och regulariserad linjär regression med viktning. Utvärderingen visar att en sådan modell presterar bättre i 65,0 % av de företag som ingår i ett stort globalt aktieindex samt får ett medelabsolutfel på 14 procentenheter. Slutsatsen blir att elastic net och viktad regression adresserar problemen med den ursprungliga modellen och kan användas för bättre förutsägelser av intäkternas beroende av valutakurser. / Quantative methods are used in fund management to predict the change in companies' revenues at the next quarterly report compared to the corresponding quarter the year before. The Swedish bank SEB already uses multiple linear regression with change of revenue as the depedent variable and change of exchange rates as independent variables. This is problematic for three reasons. Firstly, currencies often exibit large multicolinearity, which yields volatile estimates. Secondly, a company's revenue can depend on a subset of the currencies included in the dataset. With the multicolinearity in mind, it is benifical to not regress against all the currencies. Thirdly, newer data is more relevant for the predictions. These issues can be handled by using regularisation and selection methods, more specifically elastic net and weighted regression. We evaluate these methods for a large number of companies by comparing the mean absolute error between multiple linear regression and regularised linear regression with weighting. The evaluation shows that such model performs better for 65.0% of the companies included in a large global share index with a mean absolute error of 14 percentage points. The conclusion is that elastic net and weighted regression address the problems with the original model and can be used for better predictions of how the revenues depend on exchange rates. Multiple linear regression elastic net ridge regression lasso revenue currency Multipel linjär regression elastic net ridge regression lasso intäkter valuta Probability Theory and Statistics Sannolikhetsteori och statistik
609	Components of Variance Analysis Walpole, Ronald E. 10 1900 (has links) <p> In this thesis a systematic and short method for computing the expected values of mean squares has been developed. One chapter is devoted to the theory of regression analysis by the method of least squares using matrix notation and a proof is given that the method of least squares leads to an absolute minimum, a result which the author has not found in the literature. For two-way classifications the results have been developed for proportional frequencies, a subject which again has been neglected in the literature except for the Type II model. Finally, the methods for computing the expected values of the mean squares are applied to nested classifications and Latin square designs.</p> / Thesis / Master of Arts (MA)
610	Estimating the load weight of freight trains using machine learning Kongpachith, Erik January 2023 (has links) Accurate estimation of the load weight of freight trains is crucial for ensuring safe, efficient and sustainable rail freight transports. Traditional methods for estimating load weight often suffer from limitations in accuracy and efficiency. In recent years, machine learning algorithms have gained significant attention and use cases within the railway industry due to their strong predictive capabilities for classification and regression tasks. This study aims to present a proof of concept in the form of a comparative analysis of five machine learning regression algorithms: Polynomial Regression, K-Nearest Neighbors, Regression Trees, Random Forest Regression, and Support Vector Regression for estimating the load weight of freight trains using simulation data. The study utilizes two comprehensive datasets derived from train simulations in GENSYS, a simulation software for modeling rail vehicles. The datasets encompasses various driving condition factors such as train speed, track conditions and running gear configurations. The algorithms are trained and evaluated on these datasets and their performance is evaluated based on the root mean squared error and R2 metrics. Results from the experiments demonstrate that all five machine learning algorithms show promising performance for estimating the load weight. Polynomial regression achieves the best result for both of the datasets when using many features of the datasets are considered. Random forest regression achieves the best result for both of the data sets when a small number features of the datasets are considered. Furthermore, it is suggested that the methodical approach of this study is examined on real world data from operating freight trains to assert the proof of concept in a real world setting. / Noggrann uppskattning av godstågens lastvikt är avgörande för att säkerställa säkra, effektiva och hållbara godstransporter via järnväg. Traditionella metoder för att uppskatta lastvikt lider ofta av begränsningar i noggrannhet och effektivitet. Under de senaste åren har maskininlärningsalgoritmer fått betydande uppmärksamhet och användningsfall inom järnvägsindustrin på grund av deras starka prediktiva förmåga för klassificerings- och regressionsproblem. Denna studie syftar till att presentera en proof of concept i form av en jämförande analys av fem maskininlärningalgoritmer för regression: Polynom regression, K-Nearest Neighbors, Regression träd, Random Forest Regression och Support Vector Regression för att uppskatta lastvikten för godståg med hjälp av simuleringsdata. Studien använder två omfattande dataset konstruerade från tågsimuleringar i GENSYS, en simuleringsprogramvara för modellering av järnvägsfordon. Dataseten omfattar olika körfaktorer såsom tåghastighet, spårförhållanden och vagns konfigurationer. Algoritmerna tränas och utvärderas på dessa dataset och deras prestanda utvärderas baserat på root mean squared error och R2 måtten. Resultat från experimenten visar att alla fem maskininlärningsalgoritmerna visar lovande prestanda för att uppskatta lastvikten. Polynom regression uppnår det bästa resultatet för båda dataset när många variabler i datan beaktas. Random Forest Regression ger det bästa resultatet för båda dataset när ett mindre antal variabler i datan beaktas. Det föreslås det att det metodiska tillvägagångssättet för denna studie undersöks på verklig data från aktiva godståg för att fastställa en proof of concept på en verklig världsbild. Railway freight Transport Rail Vehicle Weighing Y25 Bogie Sdggmrss T3000eD GENSYS Machine Learning Regression Polynomial Regression Regression Trees Random Forest Regression Support Vector Regression Järnvägsgods transport Vägning av järnvägsfordon Y25 Bogie Sdggmrss T3000eD GENSYS Maskininlärning Regression Polynom Regression Regressionsträd Random Forest Regression Support Vector Regression Computer and Information Sciences Data- och informationsvetenskap

Search results