Global ETD Search

91	Big Data : le nouvel enjeu de l'apprentissage à partir des données massives / Big Data : the new challenge Learning from data Massive Adjout Rehab, Moufida 01 April 2016 (has links) Le croisement du phénomène de mondialisation et du développement continu des technologies de l’information a débouché sur une explosion des volumes de données disponibles. Ainsi, les capacités de production, de stockage et de traitement des donnée sont franchi un tel seuil qu’un nouveau terme a été mis en avant : Big Data.L’augmentation des quantités de données à considérer, nécessite la mise en oeuvre de nouveaux outils de traitement. En effet, les outils classiques d’apprentissage sont peu adaptés à ce changement de volumétrie tant au niveau de la complexité de calcul qu’à la durée nécessaire au traitement. Ce dernier, étant le plus souvent centralisé et séquentiel,ce qui rend les méthodes d’apprentissage dépendantes de la capacité de la machine utilisée. Par conséquent, les difficultés pour analyser un grand jeu de données sont multiples.Dans le cadre de cette thèse, nous nous sommes intéressés aux problèmes rencontrés par l’apprentissage supervisé sur de grands volumes de données. Pour faire face à ces nouveaux enjeux, de nouveaux processus et méthodes doivent être développés afin d’exploiter au mieux l’ensemble des données disponibles. L’objectif de cette thèse est d’explorer la piste qui consiste à concevoir une version scalable de ces méthodes classiques. Cette piste s’appuie sur la distribution des traitements et des données pou raugmenter la capacité des approches sans nuire à leurs précisions.Notre contribution se compose de deux parties proposant chacune une nouvelle approche d’apprentissage pour le traitement massif de données. Ces deux contributions s’inscrivent dans le domaine de l’apprentissage prédictif supervisé à partir des données volumineuses telles que la Régression Linéaire Multiple et les méthodes d’ensemble comme le Bagging.La première contribution nommée MLR-MR, concerne le passage à l’échelle de la Régression Linéaire Multiple à travers une distribution du traitement sur un cluster de machines. Le but est d’optimiser le processus du traitement ainsi que la charge du calcul induite, sans changer évidement le principe de calcul (factorisation QR) qui permet d’obtenir les mêmes coefficients issus de la méthode classique.La deuxième contribution proposée est appelée "Bagging MR_PR_D" (Bagging based Map Reduce with Distributed PRuning), elle implémente une approche scalable du Bagging,permettant un traitement distribué sur deux niveaux : l’apprentissage et l’élagage des modèles. Le but de cette dernière est de concevoir un algorithme performant et scalable sur toutes les phases de traitement (apprentissage et élagage) et garantir ainsi un large spectre d’applications.Ces deux approches ont été testées sur une variété de jeux de données associées àdes problèmes de régression. Le nombre d’observations est de plusieurs millions. Nos résultats expérimentaux démontrent l’efficacité et la rapidité de nos approches basées sur la distribution de traitement dans le Cloud Computing. / In recent years we have witnessed a tremendous growth in the volume of data generatedpartly due to the continuous development of information technologies. Managing theseamounts of data requires fundamental changes in the architecture of data managementsystems in order to adapt to large and complex data. Single-based machines have notthe required capacity to process such massive data which motivates the need for scalablesolutions.This thesis focuses on building scalable data management systems for treating largeamounts of data. Our objective is to study the scalability of supervised machine learningmethods in large-scale scenarios. In fact, in most of existing algorithms and datastructures,there is a trade-off between efficiency, complexity, scalability. To addressthese issues, we explore recent techniques for distributed learning in order to overcomethe limitations of current learning algorithms.Our contribution consists of two new machine learning approaches for large scale data.The first contribution tackles the problem of scalability of Multiple Linear Regressionin distributed environments, which permits to learn quickly from massive volumes ofexisting data using parallel computing and a divide and-conquer approach to providethe same coefficients like the classic approach.The second contribution introduces a new scalable approach for ensembles of modelswhich allows both learning and pruning be deployed in a distributed environment.Both approaches have been evaluated on a variety of datasets for regression rangingfrom some thousands to several millions of examples. The experimental results showthat the proposed approaches are competitive in terms of predictive performance while reducing significantly the time of training and prediction. Données massives Big data Régression linéaire multiple Large scale data Mapreduce Multiple linear regression Bagging
92	Assignment of Estimated Average Annual Daily Traffic Volumes on All Roads in Florida Pan, Tao 27 March 2008 (has links) In the first part, this thesis performed a study to compile and compare current procedures or methodologies for the estimation of traffic volumes on the roads where traffic counts are not easily available. In the second part, linear regression was practiced as an AADT estimation process, which was primarily based on known or accepted AADT values on the neighboring state and local roadways, population densities and other social/economic data. To develop AADT prediction models for estimating AADT values, two different types of database were created, including a social economic database and a roadway characteristics database. Ten years social economic data, from 1995 to 2005 were collected for each of the 67 counties in the state of Florida, and a social economic database was created by manually imputing data obtained from different resources into the social economic database. The roadway characteristics database was created by joining different GIS data layers to the Tele Atlas base map provided by Florida Department of Transportation (FDOT). Stepwise regression method was used to select variables that will be included into the final models. All selected independent variables in the models are statistically significant with a 90% level of confidence. In total, six linear regression models were built. The adjusted R2 values of the AADT prediction models vary from 0.166 to 0.418. Model validation results show that the MAPE values of the AADT prediction models vary from 31.99% to 159.49%. The model with the lowest MAPE value is found to be the minor state/county highway model for rural area. The model with the highest MAPE value is found to be the local street model for large metropolitan area. In general, minor state/county highway models provide more reasonable AADT estimates as compared to the local street model in terms of the lower MAPE values. AADT Linear regression Social economy Traffic count Database American Studies Arts and Humanities
93	Contact Center Employee Characteristics Associated with Customer Satisfaction Pow, Lara 01 January 2017 (has links) The management of operations for a customer contact center (CCC) presents significant challenges. Management's direction is to reduce costs through operational efficiency metrics while providing maximum customer satisfaction levels to retain customers and increase profit margins. The purpose of this correlational study was to quantify the significance of various customer service representative (CSR) characteristics including internal service quality, employee satisfaction, and employee productivity, and then to determine their predictive ability on customer satisfaction, as outlined in the service-profit chain model. The research question addressed whether a linear relationship existed between CSR characteristics and the customers' satisfaction with the CSR by applying ordinary least squares regression using archival dyadic data. The data consisted of a random sample of 269 CSRs serving a large Canadian bank. Various subsets of data were analyzed via regression to help generate actionable insights. One particular model involving poor performing CSRs whose customer satisfaction was less than 75% top box proved to be statistically significant (p = .036, R-squared = .321) suggesting that poor performing CSRs contribute to a significant portion of poor customer service while high performing CSRs do not necessarily guarantee good customer service. A key variable used in this research was a CSR's level of education, which was not significant. Such a finding implies that for CCC support, a less-educated labor pool may be maintained, balancing societal benefits of employment for less-educated people at a reasonable service cost to a company. These findings relate to positive social change as hiring less-educated applicants could increase their social and economic status. contact center customer satisfaction job performance multiple linear regression service-profit chain Business
94	QUANTIFYING NON-RECURRENT DELAY USING PROBE-VEHICLE DATA Brashear, Jacob Douglas Keaton 01 January 2018 (has links) Current practices based on estimated volume and basic queuing theory to calculate delay resulting from non-recurrent congestion do not account for the day-to-day fluctuations in traffic. In an attempt to address this issue, probe GPS data are used to develop impact zone boundaries and calculate Vehicle Hours of Delay (VHD) for incidents stored in the Traffic Response and Incident Management Assisting the River City (TRIMARC) incident log in Louisville, KY. Multiple linear regression along with stepwise selection is used to generate models for the maximum queue length, the average queue length, and VHD to explore the factors that explain the impact boundary and VHD. Models predicting queue length do not explain significant amounts of variance but can be useful in queue spillback studies. Models predicting VHD are as effective as the data collected; models using cheaper-to-collect data sources explain less variance; models collecting more detailed data explained more variance. Models for VHD can be useful in incident management after action reviews and predicting road user costs. Probe GPS Modeling Multiple Linear Regression Impact Zone Vehicle Hours of Delay Transportation Engineering
95	Queueing Variables and Leave-Without-Treatment Rates in the Emergency Room Gibbs, Joy Jaylene 01 January 2018 (has links) Hospitals stand to lose millions of dollars in revenue due to patients who leave without treatment (LWT). Grounded in queueing theory, the purpose of this correlational study was to examine the relationship between daily arrivals, daily staffing, triage time, emergency severity index (ESI), rooming time, door-to-provider time (DTPT), and LWT rates. The target population comprised patients who visited a Connecticut emergency room between October 1, 2017, and May 31, 2018. Archival records (N = 154) were analyzed using multiple linear regression analysis. The results of the multiple linear regression were statistically significant, with F(9,144) = 2902.49, p < .001, and R2 = 0.99, indicating 99% of the variation in LWT was accounted for by the predictor variables. ESI levels were the only variables making a significant contribution to the regression model. The implications for positive social change include the potential for patients to experience increased satisfaction due to the high quality of care and overall improvement in public health outcomes. Hospital leaders might use the information from this study to mitigate LWT rates and modify or manage staffing levels, time that patients must wait for triage, room placement, and DTPT to decrease the rate of LWT in the emergency room. Emergency Department Emergency Room Emergency Severity Index Leave-Without-Treatment Multiple Linear Regression Queueing Theory Business
96	Selecting the Best Linear Model From a Subset of All Possible Models for a Given Set of Predictors in a Multiple Linear Regression Analysis Jensen, David L. 01 May 1972 (has links) Sixteen "model building" and "model selection" procedures commonly encountered in industry, all of which were initially alleged to be capable of identifying the best model from the collection of 2k possible linear models corresponding to a given set of k predictors in a multiple linear regression analysis, were individually summarized and subsequently evaluated by considering their comparative advantages and limitations from both a theoretical and a practical standpoint. It was found that none of the proposed procedures were absolutely infallible and that several were actually unsuitable. However, it was also found that most of these techniques could still be profitably employed by the analyst, and specific directional guidelines were recommended for their implementation in a proper analysis. Furthermore, the specific role of the analyst in a multiple linear regression application was clearly defined in a practical sense. linear model subset predictors multiple linear regression analysis Applied Statistics Statistics and Probability
97	Automated Localization and Segmentation of Pelvic Floor Structures on MRI to Predict Pelvic Organ Prolapse Onal, Sinan 29 May 2014 (has links) Pelvic organ prolapse (POP) is a major health problem that affects women. POP is a herniation of the female pelvic floor organs (bladder, uterus, small bowel, and rectum) into the vagina. This condition can cause significant problems such as urinary and fecal incontinence, bothersome vaginal bulge, incomplete bowel and bladder emptying, and pain/discomfort. POP is normally diagnosed through clinical examination since there are few associated symptoms. However, clinical examination has been found to be inadequate and in disagreement with surgical findings. This makes POP a common but poorly understood condition. Dynamic magnetic resonance imaging (MRI) of the pelvic floor has become an increasingly popular tool to assess POP cases that may not be evident on clinical examination. Anatomical landmarks are manually identified on MRI along the midsagittal plane to determine reference lines and measurements for grading POP. However, the manual identification of these points, lines and measurements on MRI is a time-consuming and subjective procedure. This has restricted the correlation analysis of MRI measurements with clinical outcomes to improve the diagnosis of POP and predict the risk of development of this disorder. The main goal of this research is to improve the diagnosis of pelvic organ prolapse through a model that automatically extracts image-based features from patient specific MRI and fuses them with clinical outcomes. To extract image-based features, anatomical landmarks need to be identified on MRI through the localization and segmentation of pelvic bone structures. This is the main challenge of current algorithms, which tend to fail during bone localization and segmentation on MRI. The proposed research consists of three major objectives: (1) to automatically identify pelvic floor structures on MRI using a multivariate linear regression model with global information, (2) to identify image-based features using a hybrid technique based on texture-based block classification and K-means clustering analysis to improve the segmentation of bone structures on images with low contrast and image in homogeneity, (3) to design, test and validate a prediction model using support vector machines with correlation analysis based feature selection to improve disease diagnosis. The proposed model will enable faster and more consistent automated extraction of features from images with low contrast and high inhomogeneity. This is expected to allow studies on large databases to improve the correlation analysis between MRI features and clinical outcomes. The proposed research focuses on the pelvic region but the techniques are applicable to other anatomical regions that require automated localization and segmentation of multiple structures from images with high inhomogeneity, low contrast, and noise. This research can also be applicable to the automated extraction and analysis of image-based features for the diagnosis of other diseases where clinical examination is not adequate. The proposed model will set the foundation towards a computer-aided decision support system that will enable the fusion of image, clinical, and patient data to improve the diagnosis of POP through personalized assessment. Automating the process of pelvic floor measurements on radiologic studies will allow the use of imaging to predict the development of POP in predisposed patients, and possibly lead to preventive strategies. Medical Imaging Non-linear Regression Organ Location Prediction SVM Industrial Engineering
98	Effect of advective pore water flow on degradation of organic matter in permeable sandy sediment : - A study of fresh- and brackish water Hofman, Birgitta January 2005 (has links) <p>The carbon metabolism in costal sediments is of major importance for the global carbon cycle. Costal sediments are also subjected to physical forcing generating water fluxes above and through the sediments, but how the physical affect the carbon metabolism is currently poorly known. In this study, the effect of advective pore water flow on degradation of organic matter in permeable sandy sediment was investigated in a laboratory study during wintertime. Sediments were collected from both brackish water (Askö) and from a fresh water stream (Getå Stream). In two chamber experiments, with and without advective pore water flow, the degradation of organic matter was measured through carbon dioxide analysis from water and headspace. In Askö sediments mineralization rates ranged from 3.019 - 5.115 mmol C m-2 d-1 and 3.139 mmol C m-2 d-1 with and without advective pore water flow, respectively. Those results correspond with results from earlier studies of carbon mineralization rates in sediment in the North Sea and the Baltic Sea. There were no significant differences between the two groups in the Askö sediment. In Getå Stream sediments mineralization rates ranged between 4.059 mmol C m-2 d-1 and 6.806 mmol C m-2 d-1 with and without advective flow, respectively. The mineralization rates for Getå Stream correspond with earlier studies of carbon mineralization rates in a stream in New Hampshire.</p> Advective pore water flow chamber experiment CO2 fresh- and brackish water linear regression Environmental chemistry Miljökemi
99	Enhancement of Pavement Maintenance Decision Making by Evaluating the Effectiveness of Pavement Maintenance Treatments Dong, Qiao 01 May 2011 (has links) The performance of different pavement maintenance treatments were evaluated by investigating practical projects collected from Tennessee Pavement Management System (PMS) and Long Term Pavement Performance (LTPP) database. The influence of factors on the effectiveness, cost-effectiveness and cracking initiation of different treatment were evaluated by “Optime”, multiple linear regression and parametric survival analysis. Pavement roughness, pavement serviceability index (PSI) and the initiation time of cracking were used as pavement performance indicators. Investigation on the pavement maintenance projects in Tennessee by Optime and multiple linear regression analysis indicated that HMA overlay had the highest effectiveness, followed by mill & fill and micro surfacing. Due to the relatively low cost, micro surfacing was the most cost-effective treatment, followed by HMA overlay and mill & fill. The effectiveness and cost-effectiveness decreased with the increase of traffic level and pre-treatment pavement condition. Investigation on the LTPP resurfacing treatments indicated that thick overlay and milling reduced the roughness after rehabilitation. Thin overlay, high traffic level and poor pre-rehabilitation pavement condition increased the deterioration rate of new overlay. Using reclaimed asphalt material did not influence the treatment performance but was cost-effective in reducing the roughness of new overlay. For a certain deterioration rate, there was an optimized pre-rehabilitation roughness value or time for applying maintenance treatment. Survival analysis on the crack initiation of asphalt overlay indicated that high traffic level accelerated the initiation of cracking. Thick overlay delayed the initiation of cracking except for the non-wheel path longitudinal crack. Mill retarded the occurrence of the non-fatigue cracks, whereas severe freeze thaw condition accelerated the occurrence of the two types of cracking. Using 30% RAP accelerated the initiation of longitudinal fatigue crack on wheel path but did not cause serious fatigue problem. The performance curves of HMA resurfacing treatments used in Tennessee were calibrated by investigating the influence of different factors on the slopes and intercepts of post-treatment performance curves. The analysis indicated that pavement with high pre-treatment PSI, thick overlay and deep milling had low deterioration rate, whereas pavement with higher traffic level deteriorated faster. Pavement maintenance Performance model Cost-effectiveness Multiple linear regression Survival analysis
100	Effect of advective pore water flow on degradation of organic matter in permeable sandy sediment : - A study of fresh- and brackish water Hofman, Birgitta January 2005 (has links) The carbon metabolism in costal sediments is of major importance for the global carbon cycle. Costal sediments are also subjected to physical forcing generating water fluxes above and through the sediments, but how the physical affect the carbon metabolism is currently poorly known. In this study, the effect of advective pore water flow on degradation of organic matter in permeable sandy sediment was investigated in a laboratory study during wintertime. Sediments were collected from both brackish water (Askö) and from a fresh water stream (Getå Stream). In two chamber experiments, with and without advective pore water flow, the degradation of organic matter was measured through carbon dioxide analysis from water and headspace. In Askö sediments mineralization rates ranged from 3.019 - 5.115 mmol C m-2 d-1 and 3.139 mmol C m-2 d-1 with and without advective pore water flow, respectively. Those results correspond with results from earlier studies of carbon mineralization rates in sediment in the North Sea and the Baltic Sea. There were no significant differences between the two groups in the Askö sediment. In Getå Stream sediments mineralization rates ranged between 4.059 mmol C m-2 d-1 and 6.806 mmol C m-2 d-1 with and without advective flow, respectively. The mineralization rates for Getå Stream correspond with earlier studies of carbon mineralization rates in a stream in New Hampshire. Advective pore water flow chamber experiment CO2 fresh- and brackish water linear regression Environmental chemistry Miljökemi

Search results