Global ETD Search

101	Bias correction of bounded location errors in binary data Walker, Nelson B. January 1900 (has links) Master of Science / Department of Statistics / Trevor Hefley / Binary regression models for spatial data are commonly used in disciplines such as epidemiology and ecology. Many spatially-referenced binary data sets suffer from location error, which occurs when the recorded location of an observation differs from its true location. When location error occurs, values of the covariates associated with the true spatial locations of the observations cannot be obtained. We show how a change of support (COS) can be applied to regression models for binary data to provide bias-corrected coefficient estimates when the true values of the covariates are unavailable, but the unknown location of the observations are contained within non-overlapping polygons of any geometry. The COS accommodates spatial and non-spatial covariates and preserves the convenient interpretation of methods such as logistic and probit regression. Using a simulation experiment, we compare binary regression models with a COS to naive approaches that ignore location error. We illustrate the flexibility of the COS by modeling individual-level disease risk in a population using a binary data set where the location of the observations are unknown, but contained within administrative units. Our simulation experiment and data illustration corroborate that conventional regression models for binary data which ignore location error are unreliable, but that the COS can be used to eliminate bias while preserving model choice.
102	Three Essays on Correlated Binary Outcomes: Detection and Appropriate Models January 2018 (has links) abstract: Correlation is common in many types of data, including those collected through longitudinal studies or in a hierarchical structure. In the case of clustering, or repeated measurements, there is inherent correlation between observations within the same group, or between observations obtained on the same subject. Longitudinal studies also introduce association between the covariates and the outcomes across time. When multiple outcomes are of interest, association may exist between the various models. These correlations can lead to issues in model fitting and inference if not properly accounted for. This dissertation presents three papers discussing appropriate methods to properly consider different types of association. The first paper introduces an ANOVA based measure of intraclass correlation for three level hierarchical data with binary outcomes, and corresponding properties. This measure is useful for evaluating when the correlation due to clustering warrants a more complex model. This measure is used to investigate AIDS knowledge in a clustered study conducted in Bangladesh. The second paper develops the Partitioned generalized method of moments (Partitioned GMM) model for longitudinal studies. This model utilizes valid moment conditions to separately estimate the varying effects of each time-dependent covariate on the outcome over time using multiple coefficients. The model is fit to data from the National Longitudinal Study of Adolescent to Adult Health (Add Health) to investigate risk factors of childhood obesity. In the third paper, the Partitioned GMM model is extended to jointly estimate regression models for multiple outcomes of interest. Thus, this approach takes into account both the correlation between the multivariate outcomes, as well as the correlation due to time-dependency in longitudinal studies. The model utilizes an expanded weight matrix and objective function composed of valid moment conditions to simultaneously estimate optimal regression coefficients. This approach is applied to Add Health data to simultaneously study drivers of outcomes including smoking, social alcohol usage, and obesity in children. / Dissertation/Thesis / Doctoral Dissertation Statistics 2018 Statistics Correlation Generalized Method of Moments Hierarchical Data Logistic Regression
103	Estudo da análise da razão alfa/teta em pacientes com doença de Alzheimer provável / Study of alpha/theta ration analysis in patients with probable Alzheimer\'s disease Magali Taino Schmidt 16 May 2013 (has links) A inclusão da eletroencefalografia nos protocolos de pesquisa diagnóstica para DA é plenamente justificada por sua larga disponibilidade, baixo custo, alta sensibilidade, o que possibilita a realização de exames seriados e o acompanhamento da evolução do estudo neurológico. Objetivo: Determinar um índice de corte, para utilizaçào na prática clínica, no auxilio diagnóstico da doença de Alzheimer. Metodologia: Avaliamos dois grupos de indivíduos compostos por 57 voluntários normais e idade superior a 50 anos comparados a 50 indivíduos com DA provável. Realizamos registros de EEG em vigília, olhos fechados e repouso por 30 minutos e computamos as potências espectrais das bandas de frequência alfa e teta, para todos os eletrodos, e calculamos a razão alfa/teta. Realizamos a regressão logística das variáveis razão alfa/teta da potência média do eletrodo C3 e do eletrodo O1e calculamos uma fórmula para o auxílio no diagnóstico da DA com um acerto cuja, sensibilidade para DA de 76, 4 % e especificidadede 84,6 % e a área sob a curva ROC 0.92. Conclusão: A regressão logística da razão alfa/teta do Espectro da potência média do EEG é um bom marcador para discriminar pacientes com doença de Alzheimer de controles normais / The inclusion of electroencephalography in diagnostic research protocols for AD is fully justified given EEG\'s wide availability, low cost and high sensitivity, allowing serial exams and neurological evolution follow-ups. Objective: To determine a screening index for use in routine clinical practice to aid the diagnosis of Alzheimer\'s disease. Methodology: Two groups of individuals older than 50 years, comprising a control group of 57 normal volunteers and a study group of 50 patients with probable AD, were compared. EEG recordings were performed of subjects in a wake state with eyes closed at rest for 30 mins. Spectral potentials of the alpha and theta bands were computed for all electrodes and the alpha/theta ratio calculated. Logistic regression of the variables alpha/theta of the mean potential of the C3 and O1 electrodes was carried out. A formula was calculated to aid the diagnosis of AD yielding 76.4 % sensitivity and 84.6 specificity for AD with an area under the ROC curve of 0.92. Conclusion: Logistic regression of the alpha/theta of the Spectrum of the mean potential of EEG represents a good marker for discriminating between AD patients and normal controls Doença de Alzheimer Eletroencefalografia Regressão logística Alzheimer's disease Electroencephalography Logistic regression
104	Credit Risk Model for loans to SMEs in Sweden : Calculating Probability of Default for SMEs in Sweden based on historical data, to estimate a financial institution’s risk exposure Mustafa, Khalil, Persson, Victor January 2017 (has links) As a consequence from the last financial crisis that began 2007 in USA, regulatory frameworks are continuously improved in order to limit the banks’ risk exposure. Two of the amendments are Basel III and IFRS 9. Basel III regulates the capital a bank is required to hold while IFRS 9 is an accounting standard for how banks and insurance companies should classify their assets and estimate their future credit losses. Mutually for both Basel III and IFRS 9 is the estimation of future credit losses which include probability of default in the calculations.The objective of this thesis was therefore to develop scoring model that can estimate the probability of default in lending capital to enterprises based on information from financial statements. The aim is that the developed model also can be used in the daily operations to reduce fixed costs by optimizing the processes and increase the profit on each loan issued. The model should estimate probability of default within 500 days from the last known information and be customized for small and medium size enterprises.The model is based on logistic regression and is therefore returning values between 0 and 1. Parameters that the model consists of can either be calculated or retrieved directly from financial statements. The authors have during the development of the model divided the data, consisting of information from enterprises, based on branches. The grouping of data has been performed to create as homogenous sets of data as possible in order to increase the degree of explanation for each model. The final solution will thus consist of several models, one for each set of data. The validation of the models is performed, on a new set of enterprises where it is observed how well the models can discriminate enterprises defined as defaults from non-defaults.The master thesis did result in a number of models that are calibrated on default, non-defaults and models developed on data divided on branches. By using the calibrated models, it is possible to discriminate defaulting from non-defaulting enterprises which has been the objective of this thesis. During the project the importance of dividing data into homogenous groups has been shown in order to better create models that more accurately can identify defaults from non-defaults. / Som en konsekvens av finanskrisen som började 2007 i USA tillkom ytterligare regelverk för att minimera bankers riskexponering. Två av de regelverk som tillkommit är Basel III och IFRS 9. Basel III reglerar kapitaltäckningen för en bank medan IFRS 9 är en standard för hur banker och försäkringsbolag skall klassificera tillgångar samt estimera framtida kreditförluster. Gemensamt för de båda regelverken är estimeringen av kreditförluster som bland annat baseras på risken för fallissemang.Målet med detta examensarbete är därför att utveckla en scoringmodell som kan estimera risken för fallissemang vid utlåning till företag baserat på information från dess årsredovisningar. Modellen kommer även kunna användas i den operativa verksamheten för att reducera fasta kostnaderna genom att effektivisera processer och då öka avkastningen på varje utlånad krona. Modellen kommer att estimera risken för fallissemang inom 500 dagar från senast kända informationen och den kommer att anpassas till svenska små och medelstora företag.Modellen är baserad på logistisk regression och kommer därför att returnera värden mellan 0 och 1 samt bestå av parametrar som antingen kan beräknas eller hämtas direkt ur en årsredovisning. För att öka modellens förklaringsgrad har författarna vid kalibreringen av modellerna delat in datat efter branscher. Uppdelningen har gjorts för att skapa så homogena grupper som möjligt och lösningen kommer därför att bestå av flera olika modeller. Validering av modellerna sker genom att på nytt data testa hur bra företag som definierats som fallissemang kan diskrimineras från företag som inte definieras som fallissemang.Rapporten resulterar i ett antal modeller som är baserade på konkurser, icke konkurser samt modeller baserade på ett data som är uppdelat på branscher. Genom att använda de kalibrerade modellerna så går det att diskriminera konkurser från icke konkurser vilket varit målet med denna rapport. Arbetet har också påvisat vikten av att dela in datat i homogena grupper för att på ett bättre sätt skapa modeller som mer exakt kan urskilja konkurser från icke konkurser. Credit risk probability of default logistic regression Mathematics Matematik
105	Affective Intelligence in Built Environments Yates, Heath January 1900 (has links) Doctor of Philosophy / Department of Computer Science / William H. Hsu / The contribution of the proposed dissertation is the application of affective intelligence in human-developed spaces where people live, work, and recreate daily, also known as built environments. Built environments have been known to influence and impact individual affective responses. The implications of built environments on human well-being and mental health necessitate the need to develop new metrics to measure and detect how humans respond subjectively in built environments. Detection of arousal in built environments given biometric data and environmental characteristics via a machine learning-centric approach provides a novel and new capability to measure human responses to built environments. Work was also conducted on experimental design methodologies for multiple sensor fusion and detection of affect in built environments. These contributions include exploring new methodologies in applying supervised machine learning algorithms, such as logistic regression, random forests, and artificial neural networks, in the detection of arousal in built environments. Results have shown a machine learning approach can not only be used to detect arousal in built environments but also for the construction of novel explanatory models of the data. Affective Computing Built Environments Machine Learning Affective Intelligence Logistic Regression
106	Prediction of protein secondary structure using binary classificationtrees, naive Bayes classifiers and the Logistic Regression Classifier Eldud Omer, Ahmed Abdelkarim January 2016 (has links) The secondary structure of proteins is predicted using various binary classifiers. The data are adopted from the RS126 database. The original data consists of protein primary and secondary structure sequences. The original data is encoded using alphabetic letters. These data are encoded into unary vectors comprising ones and zeros only. Different binary classifiers, namely the naive Bayes, logistic regression and classification trees using hold-out and 5-fold cross validation are trained using the encoded data. For each of the classifiers three classification tasks are considered, namely helix against not helix (H/∼H), sheet against not sheet (S/∼S) and coil against not coil (C/∼C). The performance of these binary classifiers are compared using the overall accuracy in predicting the protein secondary structure for various window sizes. Our result indicate that hold-out cross validation achieved higher accuracy than 5-fold cross validation. The Naive Bayes classifier, using 5-fold cross validation achieved, the lowest accuracy for predicting helix against not helix. The classification tree classifiers, using 5-fold cross validation, achieved the lowest accuracies for both coil against not coil and sheet against not sheet classifications. The logistic regression classier accuracy is dependent on the window size; there is a positive relationship between the accuracy and window size. The logistic regression classier approach achieved the highest accuracy when compared to the classification tree and Naive Bayes classifiers for each classification task; predicting helix against not helix with accuracy 77.74 percent, for sheet against not sheet with accuracy 81.22 percent and for coil against not coil with accuracy 73.39 percent. It is noted that it is easier to compare classifiers if the classification process could be completely facilitated in R. Alternatively, it would be easier to assess these logistic regression classifiers if SPSS had a function to determine the accuracy of the logistic regression classifier. Bayesian statistical decision theory Logistic regression analysis Biostatistics Proteins -- Structure
107	Identification de la zone regardée sur un écran d'ordinateur à partir du flou Néron, Eric January 2017 (has links) Quand vient le temps de comprendre le comportement d’une personne, le regard est une source d’information importante. L’analyse des comportements des consommateurs, des criminels, ou encore de certains états cognitifs passe par l’interprétation du regard dans une scène à travers le temps. Il existe un besoin réel d’identification de la zone regardée sur un écran ou tout autre médium par un utilisateur. Pour cela, la vision humaine fait la composition de plusieurs images pour permettre de comprendre la relation tridimensionnelle qui existe entre les objets et la scène. La perception 3D d’une scène réelle passe alors à travers plusieurs images. Mais qu’en est-il lorsqu’il n’y a qu’une seule image ? Estimation du regard Marqueur Flou Position spatiale de la tête Bayesian logistic regression
108	Street network connectivity and local travel behaviour: assessing the relationship of travel outcomes to disparate pedestrian and vehicular street network connectivity Hawkins, Christopher 05 1900 (has links) This research investigated the association of street network connectivity differences across travel modes with travel behaviour – mode choice, distance traveled and number of trips. To date research on travel behaviour relationships with urban form has not developed empirical evidence on street designs as distinct networks for walking and driving. A street network having greater connectivity and continuity for the pedestrian mode of travel vis-à-vis the vehicular network, like the Fused Grid, will likely encourage more walking. This hypothesis was investigated using a quasi-experimental approach within a rational utility behavioural framework. Local travel behaviour is theorized to be affected by desire to access goods and services (broadly termed, ‘activities’) in the community where people live. Using inferential statistics, the research tested for relationships between measured street patterns and self-reported local travel by King County, WA households. The main variables were ratios (walking : driving) of network connectivity and density, in the vicinity of travel survey households. Demographics and household characteristics, as well as other behaviourally influential urban form factors (residential density, proximity of destinations, etc.), were included in regression models, allowing control for confounding factors. Findings suggest that street networks with connectivity that provides better routing for one mode of transportation over others encourage more travel by the favored mode. The regression model demonstrated that a change from a pure small-block grid to a modified grid (i.e. Fused Grid) can result in an 11.3% increase in odds of a home-based trip being walked. The modified street pattern like a Fused Grid is also associated with a 25.9% increase, over street patterns with equivalent route directness for walking and driving, in the odds a person will meet recommended levels of physical activity. Finally, the Fused Grid’s 10% increase in relative connectivity for pedestrians is associated with a 23% decrease in local vehicle travel distance (VMT), and its improved continuity is associated with increased walking trips and distance. Conclusions: Other factors being equal, residential street networks with either more direct routing for pedestrians or more pedestrian facilities relative to vehicular network are associated with improved odds of walking and reduced odds of driving. / Applied Science, Faculty of / Community and Regional Planning (SCARP), School of / Graduate street network GIS walking relative utility driving logistic regression
109	What Motivates Marketing Innovation and Whether Marketing Innovation Varies across Industry Sectors Wang, Shu January 2015 (has links) Innovativeness is one of the fundamental instruments of growth strategies that provide companies with a competitive edge. Only a few recent studies have examined marketing innovation and the factors that might encourage its adoption. This study investigates the factors that motivate marketing innovation and examines whether the occurrence of marketing innovation varies across industry sectors. This study uses data from surveys and a nationwide census conducted by Statistics Canada. They include: the Survey of Innovation and Business Strategies (SIBS) 2009, the Survey of Innovation and Business Strategies (SIBS) 2012, the Business Registry (BR) and the General Index of Financial Information (GIFI). Multilevel (random-intercept) logistic regression modelling is employed. The results show that if a firm has a strategic focus on new marketing practices, maintains marketing within its enterprise, acquires or expands marketing capacity, has competitor and customer orientations, and adopts advanced technology then it is more likely to carry out marketing innovation. However, breadth of long-term strategic objectives and competitive intensity do not have significant impacts on marketing innovation. In addition, product innovation and organizational innovation occur simultaneously with marketing innovation, but process innovation may not. Lastly, the occurrence of marketing innovation is found to vary across industry sectors. The theoretical and empirical implications of the results are discussed within this study. marketing innovation
110	Riziko chudoby v ČR / The Risk of Poverty in the Czech Republic Klein, Jan January 2011 (has links) The goal of this work is to identify and analyse factors with impact on the income decrease of households under the poverty line. Data used in this work are taken from EU SILC survey. In this work is created a statistical model which help us to discover relevant and irrelevant factors. The situation and it's development is analysed only for Czech households in this work

Search results