Spelling suggestions: "subject:"jogistic degression."" "subject:"jogistic aregression.""
181 |
Modeling Success Factors for Start-ups in Western Europe through a Statistical Learning Approach / Modellering av framgångsfaktorer för startups i Västeuropa genom statistisk inlärningKamal, Adib, Sabani, Kenan January 2021 (has links)
The purpose of this thesis was to use a quantitative method to expand on previous research in the field of start-up success prediction. This was accomplished by including more criteria in the study, which was made possible by the Crunchbase database, which is the largest available information source for start-ups. Furthermore, the data used in this thesis was limited to Western European start-ups only in order to study the effects of limiting the data to a certain geographical region on the prediction models, which to our knowledge has not been done before in this type of research. The quantitative method used was machine learning and specifically the three machine learning predictors used in this thesis were Logistic Regression, Random Forest and K-nearest Neighbor (KNN). All three models proposed and evaluated have a better prediction accuracy than guessing the outcome at random. When tested on data previously unknown to the model, Random Forest produced the greatest results, predicting a successful company as a success and a failed company as a failure with 79 percent accuracy. With accuracies of 65 percent and 59 percent, respectively, both logistic regression and K-Nearest Neighbor (KNN) were close behind. / Syftet med denna avhandling var att använda en kvantitativ metod för att utöka tidigare forskning inom modellering av framgångsfaktorer för start-ups genom maskininlärning. Detta kunde åstadkommas genom att inkludera fler kriterier i studien än vad som har gjorts tidigare, vilket möjliggjordes av Crunchbase-databasen, som är den största tillgängliga informationskällan för nystartade företag. Dessutom är den data som användes i denna avhandling begränsad till endast västeuropeiska start-ups för att studera effekterna av att begränsa data till ett visst geografiskt område i prediktionsmodellerna, vilket inte har gjorts tidigare i denna typ av forskning. Den kvantitativa metoden som användes var maskininlärning och specifikt var de tre maskininlärningsmodellerna som användes i denna avhandling Logistic Regression, Random Forest och K-Nearest Neighbor (KNN). Alla tre modeller som inkluderats och utvärderats har en bättre förutsägelsesnoggrannhet än att gissa resultatet slumpmässigt. När modellerna testades med data som tidigare varit okänd för modellerna, gav Random Forest det bästa resultatet och predikterade ett framgångsrikt företag korrekt och ett misslyckat företag korrekt med 79 procents noggrannhet. Nära efter kom både K-Nearest Neighbor (KNN) och Logistic Regression med respektive noggrannheter på 65 och 59 procent.
|
182 |
Prediction of Bronchopulmonary Dysplasia by a Priori and Longitudinal Risk Factors in Extremely Premature InfantsPax, Benjamin M. 01 June 2018 (has links)
No description available.
|
183 |
Examining the reliability of logistic regression estimation softwareMo, Lijia January 1900 (has links)
Doctor of Philosophy / Department of Agricultural Economics / Allen M. Featherstone / Bryan W. Schurle / The reliability of nine software packages using the maximum likelihood estimator for the logistic regression model were examined using generated benchmark datasets and models. Software packages tested included: SAS (Procs Logistic, Catmod, Genmod, Surveylogistic, Glimmix, and Qlim), Limdep (Logit, Blogit), Stata (Logit, GLM, Binreg), Matlab, Shazam, R, Minitab, Eviews, and SPSS for all available algorithms, none of which have been previously tested. This study expands on the existing literature in this area by examination of Minitab 15 and SPSS 17. The findings indicate that Matlab, R, Eviews, Minitab, Limdep (BFGS), and SPSS provided consistently reliable results for both parameter and standard error estimates across the benchmark datasets. While some packages performed admirably, shortcomings did exist. SAS maximum log-likelihood estimators do not always converge to the optimal solution and stop prematurely depending on starting values, by issuing a ``flat" error message. This drawback can be dealt with by rerunning the maximum log-likelihood estimator, using a closer starting point, to see if the convergence criteria are actually satisfied. Although Stata-Binreg provides reliable parameter estimates, there is no way to obtain standard error estimates in Stata-Binreg as of yet. Limdep performs relatively well, but did not converge due to a weakness of the algorithm. The results show that solely trusting the default settings of statistical software packages may lead to non-optimal, biased or erroneous results, which may impact the quality of empirical results obtained by applied economists. Reliability tests indicate severe weaknesses in SAS Procs Glimmix and Genmod. Some software packages fail reliability tests under certain conditions. The finding indicates the need to use multiple software packages to solve econometric models.
|
184 |
Comparative study of neural networks and design of experiments to the classification of HIV status / Wilbert Sibanda.Sibanda, Wilbert January 2013 (has links)
This research addresses the novel application of design of experiment, artificial neural net-works and logistic regression to study the effect of demographic characteristics on the risk of acquiring HIV infection among the antenatal clinic attendees in South Africa. The annual antenatal HIV survey is the only major national indicator for HIV prevalence in South Africa. This is a vital technique to understand the changes in the HIV epidemic over time. The annual antenatal clinic data contains the following demographic characteristics for each pregnant woman; age (herein called mother's age), partner's age (herein father's age), population group (race), level of education, gravidity (number of pregnancies), parity (number of children born), HIV and syphilis status. This project applied a screening design of experiment technique to rank the effects of individual demographic characteristics on the risk of acquiring an HIV infection. There are a various screening design techniques such as fractional or full factorial and Plackett-Burman designs. In this work, a two-level fractional factorial design was selected for the purposes of screening. In addition to screening designs, this project employed response surface methodologies (RSM) to estimate interaction and quadratic effects of demographic characteristics using a central composite face-centered and a Box-Behnken design. Furthermore, this research presents the novel application of multi-layer perceptron’s (MLP) neural networks to model the demographic characteristics of antenatal clinic attendees. A review report was produced to study the application of neural networks to modelling HIV/AIDS around the world. The latter report is important to enhance our understanding of the extent to which neural networks have been applied to study the HIV/AIDS pandemic. Finally, a binary logistic regression technique was employed to benchmark the results obtained by the design of experiments and neural networks methodologies. The two-level fractional factorial design demonstrated that HIV prevalence was highly sensitive to changes in the mother's age (15-55 years) and level of her education (Grades 0-13). The central composite face centered and Box-Behnken designs employed to study the individual and interaction effects of demographic characteristics on the spread of HIV in South Africa, demonstrated that HIV status of an antenatal clinic attendee was highly sensitive to changes in pregnant mother's age and her educational level. In addition, the interaction of the mother's age with other demographic characteristics was also found to be an important determinant of the risk of acquiring an HIV infection. Furthermore, the central composite face centered and Box-Behnken designs illustrated that, individual-ally the pregnant mother's parity and her partner's age had no marked effect on her HIV status. However, the pregnant woman’s parity and her male partner’s age did show marked effects on her HIV status in “two way interactions with other demographic characteristics”. The multilayer perceptron (MLP) sensitivity test also showed that the age of the pregnant woman had the greatest effect on the risk of acquiring an HIV infection, while her gravidity and syphilis status had the lowest effects. The outcome of the MLP modelling produced the same results obtained by the screening and response surface methodologies. The binary logistic regression technique was compared with a Box-Behnken design to further elucidate the differential effects of demographic characteristics on the risk of acquiring HIV amongst pregnant women. The two methodologies indicated that the age of the pregnant woman and her level of education had the most profound effects on her risk of acquiring an HIV infection. To facilitate the comparison of the performance of the classifiers used in this study, a receiver operating characteristics (ROC) curve was applied. Theoretically, an ROC analysis provides tools to select optimal models and to discard suboptimal ones independent from the cost context or the classification distribution. SAS Enterprise MinerTM was employed to develop the required receiver-of-characteristics (ROC) curves. To validate the results obtained by the above classification methodologies, a credit scoring add-on in SAS Enterprise MinerTM was used to build binary target scorecards comprised of HIV positive and negative datasets for probability determination. The process involved grouping variables using weights-of-evidence (WOE), prior to performing a logistic regression to produce predicted probabilities. The process of creating bins for the scorecard enables the study of the inherent relationship between demographic characteristics and an in-dividual’s HIV status. This technique increases the understanding of the risk ranking ability of the scorecard method, while offering an added advantage of being predictive.
|
185 |
Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized ApproachChen, Wei 06 November 2015 (has links)
In this paper, a rheumatoid arthritis (RA) medicine clinical dataset with an ordinal response is selected to study this new medicine. In the dataset, there are four features, sex, age,treatment, and preliminary. Sex is a binary categorical variable with 1 indicates male, and 0 indicates female. Age is the numerical age of the patients. And treatment is a binary categorical variable with 1 indicates has RA, and 0 indicates does not have RA. And preliminary is a five class categorical variable indicates the patient’s RA severity status before taking the medication. The response Y is 5 class ordinal variable shows the severity of patient’s RA severity after taking the medication.
The primary aim of this study is to determine what factors play a significant role in determine the response after taking the medicine. First, cumulative logistic regression is applied to the dataset to examine the effect of various factors on ordinal response. Secondly, the ordinal response is categorized into two classes. Then logistic regression is conducted to the RA dataset to see if the variable selection would be different. Moreover, the shrinkage methods, elastic net and lasso are used to make a variable selection on the RA dataset of two-class response for the purpose of adding penalization to increase the model’s robustness.The four model results were compared at the end of the paper. From the comparison result, logistic regression has a better performance on variable selection than the other three approaches based on P-value.
|
186 |
Comparative study of neural networks and design of experiments to the classification of HIV status / Wilbert Sibanda.Sibanda, Wilbert January 2013 (has links)
This research addresses the novel application of design of experiment, artificial neural net-works and logistic regression to study the effect of demographic characteristics on the risk of acquiring HIV infection among the antenatal clinic attendees in South Africa. The annual antenatal HIV survey is the only major national indicator for HIV prevalence in South Africa. This is a vital technique to understand the changes in the HIV epidemic over time. The annual antenatal clinic data contains the following demographic characteristics for each pregnant woman; age (herein called mother's age), partner's age (herein father's age), population group (race), level of education, gravidity (number of pregnancies), parity (number of children born), HIV and syphilis status. This project applied a screening design of experiment technique to rank the effects of individual demographic characteristics on the risk of acquiring an HIV infection. There are a various screening design techniques such as fractional or full factorial and Plackett-Burman designs. In this work, a two-level fractional factorial design was selected for the purposes of screening. In addition to screening designs, this project employed response surface methodologies (RSM) to estimate interaction and quadratic effects of demographic characteristics using a central composite face-centered and a Box-Behnken design. Furthermore, this research presents the novel application of multi-layer perceptron’s (MLP) neural networks to model the demographic characteristics of antenatal clinic attendees. A review report was produced to study the application of neural networks to modelling HIV/AIDS around the world. The latter report is important to enhance our understanding of the extent to which neural networks have been applied to study the HIV/AIDS pandemic. Finally, a binary logistic regression technique was employed to benchmark the results obtained by the design of experiments and neural networks methodologies. The two-level fractional factorial design demonstrated that HIV prevalence was highly sensitive to changes in the mother's age (15-55 years) and level of her education (Grades 0-13). The central composite face centered and Box-Behnken designs employed to study the individual and interaction effects of demographic characteristics on the spread of HIV in South Africa, demonstrated that HIV status of an antenatal clinic attendee was highly sensitive to changes in pregnant mother's age and her educational level. In addition, the interaction of the mother's age with other demographic characteristics was also found to be an important determinant of the risk of acquiring an HIV infection. Furthermore, the central composite face centered and Box-Behnken designs illustrated that, individual-ally the pregnant mother's parity and her partner's age had no marked effect on her HIV status. However, the pregnant woman’s parity and her male partner’s age did show marked effects on her HIV status in “two way interactions with other demographic characteristics”. The multilayer perceptron (MLP) sensitivity test also showed that the age of the pregnant woman had the greatest effect on the risk of acquiring an HIV infection, while her gravidity and syphilis status had the lowest effects. The outcome of the MLP modelling produced the same results obtained by the screening and response surface methodologies. The binary logistic regression technique was compared with a Box-Behnken design to further elucidate the differential effects of demographic characteristics on the risk of acquiring HIV amongst pregnant women. The two methodologies indicated that the age of the pregnant woman and her level of education had the most profound effects on her risk of acquiring an HIV infection. To facilitate the comparison of the performance of the classifiers used in this study, a receiver operating characteristics (ROC) curve was applied. Theoretically, an ROC analysis provides tools to select optimal models and to discard suboptimal ones independent from the cost context or the classification distribution. SAS Enterprise MinerTM was employed to develop the required receiver-of-characteristics (ROC) curves. To validate the results obtained by the above classification methodologies, a credit scoring add-on in SAS Enterprise MinerTM was used to build binary target scorecards comprised of HIV positive and negative datasets for probability determination. The process involved grouping variables using weights-of-evidence (WOE), prior to performing a logistic regression to produce predicted probabilities. The process of creating bins for the scorecard enables the study of the inherent relationship between demographic characteristics and an in-dividual’s HIV status. This technique increases the understanding of the risk ranking ability of the scorecard method, while offering an added advantage of being predictive.
|
187 |
A country bug in the city: urban infestation by the Chagas disease vector Triatoma infestans in Arequipa, PeruDelgado, Stephen, Ernst, Kacey, Pumahuanca, Maria Luz, Yool, Stephen, Comrie, Andrew, Sterling, Charles, Gilman, Robert, Naquira, Cesar, Levy, Michael, the Chagas Disease Working Group, in Arequipa January 2013 (has links)
BACKGROUND:Interruption of vector-borne transmission of Trypanosoma cruzi remains an unrealized objective in many Latin American countries. The task of vector control is complicated by the emergence of vector insects in urban areas.METHODS:Utilizing data from a large-scale vector control program in Arequipa, Peru, we explored the spatial patterns of infestation by Triatoma infestans in an urban and peri-urban landscape. Multilevel logistic regression was utilized to assess the associations between household infestation and household- and locality-level socio-environmental measures.RESULTS:Of 37,229 households inspected for infestation, 6,982 (18.8% / 95% CI: 18.4 - 19.2%) were infested by T. infestans. Eighty clusters of infestation were identified, ranging in area from 0.1 to 68.7 hectares and containing as few as one and as many as 1,139 infested households. Spatial dependence between infested households was significant at distances up to 2,000 meters. Household T. infestans infestation was associated with household- and locality-level factors, including housing density, elevation, land surface temperature, and locality type.CONCLUSIONS:High levels of T. infestans infestation, characterized by spatial heterogeneity, were found across extensive urban and peri-urban areas prior to vector control. Several environmental and social factors, which may directly or indirectly influence the biology and behavior of T. infestans, were associated with infestation. Spatial clustering of infestation in the urban context may both challenge and inform surveillance and control of vector reemergence after insecticide intervention.
|
188 |
Eine empirische Analyse des individuellen Verkehrsmittelwahlverhaltens am Beispiel der Stadt DresdenSchletze, Matthias 15 December 2015 (has links) (PDF)
Das Verkehrsmittelwahlverhalten von Menschen ist komplex. So spielen soziodemografische, sozioökonomische sowie raum- und siedlungsstrukturelle Merkmale eine Rolle. In dieser Arbeit wird dieses Verhalten untersucht. Dabei wird eine homogene Grundgesamtheit geschaffen, welche alle Personen beinhaltet, die sowohl über eine Dauerkarte des öffentlichen Personenverkehrs als auch einen Personenkraftwagen verfügen. Anhand derer soll eine deskriptive Analyse und eine multinomiale logistische Regression Aufschluss geben, ob es Unterschiede zwischen den jeweiligen Nutzergruppen gibt.
So lässt sich die Gruppe der ÖV-Nutzer durch folgende Charakteristiken beschreiben: der Großteil sind Frauen, sowie Personen, die eine hohe schulische und berufliche Bildung besitzen. Des Weiteren werden eher weniger Wege mit dem ÖV als mit dem PKW zurückgelegt. Erwerbstätige hingegen entscheiden sich eher für den PKW. / Human behavior towards the choice of transportation varies in very complex ways such as sociodemographics, socioeconomics as well as settlement structures. For this paper a homogenous population is created from season ticket holders for public transportation and car owners. Based on this population a descriptive analysis followed by a multinomial logistic regression is supposed to generate the differences between the user groups.
The group of users of the public transportation system can be characterized as followed: the majority of users are women as well as highly educated people. Within this specific group distances are more likely to be covered by public transportation rather than by car. However the working population prefers to go by passenger car.
|
189 |
Effektivisering av urvalsprocesser vid analysering av björnspillning : Ett förslag till den svenska förvaltningen av brunbjörn Ursus arctosGustafsson, Jonas January 2015 (has links)
The aim with this report is to formulate a strategic method to optimize selection processes of DNA-samples from a faeces inventory to identify as many individuals in as few analyzes as possible, and by that keep down the costs of brown bear management. Brown bear management in Sweden founds today on results from faeces inventory and is substantially led by the county administration boards. Data from the years of 2004 and 2009´s inventories in Västerbotten was used to test and evaluate different methods in selection processes of which faeces that should be sampled. Comparison were made between making selection by chance, by spatial distribution and by calculating variations in logistic regressions coefficient b, in other words bear density and probability in finding same individual in several faeces. We can show making selection by chance is the most uncertain method. Making selection by spatial distribution, without take in account variations in b, provides the highest number of identified individuals at a low labour and thus a low cost. Therefor we strongly recommend future brown bear management to, if not possible to sample all faces in a dataset, make selection by spatial distribution to minimize the risk of sampling the same bear several times.
|
190 |
THE MOBILITY OF FECAL INDICATOR MICROORGANISMS WITHIN A KARST GROUNDWATER BASIN IN THE INNER BLUEGRASS REGION, KENTUCKYWard, James Wade 01 January 2008 (has links)
This project implemented novel approaches to assess the source, age, concentration and mobility of fecal indicator microorganisms within a karst groundwater system. Research was conducted in the well-characterized Blue Hole Spring karst groundwater basin in Versailles, Woodford County, Kentucky. At this site the AC/TC ratio and fecal coliform (FC) bacteria counts were used to delineate sources of fecal inputs and determine relative age of the fecal matter. An aging experiment using indicator bacteria (total coliform (TC) and atypical colonies (AC)), which approximated subsurface conditions, indicated that changes in the AC/TC ratio are likely to be retarded during bacterial transport through karst conduits. Decreases in the AC/TC ratio during the monitoring period appear to be the result of sewage releases. Multiple logistic regression (MLR) modeling was performed to examine correlations between physiochemical parameters and FC concentrations. MLR models using physiochemical parameters correctly predicted “safe for contact” (< 200 cfu/100 mL FC) conditions 65.6% of the time and “unsafe for contact” (> 200 cfu/100 mL FC) conditions 69.2% of the time at Blue Hole Spring. Modeling using other indicators (TC and AC) predicted “safe for contact” conditions 87.5% of the time and “unsafe for contact” conditions 61.5% of the time. A series of tracer tests were performed to compare transport of solute and abiotic particle tracers (rhodamine WT fluorescent dye, bromide and fluorescent bacteria-sized microspheres) and bacteria (15N-enriched wild-type E. coli) within the karst system. The surrogate tracers did not suitably mimic microbial mobility within the basin. Solutes and 15N-enriched E. coli arrived concurrently during storm flow to Blue Hole Spring, whereas microsphere breakthrough corresponded with maximum solute concentrations. The 15Nenriched E. coli exhibited slightly more tailing during storm-flow recession than solute tracers, none of which exhibited remobilization. Microspheres demonstrated remobilization within the conduits that correlated with later increases in discharge related to secondary storm events.
|
Page generated in 0.073 seconds