Global ETD Search

911	ESSAYS ON SPATIAL DIFFERENTIATION AND IMPERFECT COMPETITION IN AGRICULTURAL PROCUREMENT MARKETS Jinho Jung (9160868) 29 July 2020 (has links) <div> <p>First Essay: We study the effect of entry of ethanol plants on the spatial pattern of corn prices. We use pre- and post-entry data from corn elevators to implement a clean identification strategy that allows us to quantify how price effects vary with the size of the entrant (relative to local corn production) and with distance from the elevator to the entrant. We estimate Difference-In-Difference (DID) and DID-matching models with linear and non-linear distance specifications. We find that the average-sized entrant causes an increase in corn price that ranges from 10 to 15 cents per bushel at the plant’s location, depending on the model specification. We also find that, on average, the price effect dissipates 60 miles away from the plant. Our results indicate that the magnitude of the price effect as well as its spatial pattern vary substantially with the size of the entrant relative to local corn supply. Under our preferred model, the largest entrant in our sample causes an estimated price increase of 15 cents per bushel at the plant’s site and the price effect propagates over 100 miles away. In contrast, the smallest entrant causes a price increase of only 2 cents per bushel at the plant’s site and the price effect dissipates within 15 miles of the plant. Our results are qualitatively robust to the pre-treatment matching strategy, to whether spatial effects are assumed to be linear or nonlinear, and to placebo tests that falsify alternative explanations.</p><p><br></p></div> <p>Second Essay: We estimate the cost of transporting corn and the resulting degree of spatial differentiation among downstream firms that buy corn from upstream farmers and examine whether such differentiation softens competition enabling buyers to exert market power (defined as the ability to pay a price for corn that is below its marginal value product net of processing cost). We estimate a structural model of spatial competition using corn procurement data from the US state of Indiana from 2004 to 2014. We adopt a strategy that allows us to estimate firm-level structural parameters while using aggregate data. Our results return a transportation cost of 0.12 cents per bushel per mile (3% of the corn price under average conditions), which provides evidence of spatial differentiation among buyers. The estimated average markdown is $0.80 per bushel (16% of the average corn price in the sample), of which $0.34 is explained by spatial differentiation and the rest by the fact that firms operated under binding capacity constraints. We also find that corn prices paid to farmers at the mill gate are independent of distance between the plant and the farm, providing evidence that firms do not engage in spatial price discrimination. Finally, we evaluate the effect of hypothetical mergers on input markets and farm surplus. A merger between nearby ethanol producers eases competition, increases markdowns by 20%, and triggers a sizable reduction in farm surplus. In contrast, a merger between distant buyers has little effect on competition and markdowns.</p><p><br></p> Third Essay: We study the dynamic response of local corn prices to entry of ethanol plants. We use spatially explicit panel data on elevator-level corn prices and ethanol plant entry and capacity to estimate an autoregressive distributed lag model with instrumental variables. We find that the average-sized entrant has no impact on local corn prices the year of entry. However, the price subsequently rises and stabilizes after two years at a level that is about 10 cents per bushel higher than the pre-entry level. This price effect dissipates as the distance between elevators and plants increase. Our results imply that long-run (2 years) supply elasticity is smaller than short-run (year of entry) supply elasticity. This may be due to rotation benefits that induce farmers to revert back to soybeans, after switching to corn due to price signals the year the plant enters. Furthermore, our results, in combination with findings in essay 2 of this dissertation, indicate that ethanol plants are likely to use pricing strategies consistent with a static rather than dynamic oligopsony competition. Agricultural Economics Econometric and Statistical Methods Panel Data Analysis Time-Series Analysis Corn Procurement Transportation Costs Spatial Differentiation Buyer Power Spatial Price Discrimination Merger
912	Analýza rozhodování Úřadu pro ochranu hospodářské soutěže v oblasti veřejných zakázek / An analysis of the decision making of the Office for the protection of competition Šipkovská, Silvie January 2016 (has links) The thesis deals with the nature of the decision-making of the Office for the Protection of Competition (hereinafter "the OPP") in the area of public procurement. The theoretical part describes the various methods used, the legislative framework for the decision-making processes of the OPP (from the point of view of the currently effective legislation and the new Act on public procurement), and summarizes theoretical assumptions. In the analytical part, selected decisions of the OPP from the years 2005-2015 are analysed, using descriptive and interferential statistical methods. In terms of the subject matter, the OPP most often conducts proceedings in relation to complaints against alleged violation of the prohibition of discriminatory practices. Decisions of the OPP are challenged before administrative courts in only 4% of cases, in spite of the fact that contracting authorities are found guilty of committing an administrative offense in almost 80% of the cases. Commonly imposed penalties are fines. The trend of decisions on guilt, as well as the imposition of fines, is growing, however, the level of fines remains unchanged. A contracting authority which awards a public works contract is more likely to be found guilty than a contracting authority awarding other public contracts. It is also more...
913	Chemometrics applied to the discrimination of synthetic fibers by microspectrophotometry Reichard, Eric Jonathan 03 January 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Microspectrophotometry is a quick, accurate, and reproducible method to compare colored fibers for forensic purposes. The use of chemometric techniques applied to spectroscopic data can provide valuable discriminatory information especially when looking at a complex dataset. Differentiating a group of samples by employing chemometric analysis increases the evidential value of fiber comparisons by decreasing the probability of false association. The aims of this research were to (1) evaluate the chemometric procedure on a data set consisting of blue acrylic fibers and (2) accurately discriminate between yellow polyester fibers with the same dye composition but different dye loadings along with introducing a multivariate calibration approach to determine the dye concentration of fibers. In the first study, background subtracted and normalized visible spectra from eleven blue acrylic exemplars dyed with varying compositions of dyes were discriminated from one another using agglomerative hierarchical clustering (AHC), principal component analysis (PCA), and discriminant analysis (DA). AHC and PCA results agreed showing similar spectra clustering close to one another. DA analysis indicated a total classification accuracy of approximately 93% with only two of the eleven exemplars confused with one another. This was expected because two exemplars consisted of the same dye compositions. An external validation of the data set was performed and showed consistent results, which validated the model produced from the training set. In the second study, background subtracted and normalized visible spectra from ten yellow polyester exemplars dyed with different concentrations of the same dye ranging from 0.1-3.5% (w/w), were analyzed by the same techniques. Three classes of fibers with a classification accuracy of approximately 96% were found representing low, medium, and high dye loadings. Exemplars with similar dye loadings were able to be readily discriminated in some cases based on a classification accuracy of 90% or higher and a receiver operating characteristic area under the curve score of 0.9 or greater. Calibration curves based upon a proximity matrix of dye loadings between 0.1-0.75% (w/w) were developed that provided better accuracy and precision to that of a traditional approach. chemometrics dye loading textile fibers multivariate statistics polyester acrylic Microspectrophotometry -- Research Chemometrics -- Research Dyes and dyeing -- Chemistry Polymers -- Structure
914	Augmenting Indiana's groundwater level monitoring network: optimal siting of additional wells to address spatial and categorical sampling gaps Sperl, Benjamin J. 21 November 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Groundwater monitoring networks are subject to change by budgetary actions and stakeholder initiatives that result in wells being abandoned or added. A strategy for network design is presented that addresses the latter situation. It was developed in response to consensus in the state of Indiana that additional monitoring wells are needed to effectively characterize water availability in aquifer systems throughout the state. The strategic methodology has two primary objectives that guide decision making for new installations: (1) purposive sampling of a diversity of environmental variables having relevance to groundwater recharge, and (2) spatial optimization by means of maximizing geographic distances that separate monitoring wells. Design objectives are integrated in a discrete facility location model known as the p-median problem, and solved to optimality using a mathematical programming package. Groundwater Monitoring Network Optimal Wells p-median Water quality -- Measurement Groundwater -- Pollution -- Measurement Groundwater -- Pollution -- Indiana Groundwater -- Mathematical models Groundwater -- Quality -- Indiana Environmental monitoring -- Indiana Groundwater recharge -- Indiana
915	Population genetic analysis of the black blow fly Phormia regina (Meigen) (Diptera: Calliphoridae) Whale, John W. January 2015 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The black blow fly, Phormia regina (Diptera: Calliphoridae), is a widely abundant fly autochthonous to North America. Like many other Calliphorids, P. regina plays a key role in several disciplines particularly in estimating post-mortem intervals (PMI). The aim of this work was to better understand the population genetic structure of this important ecological species using microsatellites from populations collected in the U.S. during 2008 and 2013. Additionally, it sought to determine the effect of limited genetic diversity on a quantitative trait throughout immature development; larval length, a measurement used to estimate specimen age. Observed heterozygosity was lower than expected at five of the six loci and ranged from 0.529-0.880 compared to expected heterozygosity that ranged from 0.512-0.980, this is indicative of either inbreeding or the presence of null alleles. Kinship coefficients indicate that individuals within each sample are not strongly related to one another; values for the wild-caught populations ranged from 0.033-0.171 and a high proportion of the genetic variation (30%) can be found among samples within regions. The population structure of this species does not correlate well to geography; populations are different to one another resulting from a lack of gene flow irrespective of geographic distance, thus inferring temporal distance plays a greater role on the genetic variation of P. regina. Among colonized samples, flies lost much of their genetic diversity, ≥67% of alleles per locus were lost, and population samples became increasingly more related; kinship coefficient values increased from 0.036 for the wild-caught individuals to 0.261 among the F10 specimens. Colonized larvae also became shorter in length following repeated inbreeding events, with the longest recorded specimen in F1 18.75 mm in length while the longest larva measured in F11 was 1.5 mm shorter at 17.25 mm. This could have major implications in forensic entomology, as the largest specimen is often assumed to be the oldest on the corpse and is subsequently used to estimate a postmortem interval. The reduction in length ultimately resulted in a greater proportion of individuals of a similar length; the range of data became reduced. Consequently, the major reduction in genetic diversity indicates that the loss in the spread of length distributions of the larvae may have a genetic influence or control. Therefore, this data highlights the importance when undertaking either genetic or development studies, particularly of blow flies such as Phormia regina, that collections of specimens and populations take place not only from more than one geographic location, but more importantly from more than one temporal event. Life cycles (Biology) -- Genetic aspects Molecular biology -- Mathematical models Gene expression -- Statistical methods Postmortem changes
916	Computational modeling for identification of low-frequency single nucleotide variants Hao, Yangyang 16 November 2015 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Reliable detection of low-frequency single nucleotide variants (SNVs) carries great significance in many applications. In cancer genetics, the frequencies of somatic variants from tumor biopsies tend to be low due to contamination with normal tissue and tumor heterogeneity. Circulating tumor DNA monitoring also faces the challenge of detecting low-frequency variants due to the small percentage of tumor DNA in blood. Moreover, in population genetics, although pooled sequencing is cost-effective compared with individual sequencing, pooling dilutes the signals of variants from any individual. Detection of low frequency variants is difficult and can be cofounded by multiple sources of errors, especially next-generation sequencing artifacts. Existing methods are limited in sensitivity and mainly focus on frequencies around 5%; most fail to consider differential, context-specific sequencing artifacts. To face this challenge, we developed a computational and experimental framework, RareVar, to reliably identify low-frequency SNVs from high-throughput sequencing data. For optimized performance, RareVar utilized a supervised learning framework to model artifacts originated from different components of a specific sequencing pipeline. This is enabled by a customized, comprehensive benchmark data enriched with known low-frequency SNVs from the sequencing pipeline of interest. Genomic-context-specific sequencing error model was trained on the benchmark data to characterize the systematic sequencing artifacts, to derive the position-specific detection limit for sensitive low-frequency SNV detection. Further, a machine-learning algorithm utilized sequencing quality features to refine SNV candidates for higher specificity. RareVar outperformed existing approaches, especially at 0.5% to 5% frequency. We further explored the influence of statistical modeling on position specific error modeling and showed zero-inflated negative binomial as the best-performed statistical distribution. When replicating analyses on an Illumina MiSeq benchmark dataset, our method seamlessly adapted to technologies with different biochemistries. RareVar enables sensitive detection of low-frequency SNVs across different sequencing platforms and will facilitate research and clinical applications such as pooled sequencing, cancer early detection, prognostic assessment, metastatic monitoring, and relapses or acquired resistance identification. Low-frequency variants Machine-learning Next generation sequencing SNVs Somatic mutations Statistical modeling Cancer -- Genetic aspects Genetics -- Statistics Genomics Machine learning -- Mathematical models Mathematical optimization Biopsy Population genetics
917	Statistical Methods for Analysis of the Homeowner's Impact on Property Valuation and Its Relation to the Mortgage Portfolio / Statistiska metoder för analys av husägarens påverkan på husvärdet och dess koppling till Hamell Hamell, Clara January 2020 (has links) The current method for house valuations in mortgage portfolio models corresponds to applying a residential property price index (RPPI) to the purchasing price (or last known valuation). This thesis introduces an alternative house valuation method, which combines the current one with the bank's customer data. This approach shows that the gap between the actual house value and the current estimated house value can to some extent be explained by customer attributes, especially for houses where the homeowner is a defaulted customer. The inclusion of customer attributes can either reduce false overestimation or predict whether or not the current valuation is an overestimation or underestimation. This particular property is of interest in credit risk, as false overestimations can have negative impacts on the mortgage portfolio. The statistical methods that were used in this thesis were the data mining techniques regression and clustering. / De modeller och tillvägagångssätt som i dagsläget används för husvärdering i bolåneportföljen bygger på husprisindexering och köpesskilling. Denna studie introducerar ett alternativt sätt att uppskattta husvärdet, genom att kombinera dagens metod med bankens egna kunddata. Det här tillvägagångssättet visar på att gapet mellan det faktiska och det uppskattade husvärdet kan i viss mån förklaras av kunddata, framförallt där husägaren är en fallerad kund. Inkluderandet av kunddata kan både minska dagens övervärdering samt predicera huruvida dagens uppskattning är en övervärdering eller undervärdering. För fallerade kunder gav den alternativa husvärderingen ett mer sanningsenligt uppskattat värde av försäljningspriset än den traditionella metoden. Denna egenskap är av intresse inom kreditrisk, då en falsk övervärdering kan ha negativa konsekvenser på bolåneportföljen, framförallt för fallerade kunder. De statistiska verktyg som användes i denna studie var diverse regressionsmetoder samt klusteranalys. Mortgage portfolio Defaulted customers House valuation Residential Properety Price Index (RPPI) SPAR Customer data Statistical methods Bolåneportfölj Fallerade kunder Husvärdering Husprisindexering SPAR Kunddata Statistiska metoder Probability Theory and Statistics Sannolikhetsteori och statistik
918	Myson Burch Thesis Myson C Burch (16637289) 08 August 2023 (has links) <p>With the completion of the Human Genome Project and many additional efforts since, there is an abundance of genetic data that can be leveraged to revolutionize healthcare. Now, there are significant efforts to develop state-of-the-art techniques that reveal insights about connections between genetics and complex diseases such as diabetes, heart disease, or common psychiatric conditions that depend on multiple genes interacting with environmental factors. These methods help pave the way towards diagnosis, cure, and ultimately prediction and prevention of complex disorders. As a part of this effort, we address high dimensional genomics-related questions through mathematical modeling, statistical methodologies, combinatorics and scalable algorithms. More specifically, we develop innovative techniques at the intersection of technology and life sciences using biobank scale data from genome-wide association studies (GWAS) and machine learning as an effort to better understand human health and disease. <br> <br> The underlying principle behind Genome Wide Association Studies (GWAS) is a test for association between genotyped variants for each individual and the trait of interest. GWAS have been extensively used to estimate the signed effects of trait-associated alleles, mapping genes to disorders and over the past decade about 10,000 strong associations between genetic variants and one (or more) complex traits have been reported. One of the key challenges in GWAS is population stratification which can lead to spurious genotype-trait associations. Our work proposes a simple clustering-based approach to correct for stratification better than existing methods. This method takes into account the linkage disequilibrium (LD) while computing the distance between the individuals in a sample. Our approach, called CluStrat, performs Agglomerative Hierarchical Clustering (AHC) using a regularized Mahalanobis distance-based GRM, which captures the population-level covariance (LD) matrix for the available genotype data.<br> <br> Linear mixed models (LMMs) have been a popular and powerful method when conducting genome-wide association studies (GWAS) in the presence of population structure. LMMs are computationally expensive relative to simpler techniques. We implement matrix sketching in LMMs (MaSk-LMM) to mitigate the more expensive computations. Matrix sketching is an approximation technique where random projections are applied to compress the original dataset into one that is significantly smaller and still preserves some of the properties of the original dataset up to some guaranteed approximation ratio. This technique naturally applies to problems in genetics where we can treat large biobanks as a matrix with the rows representing samples and columns representing SNPs. These matrices will be very large due to the large number of individuals and markers in biobanks and can benefit from matrix sketching. Our approach tackles the bottleneck of LMMs directly by using sketching on the samples of the genotype matrix as well as sketching on the markers during the computation of the relatedness or kinship matrix (GRM). <br> <br> Predictive analytics have been used to improve healthcare by reinforcing decision-making, enhancing patient outcomes, and providing relief for the healthcare system. These methods help pave the way towards diagnosis, cure, and ultimately prediction and prevention of complex disorders. The prevalence of these complex diseases varies greatly around the world. Understanding the basis of this prevalence difference can help disentangle the interaction among different factors causing complex disorders and identify groups of people who may be at a greater risk of developing certain disorders. This could become the basis of the implementation of early intervention strategies for populations at higher risk with significant benefits for public health.<br> <br> This dissertation broadens our understanding of empirical population genetics. It proposes a data-driven perspective to a variety of problems in genetics such as confounding factors in genetic structure. This dissertation highlights current computational barriers in open problems in genetics and provides robust, scalable and efficient methods to ease the analysis of genotype data.</p> Applications in health Applications in life sciences Data engineering and data science computational biology and chemistry Statistical methods and models numerical linear algebra Genetics & Genomics big data challenges
919	Vibration-Based Health Monitoring of Rotating Systems with Gyroscopic Effect Gavrilovic, Nenad 01 March 2015 (has links) (PDF) This thesis focuses on the simulation of the gyroscopic effect using the software MSC Adams. A simple shaft-disk system was created and parameter of the sys-tem were changed in order to study the influence of the gyroscopic effect. It was shown that an increasing bearing stiffness reduces the precession motion. Fur-thermore, it was shown that the gyroscopic effect vanishes if the disk of system is placed symmetrically on the shaft, which reduces the system to a Jeffcott-Ro-tor. The second objective of this study was to analyze different defects in a simple fixed axis gear set. In particular, a cracked shaft, a cracked pinion and a chipped pinion as well as a healthy gear system were created and tested in Adams. The contact force between the two gears was monitored and the 2D and 3D frequency spectrum, as well as the Wavelet Transform, were plotted in order to compare the individual defects. It was shown that the Wavelet Transform is a powerful tool, capable of identifying a cracked gear with a non-constant speed. The last part of this study included fault detection with statistical methods as well as with the Sideband Energy Ratio (SER). The time domain signal of the individual faults were used to compare the mean, the standard deviation and the root mean square. Furthermore, the noise profile in the frequency spectrum was tracked with statistical methods using the mean and the standard deviation. It was demonstrated that it is possible to identify a cracked gear, as well as a chipped gear, with statistical methods. However, a cracked shaft could not be identified. The results also show that SER was only capable to identify major defects in a gear system such as a chipped tooth. Health monitoring Condition monitoting Fixed axis gear Fault de-tection Fast Fourier Transform FFT Short-Time Fourier Transform Short-Term Fourier Transform STFT 3D FFT SER Side Band Energy Ratio Statistical methods Wavelet Transform Acoustics, Dynamics, and Controls
920	Expeditious Causal Inference for Big Observational Data Yumin Zhang (13163253) 28 July 2022 (has links) <p>This dissertation address two significant challenges in the causal inference workflow for Big Observational Data. The first is designing Big Observational Data with high-dimensional and heterogeneous covariates. The second is performing uncertainty quantification for estimates of causal estimands that are obtained from the application of black box machine learning algorithms on the designed Big Observational Data. The methodologies developed by addressing these challenges are applied for the design and analysis of Big Observational Data from a large public university in the United States. </p> <h4>Distributed Design</h4> <p>A fundamental issue in causal inference for Big Observational Data is confounding due to covariate imbalances between treatment groups. This can be addressed by designing the study prior to analysis. The design ensures that subjects in the different treatment groups that have comparable covariates are subclassified or matched together. Analyzing such a designed study helps to reduce biases arising from the confounding of covariates with treatment. Existing design methods, developed for traditional observational studies consisting of a single designer, can yield unsatisfactory designs with sub-optimum covariate balance for Big Observational Data due to their inability to accommodate the massive dimensionality, heterogeneity, and volume of the Big Data. We propose a new framework for the distributed design of Big Observational Data amongst collaborative designers. Our framework first assigns subsets of the high-dimensional and heterogeneous covariates to multiple designers. The designers then summarize their covariates into lower-dimensional quantities, share their summaries with the others, and design the study in parallel based on their assigned covariates and the summaries they receive. The final design is selected by comparing balance measures for all covariates across the candidates and identifying the best amongst the candidates. We perform simulation studies and analyze datasets from the 2016 Atlantic Causal Inference Conference Data Challenge to demonstrate the flexibility and power of our framework for constructing designs with good covariate balance from Big Observational Data.</p> <h4>Designed Bootstrap</h4> <p>The combination of modern machine learning algorithms with the nonparametric bootstrap can enable effective predictions and inferences on Big Observational Data. An increasingly prominent and critical objective in such analyses is to draw causal inferences from the Big Observational Data. A fundamental step in addressing this objective is to design the observational study prior to the application of machine learning algorithms. However, the application of the traditional nonparametric bootstrap on Big Observational Data requires excessive computational efforts. This is because every bootstrap sample would need to be re-designed under the traditional approach, which can be prohibitive in practice. We propose a design-based bootstrap for deriving causal inferences with reduced bias from the application of machine learning algorithms on Big Observational Data. Our bootstrap procedure operates by resampling from the original designed observational study. It eliminates the need for additional, costly design steps on each bootstrap sample that are performed under the standard nonparametric bootstrap. We demonstrate the computational efficiency of this procedure compared to the traditional nonparametric bootstrap, and its equivalency in terms of confidence interval coverage rates for the average treatment effects, by means of simulation studies and a real-life case study.</p> <h4>Case Study</h4> <p>We apply the distributed design and designed bootstrap methodologies in a case study involving institutional data from a large public university in the United States. The institutional data contains comprehensive information about the undergraduate students in the university, ranging from their academic records to on-campus activities. We study the causal effects of undergraduate students’ attempted course load on their academic performance based on a selection of covariates from these data. Ultimately, our real-life case study demonstrates how our methodologies enable researchers to effectively use straightforward design procedures to obtain valid causal inferences with reduced computational efforts from the application of machine learning algorithms on Big Observational Data.</p> <p><br></p> Econometric and statistical methods Applied statistics Statistical data science Statistics not elsewhere classified Causal inference Design of observational studies Propensity score method Bootstrap resampling method Causal machine learning Institutional research Big data

Search results