Spelling suggestions: "subject:"data analysis"" "subject:"mata analysis""
21 |
Computers in production scheduling and loading - an evaluationTokar, Daniel January 1965 (has links)
Thesis (M.B.A)--Boston University / PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you. / 2031-01-01
|
22 |
Topics in functional data analysis with biological applicationsLi, Yehua 02 June 2009 (has links)
Functional data analysis (FDA) is an active field of statistics, in which the primary subjects
in the study are curves. My dissertation consists of two innovative applications of
functional data analysis in biology. The data that motivated the research broadened the
scope of FDA and demanded new methodology. I develop new nonparametric methods to
make various estimations, and I focus on developing large sample theories for the proposed
estimators.
The first project is motivated from a colon carcinogenesis study, the goal of which is to
study the function of a protein (p27) in colon cancer development. In this study, a number
of colonic crypts (units) were sampled from each rat (subject) at random locations along
the colon, and then repeated measurements on the protein expression level were made on
each cell (subunit) within the selected crypts. In this problem, measurements within each
crypt can be viewed as a function, since the measurements can be indexed by the cell
locations. The functions from the same subject are spatially correlated along the colon,
and my goal is to estimate this correlation function using nonparametric methods. We use
this data set as an motivation and propose a kernel estimator of the correlation function
in a more general framework. We develop a pointwise asymptotic normal distribution
for the proposed estimator when the number of subjects is fixed and the number of units within each subject goes to infinity. Based on the asymptotic theory, we propose a weighted
block bootstrapping method for making inferences about the correlation function, where the
weights account for the inhomogeneity of the distribution of the unit locations. Simulation
studies are also provided to illustrate the numerical performance of the proposed method.
My second project is on a lipoprotein profile data, where the goal is to use lipoprotein
profile curves to predict the cholesterol level in human blood. Again, motivated by the data,
we consider a more general problem: the functional linear models (Ramsay and Silverman,
1997) with functional predictor and scalar response. There is literature developing different
methods for this model; however, there is little theory to support the methods. Therefore,
we focus more on the theoretical properties of this model. There are other contemporary
theoretical work on methods based on Principal Component Regression. Our work is different
in the sense that we base our method on roughness penalty approach and consider a
more realistic scenario that the functional predictor is observed only on discrete points. To
reduce the difficulty of the theoretical derivations, we restrict the functions with a periodic
boundary condition and develop an asymptotic convergence rate for this problem in Chapter
III. A more general result based on splines is a future research topic that I give some
discussion in Chapter IV.
|
23 |
The Implications and Flow Behavior of the Hydraulically Fractured Wells in Shale Gas FormationAlmarzooq, Anas Mohammadali S. 2010 December 1900 (has links)
Shale gas formations are known to have low permeability. This low permeability can be as low as 100 nano darcies. Without stimulating wells drilled in the shale gas formations, it is hard to produce them at an economic rate. One of the stimulating approaches is by drilling horizontal wells and hydraulically fracturing the formation. Once the formation is fractured, different flow patterns will occur. The dominant flow regime observed in the shale gas formation is the linear flow or the transient drainage from the formation matrix toward the hydraulic fracture. This flow could extend up to years of production and it can be identified by half slop on the log-log plot of the gas rate against time. It could be utilized to evaluate the hydraulic fracture surface area and eventually evaluate the effectiveness of the completion job. Different models from the literature can be used to evaluate the completion job. One of the models used in this work assumes a rectangular reservoir with a slab shaped matrix between each two hydraulic fractures. From this model, there are at least five flow regions and the two regions discussed are the Region 2 in which bilinear flow occurs as a result of simultaneous drainage form the matrix and hydraulic fracture. The other is Region 4 which results from transient matrix drainage which could extend up to many years. The Barnett shale production data will be utilized throughout this work to show sample of the calculations.
This first part of this work will evaluate the field data used in this study following a systematic procedure explained in Chapter III. This part reviews the historical production, reservoir and fluid data and well completion records available for the wells being analyzed. It will also check for data correlations from the data available and explain abnormal flow behaviors that might occur utilizing the field production data. It will explain why some wells might not fit into each model. This will be followed by a preliminary diagnosis, in which flow regimes will be identified, unclear data will be filtered, and interference and liquid loading data will be pointed. After completing the data evaluation, this work will evaluate and compare the different methods available in the literature in order to decide which method will best fit to analyze the production data from the Barnett shale. Formation properties and the original gas in place will be evaluated and compared for different methods.
|
24 |
Temporal and spatial host-pathogen models with diverse types of transmissionTurner, Joanne January 2000 (has links)
No description available.
|
25 |
Hedge Funds and Survival AnalysisNhogue Wabo, Blanche Nadege 24 October 2013 (has links)
Using data from Hedge Fund Research, Inc. (HFR), this study adapts and expands
on existing methods in survival analysis in an attempt to investigate whether hedge
funds mortality can be predicted on the basis of certain hedge funds characteristics.
The main idea is to determine the characteristics which contribute the most to the
survival and failure probabilities of hedge funds and interpret them. We establish hazard
models with time-independent covariates, as well as time-varying covariates to interpret
the selected hedge funds characteristics. Our results show that size, age, performance,
strategy, annual audit, fund offshore and fund denomination are the characteristics that
best explain hedge fund failure. We find that 1% increase in performance decreases
the hazard by 3.3%, the small size and the less than 5 years old hedge funds are the
most likely to die and the event-driven strategy is the best to use as compare to others.
The risk of death is 0.668 times lower for funds who indicated that an annual audit
is performed as compared to the funds who did not indicated that an annual audit is
performed. The risk of death for the offshore hedge funds is 1.059 times higher than the
non-offshore hedge funds.
|
26 |
The measurement and reduction of quality related costs in the process plant industryRooney, E. M. January 1987 (has links)
No description available.
|
27 |
Hedge Funds and Survival AnalysisNhogue Wabo, Blanche Nadege January 2013 (has links)
Using data from Hedge Fund Research, Inc. (HFR), this study adapts and expands
on existing methods in survival analysis in an attempt to investigate whether hedge
funds mortality can be predicted on the basis of certain hedge funds characteristics.
The main idea is to determine the characteristics which contribute the most to the
survival and failure probabilities of hedge funds and interpret them. We establish hazard
models with time-independent covariates, as well as time-varying covariates to interpret
the selected hedge funds characteristics. Our results show that size, age, performance,
strategy, annual audit, fund offshore and fund denomination are the characteristics that
best explain hedge fund failure. We find that 1% increase in performance decreases
the hazard by 3.3%, the small size and the less than 5 years old hedge funds are the
most likely to die and the event-driven strategy is the best to use as compare to others.
The risk of death is 0.668 times lower for funds who indicated that an annual audit
is performed as compared to the funds who did not indicated that an annual audit is
performed. The risk of death for the offshore hedge funds is 1.059 times higher than the
non-offshore hedge funds.
|
28 |
Probabilistic methods for radio interferometry data analysisNatarajan, Iniyan January 2017 (has links)
Probability theory provides a uniquely valid set of rules for plausible reasoning. This enables us to apply this mathematical formalism of probability, also known as Bayesian, with greater flexibility to problems of scientific inference. In this thesis, we are concerned with applying this method to the analysis of visibility data from radio interferometers. Any radio interferometry observation can be described using the Radio Interferometry Measurement Equation (RIME). Throughout the thesis, we use the RIME to model the visibilities in performing the probabilistic analysis. We first develop the theory for employing the RIME in performing Bayesian analysis of interferometric data. We then apply this to the problem of super-resolution with radio interferometers by performing model selection successfully between different source structures, all smaller in scale than the size of the point spread function (PSF) of the interferometer, on Westerbork Synthesis Radio Telescope (WSRT) simulations at a frequency of 1.4 GHz. We also quantify the change in the scale of the sources that can be resolved by WSRT at this frequency, with changing signal-to-noise (SNR) of the data, using simulations. Following this, we apply this method to a 5 GHz European VLBI Network (EVN) observation of the flaring blazar CGRaBS J0809+5341, to ascertain the presence of a jet emanating from its core, taking into account the imperfections in the station gain calibration performed on the data, especially on the longest baselines, prior to our analysis. We find that the extended source model is preferred over the point source model with an odds ratio of 109 : 1. Using the flux-density and shape parameter estimates of this model, we also derive the brightness temperature of the blazar (10¹¹-10¹² K), which confirms the presence of a relativistically boosted jet with an intrinsic brightness temperature lower than the apparent brightness temperature, consistent with the literature. We also develop a Bayesian criterion for super-resolution in the presence of baseline-dependent noise and calibration errors and find that these errors play an important role in determining how close one can get to the theoretical super-resolution limit. We then proceed to include fringe-fitting, the process of solving for the time and frequency dependent phase variations introduced by the interstellar medium and the Earth's atmosphere, in our probabilistic approach. Fringe-fitting is one of the first corrections made to Very Long Baseline Interferometry (VLBI) observations, and, by extending our method to include simultaneous fringefitting and source structure estimation, we will be able to perform end-to-end VLBI analysis using our method. To this end, we estimate source amplitude and fringe-fitting phase terms (phase offsets and delays) on 43 GHz Very Long Baseline Array and 230 GHz Event Horizon Telescope (EHT) simulations of point sources. We then perform model selection on a 5 μas extended Gaussian source (one-fourth the size of the PSF) on a synthetic 230 GHz EHT observation. Finally we incorporate turbulent time-varying phase offsets and delays in our model selection and show that the delays can be estimated to within 10-16 per cent error (often better than contemporary software packages) while simultaneously estimating the extended source structure.
|
29 |
Graph Neural Networks for Improved Interpretability and EfficiencyPho, Patrick 01 January 2022 (has links) (PDF)
Attributed graph is a powerful tool to model real-life systems which exist in many domains such as social science, biology, e-commerce, etc. The behaviors of those systems are mostly defined by or dependent on their corresponding network structures. Graph analysis has become an important line of research due to the rapid integration of such systems into every aspect of human life and the profound impact they have on human behaviors. Graph structured data contains a rich amount of information from the network connectivity and the supplementary input features of nodes. Machine learning algorithms or traditional network science tools have limitation in their capability to make use of both network topology and node features. Graph Neural Networks (GNNs) provide an efficient framework combining both sources of information to produce accurate prediction for a wide range of tasks including node classification, link prediction, etc. The exponential growth of graph datasets drives the development of complex GNN models causing concerns about processing time and interpretability of the result. Another issue arises from the cost and limitation of collecting a large amount of annotated data for training deep learning GNN models. Apart from sampling issue, the existence of anomaly entities in the data might degrade the quality of the fitted models. In this dissertation, we propose novel techniques and strategies to overcome the above challenges. First, we present a flexible regularization scheme applied to the Simple Graph Convolution (SGC). The proposed framework inherits fast and efficient properties of SGC while rendering a sparse set of fitted parameter vectors, facilitating the identification of important input features. Next, we examine efficient procedures for collecting training samples and develop indicative measures as well as quantitative guidelines to assist practitioners in choosing the optimal sampling strategy to obtain data. We then improve upon an existing GNN model for the anomaly detection task. Our proposed framework achieves better accuracy and reliability. Lastly, we experiment with adapting the flexible regularization mechanism to link prediction task.
|
30 |
Change Point Detection for Streaming Data Using Support Vector MethodsHarrison, Charles 01 January 2022 (has links) (PDF)
Sequential multiple change point detection concerns the identification of multiple points in time where the systematic behavior of a statistical process changes. A special case of this problem, called online anomaly detection, occurs when the goal is to detect the first change and then signal an alert to an analyst for further investigation. This dissertation concerns the use of methods based on kernel functions and support vectors to detect changes. A variety of support vector-based methods are considered, but the primary focus concerns Least Squares Support Vector Data Description (LS-SVDD). LS-SVDD constructs a hypersphere in a kernel space to bound a set of multivariate vectors using a closed-form solution. The mathematical tractability of the LS-SVDD facilitates closed-form updates for the LS-SVDD Lagrange multipliers. The update formulae concern either adding or removing a block of observations from an existing LS-SVDD description, respectively, and thus LS-SVDD can be constructed or updated sequentially which makes it attractive for online problems with sequential data streams. LS-SVDD is applied to a variety of scenarios including online anomaly detection and sequential multiple change point detection.
|
Page generated in 0.0735 seconds