Spelling suggestions: "subject:"depependent data"" "subject:"dependendent data""
1 |
Using the bootstrap to analyze variable stars dataDunlap, Mickey Paul 17 February 2005 (has links)
Often in statistics it is of interest to investigate whether or not a trend is significant. Methods for testing such a trend depend on the assumptions of the error terms such as whether the distribution is known and also if the error terms are independent. Likelihood ratio tests may be used if the distribution is known but in some instances one may not want to make such assumptions. In a time series, these errors will not always be independent. In this case, the error terms are often modelled by an autoregressive or moving average process. There are resampling techniques for testing the hypothesis of interest when the error terms are dependent, such as, modelbased bootstrapping and the wild bootstrap, but the error terms need to be whitened. In this dissertation, a bootstrap procedure is used to test the hypothesis of no trend for variable stars when the error structure assumes a particular form. In some cases, the bootstrap to be implemented is preferred over large sample tests in terms of the level of the test. The bootstrap procedure is able to correctly identify the underlying distribution which may not be χ2.
|
2 |
Using the bootstrap to analyze variable stars dataDunlap, Mickey Paul 17 February 2005 (has links)
Often in statistics it is of interest to investigate whether or not a trend is significant. Methods for testing such a trend depend on the assumptions of the error terms such as whether the distribution is known and also if the error terms are independent. Likelihood ratio tests may be used if the distribution is known but in some instances one may not want to make such assumptions. In a time series, these errors will not always be independent. In this case, the error terms are often modelled by an autoregressive or moving average process. There are resampling techniques for testing the hypothesis of interest when the error terms are dependent, such as, modelbased bootstrapping and the wild bootstrap, but the error terms need to be whitened. In this dissertation, a bootstrap procedure is used to test the hypothesis of no trend for variable stars when the error structure assumes a particular form. In some cases, the bootstrap to be implemented is preferred over large sample tests in terms of the level of the test. The bootstrap procedure is able to correctly identify the underlying distribution which may not be χ2.
|
3 |
Central Limit Theorems for Empirical Processes Based on Stochastic ProcessesYang, Yuping 16 December 2013 (has links)
In this thesis, we study time-dependent empirical processes, which extend the classical empirical processes to have a time parameter; for example the empirical process for a sequence of independent stochastic processes {Yi : i ∈ N}:
(1) ν_n(t, y) = n^(−1/2 )Sigma[1_(Y i(t)¬<=y) – P(Yi(t) <= y)] from i=1 to n, t ∈ E, y ∈ R.
In the case of independent identically distributed samples (that is {Yi(t) : i ∈ N} are iid), Kuelbs et al. (2013) proved a Central Limit Theorem for ν_n(t, y) for a large class of stochastic processes.
In Chapter 3, we give a sufficient condition for the weak convergence of the weighted empirical process for iid samples from a uniform process:
(2) α_n(t, y) := n^(−1/2 )Sigma[w(y)(1_(X (t)<=y) – y)] from i=1 to n, t ∈ E, y ∈ [0, 1]
where {X (t), X1(t), X2(t), • • • } are independent and identically distributed uniform processes (for each t ∈ E, X (t) is uniform on (0, 1)) and w(x) is a “weight” function satisfying some regularity properties. Then we give an example when X (t) := Ft(Bt) : t ∈ E = [1, 2], where Bt is a Brownian motion and Ft is the distribution function of Bt.
In Chapter 4, we investigate the weak convergence of the empirical processes for non-iid samples. We consider the weak convergence of the empirical process:
(3) β_n(t, y) := n^(−1/2 )Sigma[(1_(Y (t)<=y) – Fi(t,y))] from i=1 to n, t ∈ E ⊂ R, y ∈ R
where {Yi(t) : i ∈ N} are independent processes and Fi(t, y) is the distribution function of Yi(t). We also prove that the covariance function of the empirical process for non-iid samples indexed by a uniformly bounded class of functions necessarily uniformly converges to the covariance function of the limiting Gaussian process for a CLT.
|
4 |
Visualisation de données temporelles personnelles / Visualization of personal time-dependent dataWambecke, Jérémy 22 October 2018 (has links)
La production d’énergie, et en particulier la production d’électricité, est la principale responsable de l’émission de gaz à effet de serre au niveau mondial. Le secteur résidentiel étant le plus consommateur d’énergie, il est essentiel d’agir au niveau personnel afin de réduire ces émissions. Avec le développement de l’informatique ubiquitaire, il est désormais aisé de récolter des données de consommation d’électricité des appareils électriques d’un logement. Cette possibilité a permis le développement des technologies eco-feedback, dont l’objectif est de fournir aux consommateurs un retour sur leur consommation dans le but de la diminuer. Dans cette thèse nous proposons une méthode de visualisation de données temporelles personnelles basée sur une interaction what if, qui signifie que les utilisateurs peuvent appliquer des changements de comportement de manière virtuelle. En particulier notre méthode permet de simuler une modification de l’utilisation des appareils électriques d’un logement, puis d’évaluer visuellement l’impact de ces modifications sur les données. Cette méthode a été implémentée dans le système Activelec, que nous avons évalué avec des utilisateurs sur des données réelles. Nous synthétisons les éléments de conception indispensables aux systèmes eco-feedback dans un état de l’art. Nous exposons également les limitations de ces technologies, la principale étant la difficulté rencontrée par les utilisateurs pour trouver des modifications de comportement pertinentes leur permettant de consommer moins d’énergie.Nous présentons ensuite trois contributions. La première contribution est la conception d’une méthode what if appliquée à l’eco-feedback ainsi que son implémentation dans le système Activelec. La seconde contribution est l’évaluation de notre méthode grâce à deux expérimentations menées en laboratoire. Dans ces expérimentations nous évaluons si des participants utilisant notre méthode trouvent des modifications qui économisent de l’énergie et qui nécessitent suffisamment peu d’efforts pour être appliquées en vrai. Enfin la troisième contribution est l’évaluation in-situ du système Activelec dans des logements personnels pour une durée d’environ un mois. Activelec a été déployé dans trois appartements privés afin de permettre l’évaluation de notre méthode en contexte domestique réel. Dans ces trois expérimentations, les participants ont pu trouver des modifications d’utilisation des appareils qui économiseraient une quantité d’énergie significative, et qui ont été jugées faciles à appliquer en réalité. Nous discutons également de l’application de notre méthode what if au-delà des données de consommation électrique au domaine de la visualisation personnelle, qui est définie comme l’analyse visuelle des données personnelles. Nous présentons ainsi plusieurs applications possibles à d’autres données temporelles personnelles, par exemple concernant l’activité physique ou les transports. Cette thèse ouvre de nouvelles perspectives pour l’utilisation d’un paradigme d’interaction what if pour la visualisation personnelle. / The production of energy, in particular the production of electricity, is the main responsible for the emission of greenhouse gases at world scale. The residential sector being the most energy consuming, it is essential to act at a personal scale to reduce these emissions. Thanks to the development of ubiquitous computing, it is now easy to collect data about the electricity consumption of electrical appliances of a housing. This possibility has allowed the development of eco-feedback technologies, whose objective is to provide to consumers a feedback about their consumption with the aim to reduce it. In this thesis we propose a personal visualization method for time-dependent data based on a what if interaction, which means that users can apply modifications in their behavior in a virtual way. Especially our method allows to simulate the modification of the usage of electrical appliances of a housing, and then to evaluate visually the impact of the modifications on data. This approach has been implemented in the Activelec system, which we have evaluated with users on real data.We synthesize the essential elements of conception for eco-feedback systems in a state of the art. We also outline the limitations of these technologies, the main one being the difficulty faced by users to find relevant modifications in their behavior to decrease their energy consumption. We then present three contributions. The first contribution is the development of a what if approach applied to eco-feedback as well as its implementation in the Activelec system. The second contribution is the evaluation of our approach with two laboratory studies. In these studies we assess if participants using our method manage to find modifications that save energy and which require a sufficiently low effort to be applied in reality. Finally the third contribution is the in-situ evaluation of the Activelec system. Activelec has been deployed in three private housings and used for a duration of approximately one month. This in-situ experiment allows to evaluate the usage of our approach in a real domestic context. In these three studies, participants managed to find modifications in the usage of appliances that would savea significant amount of energy, while being judged easy to be applied in reality.We also discuss of the application of our what if approach to the domain of personal visualization, beyond electricity consumption data, which is defined as the visual analysis of personal data. We hence present several potential applications to other types of time-dependent personal data, for example related to physical activity or to transportation. This thesis opens new perspectives for using a what if interaction paradigm for personal visualization.
|
5 |
Multi-criteria ranking of corporate distress prediction models: empirical evaluation and methodological contributionsMousavi, Mohammad M., Quenniche, J. 2018 March 1919 (has links)
Yes / Although many modelling and prediction frameworks for corporate bankruptcy
and distress have been proposed, the relative performance evaluation of prediction models
is criticised due to the assessment exercise using a single measure of one criterion at
a time, which leads to reporting conflicting results. Mousavi et al. (Int Rev Financ Anal
42:64–75, 2015) proposed an orientation-free super-efficiency DEA-based framework to
overcome this methodological issue. However, within a super-efficiency DEA framework,
the reference benchmark changes from one prediction model evaluation to another, which
in some contexts might be viewed as “unfair” benchmarking. In this paper, we overcome
this issue by proposing a slacks-based context-dependent DEA (SBM-CDEA) framework
to evaluate competing distress prediction models. In addition, we propose a hybrid crossbenchmarking-
cross-efficiency framework as an alternative methodology for ranking DMUs
that are heterogeneous. Furthermore, using data on UK firms listed on London Stock
Exchange, we perform a comprehensive comparative analysis of the most popular corporate
distress prediction models; namely, statistical models, under both mono criterion and
multiple criteria frameworks considering several performance measures. Also, we propose
new statistical models using macroeconomic indicators as drivers of distress.
|
6 |
Crash Risk Analysis of Coordinated Signalized IntersectionsQiming Guo (17582769) 08 December 2023 (has links)
<p dir="ltr">The emergence of time-dependent data provides researchers with unparalleled opportunities to investigate disaggregated levels of safety performance on roadway infrastructures. A disaggregated crash risk analysis uses both time-dependent data (e.g., hourly traffic, speed, weather conditions and signal controls) and fixed data (e.g., geometry) to estimate hourly crash probability. Despite abundant research on crash risk analysis, coordinated signalized intersections continue to require further investigation due to both the complexity of the safety problem and the relatively small number of past studies that investigated the risk factors of coordinated signalized intersections. This dissertation aimed to develop robust crash risk prediction models to better understand the risk factors of coordinated signalized intersections and to identify practical safety countermeasures. The crashes first were categorized into three types (same-direction, opposite-direction, and right-angle) within several crash-generating scenarios. The data needed were organized in hourly observations and included the following factors: road geometric features, traffic movement volumes, speeds, weather precipitation and temperature, and signal control settings. Assembling hourly observations for modeling crash risk was achieved by synchronizing and linking data sources organized at different time resolutions. Three different non-crash sampling strategies were applied to the following three statistical models (Conditional Logit, Firth Logit, and Mixed Logit) and two machine learning models (Random Forest and Penalized Support Vector Machine). Important risk factors, such as the presence of light rain, traffic volume, speed variability, and vehicle arrival pattern of downstream, were identified. The Firth Logit model was selected for implementation to signal coordination practice. This model turned out to be most robust based on its out-of-sample prediction performance and its inclusion of important risk factors. The implementation examples of the recommended crash risk model to building daily risk profiles and to estimating the safety benefits of improved coordination plans demonstrated the model’s practicality and usefulness in improving safety at coordinated signals by practicing engineers.</p>
|
7 |
Partial Least Squares for Serially Dependent DataSinger, Marco 04 August 2016 (has links)
No description available.
|
8 |
Design and performance evaluation of failure prediction modelsMousavi Biouki, Seyed Mohammad Mahdi January 2017 (has links)
Prediction of corporate bankruptcy (or distress) is one of the major activities in auditing firms’ risks and uncertainties. The design of reliable models to predict distress is crucial for many decision-making processes. Although a variety of models have been designed to predict distress, the relative performance evaluation of competing prediction models remains an exercise that is unidimensional in nature. To be more specific, although some studies use several performance criteria and their measures to assess the relative performance of distress prediction models, the assessment exercise of competing prediction models is restricted to their ranking by a single measure of a single criterion at a time, which leads to reporting conflicting results. The first essay of this research overcomes this methodological issue by proposing an orientation-free super-efficiency Data Envelopment Analysis (DEA) model as a multi-criteria assessment framework. Furthermore, the study performs an exhaustive comparative analysis of the most popular bankruptcy modelling frameworks for UK data. Also, it addresses two important research questions; namely, do some modelling frameworks perform better than others by design? and to what extent the choice and/or the design of explanatory variables and their nature affect the performance of modelling frameworks? Further, using different static and dynamic statistical frameworks, this chapter proposes new Failure Prediction Models (FPMs). However, within a super-efficiency DEA framework, the reference benchmark changes from one prediction model evaluation to another one, which in some contexts might be viewed as “unfair” benchmarking. The second essay overcomes this issue by proposing a Slacks-Based Measure Context-Dependent DEA (SBM-CDEA) framework to evaluate the competing Distress Prediction Models (DPMs). Moreover, it performs an exhaustive comparative analysis of the most popular corporate distress prediction frameworks under both a single criterion and multiple criteria using data of UK firms listed on London Stock Exchange (LSE). Further, this chapter proposes new DPMs using different static and dynamic statistical frameworks. Another shortcoming of the existing studies on performance evaluation lies in the use of static frameworks to compare the performance of DPMs. The third essay overcomes this methodological issue by suggesting a dynamic multi-criteria performance assessment framework, namely, Malmquist SBM-DEA, which by design, can monitor the performance of competing prediction models over time. Further, this study proposes new static and dynamic distress prediction models. Also, the study addresses several research questions as follows; what is the effect of information on the performance of DPMs? How the out-of-sample performance of dynamic DPMs compares to the out-of-sample performance of static ones? What is the effect of the length of training sample on the performance of static and dynamic models? Which models perform better in forecasting distress during the years with Higher Distress Rate (HDR)? On feature selection, studies have used different types of information including accounting, market, macroeconomic variables and the management efficiency scores as predictors. The recently applied techniques to take into account the management efficiency of firms are two-stage models. The two-stage DPMs incorporate multiple inputs and outputs to estimate the efficiency measure of a corporation relative to the most efficient ones, in the first stage, and use the efficiency score as a predictor in the second stage. The survey of the literature reveals that most of the existing studies failed to have a comprehensive comparison between two-stage DPMs. Moreover, the choice of inputs and outputs for DEA models that estimate the efficiency measures of a company has been restricted to accounting variables and features of the company. The fourth essay adds to the current literature of two-stage DPMs in several respects. First, the study proposes to consider the decomposition of Slack-Based Measure (SBM) of efficiency into Pure Technical Efficiency (PTE), Scale Efficiency (SE), and Mix Efficiency (ME), to analyse how each of these measures individually contributes to developing distress prediction models. Second, in addition to the conventional approach of using accounting variables as inputs and outputs of DEA models to estimate the measure of management efficiency, this study uses market information variables to calculate the measure of the market efficiency of companies. Third, this research provides a comprehensive analysis of two-stage DPMs through applying different DEA models at the first stage – e.g., input-oriented vs. output oriented, radial vs. non-radial, static vs. dynamic, to compute the measures of management efficiency and market efficiency of companies; and also using dynamic and static classifier frameworks at the second stage to design new distress prediction models.
|
9 |
Metodika budování a údržby závislých datových tržišť / Methodology of development and maintanance of dependent data martsMüllerová, Sandra January 2011 (has links)
The thesis is primary focused on the integrated data warehouse, particularly on a subset -- dependent data marts. The main objectives of this thesis are design the methodology of development and maintenance of dependent data marts and verification methodology usefulness in the real organization. The first part deals with the theoretical definition of terms. It focuses particularly on the definition of terms from Business Intelligence area especially data warehousing and data marts. Each of the terms is described in detail in separate chapters. Business Intelligence area puts emphasis on description of individual components. In data warehousing area are described the data warehouse concepts and content of layers in a data warehouse. Finally, the data mart area is designed to describe dependent and independent data marts and also "special "cases of data marts, likely the semantic layer, and a sandbox. The second part focuses on the design methodology itself. At the beginning of this part is analysis of the existing methodologies and assess their usefulness with connection to designed methodology. The following part describes the current situation in the approach to the development and maintenance of dependent data marts in the organization. At the end of the second part is designed own methodology, which is based in part on the analysis methodology and in part on the analysis of current situation. The third part focuses on usability and usefulness evaluation of methodology in the organization. Evaluation is based on the methodology of criticism from employees in the organization who are directly engaged in designing and maintaining dependent data marts. Finally, the fourth and final part will focus on the description of an alternative solution that could be considered as one of the ways to sustainable development of data warehouse in the organization. It's about comparison architecture based on utilization of semantic layer as oppose to the "three layers" concept of data warehouse by Bill Inmon, which is implemented in the organization. The output evaluates alternative solutions to the current solution.
|
10 |
Extension au cadre spatial de l'estimation non paramétrique par noyaux récursifs / Extension to spatial setting of kernel recursive estimationYahaya, Mohamed 15 December 2016 (has links)
Dans cette thèse, nous nous intéressons aux méthodes dites récursives qui permettent une mise à jour des estimations séquentielles de données spatiales ou spatio-temporelles et qui ne nécessitent pas un stockage permanent de toutes les données. Traiter et analyser des flux des données, Data Stream, de façon effective et efficace constitue un défi actif en statistique. En effet, dans beaucoup de domaines d'applications, des décisions doivent être prises à un temps donné à la réception d'une certaine quantité de données et mises à jour une fois de nouvelles données disponibles à une autre date. Nous proposons et étudions ainsi des estimateurs à noyau de la fonction de densité de probabilité et la fonction de régression de flux de données spatiales ou spatio-temporelles. Plus précisément, nous adaptons les estimateurs à noyau classiques de Parzen-Rosenblatt et Nadaraya-Watson. Pour cela, nous combinons la méthodologie sur les estimateurs récursifs de la densité et de la régression et celle d'une distribution de nature spatiale ou spatio-temporelle. Nous donnons des applications et des études numériques des estimateurs proposés. La spécificité des méthodes étudiées réside sur le fait que les estimations prennent en compte la structure de dépendance spatiale des données considérées, ce qui est loin d'être trivial. Cette thèse s'inscrit donc dans le contexte de la statistique spatiale non-paramétrique et ses applications. Elle y apporte trois contributions principales qui reposent sur l'étude des estimateurs non-paramétriques récursifs dans un cadre spatial/spatio-temporel et s'articule autour des l'estimation récursive à noyau de la densité dans un cadre spatial, l'estimation récursive à noyau de la densité dans un cadre spatio-temporel, et l'estimation récursive à noyau de la régression dans un cadre spatial. / In this thesis, we are interested in recursive methods that allow to update sequentially estimates in a context of spatial or spatial-temporal data and that do not need a permanent storage of all data. Process and analyze Data Stream, effectively and effciently is an active challenge in statistics. In fact, in many areas, decisions should be taken at a given time at the reception of a certain amount of data and updated once new data are available at another date. We propose and study kernel estimators of the probability density function and the regression function of spatial or spatial-temporal data-stream. Specifically, we adapt the classical kernel estimators of Parzen-Rosenblatt and Nadaraya-Watson. For this, we combine the methodology of recursive estimators of density and regression and that of a distribution of spatial or spatio-temporal data. We provide applications and numerical studies of the proposed estimators. The specifcity of the methods studied resides in the fact that the estimates take into account the spatial dependence structure of the relevant data, which is far from trivial. This thesis is therefore in the context of non-parametric spatial statistics and its applications. This work makes three major contributions. which are based on the study of non-parametric estimators in a recursive spatial/space-time and revolves around the recursive kernel density estimate in a spatial context, the recursive kernel density estimate in a space-time and recursive kernel regression estimate in space.
|
Page generated in 0.0627 seconds