331 |
Statistical inference for joint modelling of longitudinal and survival dataLi, Qiuju January 2014 (has links)
In longitudinal studies, data collected within a subject or cluster are somewhat correlated by their very nature and special cares are needed to account for such correlation in the analysis of data. Under the framework of longitudinal studies, three topics are being discussed in this thesis. In chapter 2, the joint modelling of multivariate longitudinal process consisting of different types of outcomes are discussed. In the large cohort study of UK north Stafforshire osteoarthritis project, longitudinal trivariate outcomes of continuous, binary and ordinary data are observed at baseline, year 3 and year 6. Instead of analysing each process separately, joint modelling is proposed for the trivariate outcomes to account for the inherent association by introducing random effects and the covariance matrix G. The influence of covariance matrix G on statistical inference of fixed-effects parameters has been investigated within the Bayesian framework. The study shows that by joint modelling the multivariate longitudinal process, it can reduce the bias and provide with more reliable results than it does by modelling each process separately. Together with the longitudinal measurements taken intermittently, a counting process of events in time is often being observed as well during a longitudinal study. It is of interest to investigate the relationship between time to event and longitudinal process, on the other hand, measurements taken for the longitudinal process may be potentially truncated by the terminated events, such as death. Thus, it may be crucial to jointly model the survival and longitudinal data. It is popular to propose linear mixed-effects models for the longitudinal process of continuous outcomes and Cox regression model for survival data to characterize the relationship between time to event and longitudinal process, and some standard assumptions have been made. In chapter 3, we try to investigate the influence on statistical inference for survival data when the assumption of mutual independence on random error of linear mixed-effects models of longitudinal process has been violated. And the study is conducted by utilising conditional score estimation approach, which provides with robust estimators and shares computational advantage. Generalised sufficient statistic of random effects is proposed to account for the correlation remaining among the random error, which is characterized by the data-driven method of modified Cholesky decomposition. The simulation study shows that, by doing so, it can provide with nearly unbiased estimation and efficient statistical inference as well. In chapter 4, it is trying to account for both the current and past information of longitudinal process into the survival models of joint modelling. In the last 15 to 20 years, it has been popular or even standard to assume that longitudinal process affects the counting process of events in time only through the current value, which, however, is not necessary to be true all the time, as recognised by the investigators in more recent studies. An integral over the trajectory of longitudinal process, along with a weighted curve, is proposed to account for both the current and past information to improve inference and reduce the under estimation of effects of longitudinal process on the risk hazards. A plausible approach of statistical inference for the proposed models has been proposed in the chapter, along with real data analysis and simulation study.
|
332 |
Exploring advanced forecasting methods with applications in aviationRiba, Evans Mogolo 02 1900 (has links)
Abstracts in English, Afrikaans and Northern Sotho / More time series forecasting methods were researched and made available in recent
years. This is mainly due to the emergence of machine learning methods which also
found applicability in time series forecasting. The emergence of a variety of methods
and their variants presents a challenge when choosing appropriate forecasting methods.
This study explored the performance of four advanced forecasting methods: autoregressive
integrated moving averages (ARIMA); artificial neural networks (ANN); support
vector machines (SVM) and regression models with ARIMA errors. To improve their
performance, bagging was also applied. The performance of the different methods was
illustrated using South African air passenger data collected for planning purposes by
the Airports Company South Africa (ACSA). The dissertation discussed the different
forecasting methods at length. Characteristics such as strengths and weaknesses and
the applicability of the methods were explored. Some of the most popular forecast accuracy
measures were discussed in order to understand how they could be used in the
performance evaluation of the methods.
It was found that the regression model with ARIMA errors outperformed all the other
methods, followed by the ARIMA model. These findings are in line with the general
findings in the literature. The ANN method is prone to overfitting and this was evident
from the results of the training and the test data sets. The bagged models showed mixed
results with marginal improvement on some of the methods for some performance measures.
It could be concluded that the traditional statistical forecasting methods (ARIMA and
the regression model with ARIMA errors) performed better than the machine learning
methods (ANN and SVM) on this data set, based on the measures of accuracy used.
This calls for more research regarding the applicability of the machine learning methods
to time series forecasting which will assist in understanding and improving their
performance against the traditional statistical methods / Die afgelope tyd is verskeie tydreeksvooruitskattingsmetodes ondersoek as gevolg van die
ontwikkeling van masjienleermetodes met toepassings in die vooruitskatting van tydreekse.
Die nuwe metodes en hulle variante laat ʼn groot keuse tussen vooruitskattingsmetodes.
Hierdie studie ondersoek die werkverrigting van vier gevorderde vooruitskattingsmetodes:
outoregressiewe, geïntegreerde bewegende gemiddeldes (ARIMA), kunsmatige neurale
netwerke (ANN), steunvektormasjiene (SVM) en regressiemodelle met ARIMA-foute.
Skoenlussaamvoeging is gebruik om die prestasie van die metodes te verbeter. Die prestasie
van die vier metodes is vergelyk deur hulle toe te pas op Suid-Afrikaanse lugpassasiersdata
wat deur die Suid-Afrikaanse Lughawensmaatskappy (ACSA) vir beplanning ingesamel is.
Hierdie verhandeling beskryf die verskillende vooruitskattingsmetodes omvattend. Sowel
die positiewe as die negatiewe eienskappe en die toepasbaarheid van die metodes is
uitgelig. Bekende prestasiemaatstawwe is ondersoek om die prestasie van die metodes te
evalueer.
Die regressiemodel met ARIMA-foute en die ARIMA-model het die beste van die vier
metodes gevaar. Hierdie bevinding strook met dié in die literatuur. Dat die ANN-metode na
oormatige passing neig, is deur die resultate van die opleidings- en toetsdatastelle bevestig.
Die skoenlussamevoegingsmodelle het gemengde resultate opgelewer en in sommige
prestasiemaatstawwe vir party metodes marginaal verbeter.
Op grond van die waardes van die prestasiemaatstawwe wat in hierdie studie gebruik is, kan
die gevolgtrekking gemaak word dat die tradisionele statistiese vooruitskattingsmetodes
(ARIMA en regressie met ARIMA-foute) op die gekose datastel beter as die
masjienleermetodes (ANN en SVM) presteer het. Dit dui op die behoefte aan verdere
navorsing oor die toepaslikheid van tydreeksvooruitskatting met masjienleermetodes om
hul prestasie vergeleke met dié van die tradisionele metodes te verbeter. / Go nyakišišitšwe ka ga mekgwa ye mentši ya go akanya ka ga molokoloko wa dinako le
go dirwa gore e hwetšagale mo mengwageng ye e sa tšwago go feta. Se k e k a
le b a k a la g o t šwelela ga mekgwa ya go ithuta ya go diriša metšhene yeo le yona e
ilego ya dirišwa ka kakanyong ya molokolokong wa dinako. Go t šwelela ga mehutahuta
ya mekgwa le go fapafapana ga yona go tšweletša tlhohlo ge go kgethwa mekgwa ya
maleba ya go akanya.
Dinyakišišo tše di lekodišišitše go šoma ga mekgwa ye mene ya go akanya yeo e
gatetšego pele e lego: ditekanyotshepelo tšeo di kopantšwego tša poelomorago ya maitirišo
(ARIMA); dinetweke tša maitirelo tša nyurale (ANN); metšhene ya bekthara ya thekgo
(SVM); le mekgwa ya poelomorago yeo e nago le diphošo tša ARIMA. Go
kaonafatša go šoma ga yona, nepagalo ya go ithuta ka metšhene le yona e dirišitšwe.
Go šoma ga mekgwa ye e fepafapanego go laeditšwe ka go šomiša tshedimošo ya
banamedi ba difofane ba Afrika Borwa yeo e kgobokeditšwego mabakeng a dipeakanyo
ke Khamphani ya Maemafofane ya Afrika Borwa (ACSA). Sengwalwanyaki šišo se
ahlaahlile mekgwa ya kakanyo ye e fapafapanego ka bophara. Dipharologanyi tša go
swana le maatla le bofokodi le go dirišega ga mekgwa di ile tša šomišwa. Magato a
mangwe ao a tumilego kudu a kakanyo ye e nepagetšego a ile a ahlaahlwa ka nepo ya go
kwešiša ka fao a ka šomišwago ka gona ka tshekatshekong ya go šoma ga mekgwa ye.
Go hweditšwe gore mokgwa wa poelomorago wa go ba le diphošo tša ARIMA o phadile
mekgwa ye mengwe ka moka, gwa latela mokgwa wa ARIMA. Dikutollo tše di sepelelana
le dikutollo ka kakaretšo ka dingwaleng. Mo k gwa wa ANN o ka fela o fetišiša gomme
se se bonagetše go dipoelo tša tlhahlo le dihlo pha t ša teko ya tshedimošo. Mekgwa
ya nepagalo ya go ithuta ka metšhene e bontšhitše dipoelo tšeo di hlakantšwego tšeo di
nago le kaonafalo ye kgolo go ye mengwe mekgwa ya go ela go phethagatšwa ga
mešomo.
Go ka phethwa ka gore mekgwa ya setlwaedi ya go akanya dipalopalo (ARIMA le
mokgwa wa poelomorago wa go ba le diphošo tša ARIMA) e šomile bokaone go phala
mekgwa ya go ithuta ka metšhene (ANN le SVM) ka mo go sehlopha se sa
tshedimošo, go eya ka magato a nepagalo ya magato ao a šomišitšwego. Se se nyaka gore
go dirwe dinyakišišo tše dingwe mabapi le go dirišega ga mekgwa ya go ithuta ka
metšhene mabapi le go akanya molokoloko wa dinako, e lego seo se tlago thuša go
kwešiša le go kaonafatša go šoma ga yona kgahlanong le mekgwa ya setlwaedi ya
dipalopalo. / Decision Sciences / M. Sc. (Operations Research)
|
333 |
Závislost hodnoty stavebního závodu na velikosti vlastního kapitálu / Dependence of the value of the construction enterprise on the size of the equityBahenský, Miloš January 2019 (has links)
The doctoral thesis deals with the valuer issues of business valuation with construction production in the condition of the Czech economy. The business valuation issue is, and will always be, highly relevant in a market economy environment, with regard to both methodical and practical approaches. The main aim of the doctoral thesis is to demonstrate the dependence constructing empirical regression model to determine the value of the construction enterprise by the chosen income valuation method based on the equity (book value of equity in historical costs). The first part of the doctoral thesis is a research study describing the approach of the authors to the current state of knowledge concerning the issues of business valuation, aspects of equity, using the principles of system methodology. Based on these findings, a space is defined in which it is possible to propose a solution of a partial problem in terms of selecting the enterprise value category and the associated income valuation methods suitable for extensive time-series analysis. An integral part of the doctoral thesis is the determination of the sample size of construction enterprises according to the assumptions and limitations of the chosen methodology. Empirical research for data collection is based on Justice.cz database. Another important part is, in the spirit of system approach principles, the choice and application of the method of system discipline for the solved problem of doctoral thesis. The result of the solution is an empirical regression model, which after subsequent validation in multiple case studies could also be recommended for wider verification in valuers practice. Part of the thesis will also include discussions in the wider context of the potential benefits of the doctoral thesis for practical, theoretical and pedagogical use.
|
334 |
Metody pro predikci s vysokodimenzionálními daty genových expresí / Methods for class prediction with high-dimensional gene expression dataŠilhavá, Jana Unknown Date (has links)
Dizertační práce se zabývá predikcí vysokodimenzionálních dat genových expresí. Množství dostupných genomických dat významně vzrostlo v průběhu posledního desetiletí. Kombinování dat genových expresí s dalšími daty nachází uplatnění v mnoha oblastech. Například v klinickém řízení rakoviny (clinical cancer management) může přispět k přesnějšímu určení prognózy nemocí. Hlavní část této dizertační práce je zaměřena na kombinování dat genových expresí a klinických dat. Používáme logistické regresní modely vytvořené prostřednictvím různých regularizačních technik. Generalizované lineární modely umožňují kombinování modelů s různou strukturou dat. V dizertační práci je ukázáno, že kombinování modelu dat genových expresí a klinických dat může vést ke zpřesnění výsledku predikce oproti vytvoření modelu pouze z dat genových expresí nebo klinických dat. Navrhované postupy přitom nejsou výpočetně náročné. Testování je provedeno nejprve se simulovanými datovými sadami v různých nastaveních a následně s~reálnými srovnávacími daty. Také se zde zabýváme určením přídavné hodnoty microarray dat. Dizertační práce obsahuje porovnání příznaků vybraných pomocí klasifikátoru genových expresí na pěti různých sadách dat týkajících se rakoviny prsu. Navrhujeme také postup výběru příznaků, který kombinuje data genových expresí a znalosti z genových ontologií.
|
335 |
Developing Artificial Neural Networks (ANN) Models for Predicting E. Coli at Lake Michigan BeachesMitra Khanibaseri (9045878) 24 July 2020 (has links)
<p>A neural
network model was developed to predict the E. Coli levels and classes in six
(6) select Lake Michigan beaches. Water quality observations at the time of
sampling and discharge information from two close tributaries were used as
input to predict the E. coli. This research was funded by the Indiana Department
of Environmental Management (IDEM). A user-friendly Excel Sheet based tool was
developed based on the best model for making future predictions of E. coli
classes. This tool will facilitate beach managers to take real-time decisions.</p>
<p>The nowcast
model was developed based on historical tributary flows and water quality
measurements (physical, chemical and biological). The model uses experimentally
available information such as total dissolved solids, total suspended solids,
pH, electrical conductivity, and water temperature to estimate whether the E.
Coli counts would exceed the acceptable standard. For setting up this model,
field data collection was carried out during 2019 beachgoer’s season.</p>
<p>IDEM
recommends posting an advisory at the beach indicating swimming and wading are
not recommended when E. coli counts exceed advisory standards. Based on the
advisory limit, a single water sample shall not exceed an E. Coli count of 235 colony
forming units per 100 milliliters (cfu/100ml). Advisories are removed when
bacterial levels fall within the acceptable standard. However, the E. coli
results were available after a time lag leading to beach closures from previous
day results. Nowcast models allow beach managers to make real-time beach
advisory decisions instead of waiting a day or more for laboratory results to
become available.</p>
<p>Using the
historical data, an extensive experiment was carried out, to obtain the
suitable input variables and optimal neural network architecture. The best feed-forward
neural network model was developed using Bayesian Regularization Neural Network
(BRNN) training algorithm. Developed ANN model showed an average prediction
accuracy of around 87% in predicting the E. coli classes. </p>
|
336 |
Régression non-paramétrique pour variables fonctionnelles / Non parametric regression for functional dataElamine, Abdallah Bacar 23 March 2010 (has links)
Cette thèse se décompose en quatre parties auxquelles s'ajoute une présentation. Dans un premier temps, on expose les outils mathématiques essentiels à la compréhension des prochains chapitres. Dans un deuxième temps, on s'intéresse à la régression non paramétrique locale pour des données fonctionnelles appartenant à un espace de Hilbert. On propose, tout d'abord, un estimateur de l'opérateur de régression. La construction de cet estimateur est liée à la résolution d'un problème inverse linéaire. On établit des bornes de l'erreur quadratique moyenne (EQM) de l'estimateur de l'opérateur de régression en utilisant une décomposition classique. Cette EQM dépend de la fonction de petite boule de probabilité du régresseur au sujet de laquelle des hypothèses de type Gamma-variation sont posées. Dans le chapitre suivant, on reprend le travail élaboré dans le précédent chapitre en se plaçant dans le cadre de données fonctionnelles appartenant à un espace semi-normé. On établit des bornes de l'EQM de l'estimateur de l'opérateur de régression. Cette EQM peut être vue comme une fonction de la fonction de petite boule de probabilité. Dans le dernier chapitre, on s'intéresse à l'estimation de la fonction auxiliaire associée à la fonction de petite boule de probabilité. D'abord, on propose un estimateur de cette fonction auxiliare. Ensuite, on établit la convergence en moyenne quadratique et la normalité asymptotique de cet estimateur. Enfin, par des simulations, on étudie le comportement de de cet estimateur au voisinage de zéro. / This thesis is divided in four sections with an additionnal presentation. In the first section, We expose the essential mathematics skills for the comprehension of the next sections. In the second section, we adress the problem of local non parametric with functional inputs. First, we propose an estimator of the unknown regression function. The construction of this estimator is related to the resolution of a linear inverse problem. Using a classical method of decomposition, we establish a bound for the mean square error (MSE). This bound depends on the small ball probability of the regressor which is assumed to belong to the class of Gamma varying functions. In the third section, we take again the work done in the preceding section by being situated in the frame of data belonging to a semi-normed space with infinite dimension. We establish bound for the MSE of the regression operator. This MSE can be seen as a function of the small ball probability function. In the last section, we interest to the estimation of the auxiliary function. Then, we establish the convergence in mean square and the asymptotic normality of the estimator. At last, by simulations, we study the bahavour of this estimator in a neighborhood of zero.
|
337 |
Competitive Strategies of Digital Platforms in New Markets : An analysis of the strategies and firm financial performanceof digital platforms entering competitive markets in theNordicsFouhy, David, Pais, Alfredo January 2022 (has links)
Over the recent decade the world has seen an increase in businesses launching new, or changing theirbusiness model to, digital platforms. New and established businesses are flocking to digital platformsin order to evolve their business model and keep up with advancements in technology, such as cloudcomputing, which enables commerce and communication on a much faster and more streamlinedlevel. Digital platforms with two-sided markets often face fierce competition from market incumbentswhich benefit from traditional supply-side economies of scale, as well as from other digital platforms.Therefore, the competitive strategy adopted at market launch and under operations will have a greatimpact on the platform performance in terms of firm financial performance.This study is divided into two parts and is performed with the objective to gain insight into thecompetitive strategies adopted by digital platform businesses with two-sided markets, and how suchstrategic decisions may be informed in favor of profitability. The first part investigates the influence ofinternal factors, such as debt ratio, quick ratio, sales growth, and capital turnover ratio, on the firmfinancial performance (measured by return on assets) of digital platforms with two-sided markets inthe Nordics. The second part investigates the relationship between the firm financial performance(measured by return on assets) of digital platform businesses with two-sided markets after launch andthe type of strategy adopted. Subsequently, two hypotheses are presented. Subsequently, twohypotheses are presented. A panel data regression model is developed to evaluate these relationships,allowing the authors to test the null hypotheses. The data set used in the panel data regression modelcomprise an unbalanced sample of 27 companies who have launched their platforms in Norway,Sweden, and Denmark. Financial data was gathered in the form of return on assets (dependentvariable), capital turnover ratio, quick ratio, debt ratio, and sales growth (explanatory variables).These companies were grouped depending on which strategy was adopted on market launch and underearly operations. These strategies are subsidy, seeding and marquee users, micro market launch andpiggybacking (categorical ‘dummy’ variables).Studying the firm financial performance of businesses which adopt digital platforms will help us tobetter understand the efficacy of strategies adopted and how these strategies impact financialperformance. Both null hypotheses tested may be partially rejected. The authors conclude that theinternal factors debt ratio, quick ratio, and sales growth have a significant influence on theprofitability (measured by return on assets) of digital platforms with two-sided markets in the Nordics.The influence of the internal factor capital turnover ratio on profitability is statistically insignificant.Quick ratio has a positive significant influence on profitability, whereas debt ratio and sales growthhave a negative influence. The authors also conclude that companies which have business modelsallowing them to adopt a subsidy strategy yields stronger profitability than those which adopt otherstrategies. Companies which entice seed & marquee users to their platform as a strategy yields thesecond strongest profitability. Companies which choose a micro market launch strategy yield theweakest profitability. The authors of this study will not draw conclusions on the efficacy of theadoption of a piggybacking strategy on profitability due to the limited number of observationsattributed to the piggybacking dataset.Future studies may expand upon this research with the inclusion of a wider catchment of businesses,as well as the inclusion of a wider data set to include other geographical locations and improvestatistical significance of the data set. An improvement to the study may also be to analyze thecorrelation between the strength of competitors upon market entry and the efficacy of the strategiesadopted.
|
338 |
Vliv koeficientu redukce na zdroj ceny na výsledný index odlišnosti při komparativní metodě oceňování nemovitostí / The price source reducing coefficient impact on total index of dissimilarity by the real estate valuation comparative methodCupal, Martin Unknown Date (has links)
True market prices of real estates, unlike bid prices, are often hard to reach. Nevertheless, this information is necessary for many direct and indirect real estate market subjects, especially for valuation purposes. Therefore the bid prices of concrete real estates are often used, but they are not generally equivalent market prices. And so it´s necessary to find some way to convert bid prices to market prices. This dissertation thesis shows definite approach to this issue. Market price and bid price rate is estimated by multi-dimensional linear regression model and non-linear estimations of simple regression. Multi-dimensional linear regression model estimates the values of this rate from other variables, like supply duration, price line according to localities and other. Non-linear estimations of regression function were used for the trend bid and market price modelling in dependence on number of the population in various localities.
|
339 |
Predicting Workforce in Healthcare : Using Machine Learning Algorithms, Statistical Methods and Swedish Healthcare Data / Predicering av Arbetskraft inom Sjukvården genom Maskininlärning, Statistiska Metoder och Svenska SjukvårdsstatistikDiskay, Gabriel, Joelsson, Carl January 2023 (has links)
Denna studie undersöker användningen av maskininlärningsmodeller för att predicera arbetskraftstrender inom hälso- och sjukvården i Sverige. Med hjälp av en linjär regressionmodell, en Gradient Boosting Regressor-modell och en Exponential Smoothing-modell syftar forskningen för detta arbete till att ge viktiga insikter för underlaget till makroekonomiska överväganden och att ge en djupare förståelse av Beveridge-kurvan i ett sammanhang relaterat till hälso- och sjukvårdssektorn. Trots vissa utmaningar med datan är målet att förbättra noggrannheten och effektiviteten i beslutsfattandet rörande arbetsmarknaden. Resultaten av denna studie visar maskininlärningspotentialen i predicering i ett ekonomiskt sammanhang, även om inneboende begränsningar och etiska överväganden beaktas. / This study examines the use of machine learning models to predict workforce trends in the healthcare sector in Sweden. Using a Linear Regression model, a Gradient Boosting Regressor model, and an Exponential Smoothing model the research aims to grant needed insight for the basis of macroeconomic considerations and to give a deeper understanding of the Beveridge Curve in the healthcare sector’s context. Despite some challenges with data, the goal is to improve the accuracy and efficiency of the policy-making around the labor market. The results of this study demonstrates the machine learning potential in the forecasting within an economic context, although inherent limitations and ethical considerations are considered.
|
340 |
Spatio-temporal analyses of the distribution of alcohol outlets in CaliforniaLi, Li January 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The objective of this research is to examine the development of the California alcohol outlets over time and the relationship between neighborhood characteristics and densities of the alcohol outlets. Two types of advanced analyses were done after the usual preliminary description of data. Firstly, fixed and random effects linear regression were used for the county panel data across time (1945-2010) with a dummy variable added to capture the change in law regarding limitations on alcohol outlets density. Secondly, a Bayesian spatio-temporal Poisson regression of the census tract panel data was conducted to capture recent availability of population characteristics affecting outlet density. The spatial Conditional Autoregressive model was embedded in the Poisson regression to detect spatial dependency of unexplained variance of alcohol outlet density. The results show that the alcohol outlets density reduced under the limitation law over time. However, it was no more effective in reducing the growth of alcohol outlets after the limitation was modified to be more restrictive. Poorer, higher vacancy rate and lower percentage of Black neighborhoods tend to have higher alcohol outlet density (numbers of alcohol outlets to population ratio) for both on-sale general and off-sale general. Other characteristics like percentage of Hispanics, percentage of Asians, percentage of younger population and median income of adjacency neighbors were associated with densities of on-sale general and off sale general alcohol outlets. Some regions like the San Francisco Bay area and the Greater Los Angeles area have more alcohol outlets than the predictions of neighborhood characteristics included in the model.
|
Page generated in 0.0448 seconds