Spelling suggestions: "subject:"bnetwork sampling"" "subject:"conetwork sampling""
1 |
Exploring network models under samplingZhou, Shu January 1900 (has links)
Master of Science / Department of Statistics / Perla Reyes / Networks are defined as sets of items and their connections. Interconnected items are
represented by mathematical abstractions called vertices (or nodes), and the links connecting pairs of vertices are known as edges. Networks are easily seen in everyday life: a network of friends, the Internet, metabolic or citation networks. The increase of available data and the need to analyze network have resulted in the proliferation of models for networks. However, for networks with billions of nodes and edges, computation and inference might not be achieved within a reasonable amount of time or budget. A sampling approach seems a natural choice, but traditional models assume that we can have access to the entire network. Moreover, when data is only available for a sampled sub-network conclusions tend to be extrapolated to the whole network/population without regard to sampling error.
The statistical problem this report addresses is the issue of how to sample a sub-network and then draw conclusions about the whole network. Are some sampling techniques better than others? Are there more efficient ways to estimate parameters of interest? In which way can we measure how effectively my method is reproducing the original network? We explore these questions with a simulation study on Mesa High School students' friendship network. First, to assess the characteristics of the whole network, we applied the traditional exponential random graph model (ERGM) and a stochastic blockmodel to the complete population of 205 students. Then, we drew simple random and stratified samples of 41 students, applied the traditional ERGM and the stochastic blockmodel again, and defined a way to generalized the sample findings to the population friendship network of 205 students. Finally, we used the degree distribution and other network statistics to compare the true friendship network with the projected one.
We achieved the following important results: 1) as expected stratified sampling outperforms simple random sampling when selecting nodes; 2) ERGM without restrictions offers a poor estimate for most of the tested parameters; and 3) the Bayesian stochastic blockmodel estimation using a strati ed sample of nodes achieves the best results.
|
2 |
Statistical analysis of network data motivated by problems in online social mediaZhang, Yaonan 08 April 2016 (has links)
Networks have been widely used to represent and analyze a system of connected elements. Online social media networks, as a result of the expansion of the Internet and increased need of communication, have become an increasingly important part of people's lives. This thesis focuses on the statistical analysis of network data motivated by problems in online social media. It discusses problems arising from both explicit network data and implicit network data. Explicit network data are data where network structures are observable, implicit network data are those that do not have a network structure but occur under the influence of an underlying network.
For the explicit network data analysis, we develop a novel method of recovering a fundamental characteristic -- network degree distributions -- under sampling. We formulate the problem of estimating degree distribution as an inverse problem. We show that this problem is ill-conditioned for many sampling methods in practice, and accordingly propose a constrained, penalized weighted least-squares approach to solve this problem. We demonstrate the ability of our method to accurately reconstruct the degree distributions from simulated network data and real world social network data. We also propose practical usage of the estimates relevant to marketing and advertising.
For the implicit network data analysis, we look at review data from the popular review websites. Motivated by articles from the popular press and the research community which publicized that the average rating for top review sites is above 4 out of 5 stars, we study the phenomena of review rating trends and convergence using restaurant review data from TripAdvisor. We analyze the trend on different levels -- a rough analysis of the characteristics of the ratings, and a subtler statistical modeling with ordinal logistic regressions. Taking into account the implicit network underlying the review data, we suggest the upward trend observed in restaurant review ratings may be explained by social influence on an individual's perception of qualities. We use the intensity of review postings as an indicator of how popular a restaurant is and to test to what extent the increase in review intensity explains increases in average rating. After that, we consider a more nuanced approach to the joint modeling of ratings and review intensity which would allow for interaction between the two, rather than intensity serving only as an explanatory variable to ratings. Specifically, a state-space model is used to test the interaction between review intensity and review ratings.
|
3 |
On Estimation Problems in Network SamplingWei, Ran January 2016 (has links)
No description available.
|
4 |
Three essays on social networks and the diffusion of innovation modelsPyo, Tae-Hyung 01 July 2014 (has links)
The Bass model has been used extensively and globally to forecast the first purchases of new products. It has been named by INFORMS as one of the top 10 most influential papers published in the 50-year history of Management Science. Most models for the diffusion of innovation are deeply rooted in the work of Bass (1969). His work provides a framework to model the underlying process of innovation adaption among first-time customers.
Potential customers may be connected to one another in some sort of network. Prior research has shown that the structure of a network affects adoption patterns (Dover et al. 2012; Hill et al. 2006; Katona and Sarvary 2008; Katona et al. 2011; Newman et al. 2006; Shaikh et al. 2010; Van den Bulte and Joshi 2007). One approach to addressing this issue is to incorporate network information into the original Bass model. The focus of this study is to explore how to incorporate network information and other micro-level data into the Bass model.
First, I prove that the Bass Model assumes all potential customers are linked to all other customers. Through simulations of individual adoptions and connections among individuals using a Random Network , I show that the estimate of q in the Bass Model is biased downward in the original Bass model. I find that biases in the Bass Model depend on the structure of the network. I relax the assumption of the fully connected network by proposing a Network-Based Bass model (NBB), which incorporates the network structure into the traditional Bass model. Using the proposed model (NBB), I am able to recover the true parameters.
To test the generalizability and to enhance the applicability of my NBB model, I tested my NBB model on the various network types with sampled data from the population network. I showed that my NBB model is robust across different types of networks, and it is efficient in terms of sample size. With a small fraction of data from the population, it accurately recovered the true parameters. Therefore, the NBB model can be used when we do not have complete network information.
The last essay is the first attempt to incorporate heterogeneous peer influence into the NBB model, based on individuals' preference structures. Besides the significant extension of the NBB (Bass) Model, incorporating high-quality data on individual behavior into the model leads to new findings on individuals' adoption behaviors, and thus expands our knowledge of the diffusion process.
|
5 |
Analyse en identification partielle de la décision d'émigrer des étudiants africainsMéango, Natoua Romuald 05 1900 (has links)
La migration internationale d’étudiants est un investissement couteux pour les familles dans beaucoup de pays en voie de développement. Cependant, cet investissement est susceptible de générer des bénéfices financiers et sociaux relativement importants aux investisseurs, tout autant que des externalités pour d’autres membres de la famille. Cette thèse s’intéresse à deux aspects importants de la migration des étudiants internationaux : (i) Qui part? Quels sont les déterminants de la probabilité de migration? (ii) Qui paie? Comment la famille s’organise-t-elle pour couvrir les frais de la migration? (iii) Qui y gagne? Ce flux migratoire est-il au bénéfice du pays d’origine?
Entreprendre une telle étude met le chercheur en face de défis importants, notamment, l’absence de données complètes et fiables; la dispersion géographique des étudiants migrants en étant la cause première. La première contribution importante de ce travail est le développement d’une méthode de sondage en « boule de neige » pour des populations difficiles à atteindre, ainsi que d’estimateurs corrigeant les possibles biais de sélection. A partir de cette méthodologie, j’ai collecté des données incluant simultanément des étudiants migrants et non-migrants du Cameroun en utilisant une plateforme internet.
Un second défi relativement bien documenté est la présence d’endogénéité du choix d’éducation. Nous tirons avantage des récents développements théoriques dans le traitement des problèmes d’identification dans les modèles de choix discrets pour résoudre cette difficulté, tout en conservant la simplicité des hypothèses nécessaires. Ce travail constitue l’une des premières applications de cette méthodologie à des questions de développement.
Le premier chapitre de la thèse étudie la décision prise par la famille d’investir dans la migration étudiante. Il propose un modèle structurel empirique de choix discret qui reflète à la fois le rendement brut de la migration et la contrainte budgétaire liée au problème de choix des agents. Nos résultats démontrent que le choix du niveau final d’éducation, les résultats académiques et l’aide de la famille sont des déterminants importants de la probabilité d’émigrer, au contraire du genre qui ne semble pas affecter très significativement la décision familiale.
Le second chapitre s’efforce de comprendre comment les agents décident de leur participation à la décision de migration et comment la famille partage les profits et décourage le phénomène de « passagers clandestins ». D’autres résultats dans la littérature sur l’identification partielle nous permettent de considérer des comportements stratégiques au sein de l’unité familiale. Les premières estimations suggèrent que le modèle « unitaire », où un agent représentatif maximise l’utilité familiale ne convient qu’aux familles composées des parents et de l’enfant. Les aidants extérieurs subissent un cout strictement positif pour leur participation, ce qui décourage leur implication. Les obligations familiales et sociales semblent expliquer les cas de participation d’un aidant, mieux qu’un possible altruisme de ces derniers.
Finalement, le troisième chapitre présente le cadre théorique plus général dans lequel s’imbriquent les modèles développés dans les précédents chapitres. Les méthodes d’identification et d’inférence présentées sont spécialisées aux jeux finis avec information complète. Avec mes co-auteurs, nous proposons notamment une procédure combinatoire pour une implémentation efficace du bootstrap aux fins d’inférences dans les modèles cités ci-dessus. Nous en faisons une application sur les déterminants du choix familial de soins à long terme pour des parents âgés. / International migration of students is a costly investment for family units in many developing countries. However, it might yield substantial financial and social return for the investors, as well as externalities for other family members. Furthermore, when these family decisions aggregate at the country-level, they affect the stock of human capital available to the origin country. This thesis addresses primarily two aspects of international student migration: (i) Who goes? What are the determinants of the probability of migration? (ii) Who pays? How does the family organize to bear the cost of the migration?
Engaging in this study, one faces the challenge of data limitation, a direct consequence of the geographical dispersion of the population of interest. The first important contribution of this work is to provide a new snowball sampling methodology for hard-to-reach population, along with estimators to correct selection-biases. I collected data which include both migrant and non-migrant students from Cameroon, using an online-platform.
A second challenge is the well-documented problem of endogeneity of the educational attainment. I take advantage of recent advances in the treatment of identification problems in discrete choice models to solve this issue while keeping assumptions at a low level. In particular, validity of the partial identification methodology does not rest on the existence of an instrument. To the best of my knowledge, this is the first empirical application of this methodology to development related issues.
The first chapter studies the decision made by a family to invest in student. I propose an empirical structural decision model which reflects the importance of both the return of the investment and the budgetary constraint in agent choices. Our results show that the choice of level of education, the help of the family and academic results in secondary school are significant determinant of the probability to migrate, unlike the gender which does not seem to play any role in the family decision.
The objective of the second chapter is to understand how agents decide to be part of the migration project and how the family organizes itself to share profits and discourage free riding-behavior. Further results on partial identification for games of incomplete information allow us to consider strategic behavior of family. My estimation suggests that models with a representative individual suit only families which consist of parent and child, but are rejected when a significant extended family member is introduced. Helpers incur a non-zero cost of participation that discourages involvement in the migration process. Kinship obligations and not altruism appears as the main reason of participation.
Finally, the third chapter presents the more general theoretical framework in which my models are imbedded. The method presented is specialized to infinite games of complete information, but is of interest for application to the empirical analysis of instrumental variable models of discrete choice (Chapter 1), cooperative and non-cooperative games (Chapter 2), as well as revealed preference analysis. With my co-authors, we propose an efficient combinatorial bootstrap procedure for inference in games of complete information that runs in linear computing time and an application to the determinants of long term elderly care choices.
|
6 |
Analyse en identification partielle de la décision d'émigrer des étudiants africainsMéango, Natoua Romuald 05 1900 (has links)
La migration internationale d’étudiants est un investissement couteux pour les familles dans beaucoup de pays en voie de développement. Cependant, cet investissement est susceptible de générer des bénéfices financiers et sociaux relativement importants aux investisseurs, tout autant que des externalités pour d’autres membres de la famille. Cette thèse s’intéresse à deux aspects importants de la migration des étudiants internationaux : (i) Qui part? Quels sont les déterminants de la probabilité de migration? (ii) Qui paie? Comment la famille s’organise-t-elle pour couvrir les frais de la migration? (iii) Qui y gagne? Ce flux migratoire est-il au bénéfice du pays d’origine?
Entreprendre une telle étude met le chercheur en face de défis importants, notamment, l’absence de données complètes et fiables; la dispersion géographique des étudiants migrants en étant la cause première. La première contribution importante de ce travail est le développement d’une méthode de sondage en « boule de neige » pour des populations difficiles à atteindre, ainsi que d’estimateurs corrigeant les possibles biais de sélection. A partir de cette méthodologie, j’ai collecté des données incluant simultanément des étudiants migrants et non-migrants du Cameroun en utilisant une plateforme internet.
Un second défi relativement bien documenté est la présence d’endogénéité du choix d’éducation. Nous tirons avantage des récents développements théoriques dans le traitement des problèmes d’identification dans les modèles de choix discrets pour résoudre cette difficulté, tout en conservant la simplicité des hypothèses nécessaires. Ce travail constitue l’une des premières applications de cette méthodologie à des questions de développement.
Le premier chapitre de la thèse étudie la décision prise par la famille d’investir dans la migration étudiante. Il propose un modèle structurel empirique de choix discret qui reflète à la fois le rendement brut de la migration et la contrainte budgétaire liée au problème de choix des agents. Nos résultats démontrent que le choix du niveau final d’éducation, les résultats académiques et l’aide de la famille sont des déterminants importants de la probabilité d’émigrer, au contraire du genre qui ne semble pas affecter très significativement la décision familiale.
Le second chapitre s’efforce de comprendre comment les agents décident de leur participation à la décision de migration et comment la famille partage les profits et décourage le phénomène de « passagers clandestins ». D’autres résultats dans la littérature sur l’identification partielle nous permettent de considérer des comportements stratégiques au sein de l’unité familiale. Les premières estimations suggèrent que le modèle « unitaire », où un agent représentatif maximise l’utilité familiale ne convient qu’aux familles composées des parents et de l’enfant. Les aidants extérieurs subissent un cout strictement positif pour leur participation, ce qui décourage leur implication. Les obligations familiales et sociales semblent expliquer les cas de participation d’un aidant, mieux qu’un possible altruisme de ces derniers.
Finalement, le troisième chapitre présente le cadre théorique plus général dans lequel s’imbriquent les modèles développés dans les précédents chapitres. Les méthodes d’identification et d’inférence présentées sont spécialisées aux jeux finis avec information complète. Avec mes co-auteurs, nous proposons notamment une procédure combinatoire pour une implémentation efficace du bootstrap aux fins d’inférences dans les modèles cités ci-dessus. Nous en faisons une application sur les déterminants du choix familial de soins à long terme pour des parents âgés. / International migration of students is a costly investment for family units in many developing countries. However, it might yield substantial financial and social return for the investors, as well as externalities for other family members. Furthermore, when these family decisions aggregate at the country-level, they affect the stock of human capital available to the origin country. This thesis addresses primarily two aspects of international student migration: (i) Who goes? What are the determinants of the probability of migration? (ii) Who pays? How does the family organize to bear the cost of the migration?
Engaging in this study, one faces the challenge of data limitation, a direct consequence of the geographical dispersion of the population of interest. The first important contribution of this work is to provide a new snowball sampling methodology for hard-to-reach population, along with estimators to correct selection-biases. I collected data which include both migrant and non-migrant students from Cameroon, using an online-platform.
A second challenge is the well-documented problem of endogeneity of the educational attainment. I take advantage of recent advances in the treatment of identification problems in discrete choice models to solve this issue while keeping assumptions at a low level. In particular, validity of the partial identification methodology does not rest on the existence of an instrument. To the best of my knowledge, this is the first empirical application of this methodology to development related issues.
The first chapter studies the decision made by a family to invest in student. I propose an empirical structural decision model which reflects the importance of both the return of the investment and the budgetary constraint in agent choices. Our results show that the choice of level of education, the help of the family and academic results in secondary school are significant determinant of the probability to migrate, unlike the gender which does not seem to play any role in the family decision.
The objective of the second chapter is to understand how agents decide to be part of the migration project and how the family organizes itself to share profits and discourage free riding-behavior. Further results on partial identification for games of incomplete information allow us to consider strategic behavior of family. My estimation suggests that models with a representative individual suit only families which consist of parent and child, but are rejected when a significant extended family member is introduced. Helpers incur a non-zero cost of participation that discourages involvement in the migration process. Kinship obligations and not altruism appears as the main reason of participation.
Finally, the third chapter presents the more general theoretical framework in which my models are imbedded. The method presented is specialized to infinite games of complete information, but is of interest for application to the empirical analysis of instrumental variable models of discrete choice (Chapter 1), cooperative and non-cooperative games (Chapter 2), as well as revealed preference analysis. With my co-authors, we propose an efficient combinatorial bootstrap procedure for inference in games of complete information that runs in linear computing time and an application to the determinants of long term elderly care choices.
|
7 |
Échantillonnage et inférence dans réseaux complexes / Sampling and inference in complex networksKazhuthuveettil Sreedharan, Jithin 02 December 2016 (has links)
L’émergence récente de grands réseaux, surtout réseaux sociaux en ligne (OSN), a révélé la difficulté de crawler le réseau complet et a déclenché le développement de nouvelles techniques distribuées. Dans cette thèse, nous concevons et analysons des algorithmes basés sur les marches aléatoires et la diffusion pour l'échantillonnage, l'estimation et l'inférence des fonctions des réseaux. La thèse commence par le problème classique de trouver les valeurs propres dominants et leurs vecteurs propres de matrices de graphe symétriques, comme la matrice Laplacienne de graphes non orientés. En utilisant le fait que le spectre est associé à une équation de type différentiel Schrödinger, nous développons des techniques évolutives à l’aide de la diffusion sur le graphe. Ensuite, nous considérons l’échantillonnage des fonctions de réseau (comme somme et moyenne) en utilisant les marches aléatoires sur le graphe. Afin d'éviter le temps «burn-in» de marche aléatoire, avec l'idée de régénération à un nœud fixe, nous développons un estimateur de la fonction de somme qui est non asymptotiquement non-biaisé et dérivons une approximation à la postérieure Bayésienne. La dernière partie de la thèse étudie l'application de la théorie des valeurs extrêmes pour faire une inférence sur les événements extrêmes à partir des échantillons stationnaires des différentes marches aléatoires pour l’échantillonnage de réseau / The recent emergence of large networks, mainly due to the rise of online social networks, brought out the difficulty to gather a complete picture of a network and it prompted the development of new distributed techniques. In this thesis, we design and analyze algorithms based on random walks and diffusion for sampling, estimation and inference of the network functions, and for approximating the spectrum of graph matrices. The thesis starts with the classical problem of finding the dominant eigenvalues and the eigenvectors of symmetric graph matrices like Laplacian of undirected graphs. Using the fact that the eigenspectrum is associated with a Schrödinger-type differential equation, we develop scalable techniques with diffusion over the graph and with gossiping algorithms. They are also adaptable to a simple algorithm based on quantum computing. Next, we consider sampling and estimation of network functions (sum and average) using random walks on graph. In order to avoid the burn-in time of random walks, with the idea of regeneration at its revisits to a fixed node, we develop an estimator for the aggregate function which is non-asymptotically unbiased and derive an approximation to its Bayesian posterior. An estimator based on reinforcement learning is also developed making use of regeneration. The final part of the thesis deals with the use of extreme value theory to make inference from the stationary samples of the random walks. Extremal events such as first hitting time of a large degree node, order statistics and mean cluster size are well captured in the parameter “extremal index”. We theoretically study and estimate extremal index of different random walk sampling techniques
|
Page generated in 0.1069 seconds