Global ETD Search

61	Microeconometric Models with Endogeneity -- Theoretical and Empirical Studies Dong, Yingying January 2009 (has links) Thesis advisor: Arthur Lewbel / This dissertation consists of three independent essays in applied microeconomics and econometrics. Essay 1 investigates the issue why individuals with health insurance use more health care. One obvious reason is that health care is cheaper for the insured. But additionally, having insurance can encourage unhealthy behavior via moral hazard. The effect of health insurance on medical utilization has been extensively studied; however, previous work has mostly ignored the effect of insurance on behavior and how that in turn affects medical utilization. This essay examines these distinct effects. The increased medical utilization due to reduced prices may help the insured maintain good health, while that due to increased unhealthy behavior does not, so distinguishing these two effects has important policy implications. A two-period dynamic forward-looking model is constructed to derive the structural causal relationships among the decision to buy insurance, health behaviors (drinking, smoking, and exercise), and medical utilization. The model shows how exogenous changes in insurance prices and past behaviors can identify the direct and indirect effects of insurance on medical utilization. An empirical analysis also distinguishes between intensive and extensive margins (e.g., changes in the number of drinkers vs. the amount of alcohol consumed) of the insurance effect, which turns out to be empirically important. Health insurance is found to encourage less healthy behavior, particularly heavy drinking, but this does not yield a short term perceptible increase in doctor or hospital visits. The effects of health insurance are primarily found at the intensive margin, e.g., health insurance may not cause a non-drinker to take up drinking, while it encourages a heavy drinker to drink even more. These results suggest that to counteract behavioral moral hazard, health insurance should be coupled with incentives that target individuals who currently engage in unhealthy behaviors, such as heavy drinkers. Essay 2 examines the effect of repeating kindergarten on the retained children's academic performance. Although most existing research concludes that grade retention generates no benefits for retainees' later academic performance, holding low achieving children back has been a popular practice for decades. Drawing on a recently collected nationally representative data set in the US, this paper estimates the causal effect of kindergarten retention on the retained children's later academic performance. Since children are observed being held back only when they enroll in schools that permit retention, this paper jointly models 1) the decision of entering a school allowing for kindergarten retention, 2) the decision of undergoing a retention treatment in kindergarten, and 3) children's academic performance in higher grades. The retention treatment is modeled as a binary choice with sample selection. The outcome equations are linear regressions including the kindergarten retention dummy as an endogenous regressor with a correlated random coefficient. A control function estimator is developed for estimating the resulting double-hurdle treatment model, which allows for unobserved heterogeneity in the retention effect. As a comparison, a nonparametric bias-corrected nearest neighbor matching estimator is also implemented. Holding children back in kindergarten is found to have positive but diminishing effects on their academic performance up to the third grade. Essay 3 proves the semiparametric identification of a binary choice model having an endogenous regressor without relying on outside instruments. A simple estimator and a test for endogeneity are provided based on this identification. These results are applied to analyze working age male's migration within the US, where labor income is potentially endogenous. Identification relies on the fact that the migration probability among workers is close to linear in age while labor income is nonlinear in age(when both are nonparametrically estimated). Using data from the PSID, this study finds that labor income is endogenous and that ignoring this endogeneity leads to downward bias in the estimated effect of labor income on the migration probability. / Thesis (PhD) — Boston College, 2009. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Economics. Control Function Endogeneity Limited Dependent Variable Model Nearest Neighbor Matching Selection Effect Structual Model
62	Learning deep embeddings by learning to rank He, Kun 05 February 2019 (has links) We study the problem of embedding high-dimensional visual data into low-dimensional vector representations. This is an important component in many computer vision applications involving nearest neighbor retrieval, as embedding techniques not only perform dimensionality reduction, but can also capture task-specific semantic similarities. In this thesis, we use deep neural networks to learn vector embeddings, and develop a gradient-based optimization framework that is capable of optimizing ranking-based retrieval performance metrics, such as the widely used Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG). Our framework is applied in three applications. First, we study Supervised Hashing, which is concerned with learning compact binary vector embeddings for fast retrieval, and propose two novel solutions. The first solution optimizes Mutual Information as a surrogate ranking objective, while the other directly optimizes AP and NDCG, based on the discovery of their closed-form expressions for discrete Hamming distances. These optimization problems are NP-hard, therefore we derive their continuous relaxations to enable gradient-based optimization with neural networks. Our solutions establish the state-of-the-art on several image retrieval benchmarks. Next, we learn deep neural networks to extract Local Feature Descriptors from image patches. Local features are used universally in low-level computer vision tasks that involve sparse feature matching, such as image registration and 3D reconstruction, and their matching is a nearest neighbor retrieval problem. We leverage our AP optimization technique to learn both binary and real-valued descriptors for local image patches. Compared to competing approaches, our solution eliminates complex heuristics, and performs more accurately in the tasks of patch verification, patch retrieval, and image matching. Lastly, we tackle Deep Metric Learning, the general problem of learning real-valued vector embeddings using deep neural networks. We propose a learning to rank solution through optimizing a novel quantization-based approximation of AP. For downstream tasks such as retrieval and clustering, we demonstrate promising results on standard benchmarks, especially in the few-shot learning scenario, where the number of labeled examples per class is limited. Computer science Average precision Computer vision Deep learning Learning to rank Nearest neighbor retrieval Vector embedding
63	Automatic text categorization for information filtering. January 1998 (has links) Ho Chao Yang. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1998. / Includes bibliographical references (leaves 157-163). / Abstract also in Chinese. / Abstract --- p.i / Acknowledgment --- p.iii / List of Figures --- p.viii / List of Tables --- p.xiv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Automatic Document Categorization --- p.1 / Chapter 1.2 --- Information Filtering --- p.3 / Chapter 1.3 --- Contributions --- p.6 / Chapter 1.4 --- Organization of the Thesis --- p.7 / Chapter 2 --- Related Work --- p.9 / Chapter 2.1 --- Existing Automatic Document Categorization Approaches --- p.9 / Chapter 2.1.1 --- Rule-Based Approach --- p.10 / Chapter 2.1.2 --- Similarity-Based Approach --- p.13 / Chapter 2.2 --- Existing Information Filtering Approaches --- p.19 / Chapter 2.2.1 --- Information Filtering Systems --- p.19 / Chapter 2.2.2 --- Filtering in TREC --- p.21 / Chapter 3 --- Document Pre-Processing --- p.23 / Chapter 3.1 --- Document Representation --- p.23 / Chapter 3.2 --- Classification Scheme Learning Strategy --- p.26 / Chapter 4 --- A New Approach - IBRI --- p.31 / Chapter 4.1 --- Overview of Our New IBRI Approach --- p.31 / Chapter 4.2 --- The IBRI Representation and Definitions --- p.34 / Chapter 4.3 --- The IBRI Learning Algorithm --- p.37 / Chapter 5 --- IBRI Experiments --- p.43 / Chapter 5.1 --- Experimental Setup --- p.43 / Chapter 5.2 --- Evaluation Metric --- p.45 / Chapter 5.3 --- Results --- p.46 / Chapter 6 --- A New Approach - GIS --- p.50 / Chapter 6.1 --- Motivation of GIS --- p.50 / Chapter 6.2 --- Similarity-Based Learning --- p.51 / Chapter 6.3 --- The Generalized Instance Set Algorithm (GIS) --- p.58 / Chapter 6.4 --- Using GIS Classifiers for Classification --- p.63 / Chapter 6.5 --- Time Complexity --- p.64 / Chapter 7 --- GIS Experiments --- p.68 / Chapter 7.1 --- Experimental Setup --- p.68 / Chapter 7.2 --- Results --- p.73 / Chapter 8 --- A New Information Filtering Approach Based on GIS --- p.87 / Chapter 8.1 --- Information Filtering Systems --- p.87 / Chapter 8.2 --- GIS-Based Information Filtering --- p.90 / Chapter 9 --- Experiments on GIS-based Information Filtering --- p.95 / Chapter 9.1 --- Experimental Setup --- p.95 / Chapter 9.2 --- Results --- p.100 / Chapter 10 --- Conclusions and Future Work --- p.108 / Chapter 10.1 --- Conclusions --- p.108 / Chapter 10.2 --- Future Work --- p.110 / Chapter A --- Sample Documents in the corpora --- p.111 / Chapter B --- Details of Experimental Results of GIS --- p.120 / Chapter C --- Computational Time of Reuters-21578 Experiments --- p.141 Text processing (Computer science) Nearest neighbor analysis (Statistics) Information retrieval
64	Superseding neighbor search on uncertain data. / 在不確定的空間數據庫中尋找最高取代性的最近鄰 / Zai bu que ding de kong jian shu ju ku zhong xun zhao zui gao qu dai xing de zui jin lin January 2009 (has links) Yuen, Sze Man. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves [44]-46). / Abstract also in Chinese. / Thesis Committee --- p.i / Abstract --- p.ii / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Related Work --- p.6 / Chapter 2.1 --- Nearest Neighbor Search on Precise Data --- p.6 / Chapter 2.2 --- NN Search on Uncertain Data --- p.8 / Chapter 3 --- Problem Definitions and Basic Characteristics --- p.11 / Chapter 4 --- The Full-Graph Approach --- p.16 / Chapter 5 --- The Pipeline Approach --- p.19 / Chapter 5.1 --- The Algorithm --- p.20 / Chapter 5.2 --- Edge Phase --- p.24 / Chapter 5.3 --- Pruning Phase --- p.27 / Chapter 5.4 --- Validating Phase --- p.28 / Chapter 5.5 --- Discussion --- p.29 / Chapter 6 --- Extension --- p.31 / Chapter 7 --- Experiment --- p.34 / Chapter 7.1 --- Properties of the SNN-core --- p.34 / Chapter 7.2 --- Efficiency of Our Algorithms --- p.38 / Chapter 8 --- Conclusions and Future Work --- p.42 / Chapter A --- List of Publications --- p.43 / Bibliography --- p.44 Database management Nearest neighbor analysis (Statistics) Data structures (Computer science) Uncertainty (Information theory)
65	Evaluation of decentralized email architecture and social network analysis based on email attachment sharing Tsipenyuk, Gregory January 2018 (has links) Present day email is provided by centralized services running in the cloud. The services transparently connect users behind middleboxes and provide backup, redundancy, and high availability at the expense of user privacy. In present day mobile environments, users can access and modify email from multiple devices with updates reconciled on the central server. Prioritizing updates is difficult and may be undesirable. Moreover, legacy email protocols do not provide optimal email synchronization and access. Recent phenomena of the Internet of Things (IoT) will see the number of interconnected devices grow to 27 billion by 2021. In the first part of my dissertation I am proposing a decentralized email architecture which takes advantage of user's a IoT devices to maintain a complete email history. This addresses the email reconciliation issue and places data under user control. I replace legacy email protocols with a synchronization protocol to achieve eventual consistency of email and optimize bandwidth and energy usage. The architecture is evaluated on a Raspberry Pi computer. There is an extensive body of research on Social Network Analysis (SNA) based on email archives. Typically, the analyzed network reflects either communication between users or a relationship between the email and the information found in the email's header and the body. This approach discards either all or some email attachments that cannot be converted to text; for instance, images. Yet attachments may use up to 90% of an email archive size. In the second part of my dissertation I suggest extracting the network from email attachments shared between users. I hypothesize that the network extracted from shared email attachments might provide more insight into the social structure of the email archive. I evaluate communication and shared email attachments networks by analyzing common centrality measures and classication and clustering algorithms. I further demonstrate how the analysis of the shared attachments network can be used to optimize the proposed decentralized email architecture.
66	Sistemáticas de gestão de layout para aprimoramento dos fluxos de uma biblioteca universitária Argenta, Aline da Silva January 2017 (has links) O setor de serviços tem fundamental importância para a economia global, porém o layout de organizações deste setor tipicamente não é abordado com a mesma intensidade com que se discute arranjo físico em ambientes industriais. O objetivo desta dissertação reside na aplicação de sistemáticas de concepção de layout com vistas ao planejamento e aprimoramento do arranjo físico e agrupamento de recursos de uma biblioteca. Como objetivos específicos, traz a aplicação do planejamento sistemático de layout (SLP) para o posicionamento de recursos e organização dos fluxos de uma biblioteca, e a adaptação do algoritmo Close Neighbor para agrupamento de materiais bibliográficos (livros) em prateleiras de acordo com sua área de abrangência. Para tanto, inicialmente apresentam-se as características da Biblioteca da Faculdade de Farmácia da UFRGS (local de aplicação do estudo), a análise da movimentação de pessoas e de materiais, a abordagem proposta e as diretrizes para organização do arranjo físico da biblioteca e do acervo de livros. Dentre outros procedimentos operacionais, fez-se necessária a realização de reuniões com a equipe da biblioteca e com a direção da Faculdade de Farmácia, visando a estabelecer prioridades e definir características desejadas para o arranjo físico do espaço em estudo. Na sequência, implantou-se a proposta de layout selecionada, seguida de uma discussão acerca do desempenho da biblioteca antes e depois da implantação do novo layout; tal discussão foi baseada tanto em resultados numéricos (análise quantitativa) como na percepção da equipe envolvida (análise qualitativa). / The service sector is of fundamental importance to the global economy, but the layout of organizations in this sector is typically not approached with the same intensity with which physical arrangement is discussed in industrial environments. The objective of this dissertation is to apply layout design systematics with a view to planning and improving the physical arrangement and grouping of library resources. As a specific goal, the application of systematic layout planning (SLP) for the positioning of resources and organization of a library's flows, and the adaptation of the Close Neighbor algorithm for grouping bibliographic materials (books) into shelves according to their area of comprehensiveness. In order to do so, the characteristics of the Library of the Faculty of Pharmacy of UFRGS (place of application of the study), the analysis of the movement of people and materials, the proposed approach and the guidelines for the organization of the physical arrangement of the library and of the collection of books. Among other operational procedures, it was necessary to hold meetings with the library staff and the Faculty of Pharmacy, in order to establish priorities and define desired characteristics for the physical arrangement of the space under study. Next, the selected layout proposal was implanted, followed by a discussion about the library's performance before and after the implementation of the new layout; such a discussion was based on both numerical results (quantitative analysis) and the perception of the team involved (qualitative analysis). Biblioteca universitária Layout Algoritmos University library Layout Systematic Layout Planning (SLP) Close Neighbor Algorithm (CNA)
67	Learning From Spatially Disjoint Data Bhadoria, Divya 02 April 2004 (has links) Committees of classifiers, also called mixtures or ensembles of classifiers, have become popular because they have the potential to improve on the performance of a single classifier constructed from the same set of training data. Bagging and boosting are some of the better known methods of constructing a committee of classifiers. Committees of classifiers are also important because they have the potential to provide a computationally scalable approach to handling massive datasets. When the emphasis is on computationally scalable approaches to handling massive datasets, the individual classifiers are often constructed from a small faction of the total data. In this context, the ability to improve on the accuracy of a hypothetical single classifier created from all of the training data may be sacrificed. The design of a committee of classifiers typically assumes that all of the training data is equally available to be assigned to subsets as desired, and that each subset is used to train a classifier in the committee. However, there are some important application contexts in which this assumption is not valid. In many real life situations, massive data sets are created on a distributed computer, recording the simulation of important physical processes. Currently, experts visually browse such datasets to search for interesting events in the simulation. This sort of manual search for interesting events in massive datasets is time consuming. Therefore, one would like to construct a classifier that could automatically label the "interesting" events. The problem is that the dataset is distributed across a large number of processors in chunks that are spatially homogenous with respect to the underlying physical context in the simulation. Here, a potential solution to this problem using ensembles is explored. data mining decision tree nearest neighbor distributed learning classification American Studies Arts and Humanities
68	Brain Tumor Target Volume Determination for Radiation Therapy Treatment Planning Through the Use of Automated MRI Segmentation Mazzara, Gloria Patrika 27 February 2004 (has links) Radiation therapy seeks to effectively irradiate the tumor cells while minimizing the dose to adjacent normal cells. Prior research found that the low success rates for treating brain tumors would be improved with higher radiation doses to the tumor area. This is feasible only if the target volume can be precisely identified. However, the definition of tumor volume is still based on time-intensive, highly subjective manual outlining by radiation oncologists. In this study the effectiveness of two automated Magnetic Resonance Imaging (MRI) segmentation methods, k-Nearest Neighbors (kNN) and Knowledge-Guided (KG), in determining the Gross Tumor Volume (GTV) of brain tumors for use in radiation therapy was assessed. Three criteria were applied: accuracy of the contours; quality of the resulting treatment plan in terms of dose to the tumor; and a novel treatment plan evaluation technique based on post-treatment images. The kNN method was able to segment all cases while the KG method was limited to enhancing tumors and gliomas with clear enhancing edges. Various software applications were developed to create a closed smooth contour that encompassed the tumor pixels from the segmentations and to integrate these results into the treatment planning software. A novel, probabilistic measurement of accuracy was introduced to compare the agreement of the segmentation methods with the weighted average physician volume. Both computer methods under-segment the tumor volume when compared with the physicians but performed within the variability of manual contouring (28% plus/minus12% for inter-operator variability). Computer segmentations were modified vertically to compensate for their under-segmentation. When comparing radiation treatment plans designed from physician-defined tumor volumes with treatment plans developed from the modified segmentation results, the reference target volume was irradiated within the same level of conformity. Analysis of the plans based on post- treatment MRI showed that the segmentation plans provided similar dose coverage to areas being treated by the original treatment plans. This research demonstrates that computer segmentations provide a feasible route to automatic target volume definition. Because of the lower variability and greater efficiency of the automated techniques, their use could lead to more precise plans and better prognosis for brain tumor patients. automatic segmentation glioma magnetic resonance knowledge guided k-nearest neighbor American Studies Arts and Humanities
69	Efficient Adjacency Queries and Dynamic Refinement for Meshfree Methods with Applications to Explicit Fracture Modeling Olliff, James 22 June 2018 (has links) Meshfree methods provide a more practical approach to solving problems involving large deformation and modeling fracture compared to the Finite Element Method (FEM). However meshfree methods are more computationally intensive compared to FEM, which can limit their practicality in engineering. Meshfree methods also lack a clear boundary definition, restricting available visualization techniques. Determining particle locations and attributes such that a consistent approximation is ensured can be challenging in meshfree methods, especially when employing h-refinement. The primary objective of this work is to address the limitations associated with computational efficiency, meshfree domain discretization, and h-refinement, including both placement of particles as well as determination of particle attributes. To demonstrate the efficacy of these algorithms, a model predicting the failure of laminated composite structures using a meshfree method will be presented. laminated composites nearest neighbor particle placement Civil Engineering Other Mechanical Engineering
70	Defining activity areas in the Early Neolithic site at Foeni-Salaş (southwest Romania): A spatial analytic approach with geographical information systems in archaeology Lawson, Kathryn Sahara 20 September 2007 (has links) Through the years, there has been a great deal of archaeological research focused on the earliest farming cultures of Europe (i.e. Early Neolithic). However, little effort has been expended to uncover the type and nature of daily activities performed within Early Neolithic dwellings, particularly in the Balkans. This thesis conducts a spatial analysis of the Early Neolithic pit house levels of the Foeni-Salaş site in southeast Romania, in the northern half of the Balkans, to determine the kinds and locations of activities that occurred in these pit houses. Characteristic Early Neolithic dwellings in the northern Balkans are pit houses. The data are analyzed using Geographic Information Systems (GIS) technology in an attempt to identify non-random patterns that will indicate how the pit house inhabitants used their space. Both visual and statistical (Nearest Neighbor) techniques are used to identify spatial patterns. Spreadsheet data are incorporated into the map database in order to compare and contrast the results from the two techniques of analysis. Map data provides precise artefact locations, while spreadsheet data yield more generalized quad centroid information. Unlike the mapped data, the spreadsheet data also included artefacts recovered in sieves. Utilizing both data types gave a more complexand fuller understanding of how space was used at Foeni-Salaş. The results show that different types of activity areas are present within each of the pit houses. Comparison of interior to exterior artifact distributions demonstrates that most activities take place within pit house. Some of the activities present include weaving, food preparation, butchering, hide processing, pottery making, ritual, and other activities related to the running of households. It was found that these activities are placed in specific locations relative to features within the pit house and the physical structure of the pit house itself. This research adds to the growing body of archaeological research that implements GIS to answer questions and solve problems related to the spatial dimension of human behaviour. / February 2008 spatial analysis artefact patterns activity areas archaeology GIS Early Neolithic Romania pit houses Nearest Neighbor statistics

Search results