Global ETD Search

11	An Investigation of Multiple Pathways of Developmental Intervention Change Eichas, Kyle Robert 28 June 2010 (has links) Convergence among treatment, prevention, and developmental intervention approaches has led to the recognition of the need for evaluation models and research designs that employ a full range of evaluation information to provide an empirical basis for enhancing the efficiency, efficacy, and effectiveness of prevention and positive development interventions. This study reports an investigation of a positive youth development program using an Outcome Mediation Cascade (OMC) evaluation model, an integrated model for evaluating the empirical intersection between intervention and developmental processes. The Changing Lives Program (CLP) is a community supported positive youth development intervention implemented in a practice setting as a selective/indicated program for multi-ethnic, multi-problem at risk youth in urban alternative high schools. This study used a Relational Data Analysis integration of quantitative and qualitative data analysis strategies, including the use of both fixed and free response measures and a structural equation modeling approach, to construct and evaluate the hypothesized OMC model. Findings indicated that the hypothesized model fit the data (χ2 (7) = 6.991, p = .43; RMSEA = .00; CFI = 1.00; WRMR = .459). Findings also provided preliminary evidence consistent with the hypothesis that in addition to having effects on targeted positive outcomes, PYD interventions are likely to have progressive cascading effects on untargeted problem outcomes that operate through effects on positive outcomes. Furthermore, the general pattern of findings suggested the need to use methods capable of capturing both quantitative and qualitative change in order to increase the likelihood of identifying more complete theory informed empirically supported models of developmental intervention change processes. developmental intervention science positive youth development identity development Relational Data Analysis
12	Návrh relační databáze pro obecní knihovnu / Design of Relational Database for Municipal Library Vlk, Jan January 2020 (has links) This diploma thesis focuses on problematics associated with design of relational database. It is divided into several parts where it deals with theoretical basis, analyses of current state and the design of own solution.
13	Statistical Methods for Biological and Relational Data Anderson, Sarah G. 12 July 2013 (has links) No description available. Biostatistics gene expression T-cell receptors classification multiple testing relational data social networking
14	Learning probabilistic relational models: a novel approach. / Aprendendo modelos probabilísticos relacionais: uma nova abordagem. Mormille, Luiz Henrique Barbosa 17 August 2018 (has links) While most statistical learning methods are designed to work with data stored in a single table, many large datasets are stored in relational database systems. Probabilistic Relational Models (PRM) extend Bayesian networks by introducing relations and individuals, thus making it possible to represent information in a relational database. However, learning a PRM from relational data is a more complex task than learning a Bayesian Network from \"flat\" data. The main difficulties that arise while learning a PRM are establishing what are the legal dependency structures, searching for possible structures, and scoring them. This thesis focuses on the development of a novel approach to learn the structure of a PRM, describes a package in the R language to support the learning framework, and applies it to a real, large scale scenario of a city named Atibaia, in the state of São Paulo, Brazil. The research is based on a database combining three different tables, each representing one class in the domain of study. The first table contains 27 attributes from 110,816 citizens of Atibaia. The second table contains 9 attributes from 20,162 companies located in the city. And finally, the third table has 8 attributes from 327 census sectors (small territorial units that comprise the city of Atibaia). The proposed framework is applied to learn a PRM structure and parameters from the database. The model is used to verify if the Social Class of a person can be explained by the location where they live, their neighbors, and the companies nearby. Preliminary experiments have been conducted and a paper published in the 2017 Symposium on Knowledge Discovery, Mining and Learning (KDMiLe). The algorithm performance was further evaluated by extensive experimentation, and a broader study using Serasa Experian data was conducted. Finally, the package in the R language that supports our method was refined along with proper documentation and a tutorial. / Embora a maioria dos métodos de aprendizado estatístico tenha sido desenvolvida para se trabalhar com dados armazenados em uma única tabela, muitas bases de dados estão armazenadas em bancos de dados relacionais. Modelos Probabilísticos Relacionai (PRM) estendem Redes Bayesianas introduzindo relações e indivíduos, tornando possível a representação de informação em uma base de dados relacional. Entretanto, aprender um PRM através de dados relacionais é uma tarefa mais complexa que aprender uma Rede Bayesiana de uma única tabela. As maiores dificuldades que se impõe enquanto se aprende um PRM são estabelecer quais são as estruturas de dependência legais, procurar por possíveis estruturas, e avalia-las. Esta tese foca em desenvolver um novo método de aprendizado de estruturas de PRM, descrever um pacote na linguagem R que suporte este método e aplica-lo a um cenário real e de grande escala, a cidade de Atibaia, no estado de São Paulo, Brasil. Esta pesquisa está baseada em uma base de dados combinando três tabelas distintas, cada uma representando uma classe no domínio de estudo. A primeira tabela contém 27 atributos de 110.816 habitantes de Atibaia, e a segunda tabela contém 9 atributos de 20.162 empresas da cidade. Por fim, a terceira tabela possui 8 atributos para 327 setores censitários (pequenas unidades territoriais que formam a cidade de Atibaia). A proposta é aplicada para aprender-se a estrutura de um PRM e seus parâmetros através desta base de dados. O modelo foi utilizado para verificar se a classe social de uma pessoa pode ser explicada pelo local onde ela vive, seus vizinhos e as companhias próximas. Experimentos preliminares foram conduzidos e um artigo foi publicado no Symposium on Knowledge Discovery, Mining and Learning (KDMiLe). O desempenho do algoritmo foi reavaliada através de extensiva experimentação, e um estudo mais amplo foi conduzido com os dados da Serasa Experian. Por fim, o pacote em R que suporta o método proposto foi refinado, e documentação e tutorial apropriado foram descritos. Bayesian network Inductive logic programming Mineração de dados Modelos para processos estocásticos Multi-relational data mining Probabilistic graphical models Programação lógica
15	An Ilp-based Concept Discovery System For Multi-relational Data Mining Kavurucu, Yusuf 01 July 2009 (has links) (PDF) Multi Relational Data Mining has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. However, as patterns involve multiple relations, the search space of possible hypothesis becomes intractably complex. In order to cope with this problem, several relational knowledge discovery systems have been developed employing various search strategies, heuristics and language pattern limitations. In this thesis, Inductive Logic Programming (ILP) based concept discovery is studied and two systems based on a hybrid methodology employing ILP and APRIORI, namely Confidence-based Concept Discovery and Concept Rule Induction System, are proposed. In Confidence-based Concept Discovery and Concept Rule Induction System, the main aim is to relax the strong declarative biases and user-defined specifications. Moreover, this new method directly works on relational databases. In addition to this, the traditional definition of confidence from relational database perspective is modified to express Closed World Assumption in first-order logic. A new confidence-based pruning method based on the improved definition is applied in the APRIORI lattice. Moreover, a new hypothesis evaluation criterion is used for expressing the quality of patterns in the search space. In addition to this, in Concept Rule Induction System, the constructed rule quality is further improved by using an improved generalization metod. Finally, a set of experiments are conducted on real-world problems to evaluate the performance of the proposed method with similar systems in terms of support and confidence. QA Computer Software 76.75-76.765
16	Data Mining For Rule Discovery In Relational Databases Toprak, Serkan 01 September 2004 (has links) (PDF) Data is mostly stored in relational databases today. However, most data mining algorithms are not capable of working on data stored in relational databases directly. Instead they require a preprocessing step for transforming relational data into algorithm specified form. Moreover, several data mining algorithms provide solutions for single relations only. Therefore, valuable hidden knowledge involving multiple relations remains undiscovered. In this thesis, an implementation is developed for discovering multi-relational association rules in relational databases. The implementation is based on a framework providing a representation of patterns in relational databases, refinement methods of patterns, and primitives for obtaining necessary record counts from database to calculate measures for patterns. The framework exploits meta-data of relational databases for pruning search space of patterns. The implementation extends the framework by employing Apriori algorithm for further pruning the search space and discovering relational recursive patterns. Apriori algorithm is used for finding large itemsets of tables, which are used to refine patterns. Apriori algorithm is modified by changing support calculation method for itemsets. A method for determining recursive relations is described and a solution is provided for handling recursive patterns using aliases. Additionally, continuous attributes of tables are discretized utilizing equal-depth partitioning. The implementation is tested with gene localization prediction task of KDD Cup 2001 and results are compared to those of the winner approach.
17	A Developmental Intervention Science Outreach Research Approach to Promoting Positive Youth Development Rinaldi, Roberto L 21 March 2011 (has links) Recent intervention efforts in promoting positive identity in troubled adolescents have begun to draw on the potential for an integration of the self-construction and self-discovery perspectives in conceptualizing identity processes, as well as the integration of quantitative and qualitative data analytic strategies. This study reports an investigation of the Changing Lives Program (CLP), using an Outcome Mediation (OM) evaluation model, an integrated model for evaluating targets of intervention, while theoretically including a Self-Transformative Model of Identity Development (STM), a proposed integration of self-discovery and self-construction identity processes. This study also used a Relational Data Analysis (RDA) integration of quantitative and qualitative analysis strategies and a structural equation modeling approach (SEM), to construct and evaluate the hypothesized OM/STM model. The CLP is a community supported positive youth development intervention, targeting multi-problem youth in alternative high schools in the Miami Dade County Public Schools (M-DCPS). The 259 participants for this study were drawn from the CLP’s archival data file. The model evaluated in this study utilized three indices of core identity processes (1) personal expressiveness, (2) identity conflict resolution, and (3) informational identity style that were conceptualized as mediators of the effects of participation in the CLP on change in two qualitative outcome indices of participants’ sense of self and identity. Findings indicated the model fit the data (χ2 (10) = 3.638, p = .96; RMSEA = .00; CFI = 1.00; WRMR = .299). The pattern of findings supported the utilization of the STM in conceptualizing identity processes and provided support for the OM design. The findings also suggested the need for methods capable of detecting and rendering unique sample specific free response data to increase the likelihood of identifying emergent core developmental research concepts and constructs in studies of intervention/developmental change over time in ways not possible using fixed response methods alone. Outreach Research Relational Data Analysis Qualitative Identity Positive Youth Development Developmental Intervention Science Developmental Psychology Social and Behavioral Sciences
18	Semantic Integration across Heterogeneous Databases : Finding Data Correspondences using Agglomerative Hierarchical Clustering and Artificial Neural Networks / Semantisk integrering mellan heterogena databaser : Hitta datakopplingar med hjälp av hierarkisk klustring och artiﬁciella neuronnät Hobro, Mark January 2018 (has links) The process of data integration is an important part of the database field when it comes to database migrations and the merging of data. The research in the area has grown with the addition of machine learning approaches in the last 20 years. Due to the complexity of the research field, no go-to solutions have appeared. Instead, a wide variety of ways of enhancing database migrations have emerged. This thesis examines how well a learning-based solution performs for the semantic integration problem in database migrations. Two algorithms are implemented. One that is based on information retrieval theory, with the goal of yielding a matching result that can be used as a benchmark for measuring the performance of the machine learning algorithm. The machine learning approach is based on grouping data with agglomerative hierarchical clustering and then training a neural network to recognize patterns in the data. This allows making predictions about potential data correspondences across two databases. The results show that agglomerative hierarchical clustering performs well in the task of grouping the data into classes. The classes can in turn be used for training a neural network. The matching algorithm gives a high recall of matching tables, but improvements are needed to both receive a high recall and precision. The conclusion is that the proposed learning-based approach, using agglomerative hierarchical clustering and a neural network, works as a solid base to semi-automate the data integration problem seen in this thesis. But the solution needs to be enhanced with scenario specific algorithms and rules, to reach desired performance. / Dataintegrering är en viktig del inom området databaser när det kommer till databasmigreringar och sammanslagning av data. Forskning inom området har ökat i takt med att maskininlärning blivit ett attraktivt tillvägagångssätt under de senaste 20 åren. På grund av komplexiteten av forskningsområdet, har inga optimala lösningar hittats. Istället har flera olika tekniker framställts, som tillsammans kan förbättra databasmigreringar. Denna avhandling undersöker hur bra en lösning baserad på maskininlärning presterar för dataintegreringsproblemet vid databasmigreringar. Två algoritmer har implementerats. En är baserad på informationssökningsteori, som främst används för att ha en prestandamässig utgångspunkt för algoritmen som är baserad på maskininlärning. Den algoritmen består av ett första steg, där data grupperas med hjälp av hierarkisk klustring. Sedan tränas ett artificiellt neuronnät att hitta mönster i dessa grupperingar, för att kunna göra förutsägelser huruvida olika datainstanser har ett samband mellan två databaser. Resultatet visar att agglomerativ hierarkisk klustring presterar väl i uppgiften att klassificera den data som använts. Resultatet av matchningsalgoritmen visar på att en stor mängd av de matchande tabellerna kan hittas. Men förbättringar behöver göras för att både ge hög en hög återkallelse av matchningar och hög precision för de matchningar som hittas. Slutsatsen är att ett inlärningsbaserat tillvägagångssätt, i detta fall att använda agglomerativ hierarkisk klustring och sedan träna ett artificiellt neuronnät, fungerar bra som en basis för att till viss del automatisera ett dataintegreringsproblem likt det som presenterats i denna avhandling. För att få bättre resultat, krävs att lösningen förbättras med mer situationsspecifika algoritmer och regler. Semantic integration data integration artificial neural networks agglomerative hierarchical clustering heterogeneous databases relational data Computer Sciences Datavetenskap (datalogi)
19	Probabilistic Modeling of Multi-relational and Multivariate Discrete Data Wu, Hao 07 February 2017 (has links) Modeling and discovering knowledge from multi-relational and multivariate discrete data is a crucial task that arises in many research and application domains, e.g. text mining, intelligence analysis, epidemiology, social science, etc. In this dissertation, we study and address three problems involving the modeling of multi-relational discrete data and multivariate multi-response count data, viz. (1) discovering surprising patterns from multi-relational data, (2) constructing a generative model for multivariate categorical data, and (3) simultaneously modeling multivariate multi-response count data and estimating covariance structures between multiple responses. To discover surprising multi-relational patterns, we first study the ``where do I start?'' problem originating from intelligence analysis. By studying nine methods with origins in association analysis, graph metrics, and probabilistic modeling, we identify several classes of algorithmic strategies that can supply starting points to analysts, and thus help to discover interesting multi-relational patterns from datasets. To actually mine for interesting multi-relational patterns, we represent the multi-relational patterns as dense and well-connected chains of biclusters over multiple relations, and model the discrete data by the maximum entropy principle, such that in a statistically well-founded way we can gauge the surprisingness of a discovered bicluster chain with respect to what we already know. We design an algorithm for approximating the most informative multi-relational patterns, and provide strategies to incrementally organize discovered patterns into the background model. We illustrate how our method is adept at discovering the hidden plot in multiple synthetic and real-world intelligence analysis datasets. Our approach naturally generalizes traditional attribute-based maximum entropy models for single relations, and further supports iterative, human-in-the-loop, knowledge discovery. To build a generative model for multivariate categorical data, we apply the maximum entropy principle to propose a categorical maximum entropy model such that in a statistically well-founded way we can optimally use given prior information about the data, and are unbiased otherwise. Generally, inferring the maximum entropy model could be infeasible in practice. Here, we leverage the structure of the categorical data space to design an efficient model inference algorithm to estimate the categorical maximum entropy model, and we demonstrate how the proposed model is adept at estimating underlying data distributions. We evaluate this approach against both simulated data and US census datasets, and demonstrate its feasibility using an epidemic simulation application. Modeling data with multivariate count responses is a challenging problem due to the discrete nature of the responses. Existing methods for univariate count responses cannot be easily extended to the multivariate case since the dependency among multiple responses needs to be properly accounted for. To model multivariate data with multiple count responses, we propose a novel multivariate Poisson log-normal model (MVPLN). By simultaneously estimating the regression coefficients and inverse covariance matrix over the latent variables with an efficient Monte Carlo EM algorithm, the proposed model takes advantages of association among multiple count responses to improve the model prediction accuracy. Simulation studies and applications to real world data are conducted to systematically evaluate the performance of the proposed method in comparison with conventional methods. / Ph. D. / In this decade of big data, massive data of various types are generated every day from different research areas and industry sectors. Among all these types of data, text data, i.e. text documents, are important to many research and real world applications. One challenge faced when analyzing massive text data is which documents we should investigate first to initialize the analysis and how to identify stories and plots, if any, that hide inside the massive text documents. For example, in intelligence analysis, when analyzing intelligence documents, some common questions that analysts ask are ‘How is a suspect connected to the passenger manifest on this flight?’ and ‘How do distributed terrorist cells interface with each other?’. This is a crucial task so called storytelling. In the first half of this dissertation, we will study this problem and design mathematical models and computer algorithms to automatically identify useful information from text data to help analysts to discover hidden stories and plots from massive text documents. We also incorporate visual analytics techniques and design a visualization system to support human-in-the-loop exploratory data analysis so that analysts could interact with the algorithms and models iteratively to investigate given datasets. In the second half of this dissertation, we study two problems that arise from the domain of public health. When epidemic of certain disease happens, e.g. flu seasons, public health officials need to make certain policies in advance to prevent or alleviate the epidemic. A data-driven approach would be to make such public health policies using simulation results and predictions based on historical data. One problem usually faced in epidemic simulation is that researchers would like to run simulations with real-world data so that the simulation results can be close to real-world scenarios but at the same time protect the private information of individuals. To solve this problem, we design and implement a mathematical model that could generate realistic sythetic population using U.S. Census Survey to help conduct the epidemic simulation. Using flus as an example, we also propose a mathematical model to study associations between different types of flus with the information collected from social media, like Twitter. We believe that identifying such associations between different types of flus will help officials to make appropriate public health policies. Multivariate Discrete Data Multi-relational Data Maximum Entropy Modeling Subjective Interestingness Latent Variable Model Multivariate Poisson Regression Covariance Estimation.
20	Development Of A Gis-based Monitoring And Management System For Underground Mining Safety Salap, Seda 01 September 2008 (has links) (PDF) Mine safety is of paramount concern to the mining industry. The generation of a Geographic Information Systems (GIS) which can administrate relevant spatial data and metadata of underground mining safety efficiently is a very vital issue in this sense. In an effort to achieve a balance of safety and productivity, GIS can contribute to the creation of a safe working environment in underground (U/G) mining. Such a system should serve to a continuous risk analysis and be designed for applications in case of emergency. Concept for safety should require three fundamental components, namely (i) constructive safety / (ii) surveillance and maintenance / and (iii) emergency. The implementation has to be carried out in a Web-Based Geographic Information System. The process comprises first the safety concept as the application domain model and then a conceptual model was generated in terms of Entity- Relationship Diagrams. After the implementation of the logical model a user interface was developed and GIS was tested. Finally, one should deal with the question if it is possible to extend the method of resolution used to a national GIS infrastructure. GE Environmental Sciences 1-140

Search results