• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 77
  • 74
  • 52
  • 10
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 269
  • 269
  • 175
  • 165
  • 95
  • 56
  • 55
  • 51
  • 50
  • 47
  • 44
  • 43
  • 40
  • 40
  • 36
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
241

Rewiring Police Officer Training Networks to Reduce Forecasted Use of Force

Ritika Pandey (9147281) 30 August 2023 (has links)
<p><br></p> <p>Police use of force has become a topic of significant concern, particularly given the disparate impact on communities of color. Research has shown that police officer involved shootings, misconduct and excessive use of force complaints exhibit network effects, where officers are at greater risk of being involved in these incidents when they socialize with officers who have a history of use of force and misconduct. Given that use of force and misconduct behavior appear to be transmissible across police networks, we are attempting to address if police networks can be altered to reduce use of force and misconduct events in a limited scope.</p> <p><br></p> <p>In this work, we analyze a novel dataset from the Indianapolis Metropolitan Police Department on officer field training, subsequent use of force, and the role of network effects from field training officers. We construct a network survival model for analyzing time-to-event of use of force incidents involving new police trainees. The model includes network effects of the diffusion of risk from field training officers (FTOs) to trainees. We then introduce a network rewiring algorithm to maximize the expected time to use of force events upon completion of field training. We study several versions of the algorithm, including constraints that encourage demographic diversity of FTOs. The results show that FTO use of force history is the best predictor of trainee's time to use of force in the survival model and rewiring the network can increase the expected time (in days) of a recruit's first use of force incident by 8%. </p> <p>We then discuss the potential benefits and challenges associated with implementing such an algorithm in practice.</p> <p><br></p>
242

A new model for worm detection and response. Development and evaluation of a new model based on knowledge discovery and data mining techniques to detect and respond to worm infection by integrating incident response, security metrics and apoptosis.

Mohd Saudi, Madihah January 2011 (has links)
Worms have been improved and a range of sophisticated techniques have been integrated, which make the detection and response processes much harder and longer than in the past. Therefore, in this thesis, a STAKCERT (Starter Kit for Computer Emergency Response Team) model is built to detect worms attack in order to respond to worms more efficiently. The novelty and the strengths of the STAKCERT model lies in the method implemented which consists of STAKCERT KDD processes and the development of STAKCERT worm classification, STAKCERT relational model and STAKCERT worm apoptosis algorithm. The new concept introduced in this model which is named apoptosis, is borrowed from the human immunology system has been mapped in terms of a security perspective. Furthermore, the encouraging results achieved by this research are validated by applying the security metrics for assigning the weight and severity values to trigger the apoptosis. In order to optimise the performance result, the standard operating procedures (SOP) for worm incident response which involve static and dynamic analyses, the knowledge discovery techniques (KDD) in modeling the STAKCERT model and the data mining algorithms were used. This STAKCERT model has produced encouraging results and outperformed comparative existing work for worm detection. It produces an overall accuracy rate of 98.75% with 0.2% for false positive rate and 1.45% is false negative rate. Worm response has resulted in an accuracy rate of 98.08% which later can be used by other researchers as a comparison with their works in future. / Ministry of Higher Education, Malaysia and Universiti Sains Islam Malaysia (USIM)
243

Learning From Data Across Domains: Enhancing Human and Machine Understanding of Data From the Wild

Sean Michael Kulinski (17593182) 13 December 2023 (has links)
<p dir="ltr">Data is collected everywhere in our world; however, it often is noisy and incomplete. Different sources of data may have different characteristics, quality levels, or come from dynamic and diverse environments. This poses challenges for both humans who want to gain insights from data and machines which are learning patterns from data. How can we leverage the diversity of data across domains to enhance our understanding and decision-making? In this thesis, we address this question by proposing novel methods and applications that use multiple domains as more holistic sources of information for both human and machine learning tasks. For example, to help human operators understand environmental dynamics, we show the detection and localization of distribution shifts to problematic features, as well as how interpretable distributional mappings can be used to explain the differences between shifted distributions. For robustifying machine learning, we propose a causal-inspired method to find latent factors that are robust to environmental changes and can be used for counterfactual generation or domain-independent training; we propose a domain generalization framework that allows for fast and scalable models that are robust to distribution shift; and we introduce a new dataset based on human matches in StarCraft II that exhibits complex and shifting multi-agent behaviors. We showcase our methods across various domains such as healthcare, natural language processing (NLP), computer vision (CV), etc. to demonstrate that learning from data across domains can lead to more faithful representations of data and its generating environments for both humans and machines.</p>
244

TEMPORAL EVENT MODELING OF SOCIAL HARM WITH HIGH DIMENSIONAL AND LATENT COVARIATES

Xueying Liu (13118850) 09 September 2022 (has links)
<p>    </p> <p>The counting process is the fundamental of many real-world problems with event data. Poisson process, used as the background intensity of Hawkes process, is the most commonly used point process. The Hawkes process, a self-exciting point process fits to temporal event data, spatial-temporal event data, and event data with covariates. We study the Hawkes process that fits to heterogeneous drug overdose data via a novel semi-parametric approach. The counting process is also related to survival data based on the fact that they both study the occurrences of events over time. We fit a Cox model to temporal event data with a large corpus that is processed into high dimensional covariates. We study the significant features that influence the intensity of events. </p>
245

SPATIAL-SPECTRAL ANALYSIS FOR THE IDENTIFICATION OF CROP NITROGEN DEFICIENCY BASED ON HIGH-RESOLUTION HYPERSPECTRAL LEAF IMAGES

Zhihang Song (8764215) 26 April 2024 (has links)
<p dir="ltr">Among the major row crops in the United States, corn and soybeans stand out due to their high nutritional value and economic importance. Achieving optimal yields is restrained by the challenge of fertilizer management. Many fields experience yield losses due to insufficient mineral nutrients like nitrogen (N), while excessive fertilization raises costs and environmental risks. The critical issue is the accurate determination of fertilizer quantity and timing, underscoring the need for precise, early-stage diagnostics. Emerging high-throughput plant phenotyping techniques, notably hyperspectral imaging (HSI), have been increasingly utilized to identify plant’s responses to abiotic or biotic stresses. Varieties of HSI systems have been developed, such as airborne imaging systems and indoor imaging stations. However, most of the current HSI systems’ signal quality is often compromised by various environmental factors. To address the issue, a handheld hyperspectral imager known as LeafSpec was recently developed at Purdue University and represents a breakthrough with its ability to scan corn or soybean leaves at exceptional spatial and spectral resolutions, improving plant phenotyping quality at reduced costs. Most of the current HSI data processing methods focus on spectral features but rarely consider spatially distributed information. Thus, the objective of this work was to develop a methodology utilizing spatial-spectral features for accurate and reliable diagnostics of crop N nutrient stress. The key innovations include the designing of spatial-spectral features based on the leaf venation structures and the feature mining method for predicting the plant nitrogen condition. First, a novel analysis method called the Natural Leaf Coordinate System (NLCS) was developed to reallocate leaf pixels and innovate the nutrient stress analysis using pixels’ relative locations to the venation structure. A new nitrogen prediction index for soybean plants called NLCS-N was developed, outperforming the conventional averaged vegetation index (Avg. NDVI) in distinguishing healthy plants from nitrogen-stressed plants with higher t-test p-values and predicting the plant nitrogen concentration (PNC) with higher R-squared values. In one of the test cases, the p-values and R-squared values were improved, respectively, from 2.1×10<sup>-3</sup> to 6.92×10<sup>-12</sup> and from 0.314 to 0.565 by Avg. NDVI and NLCS-N. Second, a corn leaf venation segmentation algorithm was developed to separate the venation structure from a corn leaf LeafSpec image, which was further used to generate 3930 spatial-spectral (S-S) features. While the S-S features could be the input variable to build a PNC prediction model, a feature selection mechanism was developed to improve the models’ accuracy in terms of reduced cross-validation errors. In one of the test cases, the cross-validation root mean squared errors were reduced compared with the leaf mean spectra from 0.273 to 0.127 using the selected features. Third, several novel spatial-spectral indexes for corn leaves were developed based on the color distributions at the venation level. The top-performing indexes were selected through a ranking system based on Cohen’s d values and the R-squared values, resulting in a best-performing S-S N prediction index with 0.861 R-squared values for predicting the corn PNC in a field assay. The discussion sections provided insights into how a robust PNC prediction index could be developed and related to plant science. The methodologies outlined offer a framework for broader applications in spatial-spectral analysis using leaf-level hyperspectral imagery, serving as a guide for scientists and researchers in customizing their future studies within this field.</p>
246

Contributions to Engineering Big Data Transformation, Visualisation and Analytics. Adapted Knowledge Discovery Techniques for Multiple Inconsistent Heterogeneous Data in the Domain of Engine Testing

Jenkins, Natasha N. January 2022 (has links)
In the automotive sector, engine testing generates vast data volumes that are mainly beneficial to requesting engineers. However, these tests are often not revisited for further analysis due to inconsistent data quality and a lack of structured assessment methods. Moreover, the absence of a tailored knowledge discovery process hinders effective preprocessing, transformation, analytics, and visualization of data, restricting the potential for historical data insights. Another challenge arises from the heterogeneous nature of test structures, resulting in varying measurements, data types, and contextual requirements across different engine test datasets. This thesis aims to overcome these obstacles by introducing a specialized knowledge discovery approach for the distinctive Multiple Inconsistent Heterogeneous Data (MIHData) format characteristic of engine testing. The proposed methods include adapting data quality assessment and reporting, classifying engine types through compositional features, employing modified dendrogram similarity measures for classification, performing customized feature extraction, transformation, and structuring, generating and manipulating synthetic images to enhance data visualization, and applying adapted list-based indexing for multivariate engine test summary data searches. The thesis demonstrates how these techniques enable exploratory analysis, visualization, and classification, presenting a practical framework to extract meaningful insights from historical data within the engineering domain. The ultimate objective is to facilitate the reuse of past data resources, contributing to informed decision-making processes and enhancing comprehension within the automotive industry. Through its focus on data quality, heterogeneity, and knowledge discovery, this research establishes a foundation for optimized utilization of historical Engine Test Data (ETD) for improved insights. / Soroptimist International Bradford
247

從電子化政府建立政府統計知識挖掘系統模型架構之研究~以內政統計為例 / Research into a System Framework for Knowledge Discovery in the Context of Statistics Tasks within e-Government – on Examples of Interior Statistic

江欣容, Chiang, Hsin Jung Unknown Date (has links)
各國政府為提高國際競爭優勢,紛紛積極推動「電子化政府」。我國電子化政府建設自八十六年起開始推動,迄今已經行政院擴大為e-Taiwan計畫。電子化政府推動之業務電腦化,帶動政府業務資訊系統的快速發展,其彙集而成之大型資料庫,為政府統計工作帶來莫大的發展契機。 本研究從電子化政府的過程、內政業務行政程序、知識挖掘及採勘方法,提出參考資料模型,可能的統計軟體工具以及電子化政府中知識發現的實驗架構。再者,本研究藉臺閩地區外籍與大陸配偶結婚登記資料集,運用各種群集分析如K-means、ANN、TwoStep等,並利用我國人口數時間序列採用多模式方法進行人口預測,並將前述分析結果回饋資料庫,最後,作者實現一個知識發現系統雛型,其中包含了前端資料庫、資料集、知識庫以及EIS使用介面。 本研究成果總結如下:(1)資料挖掘工作產出之知識,除真實呈現社會現象外,亦作為政府政策之指南;(2)在本研究發展之系統中,新興資料挖掘技術及傳統資料分析方法,二者相輔相成;(3)某些資料挖掘技術適合相符的資料型態,例如文中人口預測資料較適合指數平滑法勝於ANN,亦即,我們可以籍由多模式分析比較其結果,來達到更佳的效果;(4)藉由知識庫模型的建立達成知識創造、共享與管理的目標;(5)資料挖掘工作可以回饋改善資訊系統或業務缺失。 / In order to enhance international competitive advantages, most of the government authorities over the world are engaging in realizing their e-Government platforms. The ROC Government began to develope its e-Government- Infrastructure since 1997, and up-to-date is expanding the e-Taiwan Project as a whole by Executive Yuan. The computerization of administration processes within various government agencies push forward fast development of administration information systems via handling administrative works and lead to utmost opportunities for the government statistics by means of very large databases. Starting from a survey on developements of e-Government, administrative processes for interior affairs, and knowledge mining as well as discovery techniques, this study brings out reference data models, potential statistical softwaretools, and an experimental framework as a whole for knowledge discovery in the context of e-Government. In the next step, this study experiments with applying clustering techniques such as K-means, ANN, and Twostep on datamart regarding marriage of foreigners ( including citizens from Mainland China ) in Taiwan, and with employeeing multi-modes approach on population forecasting. The results of aforementioned analysises are feed into backend database. At last, this author carries out a prototype of knowledge discovery system which includes front-end data base, data marts, knowledge base and interfaces to EIS. The results of the research can be summarized as following: 1.Knowledge derived by means of data mining is capable to represent social events / affairs as well as to serve as a kind of guideline for developing government ploicies. 2. The modern data-ming techniques and classical data-analysis approaches complement with each other in the system developed in this research. 3. Certain mining technique is suitable of corresponding data pattern, for example, expotential smoothing is more suitable for our population data than ANN, which means that we may often achieve better result by multi-mode analysis and comprison with the outputs of different modes. 4. Knowledge creation, sharing, and management can be achieved by means of the knowledge discovery processes on the framework developed in this research. 5. We can figure out errorful raw data in the mining output and feedback to the data source to improve its quality.
248

Metodika vývoje a nasazování Business Intelligence v malých a středních podnicích / Methodology of development and deployment of Business Intelligence solutions in Small and Medium Sized Enterprises

Rydzi, Daniel January 2005 (has links)
Dissertation thesis deals with development and implementation of Business Intelligence (BI) solutions for Small and Medium Sized Enterprises (SME) in the Czech Republic. This thesis represents climax of author's up to now effort that has been put into completing a methodological model for development of this kind of applications for SMEs using self-owned skills and minimum of external resources and costs. This thesis can be divided into five major parts. First part that describes used technologies is divided into two chapters. First chapter describes contemporary state of Business Intelligence concept and it also contains original taxonomy of Business Intelligence solutions. Second chapter describes two Knowledge Discovery in Databases (KDD) techniques that were used for building those BI solutions that are introduced in case studies. Second part describes the area of Czech SMEs, which is an environment where the thesis was written and which it is meant to contribute to. This environment is represented by one chapter that defines the differences of SMEs against large corporations. Furthermore, there are author's reasons why he is personally focusing on this area explained. Third major part introduces the results of survey that was conducted among Czech SMEs with support of Department of Information Technologies of Faculty of Informatics and Statistics of University of Economics in Prague. This survey had three objectives. First one was to map the readiness of Czech SMEs for BI solutions development and deployment. Second was to determine major problems and consequent decisions of Czech SMEs that could be supported by BI solutions and the third objective was to determine top factors preventing SMEs from developing and deploying BI solutions. Fourth part of the thesis is also the core one. In two chapters there is the original Methodology for development and deployment of BI solutions by SMEs described as well as other methodologies that were studied. Original methodology is partly based on famous CRISP-DM methodology. Finally, last part describes particular company that has become a testing ground for author's theories and that supports his research. In further chapters it introduces case-studies of development and deployment of those BI solutions in this company, that were build using contemporary BI and KDD techniques with respect to original methodology. In that sense, these case-studies verified theoretical methodology in real use.
249

Extraction de connaissances pour la modélisation tri-dimensionnelle de l'interactome structural / Knowledge-based approaches for modelling the 3D structural interactome

Ghoorah, Anisah W. 22 November 2012 (has links)
L'étude structurale de l'interactome cellulaire peut conduire à des découvertes intéressantes sur les bases moléculaires de certaines pathologies. La modélisation par homologie et l'amarrage de protéines ("protein docking") sont deux approches informatiques pour modéliser la structure tri-dimensionnelle (3D) d'une interaction protéine-protéine (PPI). Des études précédentes ont montré que ces deux approches donnent de meilleurs résultats quand des données expérimentales sur les PPIs sont prises en compte. Cependant, les données PPI ne sont souvent pas disponibles sous une forme facilement accessible, et donc ne peuvent pas être re-utilisées par les algorithmes de prédiction. Cette thèse présente une approche systématique fondée sur l'extraction de connaissances pour représenter et manipuler les données PPI disponibles afin de faciliter l'analyse structurale de l'interactome et d'améliorer les algorithmes de prédiction par la prise en compte des données PPI. Les contributions majeures de cette thèse sont de : (1) décrire la conception et la mise en oeuvre d'une base de données intégrée KBDOCK qui regroupe toutes les interactions structurales domaine-domaine (DDI); (2) présenter une nouvelle méthode de classification des DDIs par rapport à leur site de liaison dans l'espace 3D et introduit la notion de site de liaison de famille de domaines protéiques ("domain family binding sites" ou DFBS); (3) proposer une classification structurale (inspirée du système CATH) des DFBSs et présenter une étude étendue sur les régularités d'appariement entre DFBSs en terme de structure secondaire; (4) introduire une approche systématique basée sur le raisonnement à partir de cas pour modéliser les structures 3D des complexes protéiques à partir des DDIs connus. Une interface web (http://kbdock.loria.fr) a été développée pour rendre accessible le système KBDOCK / Understanding how the protein interactome works at a structural level could provide useful insights into the mechanisms of diseases. Comparative homology modelling and ab initio protein docking are two computational methods for modelling the three-dimensional (3D) structures of protein-protein interactions (PPIs). Previous studies have shown that both methods give significantly better predictions when they incorporate experimental PPI information. However, in general, PPI information is often not available in an easily accessible way, and cannot be re-used by 3D PPI modelling algorithms. Hence, there is currently a need to develop a reliable framework to facilitate the reuse of PPI data. This thesis presents a systematic knowledge-based approach for representing, describing and manipulating 3D interactions to study PPIs on a large scale and to facilitate knowledge-based modelling of protein-protein complexes. The main contributions of this thesis are: (1) it describes an integrated database of non-redundant 3D hetero domain interactions; (2) it presents a novel method of describing and clustering DDIs according to the spatial orientations of the binding partners, thus introducing the notion of "domain family-level binding sites" (DFBS); (3) it proposes a structural classification of DFBSs similar to the CATH classification of protein folds, and it presents a study of secondary structure propensities of DFBSs and interaction preferences; (4) it introduces a systematic case-base reasoning approach to model on a large scale the 3D structures of protein complexes from existing structural DDIs. All these contributions have been made publicly available through a web server (http://kbdock.loria.fr)
250

Apport des images satellites à très haute résolution spatiale couplées à des données géographiques multi-sources pour l’analyse des espaces urbains / Contribution of very high spatial resolution satellite images combined with multi-sources geographic data to analyse urban spaces

Rougier, Simon 28 September 2016 (has links)
Les villes sont confrontées à de nombreuses problématiques environnementales. Leurs gestionnaires ont besoin d'outils et d'une bonne connaissance de leur territoire. Un objectif est de mieux comprendre comment s'articulent les trames grise et verte pour les analyser et les représenter. Il s'agit aussi de proposer une méthodologie pour cartographier la structure urbaine à l'échelle des tissus en tenant compte de ces trames. Les bases de données existantes ne cartographient pas la végétation de manière exhaustive. Ainsi la première étape est d'extraire la végétation arborée et herbacée à partir d'images satellites Pléiades par une analyse orientée-objet et une classification par apprentissage actif. Sur la base de ces classifications et de données multi-sources, la cartographie des tissus se base sur une démarche d'extraction de connaissances à partir d'indicateurs issus de l'urbanisme et de l'écologie du paysage. Cette méthodologie est construite sur Strasbourg puis appliquée à Rennes. / Climate change presents cities with significant environmental challenges. Urban planners need decision-making tools and a better knowledge of their territory. One objective is to better understand the link between the grey and the green infrastructures in order to analyse and represent them. The second objective is to propose a methodology to map the urban structure at urban fabric scale taking into account the grey and green infrastructures. In current databases, vegetation is not mapped in an exhaustive way. Therefore the first step is to extract tree and grass vegetation using Pléiades satellite images using an object-based image analysis and an active learning classification. Based on those classifications and multi-sources data, an approach based on knowledge discovery in databases is proposed. It is focused on set of indicators mostly coming from urbanism and landscape ecology. The methodology is built on Strasbourg and applied on Rennes to validate and check its reproducibility.

Page generated in 0.0862 seconds