241 |
Vytvoření nových predikčních modulů v systému pro dolování z dat na platformě NetBeans / Creation of New Prediction Units in Data Mining System on NetBeans PlatformHavlíček, David January 2009 (has links)
The issue of this master's thesis is a creation of new prediction unit for existing system of knowledge discovery in database. The first part of project deal with general problems of knowledge discovery in database and predictive analysis. The second part of the project deal with system developed on FIT, for which is module implemented, used technologies, concept and implementation of mining module for this system. The solution is implemented in Java language and is a built on the NetBeans platform.
|
242 |
Implementace části standardu SQL/MM DM pro asociační pravidla / Implementation of SQL/MM DM for Association RulesŠkodík, Zdeněk Unknown Date (has links)
This project is concerned with problems of knowledge discovery in databases, in the concrete then is concerned with an association rules, which are part of the system of data mining. By that way we try to get knowledge which we can´t find directly in the database and which can be useful. There is the description of SQL/MM DM, especially then all user-defined types given by standard for association rules as well as common types which create framework for data mining. Before the description of implementation these types, there is mentioned the instruments which are used for that - programming language PL/SQL and Oracle Data Mining support. The accuracy of implementation is verified by a sample application. In the conclusion, achieved results are evaluated and possible continuation of this work is mentioned.
|
243 |
Rewiring Police Officer Training Networks to Reduce Forecasted Use of ForceRitika Pandey (9147281) 30 August 2023 (has links)
<p><br></p>
<p>Police use of force has become a topic of significant concern, particularly given the disparate impact on communities of color. Research has shown that police officer involved shootings, misconduct and excessive use of force complaints exhibit network effects, where officers are at greater risk of being involved in these incidents when they socialize with officers who have a history of use of force and misconduct. Given that use of force and misconduct behavior appear to be transmissible across police networks, we are attempting to address if police networks can be altered to reduce use of force and misconduct events in a limited scope.</p>
<p><br></p>
<p>In this work, we analyze a novel dataset from the Indianapolis Metropolitan Police Department on officer field training, subsequent use of force, and the role of network effects from field training officers. We construct a network survival model for analyzing time-to-event of use of force incidents involving new police trainees. The model includes network effects of the diffusion of risk from field training officers (FTOs) to trainees. We then introduce a network rewiring algorithm to maximize the expected time to use of force events upon completion of field training. We study several versions of the algorithm, including constraints that encourage demographic diversity of FTOs. The results show that FTO use of force history is the best predictor of trainee's time to use of force in the survival model and rewiring the network can increase the expected time (in days) of a recruit's first use of force incident by 8%. </p>
<p>We then discuss the potential benefits and challenges associated with implementing such an algorithm in practice.</p>
<p><br></p>
|
244 |
A new model for worm detection and response. Development and evaluation of a new model based on knowledge discovery and data mining techniques to detect and respond to worm infection by integrating incident response, security metrics and apoptosis.Mohd Saudi, Madihah January 2011 (has links)
Worms have been improved and a range of sophisticated techniques have been
integrated, which make the detection and response processes much harder and
longer than in the past. Therefore, in this thesis, a STAKCERT (Starter Kit for
Computer Emergency Response Team) model is built to detect worms attack in
order to respond to worms more efficiently.
The novelty and the strengths of the STAKCERT model lies in the method
implemented which consists of STAKCERT KDD processes and the
development of STAKCERT worm classification, STAKCERT relational model
and STAKCERT worm apoptosis algorithm. The new concept introduced in this
model which is named apoptosis, is borrowed from the human immunology
system has been mapped in terms of a security perspective. Furthermore, the
encouraging results achieved by this research are validated by applying the
security metrics for assigning the weight and severity values to trigger the
apoptosis. In order to optimise the performance result, the standard operating
procedures (SOP) for worm incident response which involve static and dynamic
analyses, the knowledge discovery techniques (KDD) in modeling the
STAKCERT model and the data mining algorithms were used.
This STAKCERT model has produced encouraging results and outperformed
comparative existing work for worm detection. It produces an overall accuracy
rate of 98.75% with 0.2% for false positive rate and 1.45% is false negative rate.
Worm response has resulted in an accuracy rate of 98.08% which later can be
used by other researchers as a comparison with their works in future. / Ministry of Higher Education, Malaysia
and Universiti Sains Islam Malaysia (USIM)
|
245 |
Learning From Data Across Domains: Enhancing Human and Machine Understanding of Data From the WildSean Michael Kulinski (17593182) 13 December 2023 (has links)
<p dir="ltr">Data is collected everywhere in our world; however, it often is noisy and incomplete. Different sources of data may have different characteristics, quality levels, or come from dynamic and diverse environments. This poses challenges for both humans who want to gain insights from data and machines which are learning patterns from data. How can we leverage the diversity of data across domains to enhance our understanding and decision-making? In this thesis, we address this question by proposing novel methods and applications that use multiple domains as more holistic sources of information for both human and machine learning tasks. For example, to help human operators understand environmental dynamics, we show the detection and localization of distribution shifts to problematic features, as well as how interpretable distributional mappings can be used to explain the differences between shifted distributions. For robustifying machine learning, we propose a causal-inspired method to find latent factors that are robust to environmental changes and can be used for counterfactual generation or domain-independent training; we propose a domain generalization framework that allows for fast and scalable models that are robust to distribution shift; and we introduce a new dataset based on human matches in StarCraft II that exhibits complex and shifting multi-agent behaviors. We showcase our methods across various domains such as healthcare, natural language processing (NLP), computer vision (CV), etc. to demonstrate that learning from data across domains can lead to more faithful representations of data and its generating environments for both humans and machines.</p>
|
246 |
TEMPORAL EVENT MODELING OF SOCIAL HARM WITH HIGH DIMENSIONAL AND LATENT COVARIATESXueying Liu (13118850) 09 September 2022 (has links)
<p> </p>
<p>The counting process is the fundamental of many real-world problems with event data. Poisson process, used as the background intensity of Hawkes process, is the most commonly used point process. The Hawkes process, a self-exciting point process fits to temporal event data, spatial-temporal event data, and event data with covariates. We study the Hawkes process that fits to heterogeneous drug overdose data via a novel semi-parametric approach. The counting process is also related to survival data based on the fact that they both study the occurrences of events over time. We fit a Cox model to temporal event data with a large corpus that is processed into high dimensional covariates. We study the significant features that influence the intensity of events. </p>
|
247 |
SPATIAL-SPECTRAL ANALYSIS FOR THE IDENTIFICATION OF CROP NITROGEN DEFICIENCY BASED ON HIGH-RESOLUTION HYPERSPECTRAL LEAF IMAGESZhihang Song (8764215) 26 April 2024 (has links)
<p dir="ltr">Among the major row crops in the United States, corn and soybeans stand out due to their high nutritional value and economic importance. Achieving optimal yields is restrained by the challenge of fertilizer management. Many fields experience yield losses due to insufficient mineral nutrients like nitrogen (N), while excessive fertilization raises costs and environmental risks. The critical issue is the accurate determination of fertilizer quantity and timing, underscoring the need for precise, early-stage diagnostics. Emerging high-throughput plant phenotyping techniques, notably hyperspectral imaging (HSI), have been increasingly utilized to identify plant’s responses to abiotic or biotic stresses. Varieties of HSI systems have been developed, such as airborne imaging systems and indoor imaging stations. However, most of the current HSI systems’ signal quality is often compromised by various environmental factors. To address the issue, a handheld hyperspectral imager known as LeafSpec was recently developed at Purdue University and represents a breakthrough with its ability to scan corn or soybean leaves at exceptional spatial and spectral resolutions, improving plant phenotyping quality at reduced costs. Most of the current HSI data processing methods focus on spectral features but rarely consider spatially distributed information. Thus, the objective of this work was to develop a methodology utilizing spatial-spectral features for accurate and reliable diagnostics of crop N nutrient stress. The key innovations include the designing of spatial-spectral features based on the leaf venation structures and the feature mining method for predicting the plant nitrogen condition. First, a novel analysis method called the Natural Leaf Coordinate System (NLCS) was developed to reallocate leaf pixels and innovate the nutrient stress analysis using pixels’ relative locations to the venation structure. A new nitrogen prediction index for soybean plants called NLCS-N was developed, outperforming the conventional averaged vegetation index (Avg. NDVI) in distinguishing healthy plants from nitrogen-stressed plants with higher t-test p-values and predicting the plant nitrogen concentration (PNC) with higher R-squared values. In one of the test cases, the p-values and R-squared values were improved, respectively, from 2.1×10<sup>-3</sup> to 6.92×10<sup>-12</sup> and from 0.314 to 0.565 by Avg. NDVI and NLCS-N. Second, a corn leaf venation segmentation algorithm was developed to separate the venation structure from a corn leaf LeafSpec image, which was further used to generate 3930 spatial-spectral (S-S) features. While the S-S features could be the input variable to build a PNC prediction model, a feature selection mechanism was developed to improve the models’ accuracy in terms of reduced cross-validation errors. In one of the test cases, the cross-validation root mean squared errors were reduced compared with the leaf mean spectra from 0.273 to 0.127 using the selected features. Third, several novel spatial-spectral indexes for corn leaves were developed based on the color distributions at the venation level. The top-performing indexes were selected through a ranking system based on Cohen’s d values and the R-squared values, resulting in a best-performing S-S N prediction index with 0.861 R-squared values for predicting the corn PNC in a field assay. The discussion sections provided insights into how a robust PNC prediction index could be developed and related to plant science. The methodologies outlined offer a framework for broader applications in spatial-spectral analysis using leaf-level hyperspectral imagery, serving as a guide for scientists and researchers in customizing their future studies within this field.</p>
|
248 |
Contributions to Engineering Big Data Transformation, Visualisation and Analytics. Adapted Knowledge Discovery Techniques for Multiple Inconsistent Heterogeneous Data in the Domain of Engine TestingJenkins, Natasha N. January 2022 (has links)
In the automotive sector, engine testing generates vast data volumes that
are mainly beneficial to requesting engineers. However, these tests are often
not revisited for further analysis due to inconsistent data quality and
a lack of structured assessment methods. Moreover, the absence of a tailored
knowledge discovery process hinders effective preprocessing, transformation,
analytics, and visualization of data, restricting the potential for
historical data insights. Another challenge arises from the heterogeneous
nature of test structures, resulting in varying measurements, data types,
and contextual requirements across different engine test datasets.
This thesis aims to overcome these obstacles by introducing a specialized
knowledge discovery approach for the distinctive Multiple Inconsistent
Heterogeneous Data (MIHData) format characteristic of engine testing.
The proposed methods include adapting data quality assessment and reporting,
classifying engine types through compositional features, employing modified dendrogram similarity measures for classification, performing customized feature extraction, transformation, and structuring, generating and manipulating synthetic images to enhance data visualization, and
applying adapted list-based indexing for multivariate engine test summary
data searches.
The thesis demonstrates how these techniques enable exploratory analysis,
visualization, and classification, presenting a practical framework to
extract meaningful insights from historical data within the engineering
domain. The ultimate objective is to facilitate the reuse of past data resources,
contributing to informed decision-making processes and enhancing
comprehension within the automotive industry. Through its focus on
data quality, heterogeneity, and knowledge discovery, this research establishes
a foundation for optimized utilization of historical Engine Test Data
(ETD) for improved insights. / Soroptimist International Bradford
|
249 |
從電子化政府建立政府統計知識挖掘系統模型架構之研究~以內政統計為例 / Research into a System Framework for Knowledge Discovery in the Context of Statistics Tasks within e-Government – on Examples of Interior Statistic江欣容, Chiang, Hsin Jung Unknown Date (has links)
各國政府為提高國際競爭優勢,紛紛積極推動「電子化政府」。我國電子化政府建設自八十六年起開始推動,迄今已經行政院擴大為e-Taiwan計畫。電子化政府推動之業務電腦化,帶動政府業務資訊系統的快速發展,其彙集而成之大型資料庫,為政府統計工作帶來莫大的發展契機。
本研究從電子化政府的過程、內政業務行政程序、知識挖掘及採勘方法,提出參考資料模型,可能的統計軟體工具以及電子化政府中知識發現的實驗架構。再者,本研究藉臺閩地區外籍與大陸配偶結婚登記資料集,運用各種群集分析如K-means、ANN、TwoStep等,並利用我國人口數時間序列採用多模式方法進行人口預測,並將前述分析結果回饋資料庫,最後,作者實現一個知識發現系統雛型,其中包含了前端資料庫、資料集、知識庫以及EIS使用介面。
本研究成果總結如下:(1)資料挖掘工作產出之知識,除真實呈現社會現象外,亦作為政府政策之指南;(2)在本研究發展之系統中,新興資料挖掘技術及傳統資料分析方法,二者相輔相成;(3)某些資料挖掘技術適合相符的資料型態,例如文中人口預測資料較適合指數平滑法勝於ANN,亦即,我們可以籍由多模式分析比較其結果,來達到更佳的效果;(4)藉由知識庫模型的建立達成知識創造、共享與管理的目標;(5)資料挖掘工作可以回饋改善資訊系統或業務缺失。 / In order to enhance international competitive advantages, most of the government authorities over the world are engaging in realizing their e-Government platforms. The ROC Government began to develope its e-Government- Infrastructure since 1997, and up-to-date is expanding the e-Taiwan Project as a whole by Executive Yuan. The computerization of administration processes within various government agencies push forward fast development of administration information systems via handling administrative works and lead to utmost opportunities for the government statistics by means of very large databases.
Starting from a survey on developements of e-Government, administrative processes for interior affairs, and knowledge mining as well as discovery techniques, this study brings out reference data models, potential statistical softwaretools, and an experimental framework as a whole for knowledge discovery in the context of e-Government. In the next step, this study experiments with applying clustering techniques such as K-means, ANN, and Twostep on datamart regarding marriage of foreigners ( including citizens from Mainland China ) in Taiwan, and with employeeing multi-modes approach on population forecasting. The results of aforementioned analysises are feed into backend database. At last, this author carries out a prototype of knowledge discovery system which includes front-end data base, data marts, knowledge base and interfaces to EIS.
The results of the research can be summarized as following: 1.Knowledge derived by means of data mining is capable to represent social events / affairs as well as to serve as a kind of guideline for developing government ploicies. 2. The modern data-ming techniques and classical data-analysis approaches complement with each other in the system developed in this research. 3. Certain mining technique is suitable of corresponding data pattern, for example, expotential smoothing is more suitable for our population data than ANN, which means that we may often achieve better result by multi-mode analysis and comprison with the outputs of different modes. 4. Knowledge creation, sharing, and management can be achieved by means of the knowledge discovery processes on the framework developed in this research. 5. We can figure out errorful raw data in the mining output and feedback to the data source to improve its quality.
|
250 |
Metodika vývoje a nasazování Business Intelligence v malých a středních podnicích / Methodology of development and deployment of Business Intelligence solutions in Small and Medium Sized EnterprisesRydzi, Daniel January 2005 (has links)
Dissertation thesis deals with development and implementation of Business Intelligence (BI) solutions for Small and Medium Sized Enterprises (SME) in the Czech Republic. This thesis represents climax of author's up to now effort that has been put into completing a methodological model for development of this kind of applications for SMEs using self-owned skills and minimum of external resources and costs. This thesis can be divided into five major parts. First part that describes used technologies is divided into two chapters. First chapter describes contemporary state of Business Intelligence concept and it also contains original taxonomy of Business Intelligence solutions. Second chapter describes two Knowledge Discovery in Databases (KDD) techniques that were used for building those BI solutions that are introduced in case studies. Second part describes the area of Czech SMEs, which is an environment where the thesis was written and which it is meant to contribute to. This environment is represented by one chapter that defines the differences of SMEs against large corporations. Furthermore, there are author's reasons why he is personally focusing on this area explained. Third major part introduces the results of survey that was conducted among Czech SMEs with support of Department of Information Technologies of Faculty of Informatics and Statistics of University of Economics in Prague. This survey had three objectives. First one was to map the readiness of Czech SMEs for BI solutions development and deployment. Second was to determine major problems and consequent decisions of Czech SMEs that could be supported by BI solutions and the third objective was to determine top factors preventing SMEs from developing and deploying BI solutions. Fourth part of the thesis is also the core one. In two chapters there is the original Methodology for development and deployment of BI solutions by SMEs described as well as other methodologies that were studied. Original methodology is partly based on famous CRISP-DM methodology. Finally, last part describes particular company that has become a testing ground for author's theories and that supports his research. In further chapters it introduces case-studies of development and deployment of those BI solutions in this company, that were build using contemporary BI and KDD techniques with respect to original methodology. In that sense, these case-studies verified theoretical methodology in real use.
|
Page generated in 0.0814 seconds