Spelling suggestions: "subject:"[een] DATA"" "subject:"[enn] DATA""
251 |
TELEMETRY AND DATA LOGGING IN A FORMULA SAE RACE CARSchultz, Aaron 10 1900 (has links)
The problem with designing and simulating a race car entirely through CAD and other computer simulations, is that the real world behavior of the car will differ from the results outputted from CFD and FEA analysis. One way to learn more about how the car actually handles, is through telemetry and data logging of many different sensors on the car while it is running at racing speeds. This data can help the engineering team build new components, and tune the many different systems on the car in order to get the fastest time around a track as possible.
|
252 |
Secure Geometric Search on Encrypted Spatial DataWang, Boyang, Wang, Boyang January 2017 (has links)
Spatial data (e.g., points) have extensive applications in practice, such as spatial databases, Location-Based Services, spatial computing, social analyses, computational geometry, graph design, medical imaging, etc. Geometric queries, such as geometric range queries (i.e., finding points inside a geometric range) and nearest neighbor queries (i.e., finding the closest point to a given point), are fundamental primitives to analyze and retrieve information over spatial data. For example, a medical researcher can query a spatial dataset to collect information about patients in a certain geometric area to predict whether there will be a dangerous outbreak of a particular disease (e.g., Ebola or Zika).
With the dramatic increase on the scale and size of data, many companies and organizations are outsourcing significant amounts of data, including significant amounts of spatial data, to public cloud data services in order to minimize data storage and query processing costs. For instance, major companies and organizations, such as Yelp, Foursquare and NASA, are using Amazon Web Services as their public cloud data services, which can save billions of dollars per year for those companies and organizations. However, due to the existence of attackers (e.g., a curious administrator or a hacker) on remote servers, users are worried about the leakage of their private data while storing and querying those data on public clouds.
Searchable Encryption (SE) is an innovative technique to protect the data privacy of users on public clouds without losing search functionalities on the server side. Specifically, a user can encrypt its data with SE before outsourcing data to a public server, and this public server is able to search encrypted data without decryption. Many SE schemes have been proposed to support simple queries, such as keyword search. Unfortunately, how to efficiently and securely support geometric queries over encrypted spatial data remains open.
In this dissertation, to protect the privacy of spatial data in public clouds while still maintaining search functions without decryption, we propose a set of new SE solutions to support geometric queries, including geometric range queries and nearest neighbor queries, over encrypted spatial data. The major contributions of this dissertation focus on two aspects. First, we enrich search functionalities by designing new solutions to carry out secure fundamental geometric search queries, which were not supported in previous works. Second, we minimize the performance gap between theory and practice by building novel schemes to perform geometric queries with highly efficient search time and updates over large-scale encrypted spatial data.
Specifically, we first design a scheme supporting circular range queries (i.e., retrieving points inside a circle) over encrypted spatial data. Instead of directly evaluating compute-then-compare operations, which are inefficient over encrypted data, we use a set of concentric circles to represent a circular range query, and then verify whether a data point is on any of those concentric circles by securely evaluating inner products over encrypted data.
Next, to enrich search functionalities, we propose a new scheme, which can support arbitrary geometric range queries, such as circles, triangles and polygons in general, over encrypted spatial data. By leveraging the properties of Bloom filters, we convert a geometric range search problem to a membership testing problem, which can be securely evaluated with inner products. Moving a step forward, we also build another new scheme, which not only supports arbitrary geometric range queries and sub-linear search time but also enables highly efficient updates.
Finally, we address the problem of secure nearest neighbor search on encrypted large-scale datasets. Specifically, we modify the algorithm of nearest neighbor search in advanced tree structures (e.g., R-trees) by simplifying operations, where evaluating comparisons alone on encrypted data is sufficient to efficiently and correctly find nearest neighbors over datasets with millions of tuples.
|
253 |
Mining Genome-Scale Growth Phenotype Data through Constant-Column BiclusteringAlzahrani, Majed A. 10 July 2017 (has links)
Growth phenotype profiling of genome-wide gene-deletion strains over stress conditions can offer a clear picture that the essentiality of genes depends on environmental conditions. Systematically identifying groups of genes from such recently emerging high-throughput data that share similar patterns of conditional essentiality and dispensability under various environmental conditions can elucidate how genetic interactions of the growth phenotype are regulated in response to the environment.
In this dissertation, we first demonstrate that detecting such “co-fit” gene groups can be cast as a less well-studied problem in biclustering, i.e., constant-column biclustering. Despite significant advances in biclustering techniques, very few were designed for mining in growth phenotype data. Here, we propose Gracob, a novel, efficient graph-based method that casts and solves the constant-column biclustering problem as a maximal clique finding problem in a multipartite graph. We compared Gracob with a large collection of widely used biclustering methods that cover different types of algorithms designed to detect different types of biclusters. Gracob showed superior performance on finding co-fit genes over all the existing methods on both a variety of synthetic data sets with a wide range of settings, and three real growth phenotype data sets for E. coli, proteobacteria, and yeast.
|
254 |
The impact on information systems controls within an organisation when making use of an EDI VANRorbye, Trevor Wayne 08 May 2014 (has links)
M.Com. (Computer Auditing) / The implementation of EDI into South Africa business has only started in the recent past. The main reason for this is the fact that the huge benefits in terms of faster processing of business transactions, reduced costs of processing and the formation of strategic business alliances with key business partners, is only now being accepted by management. The other reason is due to the fact that large, commercially operated Value Added Networks (VANs) have only been in existence in this country during the last two years. The primary objective of this short dissertation can be summarised as follows: a) To provide a brief overview of the developments which are currently taking place in South Africa in the Electronic Data Interchange (EDI) environment and the Value Added Network (VAN) environment; b) To highlight how EDI is currently being implemented in South Africa; c) To develop a simplistic framework of key information systems controls which an auditor should consider when evaluating the information systems at a client; and d) To apply this controls framework to the EDI and VAN environments in order to derive lists of the information systems controls which should be reviewed by the auditor when their client makes use of an EDI VAN.
|
255 |
The role of management in the design and implementation of value-adding accounting systems and procedures in the small organistionCumberlege, Engela Helena. 16 August 2012 (has links)
M.Comm. / The primary objective of the dissertation is: • To explain the design and implementation of effective accounting systems and procedures in a small organisation with a workforce of less than 100 employees. The secondary objectives of the dissertation are as follows: • To explain the nature of current non-value adding accounting systems and procedures whch the small organisation should change or eliminate; • To explain the nature of desired value-adding accounting systems and procedures which should be introduced by the small organisation; • To explain management involvement and participation in the design and implementation of accounting systems and procedures manuals in a small organisation; • To explain the purpose, objectives, need for and basic writing principles which a written accounting procedure of a small organisation must adhere to; • To explain the steps to be followed to design and implement an accounting system and procedure; and • To propose suggestions and recommendations with reference to the design and implementation of accounting systems and procedures. • To determine the minimum and maximum accounting systems and procedures that needed to be implemented in the small organisation; • To identify the shortcomings and positives regarding the information supplied by the current Accounting systems. All of the abovelisted primary and secondary objectives have effect on a small organisation with less than 100 employees, a limited cashflow and which pays limited attention to procedures, due to time constraints. The time constraints relate primarily to the limited workforce.
|
256 |
On techniques for pay-as-you-go data integration of linked dataChristodoulou, Klitos January 2015 (has links)
It is recognised that nowadays, users interact with large amounts of data that exist in disparate forms, and are stored under different settings. Moreover, it is true that the amount of structured and un-structured data outside a single well organised data management system is expanding rapidly. To address the recent challenges of managing large amounts of potentially distributed data, the vision of a dataspace was introduced. This data management paradigm aims at reducing the complexity behind the challenges of integrating heterogeneous data sources. Recently, efforts by the Linked Data (LD) community gave rise to a Web of Data (WoD) that interweaves with the current Web of documents in a way that it is useful for data consumption by both humans and computational agents. On the WoD, datasets are structured under a common data model and published as Web resources following a simple set of guidelines that enables them to be linked with other pieces of data, as well as, to be annotated with useful meta data that help determine their semantics. The WoD is an evolving open ecosystem including specialist publishers as well as community efforts aiming at re-publishing isolated databases as LD on the WoD, and annotating them with meta data. The WoD raises new opportunities and challenges. However, currently it mostly relies on manual efforts for integrating the large amounts of heterogeneous data sources on the WoD. This dissertation makes the case that several techniques from the dataspaces research area (aiming at on-demand integration of data sources in a pay-as-you-go fashion) can support the integration of heterogeneous WoD sources. In so doing, this dissertation explores the opportunities and identifies the challenges of adapting existing pay-as-you-go data integration techniques in the context of LD. More specifically, this dissertation makes the following contributions: (1) a case-study for identifying the challenges when existing pay-as-you-go data integration techniques are applied in a setting where data sources are LD; (2) a methodology that deals with the 'schema-less' nature of LD sources by automatically inferring a conceptual structure from a given RDF graph thus enabling downstream tasks, such as the identification of matches and the derivation of mappings, which are, both, essential for the automatic bootstrapping of a dataspace; and (3) a well-defined, principled methodology that builds on a Bayesian inference technique for reasoning under uncertainty to improve pay-as-you-go integration. Although the developed methodology is generic in being able to reason with different hypothesis, its effectiveness has only been explored on reducing the uncertain decisions made by string-based matchers during the matching stage of a dataspace system.
|
257 |
Study on a Hierarchy ModelChe, Suisui 23 March 2012 (has links)
The statistical inferences about the parameters of Binomial-Poisson hierarchy model are discussed. Based on the estimators of paired observations we consider the other two cases with extra observations on both the first and second layer of the model. The MLEs of lambda and p are derived and it is also proved the MLE lambda is also the UMVUE of lambda. By using multivariate central limit theory and large sample theory, both the estimators based on extra observations on the first and second layer are obtained respectively. The performances of the estimators are compared numerically based on extensive Monte Carlo simulation. Simulation studies indicate that the performance of the estimators is more efficient than those only based on paired observations. Inference about the confidence interval for p is presented for both cases. The efficiency of the estimators is compared with condition given that same number of extra observations is provided.
|
258 |
The impact of domain knowledge-driven variable derivation on classifier performance for corporate data miningWelcker, Laura Joana Maria January 2015 (has links)
The technological progress in terms of increasing computational power and growing virtual space to collect data offers great potential for businesses to benefit from data mining applications. Data mining can create a competitive advantage for corporations by discovering business relevant information, such as patterns, relationships, and rules. The role of the human user within the data mining process is crucial, which is why the research area of domain knowledge becomes increasingly important. This thesis investigates the impact of domain knowledge-driven variable derivation on classifier performance for corporate data mining. Domain knowledge is defined as methodological, data and business know-how. The thesis investigates the topic from a new perspective by shifting the focus from a one-sided approach, namely a purely analytic or purely theoretical approach towards a target group-oriented (researcher and practitioner) approach which puts the methodological aspect by means of a scientific guideline in the centre of the research. In order to ensure feasibility and practical relevance of the guideline, it is adapted and applied to the requirements of a practical business case. Thus, the thesis examines the topic from both perspectives, a theoretical and practical perspective. Therewith, it overcomes the limitation of a one-sided approach which mostly lacks practical relevance or generalisability of the results. The primary objective of this thesis is to provide a scientific guideline which should enable both practitioners and researchers to move forward the domain knowledge-driven research for variable derivation on a corporate basis. In the theoretical part, a broad overview of the main aspects which are necessary to undertake the research are given, such as the concept of domain knowledge, the data mining task of classification, variable derivation as a subtask of data preparation, and evaluation techniques. This part of the thesis refers to the methodological aspect of domain knowledge. In the practical part, a research design is developed for testing six hypotheses related to domain knowledge-driven variable derivation. The major contribution of the empirical study is concerned with testing the impact of domain knowledge on a real business data set compared to the impact of a standard and randomly derived data set. The business application of the research is a binary classification problem in the domain of an insurance business, which deals with the prediction of damages in legal expenses insurances. Domain knowledge is expressed through deriving the corporate variables by means of the business and data-driven constructive induction strategy. Six variable derivation steps are investigated: normalisation, instance relation, discretisation, categorical encoding, ratio, and multivariate mathematical function. The impact of the domain knowledge is examined by pairwise (with and without derived variables) performance comparisons for five classification techniques (decision trees, naive Bayes, logistic regression, artificial neural networks, k-nearest neighbours). The impact is measured by two classifier performance criteria: sensitivity and area under the ROC-curve (AUC). The McNemar significance test is used to verify the results. Based on the results, two hypotheses are clearly verified and accepted, three hypotheses are partly verified, and one hypothesis had to be rejected on the basis of the case study results. The thesis reveals a significant positive impact of domain knowledge-driven variable derivation on classifier performance for options of all six tested steps. Furthermore, the findings indicate that the classification technique influences the impact of the variable derivation steps, and the bundling of steps has a significant higher performance impact if the variables are derived by using domain knowledge (compared to a non-knowledge application). Finally, the research turns out that an empirical examination of the domain knowledge impact is very complex due to a high level of interaction between the selected research parameters (variable derivation step, classification technique, and performance criteria).
|
259 |
Data Quality Metrics / Data Quality MetricsSýkorová, Veronika January 2008 (has links)
The aim of the thesis is to prove measurability of the Data Quality which is a relatively subjective measure and thus is difficult to measure. In doing this various aspects of measuring the quality of data are analyzed and a Complex Data Quality Monitoring System is introduced with the aim to provide a concept for measuring/monitoring the overall Data Quality in an organization. The system is built on a metrics hierarchy decomposed into particular detailed metrics, dimensions enabling multidimensional analyses of the metrics, and processes being measured by the metrics. The first part of the thesis (Chapter 2 and Chapter 3) is focused on dealing with Data Quality, i.e. provides various definitions of Data Quality, gives reasoning for the importance of Data Quality in a company, and presents some of the most common tools and solutions that target to managing Data Quality in an organization. The second part of the thesis (Chapter 4 and Chapter 5) builds on the previous part and leads into measuring Data Quality using metrics, i.e. contains definition and purpose of Data Quality Metrics, places them into the multidimensional context (dimensions, hierarchies) and states five possible decompositions of Data Quality metrics into detail. The third part of the thesis (Chapter 6) contains the proposed Complex Data Quality Monitoring System including description of Data Quality Management related dimensions and processes, and most importantly detailed definition of bottom-level metrics used for calculation of the overall Data Quality.
|
260 |
Systém předzpracování dat pro dobývání znalostí z databází / Systém předzpracování dat pro dobývání znalostí z databázíKotinová, Hana January 2009 (has links)
Abstract Aim of this diploma thesis was to create an aplication for data preprocessing. The aplication uses files in csv format and is useful for preparing data while solving datamining tasks. The aplication was created using the programing language Java. This text discusses problems, their solutions and algorithms associated with data preprocessing and discusses similar systems such as Mining Mart and SumatraTT. A complete aplication user guide is provided in the main part of this text.
|
Page generated in 0.055 seconds