Global ETD Search

391	A New Evolutionary Algorithm For Mining Noisy, Epistatic, Geospatial Survey Data Associated With Chagas Disease Hanley, John P. 01 January 2017 (has links) The scientific community is just beginning to understand some of the profound affects that feature interactions and heterogeneity have on natural systems. Despite the belief that these nonlinear and heterogeneous interactions exist across numerous real-world systems (e.g., from the development of personalized drug therapies to market predictions of consumer behaviors), the tools for analysis have not kept pace. This research was motivated by the desire to mine data from large socioeconomic surveys aimed at identifying the drivers of household infestation by a Triatomine insect that transmits the life-threatening Chagas disease. To decrease the risk of transmission, our colleagues at the laboratory of applied entomology and parasitology have implemented mitigation strategies (known as Ecohealth interventions); however, limited resources necessitate the search for better risk models. Mining these complex Chagas survey data for potential predictive features is challenging due to imbalanced class outcomes, missing data, heterogeneity, and the non-independence of some features. We develop an evolutionary algorithm (EA) to identify feature interactions in "Big Datasets" with desired categorical outcomes (e.g., disease or infestation). The method is non-parametric and uses the hypergeometric PMF as a fitness function to tackle challenges associated with using p-values in Big Data (e.g., p-values decrease inversely with the size of the dataset). To demonstrate the EA effectiveness, we first test the algorithm on three benchmark datasets. These include two classic Boolean classifier problems: (1) the "majority-on" problem and (2) the multiplexer problem, as well as (3) a simulated single nucleotide polymorphism (SNP) disease dataset. Next, we apply the EA to real-world Chagas Disease survey data and successfully archived numerous high-order feature interactions associated with infestation that would not have been discovered using traditional statistics. These feature interactions are also explored using network analysis. The spatial autocorrelation of the genetic data (SNPs of Triatoma dimidiata) was captured using geostatistics. Specifically, a modified semivariogram analysis was performed to characterize the SNP data and help elucidate the movement of the vector within two villages. For both villages, the SNP information showed strong spatial autocorrelation albeit with different geostatistical characteristics (sills, ranges, and nuggets). These metrics were leveraged to create risk maps that suggest the more forested village had a sylvatic source of infestation, while the other village had a domestic/peridomestic source. This initial exploration into using Big Data to analyze disease risk shows that novel and modified existing statistical tools can improve the assessment of risk on a fine-scale. Big Data Chagas Evolutionary Computation Feature Interactions Geostatistics Heterogeneity Environmental Engineering Epidemiology
392	The use of data within Product Development of manufactured products Flankegård, Filip January 2017 (has links) No description available. data big data product development literature review company interview Mechanical Engineering Maskinteknik
393	Automatické generování umělých XML dokumentů / Automatic Generation of Synthetic XML Documents Betík, Roman January 2015 (has links) The aim of this thesis is to research the current possibilities and limitations of automatic generation of synthetic XML and JSON documents used in the area of Big Data. The first part of the work discusses the properties of the most used XML data generators, Big Data and JSON generators and compares them. The next part of the thesis proposes an algorithm for data generation of semistructured data. The main focus of the algorithm is on the parallel execution of the generation process while preserving the ability to control the contents of the generated documents. The data generator can also use samples of real data in the generation of the synthetic data and is also capable of automatic creation of simple references between JSON documents. The last part of the thesis provides the results of experiments with the data generator exploited for the purpose of testing database MongoDB, describes its added value and compares it to other solutions. Powered by TCPDF (www.tcpdf.org)
394	Building a scalable distributed data platform using lambda architecture Mehta, Dhananjay January 1900 (has links) Master of Science / Department of Computer Science / William H. Hsu / Data is generated all the time over Internet, systems sensors and mobile devices around us this is often referred to as ‘big data’. Tapping this data is a challenge to organizations because of the nature of data i.e. velocity, volume and variety. What make handling this data a challenge? This is because traditional data platforms have been built around relational database management systems coupled with enterprise data warehouses. Legacy infrastructure is either technically incapable to scale to big data or financially infeasible. Now the question arises, how to build a system to handle the challenges of big data and cater needs of an organization? The answer is Lambda Architecture. Lambda Architecture (LA) is a generic term that is used for scalable and fault-tolerant data processing architecture that ensures real-time processing with low latency. LA provides a general strategy to knit together all necessary tools for building a data pipeline for real-time processing of big data. LA comprise of three layers – Batch Layer, responsible for bulk data processing, Speed Layer, responsible for real-time processing of data streams and Service Layer, responsible for serving queries from end users. This project draw analogy between modern data platforms and traditional supply chain management to lay down principles for building a big data platform and show how major challenges with building a data platforms can be mitigated. This project constructs an end to end data pipeline for ingestion, organization, and processing of data and demonstrates how any organization can build a low cost distributed data platform using Lambda Architecture. Big data Hadoop Data supply chain Spark Map Reduce Lambda architecture
395	The Auditor’s Role in a Digital World : Empirical evidence on auditors’ perceived role and its implications on the principal-agent justification Caringe, Andreas, Holm, Erik January 2017 (has links) Most of the theory that concerns auditing relates to agency theory where auditors' role is to mitigate the information asymmetry between principals and agents. During the last decade, we have witnessed technological advancements across the society, advancements which also have affected the auditing profession. Technology and accounting information systems has decreased information asymmetry in various ways. From an agency theory point of view, this would arguably reduce the demand for auditing. In the same time, the audit profession is expanding into new business areas where auditors perform assurance services. The purpose of this paper is to investigate auditors' role in a technological environment. Interviews have been used to explore auditors' perception of the role. The result indicates that auditors' role still is to mitigate principal-agent conflicts, though, information asymmetries are expanding to comprehend more and to a wider stakeholder group due to technology. The end goal is still the same, that to provide trust to the stakeholders, technology enable new ways of reaching there and broadens the scope towards systems and other related services. That is the perceived role of auditors in today´s technological environment. Auditor’s Role Big Data ERP XBRL Technology Principal-Agent Theory Business Administration Företagsekonomi
396	There ain ́t no such thing as a free lunch : What consumers think about personal data collection online Loverus, Anna, Tellebo, Paulina January 2017 (has links) This study examines how consumers reason and their opinions about personal data collection online. Its focus is to investigate whether consumers consider online data collection as an issue with moral implications, and if these are unethical. This focus is partly motivated by the contradiction between consumers’ stated opinions and actual behavior, which differ. To meet its purpose, the study poses the research question How is personal data collection and its prevalence online perceived and motivated by consumers?. The theoretical framework consists of the Issue-Contingent Model of Ethical Decision-Making by Jones (1991), thus putting the model to use in a new context. Collection of data for the study was done by conducting focus groups, since Jones’ model places ethical decision- making in a social context. The results of the study showed that consumers acknowledge both positive and negative aspects of online data collection, but the majority of them do not consider this data collection to be unethical. This result confirms partly the behaviour that consumers already display, but does not explain why their stated opinions do not match this. Thus, this study can be seen as an initial attempt at clarifying consumer reasoning on personal data collection online, with potential for future studies to further investigate and understand consumer online behaviour. / Denna uppsats undersöker hur konsumenter resonerar och tänker kring insamling av personlig data på Internet. Fokus är att utreda ifall konsumenter anser att denna insamling har konsekvenser, och ifall dessa anses vara oetiska. Detta fokus baseras delvis på resultat som visar på skillnader i vad konsumenter uttrycker för åsikter kring detta ämne, och deras faktiska beteende på Internet. Undersökningen utgår ifrån forskningsfrågan som lyder Hur uppfattar och motiverar konsumenter insamling av personlig data på Internet? Studiens teoretiska ramverk består av modellen An Issue-Contingent model of Ethical Decision- Making som är utvecklad av Jones (1991), och modellen används därmed i en ny kontext. Studiens data samlades in genom fokusgrupper. Detta val baserades på Jones (1991) modell, som menar att etiskt beslutsfattande alltid sker i en social kontext. De resultat som kommit fram visar att konsumenter ser både positiva och negativa aspekter och konsekvenser av att ha sin personliga data insamlad, däremot utan att anse att insamlingen i sig är oetisk. Detta bekräftar delvis tidigare resultat, men förklarar inte varför de åsikter konsumenter uttrycker kring ämnet inte stämmer överens med hur de sedan faktiskt beter sig. Därmed kan den här uppsatsen ses som ett första försök att klargöra hur konsumenter resonerar kring insamling av personlig data på Internet. Det har bedömts finnas mycket potential för framtida studier inom samma område, för att fortsatt undersöka och förstå konsumenters beteende på Internet. Big Data Ethics Consumer Behaviour Privacy Digital Marketing Etik Företagsetik Konsumentbeteende Datainsamling Marknadsföring
397	Efficiency of combine usage: a study of combine data comparing operators and combines to maximize efficiency Schemper, Janel K. January 1900 (has links) Master of Agribusiness / Department of Agricultural Economics / Vincent Amanor-Boadu / Farming is an important industry in the United States. The custom harvesting industry plays a major role in feeding the world. Schemper Harvesting is a family-owned and operated custom harvesting service that employs 20-25 seasonal workers and understanding how to manage a custom harvesting business professionally and efficiently is the key for its success. Today, there is data available through JDLink on John Deere combine performance beginning in year 2012. The purpose of this study is to examine the usefulness of this JDLink data to assess the efficiency of each of Schemper Harvesting’s seven combines, including machine efficiency and different combine operators. The goal is to determine how the data can improve Schemper Harvesting’s overall performance. Statistical methods were used to analyze Schemper Harvesting’s performance. The analysis indicated that fuel is a major expense and there are ways Schemper Harvesting can conserve fuel. This information may prove valuable in being able to operate a combine more efficiently and save money on expenses. Overall, the objective is to improve Schemper Harvesting’s performance, which results in higher profit without sacrificing quality. Precision technology is an added expense to the business. Being able to justify this expense with profit is the answer. Fuel, labor and machinery are the biggest inputs in the custom harvesting business. These costs related to production agriculture have increased the demand for precision agriculture to increase efficiency and profitability. In order to compensate for the investment in technology, it has been demonstrated that it pays for itself. Making correct use of precision technology adds to productivity. With experience, operators improve increasing their overall efficiency. Incentive plans can be utilized through this data. With the availability of data, the costs and benefits of precision technology can be further evaluated. Five of the seven combines are operated by family members and the other two by non-family employees. This study shows that the performance of the non-family employees was below that of family members. The initial assessment for this difference may be attributed to experience because all the family members have been operating combines for most of their lives. This implies that employing people with excellent performance experience records and/or a need to train non-family employees to help them understand the performance expectations at Schemper Harvesting. The results indicate that tracking operational output performance indicators, such as acreage and volume harvest should be completed so that they may be assessed in concert with the technical indicators such as time and fuel use. The study provides the potential benefits of using John Deere’s JDLink data service providing telematics information for its customers with the latest precision agriculture technologies. Precision Agriculture JDLink Custom Harvesting Big Data Economics, Agricultural (0503) Engineering, Agricultural (0539) Management (0454)
398	Evaluation of SMP Shared Memory Machines for Use with In-Memory and OpenMP Big Data Applications Younge, Andrew J., Reidy, Christopher, Henschel, Robert, Fox, Geoffrey C. 05 1900 (has links) While distributed memory systems have shaped the field of distributed systems for decades, the demand for many-core shared memory resources is increasing. Symmetric Multiprocessor Systems (SMPs) have become increasingly important recently among a wide array of disciplines, ranging from Bioinformatics to astrophysics, and beyond. With the increase in big data computing, the size and scope of traditional commodity server systems is often outpaced. While some big data applications can be mapped to distributed memory systems found through many cluster and cloud technologies today, this effort represents a large barrier of entry that some projects cannot cross. Shared memory SMP systems look to effectively and efficiently fill this niche within distributed systems by providing high throughput and performance with minimized development effort, as the computing environment often represents what many researchers are already familiar with. In this paper, we look at the use of two common shared memory systems, the ScaleMP vSMP virtualized SMP deployment at Indiana University, and the SGI UV architecture deployed at University of Arizona. While both systems are notably different in their design, their potential impact on computing is remarkably similar. As such, we look to compare each system first under a set of OpenMP threaded benchmarks via the SPEC group, and to follow up with our experience using each machine for Trinity de-novo assembly. We find both SMP systems are well suited to support various big data applications, with the newer vSMP deployment often slightly faster; however, certain caveats and performance considerations are necessary when considering such SMP systems. Symmetric Multiprocessing SMP ScaleMP vSMP SGI UV Large Memory Big Data Virtualization
399	Technology and Big Data Meet the Risk of Terrorism in an Era of Predictive Policing and Blanket Surveillance Patti, Alexandra C 15 May 2015 (has links) Surveillance studies suffer from a near-total lack of empirical data, partially due to the highly secretive nature of surveillance programs. However, documents leaked by Edward Snowden in June of 2013 provided unprecedented proof of top-secret American data mining initiatives that covertly monitor electronic communications, collect, and store previously unfathomable quantities of data. These documents presented an ideal opportunity for testing theory against data to better understand contemporary surveillance. This qualitative content analysis compared themes of technology, privacy, national security, and legality in the NSA documents to those found in sets of publicly available government reports, laws, and guidelines, finding inconsistencies in the portrayal of governmental commitments to privacy, transparency, and civil liberties. These inconsistencies are best explained by the risk society theoretical model, which predicts that surveillance is an attempt to prevent risk in globalized and complex contemporary societies. surveillance big data risk society Snowden data mining Social Control, Law, Crime, and Deviance
400	Bayesian-based Traffic State Estimation in Large-Scale Networks Using Big Data Gu, Yiming 01 February 2017 (has links) Traffic state estimation (TSE) aims to estimate the time-varying traffic characteristics (such as flow rate, flow speed, flow density, and occurrence of incidents) of all roads in traffic networks, provided with limited observations in sparse time and locations. TSE is critical to transportation planning, operation and infrastructure design. In this new era of “big data”, massive volumes of sensing data from a variety of source (such as cell phones, GPS, probe vehicles, and inductive loops, etc.) enable TSE in an efficient, timely and accurate manner. This research develops a Bayesian-based theoretical framework, along with statistical inference algorithms, to (1) capture the complex flow patterns in the urban traffic network consisting both highways and arterials; (2) incorporate heterogeneous data sources into the process of TSE; (3) enable both estimation and perdition of traffic states; and (4) demonstrate the scalability to large-scale urban traffic networks. To achieve those goals, a Hierarchical Bayesian probabilistic model is proposed to capture spatio-temporal traffic states. The propagation of traffic states are encapsulated through mesoscopic network flow models (namely the Link Queue Model) and equilibrated fundamental diagrams. Traffic states in the Hierarchical Bayesian model are inferred using the Expectation-Maximization Extended Kalman Filter (EM-EKF). To better estimate and predict states, infrastructure supply is also estimated as part of the TSE process. It is done by adopting a series of algorithms to translate Twitter data into traffic incident information. Finally, the proposed EM-EKF algorithm is implemented and examined on the road networks in Washington DC. The results show that the proposed methods can handle large-scale traffic state estimation, while achieving superior results comparing to traditional temporal and spatial smoothing methods. Bayesian Big Data Hadoop Intelligent Transportation System Traffic State Estimation Transportation

Search results