Spelling suggestions: "subject:" gig data"" "subject:" iig data""
391 |
Improvement of recommendation system for a wholesale store chain using advanced data mining techniquesVidela Cavieres, Iván Fernando January 2015 (has links)
Magíster en Gestión de Operaciones / Ingeniero Civil Industrial / En las empresas de Retail, las áreas de Customer Intelligence tienen muchas oportunidades de mejorar sus decisiones estratégicas a partir de la información que podrían obtener de los registros de interacciones con sus clientes. Sin embargo se ha convertido en un desafío poder procesar estos grandes volúmenes de datos.
Uno de los problemas que se enfrentan día a día es segmentar o agrupar clientes. La mayoría de las empresas generan agrupaciones según nivel de gasto, no por similitud en sus canastas de compra, como propone la literatura. Otro desafío de estas empresas es aumentar las ventas en cada visita del cliente y fidelizar. Una de las técnicas utilizadas para lograrlo es usar sistemas de recomendación.
En este trabajo se proceso ́ alrededor de medio billón de registros transaccionales de una cadena de supermercados mayorista. Al aplicar las técnicas tradicionales de Clustering y Market Basket Analysis los resultados son de baja calidad, haciendo muy difícil la interpretación, además no se logra identificar grupos que permitan clasificar a un cliente de acuerdo a sus compras históricas.
Entendiendo que la presencia simultánea de dos productos en una misma boleta implica una relación entre ellos, se usó un método de graph mining basado en redes sociales que permitió obtener grupos de productos identificables que denominamos comunidades, a las que puede pertenecer un cliente. La robustez del modelo se comprueba por la estabilidad de los grupos generados en distintos periodos de tiempo.
Bajo las mismas restricciones que la empresa exige, se generan recomendaciones basadas en las compras históricas y en la pertenencia de los clientes a los distintos grupos de productos. De esta manera, los clientes reciben recomendaciones mucho más pertinentes y no solo son basadas en los que otros clientes también compraron.
La novedosa forma de resolver el problema de segmentar clientes ayuda a mejorar en un 140% el actual método de recomendaciones que utiliza la cadena Chilena de supermercados mayoristas. Esto se traduce en un aumento de más de 430% de los ingresos posibles.
|
392 |
INFLUENCE ANALYSIS TOWARDS BIG SOCIAL DATAHan, Meng 03 May 2017 (has links)
Large scale social data from online social networks, instant messaging applications, and wearable devices have seen an exponential growth in a number of users and activities recently. The rapid proliferation of social data provides rich information and infinite possibilities for us to understand and analyze the complex inherent mechanism which governs the evolution of the new technology age. Influence, as a natural product of information diffusion (or propagation), which represents the change in an individual’s thoughts, attitudes, and behaviors resulting from interaction with others, is one of the fundamental processes in social worlds. Therefore, influence analysis occupies a very prominent place in social related data analysis, theory, model, and algorithms. In this dissertation, we study the influence analysis under the scenario of big social data. Firstly, we investigate the uncertainty of influence relationship among the social network. A novel sampling scheme is proposed which enables the development of an efficient algorithm to measure uncertainty. Considering the practicality of neighborhood relationship in real social data, a framework is introduced to transform the uncertain networks into deterministic weight networks where the weight on edges can be measured as Jaccard-like index. Secondly, focusing on the dynamic of social data, a practical framework is proposed by only probing partial communities to explore the real changes of a social network data. Our probing framework minimizes the possible difference between the observed topology and the actual network through several representative communities. We also propose an algorithm that takes full advantage of our divide-and-conquer strategy which reduces the computational overhead. Thirdly, if let the number of users who are influenced be the depth of propagation and the area covered by influenced users be the breadth, most of the research results are only focused on the influence depth instead of the influence breadth. Timeliness, acceptance ratio, and breadth are three important factors that significantly affect the result of influence maximization in reality, but they are neglected by researchers in most of time. To fill the gap, a novel algorithm that incorporates time delay for timeliness, opportunistic selection for acceptance ratio, and broad diffusion for influence breadth has been investigated. In our model, the breadth of influence is measured by the number of covered communities, and the tradeoff between depth and breadth of influence could be balanced by a specific parameter. Furthermore, the problem of privacy preserved influence maximization in both physical location network and online social network was addressed. We merge both the sensed location information collected from cyber-physical world and relationship information gathered from online social network into a unified framework with a comprehensive model. Then we propose the resolution for influence maximization problem with an efficient algorithm. At the same time, a privacy-preserving mechanism are proposed to protect the cyber physical location and link information from the application aspect. Last but not least, to address the challenge of large-scale data, we take the lead in designing an efficient influence maximization framework based on two new models which incorporate the dynamism of networks with consideration of time constraint during the influence spreading process in practice. All proposed problems and models of influence analysis have been empirically studied and verified by different, large-scale, real-world social data in this dissertation.
|
393 |
What are the Potential Impacts of Big Data, Artificial Intelligence and Machine Learning on the Auditing Profession?Evett, Chantal 01 January 2017 (has links)
To maintain public confidence in the financial system, it is essential that most financial fraud is prevented and that incidents of fraud are detected and punished. The responsibility of uncovering creatively implemented fraud is placed, in a large part, on auditors. Recent advancements in technology are helping auditors turn the tide against fraudsters. Big Data, made possible by the proliferation, widespread availability and amalgamation of diverse digital data sets, has become an important driver of technological change. Big Data analytics are already transforming the traditional audit. Sampling and testing a limited number of random samples has turned into a much more comprehensive audit that analyzes the entire population of transactions within an account, allowing auditors to flag and investigate all sorts of potentially fraudulent anomalies that were previously invisible. Artificial intelligence (AI) programs, typified by IBM’s Watson, can mimic the thought processes of the human mind and will soon be adopted by the auditing profession. Machine learning (ML) programs, with the ability to change when exposed to new data, are developing rapidly and may take over many of the decision-making functions currently performed by auditors. The SEC has already implemented pioneering fraud-detection software based on AI and ML programs. The evolution of the auditor’s role has already begun. Current accounting students must understand the traditional auditing skillset will not longer be sufficient. While facing a future with fewer auditing positions available due to increased automation, auditors will need training for roles that will be more data analytical and computer-science based.
|
394 |
A New Evolutionary Algorithm For Mining Noisy, Epistatic, Geospatial Survey Data Associated With Chagas DiseaseHanley, John P. 01 January 2017 (has links)
The scientific community is just beginning to understand some of the profound affects that feature interactions and heterogeneity have on natural systems. Despite the belief that these nonlinear and heterogeneous interactions exist across numerous real-world systems (e.g., from the development of personalized drug therapies to market predictions of consumer behaviors), the tools for analysis have not kept pace. This research was motivated by the desire to mine data from large socioeconomic surveys aimed at identifying the drivers of household infestation by a Triatomine insect that transmits the life-threatening Chagas disease. To decrease the risk of transmission, our colleagues at the laboratory of applied entomology and parasitology have implemented mitigation strategies (known as Ecohealth interventions); however, limited resources necessitate the search for better risk models. Mining these complex Chagas survey data for potential predictive features is challenging due to imbalanced class outcomes, missing data, heterogeneity, and the non-independence of some features.
We develop an evolutionary algorithm (EA) to identify feature interactions in "Big Datasets" with desired categorical outcomes (e.g., disease or infestation). The method is non-parametric and uses the hypergeometric PMF as a fitness function to tackle challenges associated with using p-values in Big Data (e.g., p-values decrease inversely with the size of the dataset). To demonstrate the EA effectiveness, we first test the algorithm on three benchmark datasets. These include two classic Boolean classifier problems: (1) the "majority-on" problem and (2) the multiplexer problem, as well as (3) a simulated single nucleotide polymorphism (SNP) disease dataset. Next, we apply the EA to real-world Chagas Disease survey data and successfully archived numerous high-order feature interactions associated with infestation that would not have been discovered using traditional statistics. These feature interactions are also explored using network analysis. The spatial autocorrelation of the genetic data (SNPs of Triatoma dimidiata) was captured using geostatistics. Specifically, a modified semivariogram analysis was performed to characterize the SNP data and help elucidate the movement of the vector within two villages. For both villages, the SNP information showed strong spatial autocorrelation albeit with different geostatistical characteristics (sills, ranges, and nuggets). These metrics were leveraged to create risk maps that suggest the more forested village had a sylvatic source of infestation, while the other village had a domestic/peridomestic source. This initial exploration into using Big Data to analyze disease risk shows that novel and modified existing statistical tools can improve the assessment of risk on a fine-scale.
|
395 |
The use of data within Product Development of manufactured productsFlankegård, Filip January 2017 (has links)
No description available.
|
396 |
Automatické generování umělých XML dokumentů / Automatic Generation of Synthetic XML DocumentsBetík, Roman January 2015 (has links)
The aim of this thesis is to research the current possibilities and limitations of automatic generation of synthetic XML and JSON documents used in the area of Big Data. The first part of the work discusses the properties of the most used XML data generators, Big Data and JSON generators and compares them. The next part of the thesis proposes an algorithm for data generation of semistructured data. The main focus of the algorithm is on the parallel execution of the generation process while preserving the ability to control the contents of the generated documents. The data generator can also use samples of real data in the generation of the synthetic data and is also capable of automatic creation of simple references between JSON documents. The last part of the thesis provides the results of experiments with the data generator exploited for the purpose of testing database MongoDB, describes its added value and compares it to other solutions. Powered by TCPDF (www.tcpdf.org)
|
397 |
Building a scalable distributed data platform using lambda architectureMehta, Dhananjay January 1900 (has links)
Master of Science / Department of Computer Science / William H. Hsu / Data is generated all the time over Internet, systems sensors and mobile devices around us this is often referred to as ‘big data’. Tapping this data is a challenge to organizations because of the nature of data i.e. velocity, volume and variety. What make handling this data a challenge? This is because traditional data platforms have been built around relational database management systems coupled with enterprise data warehouses. Legacy infrastructure is either technically incapable to scale to big data or financially infeasible. Now the question arises, how to build a system to handle the challenges of big data and cater needs of an organization? The answer is Lambda Architecture.
Lambda Architecture (LA) is a generic term that is used for scalable and fault-tolerant data processing architecture that ensures real-time processing with low latency. LA provides a general strategy to knit together all necessary tools for building a data pipeline for real-time processing of big data. LA comprise of three layers – Batch Layer, responsible for bulk data processing, Speed Layer, responsible for real-time processing of data streams and Service Layer, responsible for serving queries from end users. This project draw analogy between modern data platforms and traditional supply chain management to lay down principles for building a big data platform and show how major challenges with building a data platforms can be mitigated. This project constructs an end to end data pipeline for ingestion, organization, and processing of data and demonstrates how any organization can build a low cost distributed data platform using Lambda Architecture.
|
398 |
The Auditor’s Role in a Digital World : Empirical evidence on auditors’ perceived role and its implications on the principal-agent justificationCaringe, Andreas, Holm, Erik January 2017 (has links)
Most of the theory that concerns auditing relates to agency theory where auditors' role is to mitigate the information asymmetry between principals and agents. During the last decade, we have witnessed technological advancements across the society, advancements which also have affected the auditing profession. Technology and accounting information systems has decreased information asymmetry in various ways. From an agency theory point of view, this would arguably reduce the demand for auditing. In the same time, the audit profession is expanding into new business areas where auditors perform assurance services. The purpose of this paper is to investigate auditors' role in a technological environment. Interviews have been used to explore auditors' perception of the role. The result indicates that auditors' role still is to mitigate principal-agent conflicts, though, information asymmetries are expanding to comprehend more and to a wider stakeholder group due to technology. The end goal is still the same, that to provide trust to the stakeholders, technology enable new ways of reaching there and broadens the scope towards systems and other related services. That is the perceived role of auditors in today´s technological environment.
|
399 |
There ain ́t no such thing as a free lunch : What consumers think about personal data collection onlineLoverus, Anna, Tellebo, Paulina January 2017 (has links)
This study examines how consumers reason and their opinions about personal data collection online. Its focus is to investigate whether consumers consider online data collection as an issue with moral implications, and if these are unethical. This focus is partly motivated by the contradiction between consumers’ stated opinions and actual behavior, which differ. To meet its purpose, the study poses the research question How is personal data collection and its prevalence online perceived and motivated by consumers?. The theoretical framework consists of the Issue-Contingent Model of Ethical Decision-Making by Jones (1991), thus putting the model to use in a new context. Collection of data for the study was done by conducting focus groups, since Jones’ model places ethical decision- making in a social context. The results of the study showed that consumers acknowledge both positive and negative aspects of online data collection, but the majority of them do not consider this data collection to be unethical. This result confirms partly the behaviour that consumers already display, but does not explain why their stated opinions do not match this. Thus, this study can be seen as an initial attempt at clarifying consumer reasoning on personal data collection online, with potential for future studies to further investigate and understand consumer online behaviour. / Denna uppsats undersöker hur konsumenter resonerar och tänker kring insamling av personlig data på Internet. Fokus är att utreda ifall konsumenter anser att denna insamling har konsekvenser, och ifall dessa anses vara oetiska. Detta fokus baseras delvis på resultat som visar på skillnader i vad konsumenter uttrycker för åsikter kring detta ämne, och deras faktiska beteende på Internet. Undersökningen utgår ifrån forskningsfrågan som lyder Hur uppfattar och motiverar konsumenter insamling av personlig data på Internet? Studiens teoretiska ramverk består av modellen An Issue-Contingent model of Ethical Decision- Making som är utvecklad av Jones (1991), och modellen används därmed i en ny kontext. Studiens data samlades in genom fokusgrupper. Detta val baserades på Jones (1991) modell, som menar att etiskt beslutsfattande alltid sker i en social kontext. De resultat som kommit fram visar att konsumenter ser både positiva och negativa aspekter och konsekvenser av att ha sin personliga data insamlad, däremot utan att anse att insamlingen i sig är oetisk. Detta bekräftar delvis tidigare resultat, men förklarar inte varför de åsikter konsumenter uttrycker kring ämnet inte stämmer överens med hur de sedan faktiskt beter sig. Därmed kan den här uppsatsen ses som ett första försök att klargöra hur konsumenter resonerar kring insamling av personlig data på Internet. Det har bedömts finnas mycket potential för framtida studier inom samma område, för att fortsatt undersöka och förstå konsumenters beteende på Internet.
|
400 |
Efficiency of combine usage: a study of combine data comparing operators and combines to maximize efficiencySchemper, Janel K. January 1900 (has links)
Master of Agribusiness / Department of Agricultural Economics / Vincent Amanor-Boadu / Farming is an important industry in the United States. The custom harvesting industry plays a major role in feeding the world. Schemper Harvesting is a family-owned and operated custom harvesting service that employs 20-25 seasonal workers and understanding how to manage a custom harvesting business professionally and efficiently is the key for its success. Today, there is data available through JDLink on John Deere combine performance beginning in year 2012.
The purpose of this study is to examine the usefulness of this JDLink data to assess the efficiency of each of Schemper Harvesting’s seven combines, including machine efficiency and different combine operators. The goal is to determine how the data can improve Schemper Harvesting’s overall performance.
Statistical methods were used to analyze Schemper Harvesting’s performance. The analysis indicated that fuel is a major expense and there are ways Schemper Harvesting can conserve fuel. This information may prove valuable in being able to operate a combine more efficiently and save money on expenses. Overall, the objective is to improve Schemper Harvesting’s performance, which results in higher profit without sacrificing quality.
Precision technology is an added expense to the business. Being able to justify this expense with profit is the answer. Fuel, labor and machinery are the biggest inputs in the custom harvesting business. These costs related to production agriculture have increased the demand for precision agriculture to increase efficiency and profitability. In order to compensate for the investment in technology, it has been demonstrated that it pays for itself. Making correct use of precision technology adds to productivity. With experience, operators improve increasing their overall efficiency. Incentive plans can be utilized through this data. With the availability of data, the costs and benefits of precision technology can be further evaluated.
Five of the seven combines are operated by family members and the other two by non-family employees. This study shows that the performance of the non-family employees was below that of family members. The initial assessment for this difference may be attributed to experience because all the family members have been operating combines for most of their lives. This implies that employing people with excellent performance experience records and/or a need to train non-family employees to help them understand the performance expectations at Schemper Harvesting. The results indicate that tracking operational output performance indicators, such as acreage and volume harvest should be completed so that they may be assessed in concert with the technical indicators such as time and fuel use. The study provides the potential benefits of using John Deere’s JDLink data service providing telematics information for its customers with the latest precision agriculture technologies.
|
Page generated in 0.067 seconds