Global ETD Search

201	A-Optimal Subsampling For Big Data General Estimating Equations Cheung, Chung Ching 08 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / A significant hurdle for analyzing big data is the lack of effective technology and statistical inference methods. A popular approach for analyzing data with large sample is subsampling. Many subsampling probabilities have been introduced in literature (Ma, \emph{et al.}, 2015) for linear model. In this dissertation, we focus on generalized estimating equations (GEE) with big data and derive the asymptotic normality for the estimator without resampling and estimator with resampling. We also give the asymptotic representation of the bias of estimator without resampling and estimator with resampling. we show that bias becomes significant when the data is of high-dimensional. We also present a novel subsampling method called A-optimal which is derived by minimizing the trace of some dispersion matrices (Peng and Tan, 2018). We derive the asymptotic normality of the estimator based on A-optimal subsampling methods. We conduct extensive simulations on large sample data with high dimension to evaluate the performance of our proposed methods using MSE as a criterion. High dimensional data are further investigated and we show through simulations that minimizing the asymptotic variance does not imply minimizing the MSE as bias not negligible. We apply our proposed subsampling method to analyze a real data set, gas sensor data which has more than four millions data points. In both simulations and real data analysis, our A-optimal method outperform the traditional uniform subsampling method. Subsampling Big Data A-optimality General Estimating Equations High Dimensional Statistics
202	Developing Bottom-Up, Integrated Omics Methodologies for Big Data Biomarker Discovery Kechavarzi, Bobak David 11 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The availability of highly-distributed computing compliments the proliferation of next generation sequencing (NGS) and genome-wide association studies (GWAS) datasets. These data sets are often complex, poorly annotated or require complex domain knowledge to sensibly manage. These novel datasets provide a rare, multi-dimensional omics (proteomics, transcriptomics, and genomics) view of a single sample or patient. Previously, biologists assumed a strict adherence to the central dogma: replication, transcription and translation. Recent studies in genomics and proteomics emphasize that this is not the case. We must employ big-data methodologies to not only understand the biogenesis of these molecules, but also their disruption in disease states. The Cancer Genome Atlas (TCGA) provides high-dimensional patient data and illustrates the trends that occur in expression profiles and their alteration in many complex disease states. I will ultimately create a bottom-up multi-omics approach to observe biological systems using big data techniques. I hypothesize that big data and systems biology approaches can be applied to public datasets to identify important subsets of genes in cancer phenotypes. By exploring these signatures, we can better understand the role of amplification and transcript alterations in cancer. Big Data Bioinformatics Deep Learning Genomics Oncology Systems Biology
203	A Deep Learning Approach to Seizure Prediction with a Desirable Lead Time Huang, Yan 23 May 2019 (has links) No description available. Computer Science
204	Data scientist : using a competency based approach to explore an emerging role Nosarka, Naseema Banu January 2018 (has links) A research report submitted in partial fulfilment to the Degree of Master of Commerce (Information Systems) in the school of Economic and Business Sciences, University of the Witwatersrand, 2018 / Purpose: The aim in this research study was to explore the role and competencies of Data Scientists in South Africa as the role starts to emerge. Due to the newness of the role, jobs in this sphere are currently being filled by skilled professionals moving from other related areas. Knowledge and skills for Data Scientists were explored in order to examine the role of a Data Scientist and the competencies they should have. Design/methodology/approach: The studies that have been published on the role of a Data Scientist are limited as the field of Data Science is still new. Therefore the design of the research was exploratory and used qualitative methods. Data gathered for this research was analysed using thematic analysis. The study used respondents drawn from the banking and insurance industries as they are amongst the first to employ Data Scientists in the real sense of the term in South Africa. Six Data Scientists were interviewed. Originality/value: Research that focuses on the role of Data Scientists especially in South Africa is limited as most of the research has taken place in developed countries. There is also limited research on the role of a Data Scientist within the banking and insurance industry. This study contributes to practitioner and research knowledge by exploring the emerging role of a Data Scientist in the South African context. Practical implications: This research improves our understanding of the knowledge and skills Data Scientists should have within the banking and insurance industry. This research adds insight by highlighting the role that Data Scientists are currently undertaking by providing information on the specific skills that they report as required. This research can help in the shaping of education and developing the required skills for individuals who intend to pursue the career path of a Data Scientist as well as help managers hire the right people for the position of a Data Scientist. / TL2019 Big data
205	Optimization of Quarry Operations and Maintenance Schedules George, Brennan Kelly 28 June 2023 (has links) New technologies such as the Internet of Things are providing newer insights into the health, performance, and utilization of mining equipment through the collection of real-time data with sensors. In this study, data is utilized from multiple quarries and a surface coal mine collected through the software CAT Productivity and CAT MineStar Edge to analyze the performance of loaders and haul trucks. This data consists of performance metrics such as truck and loader cycle time, payload per loader bucket, total truck payload, truck plan distance, and loader dipper count. This study uses data analysis and machine learning techniques to analyze the performance of loaders and haul trucks in the mining operations used in the scope of this study. Data analysis of cycle time and payload show promising results such that there is an optimum cycle time for multiple loaders between 30-40 seconds that show a high average production. Furthermore, the distribution of production variables is analyzed across each set of loaders to compare the performance. The Caterpillar 992K machine in the rock quarries data set seemed to be the highest-yielding machine while the two Caterpillar 993K machines performed similarly in the surface coal mine data set. The Neural Network algorithm created a model that predicted the loader from the performance metrics with 90.26% accuracy using the CAT Productivity data set, while the Random Forest algorithm achieved a 79.82% accuracy using the CAT MineStar Edge data set. Furthermore, the use of preventative maintenance is investigated in the process of replacing Ground Engaging Tools on loader buckets to determine if maintenance was effective. Additionally, data analysis is applied to Ground Engagement Tools maintenance to identify key preventative maintenance schedules to minimize production impact from equipment downtime and unnecessary maintenance. Production efficiency is compared before and after maintenance on Ground Engaging Tools and concluded that there was no material change in the average production of the mine based on that analysis. The insights gained from this study can inform future research and decision-making and improve operational efficiency. / Master of Science / New technologies are helping us better understand the performance of mining equipment. This is done by using special sensors to collect real-time data on information such as how long it takes for trucks and loaders to perform their job, how much weight in the material they can carry, and how far they have to travel. Through the use of data analysis techniques and machine learning models, the data are analyzed to investigate optimum performance metrics. An optimum time of around 30-40 seconds is discovered for the loaders to output their best performance. We also discovered that through a comparison of normal distributions, some machines in similar working conditions perform much better. In the case of this study, it was found that the Caterpillar 992K loader machine outperformed all the other machines. Using machine learning models, we could accurately predict the loader unit from its data with about 80-90% accuracy. Maintenance practices are analyzed on loader bucket parts that assist in digging to prevent unnecessary maintenance or loss of production. Through analysis of maintenance records and production, it was found that there were no big changes after maintenance was performed. This information can help fuel future research as well as show where improvements can be made. Mining Industry Big Data Optimization Maintenance Machine Learning
206	Data-driven Operations Management: Combining Machine Learning and Optimization for Improved Decision-making / Datengetriebenes Operations Management: Kombination von maschinellem Lernen und Optimierung zur besseren Entscheidungsunterstützung Meller, Jan Maximilian January 2020 (has links) (PDF) This dissertation consists of three independent, self-contained research papers that investigate how state-of-the-art machine learning algorithms can be used in combination with operations management models to consider high dimensional data for improved planning decisions. More specifically, the thesis focuses on the question concerning how the underlying decision support models change structurally and how those changes affect the resulting decision quality. Over the past years, the volume of globally stored data has experienced tremendous growth. Rising market penetration of sensor-equipped production machinery, advanced ways to track user behavior, and the ongoing use of social media lead to large amounts of data on production processes, user behavior, and interactions, as well as condition information about technical gear, all of which can provide valuable information to companies in planning their operations. In the past, two generic concepts have emerged to accomplish this. The first concept, separated estimation and optimization (SEO), uses data to forecast the central inputs (i.e., the demand) of a decision support model. The forecast and a distribution of forecast errors are then used in a subsequent stochastic optimization model to determine optimal decisions. In contrast to this sequential approach, the second generic concept, joint estimation-optimization (JEO), combines the forecasting and optimization step into a single optimization problem. Following this approach, powerful machine learning techniques are employed to approximate highly complex functional relationships and hence relate feature data directly to optimal decisions. The first article, “Machine learning for inventory management: Analyzing two concepts to get from data to decisions”, chapter 2, examines performance differences between implementations of these concepts in a single-period Newsvendor setting. The paper first proposes a novel JEO implementation based on the random forest algorithm to learn optimal decision rules directly from a data set that contains historical sales and auxiliary data. Going forward, we analyze structural properties that lead to these performance differences. Our results show that the JEO implementation achieves significant cost improvements over the SEO approach. These differences are strongly driven by the decision problem’s cost structure and the amount and structure of the remaining forecast uncertainty. The second article, “Prescriptive call center staffing”, chapter 3, applies the logic of integrating data analysis and optimization to a more complex problem class, an employee staffing problem in a call center. We introduce a novel approach to applying the JEO concept that augments historical call volume data with features like the day of the week, the beginning of the month, and national holiday periods. We employ a regression tree to learn the ex-post optimal staffing levels based on similarity structures in the data and then generalize these insights to determine future staffing levels. This approach, relying on only few modeling assumptions, significantly outperforms a state-of-the-art benchmark that uses considerably more model structure and assumptions. The third article, “Data-driven sales force scheduling”, chapter 4, is motivated by the problem of how a company should allocate limited sales resources. We propose a novel approach based on the SEO concept that involves a machine learning model to predict the probability of winning a specific project. We develop a methodology that uses this prediction model to estimate the “uplift”, that is, the incremental value of an additional visit to a particular customer location. To account for the remaining uncertainty at the subsequent optimization stage, we adapt the decision support model in such a way that it can control for the level of trust in the predicted uplifts. This novel policy dominates both a benchmark that relies completely on the uplift information and a robust benchmark that optimizes the sum of potential profits while neglecting any uplift information. The results of this thesis show that decision support models in operations management can be transformed fundamentally by considering additional data and benefit through better decision quality respectively lower mismatch costs. The way how machine learning algorithms can be integrated into these decision support models depends on the complexity and the context of the underlying decision problem. In summary, this dissertation provides an analysis based on three different, specific application scenarios that serve as a foundation for further analyses of employing machine learning for decision support in operations management. / Diese Dissertation besteht aus drei inhaltlich abgeschlossenen Teilen, welche ein gemeinsames Grundthema besitzen: Wie lassen sich neue maschinelle Lernverfahren in Entscheidungsunterstützungsmodelle im Operations Management einbetten, sodass hochdimensionale, planungsrelevante Daten für bessere Entscheidungen berücksichtigt werden können? Ein spezieller Fokus liegt hierbei auf der Fragestellung, wie die zugrunde liegenden Planungsmodelle strukturell angepasst werden müssen und wie sich in Folge dessen die Qualität der Entscheidungen verändert. Die vergangenen Jahre haben ein starkes Wachstum des global erzeugten und zur Verfügung stehenden Datenvolumens gezeigt. Die wachsende Verbreitung von Sensoren in Produktionsmaschinen und technischen Geräten, Möglichkeiten zur Nachverfolgung von Nutzerverhalten sowie die sich verstärkende Nutzung sozialer Medien führen zu einer Fülle von Daten über Produktionsprozesse, Nutzerverhalten und -interaktionen sowie Zustandsdaten und Interaktionen von technischen Geräten. Unternehmen möchten diese Daten nun für unterschiedlichste betriebswirtschaftliche Entscheidungsprobleme nutzen. Hierfür haben sich zwei grundsätzliche Ansätze herauskristallisiert: Im ersten, sequentiellen Verfahren wird zunächst ein Vorhersagemodell erstellt, welches zentrale Einflussgrößen (typischerweise die Nachfrage) vorhersagt. Die Vorhersagen werden dann in einem nachgelagerten Optimierungsproblem verwendet, um unter Berücksichtigung der verbliebenen Vorhersageunsicherheit eine optimale Lösung zu ermitteln. Im Gegensatz zu diesem traditionellen, zweistufigen Vorgehensmodell wurde in den letzten Jahren eine neue Klasse von Planungsmodellen entwickelt, welche Vorhersage und Entscheidungsunterstützung in einem integrierten Optimierungsmodell kombinieren. Hierbei wird die Leistungsfähigkeit maschineller Lernverfahren genutzt, um automatisiert Zusammenhänge zwischen optimalen Entscheidungen und Ausprägungen von bestimmten Kovariaten direkt aus den vorhandenen Daten zu erkennen. Der erste Artikel, “Machine learning for inventory management: Analyzing two concepts to get from data to decisions”, Kapitel 2, beschreibt konkrete Ausprägungen dieser beiden Ansätze basierend auf einem Random Forest Modell für ein Bestandsmanagementszenario. Es wird gezeigt, wie durch die Integration des Optimierungsproblems in die Zielfunktion des Random Forest-Algorithmus die optimale Bestandsmenge direkt aus einem Datensatz bestimmt werden kann. Darüber hinaus wird dieses neue, integrierte Verfahren anhand verschiedener Analysen mit einem äquivalenten klassischen Vorgehen verglichen und untersucht, welche Faktoren Performance-Unterschiede zwischen den Verfahren treiben. Hierbei zeigt sich, dass das integrierte Verfahren signifikante Verbesserungen im Vergleich zum klassischen, sequentiellen, Verfahren erzielt. Ein wichtiger Einflussfaktor auf diese Performance-Unterschiede ist hierbei die Struktur der Vorhersagefehler beim sequentiellen Verfahren. Der Artikel “Prescriptive call center staffing”, Kapitel 3, überträgt die Logik, optimale Planungsentscheidungen durch integrierte Datenanalyse und Optimierung zu bestimmen, auf eine komplexere Problemklasse, die Schichtplanung von Mitarbeitern. Da die höhere Komplexität eine direkte Integration des Optimierungsproblems in das maschinelle Lernverfahren nicht erlaubt, wird in dem Artikel ein Datenvorverarbeitungsverfahren entwickelt, mit dessen Hilfe die Eingangsdaten mit den ex post-optimalen Entscheidungen angereichert werden. Durch die Vorverarbeitung kann dann eine angepasste Variante des Regression Tree Lernverfahrens diesen Datensatz nutzen, um optimale Entscheidungen zu lernen. Dieses Verfahren, welches mit sehr wenigen und schwachen Modellierungsannahmen bezüglich des zugrunde liegenden Problems auskommt, führt zu deutlich geringeren Kosten durch Fehlplanungen als ein konkurrierendes Verfahren mit mehr Modellstruktur und -annahmen. Dem dritten Artikel, “Data-driven sales force scheduling”, Kapitel 4, liegt ein noch komplexeres Planungsproblem, die Tourenplanung von Außendienstmitarbeitern, zugrunde. Anhand eines konkreten Anwendungsszenarios bei einem Farben- und Lackhersteller beschreibt der Artikel, wie maschinelle Lernverfahren auch bei Einsatz im traditionellen, sequentiellen Ansatz als reine Vorhersagemodelle die nachgelagerten Entscheidungsmodelle verändern können. In diesem Fall wird ein Entscheidungsbaum-basiertes Lernverfahren in einem neuartigen Ansatz verwendet, um den Wert eines Besuchs bei einem potentiellen Kunden abzuschätzen. Diese Informationen werden dann in einem Optimierungsmodell, welches die verbliebene Unsicherheit der Vorhersagen berücksichtigen kann, zur Routenplanung verwendet. Es wird ersichtlich, dass Daten und fortschrittliche Analyseverfahren hier den Einsatz von neuen Optimierungsmodellen erlauben, welche vorher mangels zuverlässiger Schätzung von wichtigen Eingangsfaktoren nicht nutzbar waren. Die in dieser Dissertation erarbeiteten Ergebnisse belegen, dass betriebswirtschaftliche Planungsmodelle durch die Berücksichtigung neuer Daten und Analysemethoden fundamental verändert werden und davon in Form von besserer Entscheidungsqualität bzw. niedrigerer Kosten durch Fehlplanungen profitieren. Die Art und Weise, wie maschinelle Lernverfahren zur Datenanalyse eingebettet werden können, hängt hierbei von der Komplexität sowie der konkreten Rahmenparameter des zu Grunde liegenden Entscheidungsproblems ab. Zusammenfassend stellt diese Dissertation eine Analyse basierend auf drei unterschiedlichen, konkreten Anwendungsfällen dar und bildet damit die Grundlage für weitergehende Untersuchungen zum Einsatz von maschinellen Lernverfahren bei der Entscheidungsunterstützung für betriebswirtschaftliche Planungsprobleme. Operations Management Maschinelles Lernen Entscheidungsunterstützung Big Data ddc:338
207	Data-driven Operations Management: From Predictive to Prescriptive Analytics / Datenbasiertes Operations Management: Von prädiktiven zu präskriptiven Verfahren Taigel, Fabian Michael January 2020 (has links) (PDF) Autonomous cars and artificial intelligence that beats humans in Jeopardy or Go are glamorous examples of the so-called Second Machine Age that involves the automation of cognitive tasks [Brynjolfsson and McAfee, 2014]. However, the larger impact in terms of increasing the efficiency of industry and the productivity of society might come from computers that improve or take over business decisions by using large amounts of available data. This impact may even exceed that of the First Machine Age, the industrial revolution that started with James Watt’s invention of an efficient steam engine in the late eighteenth century. Indeed, the prevalent phrase that calls data “the new oil” indicates the growing awareness of data’s importance. However, many companies, especially those in the manufacturing and traditional service industries, still struggle to increase productivity using the vast amounts of data [for Economic Co-operation and Development, 2018]. One reason for this struggle is that companies stick with a traditional way of using data for decision support in operations management that is not well suited to automated decision-making. In traditional inventory and capacity management, some data – typically just historical demand data – is used to estimate a model that makes predictions about uncertain planning parameters, such as customer demand. The planner then has two tasks: to adjust the prediction with respect to additional information that was not part of the data but still might influence demand and to take the remaining uncertainty into account and determine a safety buffer based on the underage and overage costs. In the best case, the planner determines the safety buffer based on an optimization model that takes the costs and the distribution of historical forecast errors into account; however, these decisions are usually based on a planner’s experience and intuition, rather than on solid data analysis. This two-step approach is referred to as separated estimation and optimization (SEO). With SEO, using more data and better models for making the predictions would improve only the first step, which would still improve decisions but would not automize (and, hence, revolutionize) decision-making. Using SEO is like using a stronger horse to pull the plow: one still has to walk behind. The real potential for increasing productivity lies in moving from predictive to prescriptive approaches, that is, from the two-step SEO approach, which uses predictive models in the estimation step, to a prescriptive approach, which integrates the optimization problem with the estimation of a model that then provides a direct functional relationship between the data and the decision. Following Akcay et al. [2011], we refer to this integrated approach as joint estimation-optimization (JEO). JEO approaches prescribe decisions, so they can automate the decision-making process. Just as the steam engine replaced manual work, JEO approaches replace cognitive work. The overarching objective of this dissertation is to analyze, develop, and evaluate new ways for how data can be used in making planning decisions in operations management to unlock the potential for increasing productivity. In doing so, the thesis comprises five self-contained research articles that forge the bridge from predictive to prescriptive approaches. While the first article focuses on how sensitive data like condition data from machinery can be used to make predictions of spare-parts demand, the remaining articles introduce, analyze, and discuss prescriptive approaches to inventory and capacity management. All five articles consider approach that use machine learning and data in innovative ways to improve current approaches to solving inventory or capacity management problems. The articles show that, by moving from predictive to prescriptive approaches, we can improve data-driven operations management in two ways: by making decisions more accurate and by automating decision-making. Thus, this dissertation provides examples of how digitization and the Second Machine Age can change decision-making in companies to increase efficiency and productivity. / Diese Dissertation besteht aus fünf inhaltlich abgeschlossenen Teilen, die ein übergeordnetes Thema zur Grundlage haben: Wie können Daten genutzt werden, um bessere Bestands- und Kapazitätsplanung zu ermöglichen? Durch die zunehmende Digitalisierung stehen in verschiedensten Wirtschaftsbereichen mehr und mehr Daten zur Verfügung, die zur besseren Planung der Betriebsabläufe genutzt werden können. Historische Nachfragedaten, Sensordaten, Preisinformationen und Daten zu Werbemaßnahmen, sowie frei verfügbare Daten wie z.B. Wettervorhersagen, Daten zu Schulferien, regionalen Events, Daten aus den Sozialen Medien oder anderen Quellen enthalten potentiell relevante Informationen, werden aber häufig noch nicht zur Entscheidungsunterstützung genutzt. Im ersten Artikel, ”Privacy-preserving condition-based forecasting using machine learning”, wird aufgezeigt, wie sensitive Zustandsdaten zur Nachfragevorhersage von Ersatzteilbedarfen nutzbar gemacht werden können. Es wird ein Modell entwickelt, das es erlaubt, Vorhersagen auf verschlüsselten Zustandsdaten zu erstellen. Dies ist z.B. in der Luftfahrt relevant, wo Dienstleister für die Wartung und Ersatzteilversorgung von Flugzeugen verschiedener Airlines zuständig sind. Da die Airlines befürchten, dass Wettbewerber an sensitive Echtzeitdaten gelangen können, werden diese Daten dem Wartungsdienstleister nicht im Klartext zur Verfügung gestellt. Die Ergebnisse des implementierten Prototyps zeigen, dass eine schnelle Auswertung maschineller Lernverfahren auch auf großen Datenmengen, die verschlüsselt in einer SAP HANA Datenbank gespeichert sind, möglich ist. Die Artikel zwei und drei behandeln innovative, datengetriebene Ansätze zur Bestandsplanung. Der zweite Artikel ”Machine learning for inventory management: “Analyzing two concepts to get from data to decisions” analysiert zwei Ansätze, die Konzepte des maschinellen Lernens nutzen um aus historischen Daten Bestandsentscheidungen zu lernen. Im dritten Artikel, ”Machine learning for inventory management: Analyzing two concepts to get from data to decisions”, wird ein neues Modell zur integrierten Bestandsoptimierung entwickelt und mit einem Referenzmodell verglichen, bei dem die Schätzung eines Vorhersagemodells und die Optimierung der Bestandsentscheidung separiert sind. Der wesentliche Beitrag zur Forschung ist hierbei die Erkenntnis, dass unter bestimmten Bedingungen der integrierte Ansatz klar bessere Ergebnisse liefert und so Kosten durch Unter- bzw. Überbestände deutlich gesenkt werden können. In den Artikeln vier und fünf werden neue datengetriebene Ansätze zur Kapazitätsplanung vorgestellt und umfassend analysiert. Im vierten Artikel ”Datadriven capacity management with machine learning: A new approach and a case-study for a public service office wird ein datengetriebenes Verfahren zur Kapazitätsplanung eingeführt und auf das Planungsproblem in einem Bürgeramt angewandt. Das Besondere hierbei ist, dass die spezifische Zielfunktion (maximal 20% der Kunden sollen länger als 20 Minuten warten müssen) direkt in ein maschinelles Lernverfahren integriert wird, womit dann ein Entscheidungsmodell aus historischen Daten gelernt werden kann. Hierbei wird gezeigt, dass mit dem integrierten Ansatz die Häufigkeit langer Wartezeiten bei gleichem Ressourceneinsatz deutlich reduziert werden kann. Im fünften Artikel, ”Prescriptive call center staffing”, wird ein Modell zur integrierten Kapazitätsoptimierung für ein Call Center entwickelt. Hier besteht die Innovation darin, dass die spezifische Kostenfunktion eines Call Centers in ein maschinelles Lernverfahren integriert wird. Die Ergebnisse für Daten von zwei Call Centern zeigen, dass mit dem neuentwickelten Verfahren, die Kosten im Vergleich zu dem gängigen Referenzmodell aus der Literatur deutlich gesenkt werden können. Maschinelles Lernen Big Data Bestandsplanung Kapazitätsplanung ddc:338
208	Data as a production factor: A model to measure the value of big data through business process management Zipf, Torsten 04 July 2022 (has links) Big Data has been among the most innovative topics in literature sources and among organizations for years. Even though only few organizations realized the significant value potentials described by contemporary literature sources, it is widely acknowledged that data assets can provide significant competitive benefits. Given the promises regarding value increases and competitiveness, practitioners as well as academia desire systematic approaches to transform the data sets into measurable assets. This dissertation investigates the current state of literature, conducts an empirical investigation through a structural equation modeling and applies existing theory to develop a model that allows organizations to apply a systematic approach to measure the value of Big Data specifically to their organization. With Business Process Management as the foundation of the model, IT as well as business functions will be able to successfully apply the model. Based on the assumption that Data is acknowledged as a production factor, the developed model supports organizations to justify Big Data investment decisions and thereby to contribute to competitiveness and company value. Furthermore, the findings and the model equip future researchers with a framework that can be adapted for industry-specific purposes, validated in different organizational contexts or dismantled to investigate specific success factors. info:eu-repo/classification/ddc/330 ddc:330
209	Socio-Geographical Mobilities : A Study of Compulsory School Students’ Mobilities within Metropolitan Stockholm’s Deregulated School Market Wahls, Rina January 2022 (has links) The Swedish educational reforms of the 1990s introduced a choice- and voucher-based system, which allowed students to choose schools regardless of their proximity to them. As a consequence, new opportunities for geographical disparities in educational provisions as well as in home-to- school mobilities have emerged. The following thesis addresses this development by focusing on compulsory school (grade 9) students’ home-to-school mobility patterns. More specifically, a Bourdieusian lens is applied to understand mobility in terms of both physical and social space. In contrast to the Bourdieusian tradition, articulations between social and physical space are operationalized by constructing individually defined, scalable neighbourhoods. The software EquiPop is used to compute neighbourhood context neighbours in the municipality of Stockholm (n = 779 079) using the k-nearest neighbour algorithm (k = 1 600). A k-means cluster analysis is applied to construct income-based neighbourhood types. On this basis, this thesis asks about the localizations and positions of schools and students as well as about the mobility patterns and predictors of students residing in low-income, and thus economic capital deprived, neighbourhoods (n = 2 346). Utilizing register data, the study finds an unequal distribution of educational provisions in relation to different providers, i.e. municipal schools and independent schools, as well as different school types. Furthermore, the results indicate that students from low-income neighbourhoods are unequally mobilized dependent on migration background and the educational background of mothers. Moreover, independent schools have been found to be a attractive alternative for students from low-income neighbourhoods. / Research project "On the outskirt of the school market" by Håkan Forsberg school market big data mobility k-nearest neighbour Sociology Sociologi
210	Online aggregate tables : A method forimplementing big data analysis in PostgreSQLusing real time pre-calculations / Realtidsaggregerade tabeller : En metod för analys av stora datamängder i PostgreSQL med hjälp av realtidsuppdaterade förberäkningar Bergmark, Fabian January 2017 (has links) In modern user-centric applications, data gathering and analysis is often of vitalimportance. Current trends in data management software show that traditionalrelational databases fail to keep up with the growing data sets. Outsourcingdata analysis often means data is locked in with a particular service, makingtransitions between analysis systems nearly impossible. This thesis implementsand evaluates a data analysis framework implemented completely within a re-lational database. The framework provides a structure for implementations ofonline algorithms of analytical methods to store precomputed results. The re-sult is an even resource utilization with predictable performance that does notdecrease over time. The system keeps all raw data gathered to allow for futureexportation. A full implementation of the framework is tested based on thecurrent analysis requirements of the company Shortcut Labs, and performancemeasurements show no problem with managing data sets of over a billion datapoints. / I moderna användarcentrerade applikationer är insamling och analys av dataofta av affärskritisk vikt. Traditionalla relationsdatabaser har svårt att hanterade ökande datamängderna. Samtidigt medför användning av externa tjänster fördataanalys ofta inlåsning av data, vilket försvårar byte av analystjänst. Dennarapport presenterar och utvärderar ett ramverk för dataanalys som är imple-menterat i en relationsdatabas. Ramverket tillhandahåller strukturer för attförberäkna resultat för analytiska beräkningar på ett effektivt sätt. Resultatetblir en jämn resursanvändning med förutsägbar prestanda som inte försämrasöver tid. Ramverket sparar även all insamlad data vilket möjliggör exporter-ing. Ramverket utvärderas hos företaget Shortcut Labs och resultatet visar attramverket klarar av datamängder på över en miljard punkter. big data aggregation real-time PostgreSQL Computer Sciences Datavetenskap (datalogi)

Search results