Global ETD Search

91	Streaming Predictive Analytics on Apache Flink Beligianni, Foteini January 2015 (has links) Data analysis and predictive analytics today are driven by large scale dis- tributed deployments of complex pipelines, guiding data cleaning, model training and evaluation. A wide range of systems and tools provide the basic abstractions for building such complex pipelines for offline data processing, however, there is an increasing demand for providing support for incremental models over unbounded streaming data. In this work, we focus on the prob- lem of modelling such a pipeline framework and providing algorithms that build on top of basic abstractions, fundamental to stream processing. We design a streaming machine learning pipeline as a series of stages such as model building, concept drift detection and continuous evaluation. We build our prototype on Apache Flink, a distributed data processing system with streaming capabilities along with a state-of-the-art implementation of a varia- tion of Vertical Hoeffding Tree (VHT), a distributed decision tree classification algorithm as a proof of concept. Furthermore, we compare our version of VHT with the current state-of- the-art implementations on distributed data processing systems in terms of performance and accuracy. Our experimental results on real-world data sets show significant performance benefits of our pipeline while maintaining low classification error. We believe that this pipeline framework can offer a good baseline for a full-fledged implementation of various streaming algorithms that can work in parallel. / Dataanalys och predictive analytics drivs idag av storskaliga distribuerade distributioner av komplexa pipelines, guiding data cleaning, model training och utvärdering. Ett brett utbud av system och verktyg ger endast grundläggande abstractions (struktur) för att bygga sådana komplexa pipelines för databehandling i off-line läge, men det finns en ökande efterfrågan att tillhandahålla stöd för stegvis modell över unbounded streaming data. I detta arbete fokuserar vi på problemet med modellering som ramverket för pipeline och ger algoritmer som bygger på grundläggande abstraktioner för stream processing. Vi konstruerar en streaming maskininlärnings pipeline som innehåller steg som model building, concept drift detection och kontinuerlig utvärdering. Vi bygger vår prototyp på Apache Flink, ett distribuerat databehandlingssystem med strömnings kapacitet tillsammans med den bästa tillgängliga implementation av en Vertical Hoeffding Tree (VHT) variant och ett distribuerat beslutsträd algoritm som koncepttest. Dessutom jämför vi vår version av VHT med den senaste tekniken inom destributed data processing systems i termer av prestanda och precision. Vårt experimentella resultaten visar betydande fördelarna med vår pipeline och samtidigt bibehållen låg klassificerat felet. Vi anser att detta ramverk kan erbjuda en bra utgångspunkt vid genomförandet av olika streaming algoritmer som kan arbeta parallellt. analytics streaming Engineering and Technology Teknik och teknologier
92	The Application of Classification Trees to Pharmacy School Admissions Karpen, Samuel C., Ellis, Steve C. 01 September 2018 (has links) In recent years, the American Association of Colleges of Pharmacy (AACP) has encouraged the application of big data analytic techniques to pharmaceutical education. Indeed, the 2013-2014 Academic Affairs Committee Report included a "Learning Analytics in Pharmacy Education" section that reviewed the potential benefits of adopting big data techniques.1 Likewise, the 2014-2015 Argus Commission Report discussed uses for big data analytics in the classroom, practice, and admissions.2 While both of these reports were thorough, neither discussed specific analytic techniques. Consequently, this commentary will introduce classification trees, with a particular emphasis on their use in admission. With electronic applications, pharmacy schools and colleges now have access to detailed applicant records containing thousands of observations. With declining applications nationwide, admissions analytics may be more important than ever.3. admissions decision tree predictive analytics Pharmacy Practice
93	A decision support system for sugarcane irrigation supply and demand management Patel, Zubair January 2017 (has links) Commercial sugarcane farming requires large quantities of water to be delivered to the fields. Ideal irrigation schedules are produced indicating how much water to be supplied to fields considering multiple objectives in the farming process. Software packages do not fully account for the fact that the ideal irrigation schedule may not be met due to limitations in the water distribution network. This dissertation proposes the use of mathematical modelling to better understand water supply and demand management on a commercial sugarcane farm. Due to the complex nature of water stress on sugarcane, non-linearities occur in the model. A piecewise linear approximation is used to handle the non-linearity in the water allocation model and is solved in a commercial optimisation software package. A test data set is first used to exercise and evaluate the model performance, then to illustrate the practical applicability of the model, a commercial sized data set is used and analysed. Statistical Sciences Advanced Analytics And Decision Sciences
94	A recommender system for e-retail Walwyn, Thomas January 2016 (has links) The e-retail sector in South Africa has a significant opportunity to capture a large portion of the country's retail industry. Central to seizing this opportunity is leveraging the advantages that the online setting affords. In particular, the e-retailer can offer an extremely large catalogue of products; far beyond what a traditional retailer is capable of supporting. However, as the catalogue grows, it becomes increasingly difficult for a customer to efficiently discover desirable products. As a consequence, it is important for the e-retailer to develop tools that automatically explore the catalogue for the customer. In this dissertation, we develop a recommender system (RS), whose purpose is to provide suggestions for products that are most likely of interest to a particular customer. There are two primary contributions of this dissertation. First, we describe a set of six characteristics that all effective RS's should possess, namely; accuracy, responsiveness, durability, scalability, model management, and extensibility. Second, we develop an RS that is capable of serving recommendations in an actual e-retail environment. The design of the RS is an attempt to embody the characteristics mentioned above. In addition, to show how the RS supports model selection, we present a proof-of-concept experiment comparing two popular methods for generating recommendations that we implement for this dissertation, namely, implicit matrix factorisation (IMF) and Bayesian personalised ranking (BPR). Statistical Sciences Advanced Analytics And Decision Sciences
95	Enhanced minimum variance optimisation: a pragmatic approach Lakhoo, Lala Bernisha Janti January 2016 (has links) Since the establishment of Markowitz's theory, numerous studies have been carried out over the past six decades or so that cover the benefits, limitations, modifications and enhancements of Mean Variance (MV) optimisation. This study endeavours to extend on this, by means of adding factors to the minimum variance framework, which would increase the likelihood of outperforming both the market and the minimum variance portfolio (MVP). An analysis of the impact of these factor tilts on the MVP is carried out in the South African environment, represented by the FTSE-JSE Shareholder weighted Index as the benchmark portfolio. The main objective is to examine if the systematic and robust methods employed, which involve the incorporation of factor tilts into the multicriteria problem, together with covariance shrinkage – improve the performance of the MVP. The factor tilts examined include Active Distance, Concentration and Volume. Additionally, the constant correlation model is employed in the estimation of the shrinkage intensity, structured covariance target and shrinkage estimator. The results of this study showed that with specific levels of factor tilting, one can generally improve both absolute and risk-adjusted performance and lower concentration levels in comparison to both the MVP and benchmark. Additionally, lower turnover levels were observed across all tilted portfolios, relative to the MVP. Furthermore, covariance shrinkage enhanced all portfolio statistics examined, but significant improvement was noted on drawdown levels, capture ratios and risk. This is in contrast to the results obtained when the standard sample covariance matrix was employed. Statistical Sciences Advanced Analytics And Decision Sciences
96	Visual Analytics and Interactive Machine Learning for Human Brain Data Li, Huang 08 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / This study mainly focuses on applying visualization techniques on human brain data for data exploration, quality control, and hypothesis discovery. It mainly consists of two parts: multi-modal data visualization and interactive machine learning. For multi-modal data visualization, a major challenge is how to integrate structural, functional and connectivity data to form a comprehensive visual context. We develop a new integrated visualization solution for brain imaging data by combining scientific and information visualization techniques within the context of the same anatomic structure. For interactive machine learning, we propose a new visual analytics approach to interactive machine learning. In this approach, multi-dimensional data visualization techniques are employed to facilitate user interactions with the machine learning process. This allows dynamic user feedback in different forms, such as data selection, data labeling, and data correction, to enhance the efficiency of model building. Visual analytics Visualization Machine learning Neural network
97	Iterative Visual Analytics and its Applications in Bioinformatics You, Qian 20 March 2012 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / You, Qian. Ph.D., Purdue University, December, 2010. Iterative Visual Analytics and its Applications in Bioinformatics. Major Professors: Shiaofen Fang and Luo Si. Visual Analytics is a new and developing field that addresses the challenges of knowledge discoveries from the massive amount of available data. It facilitates humans‘ reasoning capabilities with interactive visual interfaces for exploratory data analysis tasks, where automatic data mining methods fall short due to the lack of the pre-defined objective functions. Analyzing the large volume of data sets for biological discoveries raises similar challenges. The domain knowledge of biologists and bioinformaticians is critical in the hypothesis-driven discovery tasks. Yet developing visual analytics frameworks for bioinformatic applications is still in its infancy. In this dissertation, we propose a general visual analytics framework – Iterative Visual Analytics (IVA) – to address some of the challenges in the current research. The framework consists of three progressive steps to explore data sets with the increased complexity: Terrain Surface Multi-dimensional Data Visualization, a new multi-dimensional technique that highlights the global patterns from the profile of a large scale network. It can lead users‘ attention to characteristic regions for discovering otherwise hidden knowledge; Correlative Multi-level Terrain Surface Visualization, a new visual platform that provides the overview and boosts the major signals of the numeric correlations among nodes in interconnected networks of different contexts. It enables users to gain critical insights and perform data analytical tasks in the context of multiple correlated networks; and the Iterative Visual Refinement Model, an innovative process that treats users‘ perceptions as the objective functions, and guides the users to form the optimal hypothesis by improving the desired visual patterns. It is a formalized model for interactive explorations to converge to optimal solutions. We also showcase our approach with bio-molecular data sets and demonstrate its effectiveness in several biomarker discovery applications. Visual Analyltics, Bioinformatics Visual analytics Bioinformatics
98	Studies on using data-driven decision support systems to improve personalized medicine processes Cameron, Kellas Ross 30 June 2018 (has links) This dissertation looks at how new sources of information should be incorporated into medical decision-making processes to improve patient outcomes and reduce costs. There are three fundamental challenges that must be overcome to effectively use personalized medicine, we need to understand: 1) how best to appropriately designate which patients will receive the greatest value from these processes; 2) how physicians and caregivers interpret additional patient-specific information and how that affects their decision-making processes; and finally, (3) how to account for a patient’s ability to engage in their own healthcare decisions. The first study looks at how we can infer which patients will receive the most value from genomic testing. The difficult statistical problem is how to separate the distribution of patients, based on ex-ante factors, to identify the best candidates for personalized testing. A model was constructed to infer a healthcare provider’s decision on whether this test would provide beneficial information in selecting a patient’s medication. Model analysis shows that healthcare providers’ primary focus is to maximize patient health outcomes while considering the impact the patient’s economic welfare. The second study focuses on understanding how technology-enabled continuity of care (TECC) for Chronic Obstructive Pulmonary Disease (COPD) and Congestive Heart Failure (CHF) patients can be utilized to improve patient engagement, measured in terms of patient activation. We shed light on the fact that different types of patients garnered different levels of value from the use of TECC. The third study looks at how data-driven decision support systems can allow physicians to more accurately understand which patients are at high-risk of readmission. We look at how we can use available patient-specific information for patients admitted with CHF to more accurately identify which patients are most likely to be readmitted, and also why – whether for condition-related reasons versus for non- related reasons, allowing physicians to suggest different patient-specific readmission prevention strategies. Taken together, these three studies allow us to build a robust theory to tackle these challenges, both operational and policy-related, that need to be addressed for physicians to take advantage of the growing availability of patient-specific information to improve personalized medication processes. Management Personalized medicine Predictive analytics Process improvement
99	Increasing analytics maturity by establishing analytics networks and spreading the use of Lean Six Sigma : A case study of a global B2B company SVANTESSON ROMANOV, VIKTOR, GULLQVIST, IDA January 2016 (has links) Organisations with high-performing data and analytics capabilities are more successful than organisations with lower analytics maturity. It is therefore necessary for organisations to assess their analytics capabilities and needs in order to identify and evaluate areas of improvement that need to be addressed. This was the purpose of this case study conducted on a region in a global B2B organisation, which has a centrally established analytics function on corporate level, wanting the use of analytics to be integrated in more of the region’s processes and analytical capabilities and resources being used as efficient as possible.To fulfil the thesis purpose, empirical data was collected through qualitative interviews with employees on corporate level, more quantitative interviews with regional employees and a questionnaire issued to regional employees. This was complemented with a thorough literature study which provided the analytics maturity models used for identifying the current capabilities on a holistic level of the region, as well as analytics setups, Lean Six Sigma and Knowledge Management. Results show a relatively low analytics maturity due to e.g. insufficient support from management, unclear responsibility of analytics, data not being used correctly or requested enough and various issues with competence, tools and sources.This study contributes to analytics research by identifying that analytics maturity models available free of charge only are good for inspiration and not full use when used in a large company. Furthermore, the study shows that complexities arise when having a central analytics function with low analytics maturity while other parts of the company face analytics problems but no indications are given on who and what to proceed on or not. This study therefore results in contributing with a proposition for companies wanting to increase its analytics maturity that this could be facilitated by establishing networks for analytics. Combining literature and empirics show that networks enable investigation of the analytics situation while at the same time enabling increased sharing, collaboration, innovation, coordination and dissemination. By making Lean Six Sigma a central part of the network analytics will be used more and better while at the same time increasing the success-rate of change and improvements projects. Data and Analytics maturity models analytics setups analytics networks knowledge management and collaboration Lean Six Sigma Economics and Business Ekonomi och näringsliv
100	Facilitating Student Achievement of Intended Learning Outcomes in Higher Education : Development and Evaluation of a Learning Analytics Dashboard / Främjande av Studenters Uppnåpende av Kursmål i Högre Utbildning : Utveckling och Utvärdering av en Learning Analytics Dashboard Buvari, Sebastian January 2023 (has links) With the continued digitization of higher education, students’ ability to selfregulate their studies in online and blended learning settings has become critical for their academic success. Goal-setting strategies are an important aspect of self-regulated learning which universities aim to support through the implementation of the intended learning outcomes (ILOs) of courses and programs. These act as a promise for students of the knowledge and skills which they are expected to acquire. However, students often perceive an absence of clear connection between ILOs and course assignments that creates a disconnect between students’ course progression and their progression toward course ILOs. To assist students in this task, a student-facing learning analytics dashboard (LAD) allowing students to track and plan their learning progress toward the achievement of the selected course’s ILOs has been developed and evaluated in the context of STEM higher education. The LAD was developed using a participatory design approach combined with design science research methodology. Thirty-seven students contributed to the design of the dashboard through a F2F workshop and later a student feedback session in Spring 2023. The tool was evaluated through five semi-structured interviews informed by the Technology Acceptance Model. The results show students having a behavioral intention to use the dashboard in their everyday university studies. The thesis contributes with a LAD focused on student ILO achievement and task-interest. / Med den fortsatta digitaliseringen av utbildning på högre nivå så har studenters förmåga att självreglera sina studier i online och flerformsundervisning blivit kritisk för deras akademiska framgång. Målsättningstrategier är en viktig aspekt av självreglerad inlärning, universitet siktar på att ge support till studenters målsättningsstrategier genom implementering av kursmål. Dessa agerar som ett löfte till studenter om de kunskaper och förmågor de förväntas uppnått när en kurs är klar. Studenter upplever däremot en brist på tydliga kopplingar mellan kursprogression och progression mot kursmålen. För att hjälpa förtydliga dessa kopplingar till studenter, så har en student riktad learning analytic dashboard (LAD), som tillåter studenter att följa och planera sin progression mot kursmålen, utvecklats och utvärderats i kontexten av STEM högre utbildning. LAD:n utvecklades med användning av en deltagande designstrategi kombinerat med design science research metodik. Trettiosju studenter bidrog till designen av dashboarden genom att ge feedback via en ansikte-till-ansikte verkstad och en gruppfeedbacksession under våren 2023. Implementationen evaluerades sedan genom fem semistrukturerade intervjuer baserade på Technology Acceptance modellen. Resultatet implicerar att studenter har en beetendemässig avsikt att använda verktyget i sina dag-för-dag universitetsstudier. Avhandlingen avslutas med diskussion om forskningsrelevans och framtida relaterade arbeten. Avhandlingen bidrar med en LAD fokuserad på studenters uppnående av kursmål och uppgiftintresse Learning Analytics Learning Analytics Dashboard Goal achievement Higher education Self-regulated Learning Learning Analytics Learning Analytics Dashboard Målprestation Högre Utbildning Självreglerad inlärning Computer Sciences Datavetenskap (datalogi)

Search results