Global ETD Search

231	Natural Language Processing of Stories Rittichier, Kaley J. 05 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / In this thesis, I deal with the task of computationally processing stories with a focus on multidisciplinary ends, specifically in Digital Humanities and Cultural Analytics. In the process, I collect, clean, investigate, and predict from two datasets. The first is a dataset of 2,302 open-source literary works categorized by the time period they are set in. These works were all collected from Project Gutenberg. The classification of the time period in which the work is set was discovered by collecting and inspecting Library of Congress subject classifications, Wikipedia Categories, and literary factsheets from SparkNotes. The second is a dataset of 6,991 open-source literary works categorized by the hierarchical location the work is set in; these labels were constructed from Library of Congress subject classifications and SparkNotes factsheets. These datasets are the first of their kind and can help move forward an understanding of 1) the presentation of settings in stories and 2) the effect the settings have on our understanding of the stories. Natural Language Processing Stories Story Setting Digital Humanities Cultural Analytics
232	Real-Time Feedback for In-Class Introductory Computer Programming Exercises Sellers, Ariana Dawn 01 June 2018 (has links) Computer programming is a difficult subject to master. Introductory programming courses often have low retention and high failure rates. Part of the problem is identifying if students understand the lecture material. In a traditional classroom, a professor can gauge a class's understanding on questions asked during lecture. However, many struggling students are unlikely to speak up in class. To address this problem, recent research has focused on gathering compiler data from programming exercises to identify at-risk students in these courses. These data allow professors to intervene with individual students who are at risk and, after analyzing the data for a given time period, a professor can also re-evaluate how certain topics are taught to improve understanding. However, current implementations do not provide information in real time. They may improve a professor's teaching long term, but they do not provide insight into how an individual student is understanding a specific topic during the lecture in time for the professor to make adjustments.This research explores a system that combines compiler data analytics with in-class exercises. The system incorporates the in-class exercise into a web-based text editor with data analytics. While the students are programming in their own browsers, the website analyzes their compiler errors and console output to determine where the students are struggling. A real-time summary is presented to the professor during the lecture. This system allows a professor to receive immediate feedback on student understanding, which enables him/her to clarify areas of confusion immediately. As a result, this dynamic learning environment allows course material to better evolve to meet the needs of the students.Results show that students in a simulated programming course performed slightly better on quizzes when the instructor had access to real-time feedback during a programming exercise. Instructors were able to determine what students were struggling with from the real-time feedback. Overall, both the student and instructor test subjects found the experimental website useful.Case studies performed in an actual programming lecture allowed the professor to address errors that are not considered in the curriculum of the course. Many students appreciated the fact that the professor was able to immediately answer questions based on the feedback. Students primarily had issues with the bugs present in the alpha version of the software. engineering education programming education learning analytics data visualization Engineering
233	Integrated Real-Time Social Media Sentiment Analysis Service Using a Big Data Analytic Ecosystem Aring, Danielle C. 15 May 2017 (has links) No description available. Computer Science
234	Avatar Playing Style : From analysis of football data to recognizable playing styles Edberger Persson, Jakob, Danielsson, Emil January 2022 (has links) Football analytics is a rapid growing area which utilizes conventional data analysis and computational methods on gathered data from football matches. The results emerging out of this can give insights of performance levels when it comes to individual football players, different teams and clubs. A difficulty football analytics struggles with daily is to translate the analysis results into actual football qualities and knowledge which the wider public can understand. In this master thesis we therefore take on the ball event data collected from football matches and develop a model which classifies individual football player’s playing styles, where the playing styles are well known among football followers. This is carried out by first detecting the playing positions: ’Strikers’, ’Central midfielders’, ’Outer wingers’, ’Full backs’, ’Centre backs’ and ’Goalkeepers’ using K-Means clustering, with an accuracy of 0.89 (for Premier league 2021/2022) and 0.84 (for Allsvenskan 2021). Secondly, we create a simplified binary model which only classifies the player’s playing style as "Offensive"/"Defensive". From the bad results of this model we show that there exist more than just these two playing styles. Finally, we use an unsupervised modelling approach where Principal component analysis (PCA) is applied in an iterative manner. For the playing position ’Striker’ we find the playing styles: ’The Target’, ’The Artist’, ’The Poacher’ and ’The Worker’ which, when comparing with a created validation data set, give a total accuracy of 0.79 (best of all positions and the only one covered in detail in the report due to delimitations). The playing styles can, for each player, be presented visually where it is seen how well a particular player fits into the different playing styles. Ultimately, the results in the master thesis indicates that it is easier to find playing styles which have clear and obvious on-the-ball-actions that distinguish them from other players within their respective position. Such playing styles, easier to find, are for example "The Poacher" and "The Target", while harder to find playing styles are for example " The Box-to-box" and "The Inverted". Finally, conclusions are that the results will come to good use and the goals of the thesis are met, although there still exist a lot of improvements and future work which can be made. Developed models can be found in a simplified form on the GitHub repository: https://github.com/Sommarro-Devs/avatar-playing-style. The report can be read stand-alone, but parts of it are highly connected to the models and code in the GitHub repository. data football analytics ai ml football sports pca clustering python sports analytics data analytics soccer analytics playing style data analysis Computer and Information Sciences Data- och informationsvetenskap Sport and Fitness Sciences Idrottsvetenskap
235	Understanding the role of visual analytics for software testing Eriksson, Nikolas, Örneholm, Max January 2021 (has links) Software development is constantly evolving. This produces a lot of opportunities, but also confusion about what the best practices are. A rather unexplored area within software development is visual analytics for software testing. The goal of this thesis is to get an understanding of what role visual analytics can have within software testing. In this thesis, a literature review was used to gather information about analytical needs, tools, and other vital information about the subject. A survey towards practitioners was used to get information about the industry, the survey had questions about visual analytics, visualizations, and their potential roles. We conclude that visual analytics of software testing results does have a role in software testing, mainly in a faster understanding of test results, the ability to produce big picture overviews, and supporting decision making. Visualization visual analytics software testing Software Engineering Programvaruteknik
236	Bayesian Visual Analytics: Interactive Visualization for High Dimensional Data Han, Chao 07 December 2012 (has links) In light of advancements made in data collection techniques over the past two decades, data mining has become common practice to summarize large, high dimensional datasets, in hopes of discovering noteworthy data structures. However, one concern is that most data mining approaches rely upon strict criteria that may mask information in data that analysts may find useful. We propose a new approach called Bayesian Visual Analytics (BaVA) which merges Bayesian Statistics with Visual Analytics to address this concern. The BaVA framework enables experts to interact with the data and the feature discovery tools by modeling the "sense-making" process using Bayesian Sequential Updating. In this paper, we use BaVA idea to enhance high dimensional visualization techniques such as Probabilistic PCA (PPCA). However, for real-world datasets, important structures can be arbitrarily complex and a single data projection such as PPCA technique may fail to provide useful insights. One way for visualizing such a dataset is to characterize it by a mixture of local models. For example, Tipping and Bishop [Tipping and Bishop, 1999] developed an algorithm called Mixture Probabilistic PCA (MPPCA) that extends PCA to visualize data via a mixture of projectors. Based on MPPCA, we developped a new visualization algorithm called Covariance-Guided MPPCA which group similar covariance structured clusters together to provide more meaningful and cleaner visualizations. Another way to visualize a very complex dataset is using nonlinear projection methods such as the Generative Topographic Mapping algorithm(GTM). We developped an interactive version of GTM to discover interesting local data structures. We demonstrate the performance of our approaches using both synthetic and real dataset and compare our algorithms with existing ones. / Ph. D. Visual Analytics Bayesian Methods Dimension Reduction Human-computer Interaction
237	Utilizing prediction analytics in the optimal design and control of healthcare systems Hu, Yue January 2022 (has links) In recent years, increasing availability of data and advances in predictive analytics present new opportunities and challenges to healthcare management. Predictive models are developed to evaluate various aspects of healthcare systems, such as patient demand, patient pathways, and patient outcomes. While these predictions potentially provide valuable information to improve healthcare delivery, there are still many open questions considering how to integrate these forecasts into operational decisions. In this context, this dissertation develops methodologies to combine predictive analytics with the design of healthcare delivery systems. The first part of dissertation considers how to schedule proactive care in the presence of patient deterioration. Healthcare systems are typically limited resource environments where scarce capacity is reserved for the most urgent patients. However, there has been a growing interest in the use of proactive care when a less urgent patient is predicted to become urgent while waiting. On one hand, providing care for patients when they are less critical could mean that fewer resources are needed to fulfill their treatment requirement. On the other hand, due to prediction errors, the moderate patients who are predicted to deteriorate in the future may self cure on their own and never need the treatment. Hence, allocating limited resource for these patients takes the capacity away from other more urgent ones who need it now. To understand this tension, we propose a multi-server queueing model with two patient classes: moderate and urgent. We allow patients to transition classes while waiting. In this setting, we characterize how moderate and urgent patients should be prioritized for treatment when proactive care for moderate patients is an option. The second part of the dissertation focuses on the nurse staffing decisions in the emergency departments (ED). Optimizing ED nurse staffing decisions to balance the quality of service and staffing cost can be extremely challenging, especially when there is a high level of uncertainty in patient demand. Increasing data availability and continuing advancements in predictive analytics provide an opportunity to mitigate demand uncertainty by utilizing demand forecasts. In the second part of the dissertation, we study a two-stage prediction-driven staffing framework where the prediction models are integrated with the base (made weeks in advance) and surge (made nearly real-time) staffing decisions in the ED. We quantify the benefit of having the ability to use the more expensive surge staffing. We also propose a near-optimal two-stage staffing policy that is straightforward to interpret and implement. Lastly, we develop a unified framework that combines parameter estimation, real-time demand forecasts, and capacity sizing in the ED. High-fidelity simulation experiments for the ED demonstrate that the proposed framework can reduce annual staffing costs by 11%-16% ($2 M-$3 M) while guaranteeing timely access to care. Operations research Predictive analytics Health planning Queuing theory
238	Mining of High-Utility Patterns in Big IoT-based Databases Wu, Jimmy M. T., Srivastava, Gautam, Lin, Jerry C., Djenouri, Youcef, Wei, Min, Parizi, Reza M., Khan, Mohammad S. 01 February 2021 (has links) When focusing on the general area of data mining, high-utility itemset mining (HUIM) can be defined as an offset of frequent itemset mining (FIM). It is known to emphasize more factors critically, which gives HUIM its intrinsic edge. Due to the flourishing development of the IoT technique, the uncertainty patterns mining is also attractive. Potential high-utility itemset mining (PHUIM) is introduced to reveal valuable patterns in an uncertainty database. Unfortunately, even though the previous methods are all very effective and powerful to mine, the potential high-utility itemsets quickly. These algorithms are not specifically designed for a database with an enormous number of records. In the previous methods, uncertainty transaction datasets would be load in the memory ultimately. Usually, several pre-defined operators would be applied to modify the original dataset to reduce the seeking time for scanning the data. However, it is impracticable to apply the same way in a big-data dataset. In this work, a dataset is assumed to be too big to be loaded directly into memory and be duplicated or modified; then, a MapReduce framework is proposed that can be used to handle these types of situations. One of our main objectives is to attempt to reduce the frequency of dataset scans while still maximizing the parallelization of all processes. Through in-depth experimental results, the proposed Hadoop algorithm is shown to perform strongly to mine all of the potential high-utility itemsets in a big-data dataset and shows excellent performance in a Hadoop computing cluster. data mining hadoop IoT data analytics uncertain utility patterns Computing
239	Leveraging business Intelligence and analytics to improve decision-making and organisational success Mushore, Rutendo January 2017 (has links) In a complex and dynamic organisational environment, challenges and dilemmas exist on how to maximise the value of Business Intelligence and Analytics (BI&A). The expectation of BI&A is to improve decision-making for core business processes that drive business performance. A multi-disciplinary review of theories from the domains of strategic management, technology adoption and economics claims that tasks, technology, people and structures (TTPS) need to be aligned for BI&A to add value to decision-making. However, these imperatives interplay, making it difficult to determine how they are configured. Whilst the links between TTPS have been previously recognised in the Socio-Technical Systems theory, no studies have delved into the issue of their configuration. This configuration is addressed in this study by adopting the fit as Gestalts approach, which examines the relationships among these elements and also determines how best to align them. A Gestalt looks at configurations that arise based on the level of coherence and helps determine the level of alignment amongst complex relationships. This study builds on an online quantitative survey tool based on a conceptual model for aligning TTPS. The alignment model contributes to the conceptual development of alignment of TTPS. Data was collected from organisations in a South African context. Individuals who participated in the survey came from the retail, insurance, banking, telecommunications and manufacturing industry sectors. This study's results show that there is close alignment that emerges between TTPS in Cluster 6 which comprises of IT experts and financial planners. Adequate training, coupled with structures encouraging usage of Business Intelligence and Analytics (BI&A), result in higher organisational success. This is because BI&A technology is in sync with the tasks it is being used for and users have high self-efficacy. Further analysis shows that poor organisational performance can be linked to gaps in alignment and the lack of an organisational culture that motivates usage of BI&A tools. This is because there is misalignment; therefore respondents do not find any value in using BI&A, thus impacting organisational performance. Applying a configurational approach helps researchers and practitioners identify coherent patterns that work well cohesively and comprehensively. The tangible contribution of this study is the conceptual model presented to achieve alignment. In essence, organisations can use the model for aligning tasks, technology, people and structures to better identify ideal configurations of the factors which are working cohesively and consequently find ways of leveraging Business intelligence and Analytics. Information Systems Business intelligence Data analytics Decision-making
240	Algorithms and Frameworks for Graph Analytics at Scale Jamour, Fuad Tarek 28 February 2019 (has links) Graph queries typically involve retrieving entities with certain properties and connectivity patterns. One popular property is betweenness centrality, which is a quantitative measure of importance used in many applications such as identifying influential users in social networks. Solving graph queries that involve retrieving important entities with user-defined connectivity patterns in large graphs requires efficient com- putation of betweenness centrality and efficient graph query engines. The first part of this thesis studies the betweenness centrality problem, while the second part presents a framework for building efficient graph query engines. Computing betweenness centrality entails computing all-pairs shortest paths; thus, exact computation is costly. The performance of existing approximation algorithms is not well understood due to the lack of an established benchmark. Since graphs in many applications are inherently evolving, several incremental algorithms were proposed. However, they cannot scale to large graphs: they either require excessive memory or perform unnecessary computations rendering them prohibitively slow. Existing graph query engines rely on exhaustive indices for accelerating query evaluation. The time and memory required to build these indices can be prohibitively high for large graphs. This thesis attempts to solve the aforementioned limitations in the graph analytics literature as follows. First, we present a benchmark for evaluating betweenness centrality approximation algorithms. Our benchmark includes ground-truth data for large graphs in addition to a systematic evaluation methodology. This benchmark is the first attempt to standardize evaluating betweenness centrality approximation algorithms and it is currently being used by several research groups working on approximate between- ness in large graphs. Then, we present a linear-space parallel incremental algorithm for updating betweenness centrality in large evolving graphs. Our algorithm uses biconnected components decomposition to localize processing graph updates, and it performs incremental computation even within affected components. Our algorithm is up to an order of magnitude faster than the state-of-the-art parallel incremental algorithm. Finally, we present a framework for building low memory footprint graph query engines. Our framework avoids building exhaustive indices and uses highly optimized matrix algebra operations instead. Our framework loads datasets, and evaluates data-intensive queries up to an order of magnitude faster than existing engines. rdf SPARQL graph query engine graph analytics centrality betweenness

Search results