• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 19646
  • 3370
  • 2417
  • 2007
  • 1551
  • 1432
  • 877
  • 406
  • 390
  • 359
  • 297
  • 234
  • 208
  • 208
  • 208
  • Tagged with
  • 38133
  • 12457
  • 9252
  • 7111
  • 6698
  • 5896
  • 5291
  • 5197
  • 4727
  • 3455
  • 3303
  • 2815
  • 2726
  • 2539
  • 2116
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
971

Combining Big Data And Traditional Business Intelligence – A Framework For A Hybrid Data-Driven Decision Support System

Dotye, Lungisa January 2021 (has links)
Since the emergence of big data, traditional business intelligence systems have been unable to meet most of the information demands in many data-driven organisations. Nowadays, big data analytics is perceived to be the solution to the challenges related to information processing of big data and decision-making of most data-driven organisations. Irrespective of the promised benefits of big data, organisations find it difficult to prove and realise the value of the investment required to develop and maintain big data analytics. The reality of big data is more complex than many organisations’ perceptions of big data. Most organisations have failed to implement big data analytics successfully, and some organisations that have implemented these systems are struggling to attain the average promised value of big data. Organisations have realised that it is impractical to migrate the entire traditional business intelligence (BI) system into big data analytics and there is a need to integrate these two types of systems. Therefore, the purpose of this study was to investigate a framework for creating a hybrid data-driven decision support system that combines components from traditional business intelligence and big data analytics systems. The study employed an interpretive qualitative research methodology to investigate research participants' understanding of the concepts related to big data, a data-driven organisation, business intelligence, and other data analytics perceptions. Semi-structured interviews were held to collect research data and thematic data analysis was used to understand the research participants’ feedback information based on their background knowledge and experiences. The application of the organisational information processing theory (OIPT) and the fit viability model (FVM) guided the interpretation of the study outcomes and the development of the proposed framework. The findings of the study suggested that data-driven organisations collect data from different data sources and process these data to transform them into information with the goal of using the information as a base of all their business decisions. Executive and senior management roles in the adoption of a data-driven decision-making culture are key to the success of the organisation. BI and big data analytics are tools and software systems that are used to assist a data-driven organisation in transforming data into information and knowledge. The suggested challenges that organisations experience when they are trying to integrate BI and big data analytics were used to guide the development of the framework that can be used to create a hybrid data-driven decision support system. The framework is divided into these elements: business motivation, information requirements, supporting mechanisms, data attributes, supporting processes and hybrid data-driven decision support system architecture. The proposed framework is created to assist data-driven organisations in assessing the components of both business intelligence and big data analytics systems and make a case-by-case decision on which components can be used to satisfy the specific data requirements of an organisation. Therefore, the study contributes to enhancing the existing literature position of the attempt to integrate business intelligence and big data analytics systems. / Dissertation (MIT (Information Systems))--University of Pretoria, 2021. / Informatics / MIT (Information Systems) / Unrestricted
972

Robust portfolio construction: using resampled efficiency in combination with covariance shrinkage

Combrink, James January 2017 (has links)
The thesis considers the general area of robust portfolio construction. In particular the thesis considers two techniques in this area that aim to improve portfolio construction, and consequently portfolio performance. The first technique focusses on estimation error in the sample covariance (one of portfolio optimisation inputs). In particular shrinkage techniques applied to the sample covariance matrix are considered and the merits thereof are assessed. The second technique considered in the thesis focusses on the portfolio construction/optimisation process itself. Here the thesis adopted the 'resampled efficiency' proposal of Michaud (1989) which utilises Monte Carlo simulation from the sampled distribution to generate a range of resampled efficient frontiers. Thereafter the thesis assesses the merits of combining these two techniques in the portfolio construction process. Portfolios are constructed using a quadratic programming algorithm requiring two inputs: (i) expected returns; and (ii) cross-sectional behaviour and individual risk (the covariance matrix). The output is a set of 'optimal' investment weights, one per each share who's returns were fed into the algorithm. This thesis looks at identifying and removing avoidable risk through a statistical robustification of the algorithms and attempting to improve upon the 'optimal' weights provided by the algorithms. The assessment of performance is done by comparing the out-of-period results with standard optimisation results, which highly sensitive and prone to sampling-error and extreme weightings. The methodology looks at applying various shrinkage techniques onto the historical covariance matrix; and then taking a resampling portfolio optimisation approach using the shrunken matrix. We use Monte-Carlo simulation techniques to replicate sets of statistically equivalent portfolios, find optimal weightings for each; and then through aggregation of these reduce the sensitivity to the historical time-series anomalies. We also consider the trade-off between sampling-error and specification-error of models.
973

Learning Analytics in Relation to Open Access to Research Data in Peru. An Interdisciplinary Comparison

Biernacka, Katarzyna, Huaroto, Libio 01 October 2020 (has links)
Conferencia realizada en el marco de la "III Conferencia Latinoamericana de Analíticas de Aprendizaje LALA2020 Project", del 1 al 2 de Octubre de 2020 en Cuenca, Ecuador. / The aim of this paper is to investigate the perceptions of learning analytics re-searchers in Peru about the barriers to publication of their research data. A review of the relevant legislation was done. Semi-structured interviews were used as a research method, the focus being on the presumed conflict between the publica-tion of research data and the protection of personal data. The results show a range of individual factors that influence the behaviour of scientists in relation to the publication of research data, emphasizing the barriers related to data protection in different disciplines.
974

Analyse et fouille de données de trajectoires d'objets mobiles / Analysis and data mining of moving object trajectories

El Mahrsi, Mohamed Khalil 30 September 2013 (has links)
Dans un premier temps, nous étudions l'échantillonnage de flux de trajectoires. Garder l'intégralité des trajectoires capturées par les terminaux de géo-localisation modernes peut s'avérer coûteux en espace de stockage et en temps de calcul. L'élaboration de techniques d'échantillonnage adaptées devient primordiale afin de réduire la taille des données en supprimant certaines positions tout en veillant à préserver le maximum des caractéristiques spatiotemporelles des trajectoires originales. Dans le contexte de flux de données, ces techniques doivent en plus être exécutées "à la volée" et s'adapter au caractère continu et éphémère des données. A cet effet, nous proposons l'algorithme STSS (spatiotemporal stream sampling) qui bénéficie d'une faible complexité temporelle et qui garantit une borne supérieure pour les erreurs d’échantillonnage. Nous montrons les performances de notre proposition en la comparant à d'autres approches existantes. Nous étudions également le problème de la classification non supervisée de trajectoires contraintes par un réseau routier. Nous proposons trois approches pour traiter ce cas. La première approche se focalise sur la découverte de groupes de trajectoires ayant parcouru les mêmes parties du réseau routier. La deuxième approche vise à grouper des segments routiers visités très fréquemment par les mêmes trajectoires. La troisième approche combine les deux aspects afin d'effectuer un co-clustering simultané des trajectoires et des segments. Nous démontrons comment ces approches peuvent servir à caractériser le trafic et les dynamiques de mouvement dans le réseau routier et réalisons des études expérimentales afin d'évaluer leurs performances. / In this thesis, we explore two problems related to managing and mining moving object trajectories. First, we study the problem of sampling trajectory data streams. Storing the entirety of the trajectories provided by modern location-aware devices can entail severe storage and processing overheads. Therefore, adapted sampling techniques are necessary in order to discard unneeded positions and reduce the size of the trajectories while still preserving their key spatiotemporal features. In streaming environments, this process needs to be conducted "on-the-fly" since the data are transient and arrive continuously. To this end, we introduce a new sampling algorithm called spatiotemporal stream sampling (STSS). This algorithm is computationally-efficient and guarantees an upper bound for the approximation error introduced during the sampling process. Experimental results show that stss achieves good performances and can compete with more sophisticated and costly approaches. The second problem we study is clustering trajectory data in road network environments. We present three approaches to clustering such data: the first approach discovers clusters of trajectories that traveled along the same parts of the road network; the second approach is segment-oriented and aims to group together road segments based on trajectories that they have in common; the third approach combines both aspects and simultaneously clusters trajectories and road segments. We show how these approaches can be used to reveal useful knowledge about flow dynamics and characterize traffic in road networks. We also provide experimental results where we evaluate the performances of our propositions.
975

Méthodes parallèles pour le traitement des flux de données continus / Parallel and continuous join processing for data stream

Song, Ge 28 September 2016 (has links)
Nous vivons dans un monde où une grande quantité de données est généré en continu. Par exemple, quand on fait une recherche sur Google, quand on achète quelque chose sur Amazon, quand on clique en ‘Aimer’ sur Facebook, quand on upload une image sur Instagram, et quand un capteur est activé, etc., de nouvelles données vont être généré. Les données sont différentes d’une simple information numérique, mais viennent dans de nombreux format. Cependant, les données prisent isolément n’ont aucun sens. Mais quand ces données sont reliées ensemble on peut en extraire de nouvelles informations. De plus, les données sont sensibles au temps. La façon la plus précise et efficace de représenter les données est de les exprimer en tant que flux de données. Si les données les plus récentes ne sont pas traitées rapidement, les résultats obtenus ne sont pas aussi utiles. Ainsi, un système parallèle et distribué pour traiter de grandes quantités de flux de données en temps réel est un problème de recherche important. Il offre aussi de bonne perspective d’application. Dans cette thèse nous étudions l’opération de jointure sur des flux de données, de manière parallèle et continue. Nous séparons ce problème en deux catégories. La première est la jointure en parallèle et continue guidée par les données. La second est la jointure en parallèle et continue guidée par les requêtes. / We live in a world where a vast amount of data is being continuously generated. Data is coming in a variety of ways. For example, every time we do a search on Google, every time we purchase something on Amazon, every time we click a ‘like’ on Facebook, every time we upload an image on Instagram, every time a sensor is activated, etc., it will generate new data. Data is different than simple numerical information, it now comes in a variety of forms. However, isolated data is valueless. But when this huge amount of data is connected, it is very valuable to look for new insights. At the same time, data is time sensitive. The most accurate and effective way of describing data is to express it as a data stream. If the latest data is not promptly processed, the opportunity of having the most useful results will be missed.So a parallel and distributed system for processing large amount of data streams in real time has an important research value and a good application prospect. This thesis focuses on the study of parallel and continuous data stream Joins. We divide this problem into two categories. The first one is Data Driven Parallel and Continuous Join, and the second one is Query Driven Parallel and Continuous Join.
976

Determining the Factors Influential in the Validation of Computer-based Problem Solving Systems

Morehead, Leslie Anne 01 January 1996 (has links)
Examination of the literature on methodologies for verifying and validating complex computer-based Problem Solving Systems led to a general hypothesis that there exist measurable features of systems that are correlated with the best testing methods for those systems. Three features (Technical Complexity, Human Involvement, and Observability) were selected as the basis of the current study. A survey of systems currently operating in over a dozen countries explored relationships between these system features, test methods, and the degree to which systems were considered valid. Analysis of the data revealed that certain system features and certain test methods are indeed related to reported levels of confidence in a wide variety of systems. A set of hypotheses was developed, focused in such a way that they correspond to linear equations that can be estimated and tested for significance using statistical regression analysis. Of 24 tested hypotheses, 17 were accepted, resulting in 49 significant models predicting validation and verification percentages, using 37 significant variables. These models explain between 28% and 86% of total variation. Interpretation of these models (equations) leads directly to useful recommendations regarding system features and types of validation methods that are most directly associated with the verification and validation of complex computer systems. The key result of the study is the identification of a set of sixteen system features and test methods that are multiply correlated with reported levels of verification and validation. Representative examples are: • People are more likely to trust a system if it models a real-world event that occurs frequently. • A system is more likely to be accepted if users were involved in its design. • Users prefer systems that give them a large choice of output. • The longer the code, or the greater the number of modules, or the more programmers involved on the project, the less likely people are to believe a system is error-free and reliable. From these results recommendations are developed that bear strongly on proper resource allocation for testing computer-based Problem Solving Systems. Furthermore, they provide useful guidelines on what should reasonably be expected from the validation process.
977

Microblaze-based coprocessor for data stream management systems

Alqaisi, Tareq S. 06 December 2017 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Data network's speed and availability are increasing at a record rate. More and more devices are now able to connect to the Internet and stream data. Processing this ever-growing amount of data in real time continues to be a challenge. Multiple studies have been conducted to address the growing demands for real-time processing and analysis of continuous data streams. Developed in a previous work, Symbiote Coprocessor Unit (SCU) is a hardware accelerator capable of providing up to 150X speedup over traditional data stream processors in the field of data stream management systems. However, SCU implementation is very complex, fixed, and uses an outdated host interface, which limits future improvements. In this study, we present a new SCU architecture that is based on a Xilinx MicroBlaze configurable microcontroller. The proposed architecture reduces complexity, allows future implementations of new algorithms in a relatively short amount of time while maintaining the SCU's high performance. It also has an industry standard PCIe interface. Finally, it uses a standard AMBA AXI4 bus interconnect, which enables easier integration of new hardware components. The new architecture is implemented using a Xilinx VC709 development board. Our experimental results have shown a minimal loss of performance as compared to the original SCU while providing a flexible and simple design.
978

Geo-Temporal Visualization for Tourism Data Using Color Curves

Choi, In Kwon 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / For individuals in the tourism industry and other businesses, the department of tourism in the government, or the individuals who are planning a travel, the data of tourist population movement can be a valuable resource that can uncover insights that could bring more profit and more tourists, or make the trip more enjoyable. As visualization is an effective way of conveying information with multiple dimensions, we would like to visualize the geo-temporal floating population data of tourists and residents in Jeju island in the Republic of Korea in two-dimensional space. In this study, we introduce the two methods we have implemented for visualizing the geo-temporal data using color curves as the representation of time dimension. We use the dots as the markers of floating population, and each color of dots represents the 24 hours of a day. In the first method, we plot the colored dots directly on the map, thereby coloring the area the data represents. In the second method, we plot the same dots inside a semi-transparent circle divided into arcs that represent each month of a year. The user can compare the population of tourists and residents between the different times of a day, the different months and the weather conditions to analyze the floating population in the given area.
979

Applications of Data Mining in Healthcare

Peng, Bo 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / With increases in the quantity and quality of healthcare related data, data mining tools have the potential to improve people’s standard of living through personalized and predictive medicine. In this thesis we improve the state-of-the-art in data mining for several problems in the healthcare domain. In problems such as drug-drug interaction prediction and Alzheimer’s Disease (AD) biomarkers discovery and prioritization, current methods either require tedious feature engineering or have unsatisfactory performance. New effective computational tools are needed that can tackle these complex problems. In this dissertation, we develop new algorithms for two healthcare problems: high-order drug-drug interaction prediction and amyloid imaging biomarker prioritization in Alzheimer’s Disease. Drug-drug interactions (DDIs) and their associated adverse drug reactions (ADRs) represent a significant detriment to the public h ealth. Existing research on DDIs primarily focuses on pairwise DDI detection and prediction. Effective computational methods for high-order DDI prediction are desired. In this dissertation, I present a deep learning based model D 3 I for cardinality-invariant and order-invariant high-order DDI pre- diction. The proposed models achieve 0.740 F1 value and 0.847 AUC value on high-order DDI prediction, and outperform classical methods on order-2 DDI prediction. These results demonstrate the strong potential of D 3 I and deep learning based models in tackling the prediction problems of high-order DDIs and their induced ADRs. The second problem I consider in this thesis is amyloid imaging biomarkers discovery, for which I propose an innovative machine learning paradigm enabling precision medicine in this domain. The paradigm tailors the imaging biomarker discovery process to individual characteristics of a given patient. I implement this paradigm using a newly developed learning-to-rank method PLTR. The PLTR model seamlessly integrates two objectives for joint optimization: pushing up relevant biomarkers and ranking among relevant biomarkers. The empirical study of PLTR conducted on the ADNI data yields promising results to identify and prioritize individual-specific amyloid imaging biomarkers based on the individual’s structural MRI data. The resulting top ranked imaging biomarkers have the potential to aid personalized diagnosis and disease subtyping.
980

Partitioned Persistent Homology

Malott, Nicholas O. January 2020 (has links)
No description available.

Page generated in 0.1169 seconds