• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 157
  • 20
  • 11
  • 11
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 311
  • 311
  • 130
  • 107
  • 75
  • 68
  • 59
  • 40
  • 40
  • 38
  • 37
  • 36
  • 36
  • 34
  • 33
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Predicting Delays In Delivery Process Using Machine Learning-Based Approach

Shehryar Shahid (9745388) 16 December 2020 (has links)
<div>There has been a great interest in applying Data Science, Machine Learning, and AI-related technologies in recent years. Industries are adopting these technologies very rapidly, which has enabled them to gather valuable data about their businesses. One such industry that can leverage this data to improve their business's output and quality is the logistics and transport industry. This phenomenon provides an excellent opportunity for companies who rely heavily on air transportation to leverage this data to gain valuable insights and improve their business operations. This thesis is aimed to leverage this data to develop techniques to model complex business processes and design a machine learning-based predictive analytical approach to predict process violations.</div><div>This thesis focused on solving delays in shipment delivery by modeling a prediction technique to predict these delays. The approach presented here was based on real airfreight shipping data, which follows the International Air and Transport Association industry standard for airfreight transportation, to identify shipments at risk of being delayed. By leveraging the shipment process structure, this research presented a new approach that solved the complex event-driven structure of airfreight data that made it difficult to model for predictive analytics.</div><div>By applying different data mining and machine learning techniques, prediction techniques were developed to predict delays in delivering airfreight shipments. The prediction techniques were based on random forest and gradient boosting algorithms. To compare and select the best model, the prediction results were interpreted in the form of six confusion matrix-based performance metrics. The results showed that all the predictors had a high specificity of over 90%, but the sensitivity was low, under 44%. Accuracy was observed to be over 75%, and a geometric mean was between 58% – 64%.</div><div>The performance metrics results provided evidence that our approach could be implemented to develop a prediction technique to model complex business processes. Additionally, an early prediction method was designed to test predictors' performance if complete process information was not available. This proposed method delivered compelling evidence suggesting that early prediction can be achieved without compromising the predictor’s performance.</div>
82

Aplicación de Data Science Specialist

Ccora Camarena, Yuli, Jeri De La Cruz, Nélida, Enriquez Yance, Rosario Grace 14 January 2020 (has links)
El trabajo de investigación que se presenta a continuación constituye el análisis de la problemática planteada sobre la empresa Travico Perú S.A.C, la cual ha reportado un descenso en sus ventas de sus diferentes servicios que ofrece. Para este desarrollo de este trabajo se ha aplicado la metodología de la ciencia de datos, con la cual se logró identificar las variables que influyeron en las ventas de todos los servicios durante los años 2016 al 2018, el conjunto de datos se obtuvo a través de plataformas con las que la empresa trabaja y los reportes de control interno, con ello, se identificaron 12 variables con 6429 datos. Así mismo, se empleó la técnica de aprendizaje automático no supervisado, basado en particiones: K means, las cual permitió segmentar y agrupar las variables que fueron seleccionadas. Finalmente, para el análisis, se presentaron distintas gráficas con los resultados de las ventas de la empresa y se hicieron comparaciones con los resultados de los clústeres. / The research work presented below constitutes the analysis of the problem raised about the company Travico Perú S.A.C, which has reported a decrease in its sales of its different services offered. For this development of this work, the methodology of data science has been applied, with which it has been identified to identify the variables that influenced the sales of all services during the years 2016 to 2018, the data set was achieved through of platforms with which the company works and internal control reports, thereby identifying 12 variables with 6429 data. Furthermore, we use a technique machine learning without supervised, based on partitions: K means the qualified segment and group the variables that were selected. Finally, for the analysis, different graphs are shown with the results of the company's sales and comparisons were made with the results of the clusters. / Trabajo de investigación
83

Improving Recommendation Systems Using Image Data

Åslin, Filip January 2022 (has links)
Recommendation systems typically use historical interactions between users and items topredict what other items can be of interest to a user. The recommendations are based onpatterns in how users interact similarly with items. This thesis investigates if it is possible toimprove the quality of the recommendations by including more information about the items inthe model that predicts the recommendations. More specifically, the use of deep learning toextract information from item images is investigated. To do this, two types of collaborativefiltering models, based on historic interactions, are implemented. These models are thencompared to different collaborative filtering models that either make use of user and itemattributes, or images of the items. Three pre-trained image classification models are used toextract useful item features from the item images. The models are trained and evaluated using adataset of historic transactions and item images from the online sports shop Stadium, given bythe thesis supervisor. The results show no noticeable improvement in performance for themodels using the images compared to the models without images. The model using the userand item attributes performs the best, indicating that the collaborative filtering models can beimproved by giving it more information than just the historic interactions. Possible ways tofurther investigate using the image feature vectors in collaborative filtering models, as well asusing them to create better item attributes, are discussed and suggested for future work.
84

Topological Hierarchies and Decomposition: From Clustering to Persistence

Brown, Kyle A. 27 May 2022 (has links)
No description available.
85

GENERATIVE, PREDICTIVE, AND REACTIVE MODELS FOR DATA SCARCE PROBLEMS IN CHEMICAL ENGINEERING

Nicolae Christophe Iovanac (11167785) 22 July 2021 (has links)
<div>Data scarcity is intrinsic to many problems in chemical engineering due to physical constraints or cost. This challenge is acute in chemical and materials design applications, where a lack of data is the norm when trying to develop something new for an emerging application. Addressing novel chemical design under these scarcity constraints takes one of two routes: the traditional forward approach, where properties are predicted based on chemical structure, and the recent inverse approach, where structures are predicted based on required properties. Statistical methods such as machine learning (ML) could greatly accelerate chemical design under both frameworks; however, in contrast to the modeling of continuous data types, molecular prediction has many unique obstacles (e.g., spatial and causal relationships, featurization difficulties) that require further ML methods development. Despite these challenges, this work demonstrates how transfer learning and active learning strategies can be used to create successful chemical ML models in data scarce situations.<br></div><div>Transfer learning is a domain of machine learning under which information learned in solving one task is transferred to help in another, more difficult task. Consider the case of a forward design problem involving the search for a molecule with a particular property target with limited existing data, a situation not typically amenable to ML. In these situations, there are often correlated properties that are computationally accessible. As all chemical properties are fundamentally tied to the underlying chemical topology, and because related properties arise due to related moieties, the information contained in the correlated property can be leveraged during model training to help improve the prediction of the data scarce property. Transfer learning is thus a favorable strategy for facilitating high throughput characterization of low-data design spaces.</div><div>Generative chemical models invert the structure-function paradigm, and instead directly suggest new chemical structures that should display the desired application properties. This inversion process is fraught with difficulties but can be improved by training these models with strategically selected chemical information. Structural information contained within this chemical property data is thus transferred to support the generation of new, feasible compounds. Moreover, transfer learning approach helps ensure that the proposed structures exhibit the specified property targets. Recent extensions also utilize thermodynamic reaction data to help promote the synthesizability of suggested compounds. These transfer learning strategies are well-suited for explorative scenarios where the property values being sought are well outside the range of available training data.</div><div>There are situations where property data is so limited that obtaining additional training data is unavoidable. By improving both the predictive and generative qualities of chemical ML models, a fully closed-loop computational search can be conducted using active learning. New molecules in underrepresented property spaces may be iteratively generated by the network, characterized by the network, and used for retraining the network. This allows the model to gradually learn the unknown chemistries required to explore the target regions of chemical space by actively suggesting the new training data it needs. By utilizing active learning, the create-test-refine pathway can be addressed purely in silico. This approach is particularly suitable for multi-target chemical design, where the high dimensionality of the desired property targets exacerbates data scarcity concerns.</div><div>The techniques presented herein can be used to improve both predictive and generative performance of chemical ML models. Transfer learning is demonstrated as a powerful technique for improving the predictive performance of chemical models in situations where a correlated property can be leveraged alongside scarce experimental or computational properties. Inverse design may also be facilitated through the use of transfer learning, where property values can be connected with stable structural features to generate new compounds with targeted properties beyond those observed in the training data. Thus, when the necessary chemical structures are not known, generative networks can directly propose them based on function-structure relationships learned from domain data, and this domain data can even be generated and characterized by the model itself for closed-loop chemical searches in an active learning framework. With recent extensions, these models are compelling techniques for looking at chemical reactions and other data types beyond the individual molecule. Furthermore, the approaches are not limited by choice of model architecture or chemical representation and are expected to be helpful in a variety of data scarce chemical applications.</div>
86

Characterizing the learning, sociology, and identity effects of participating in The Data Mine

Aparajita Jaiswal (12418072) 14 April 2022 (has links)
<p>The discipline of data science has gained substantial attention recently. This is mainly attributed to the technological advancement that led to an exponential increase in computing power and has made the generation and recording of enormous amounts of data possible on an everyday basis. It has become crucial for industries to wrangle, curate, and analyze data using data science techniques to make informed decisions. Making informed decisions is complex. Therefore, a trained data science workforce is required to analyze data on a real-time basis. The increasing demand for data science professionals has caused higher education institutions to develop courses and train students starting from the undergraduate level about the data science concepts and tools.</p> <p>Despite the efforts from the institutions and national agency such as National Academies of Sciences, Engineering, and Medicine, it has been witnessed that there have been significant challenges in retaining and attracting students in the discipline of data science. The novice learners in data science are required to possess the skills of a programmer, a statistician, research skills, and non-technical skills such as communication and critical thinking. The undergraduate students do not possess all the required skills, which, in turn, creates a cognitive load for novice learners (Koby & Orit, 2020). Research suggests that improving the teaching and mentoring methodologies can improve retention for students from all demographic groups (Seymour, 2002). Previous studies (e.g., Hoffmann et al., 2002, Flynn, 2015; Lenning & Ebbers, 1999) have revealed that learning communities are effective in improving student retention, especially at the undergraduate level, as it helps students develop a sense of belonging, socialize, and form their own identities. Learning communities have been identified as <em>high impact practices</em> (Kuh, 2008) that helps to develop identities and sense of belonging, however to the best of our knowledge there are few studies that focus on the development of the psychosocial and cognitive skills of the students enrolled in a data science learning community.</p> <p>To meet the demand for the future workforce and help undergraduate students develop data science skills, The Data Mine (TDM) at Purdue University has undertaken an initiative in the discipline of data science. The Data Mine is an interdisciplinary living-learning community that allows students from various disciplines to enroll and learn data science skills under the guidance of competent faculty and corporate mentors. The residential nature of the learning community allows the undergraduate students to live, learn and socialize with peers of similar interests and develop a sense of belonging. The constant interaction with knowledgeable faculty and mentors in real-world projects allows novice learners to master data science skills and develop an identity. The study aims to characterize the effects of identity formation, socialization, and learning of the undergraduate students enrolled in The Data Mine and answer the following research question:</p> <p><br></p> <p><strong>Quantitative: RQ 1:</strong> What are the perceptions of students regarding their identity formation, socialization opportunities, self-belief, and academic/intellectual development in The Data Mine? </p> <p><strong>Qualitative: Guiding RQ 2:</strong> How do students’ participation in activities and interaction with peers, faculty, staff at The Data Mine contribute to becoming an experienced member of the learning community?</p> <ul> <li><strong>Sub-RQ 2(a):</strong> What are the perceived benefits and challenges of participating in The Data Mine?</li> <li><strong>Sub-RQ 2(b):</strong> How do students describe their levels of socialization and a sense of belonging within The Data Mine?</li> <li><strong>Sub-RQ 2(c):</strong> How do students’ participation and interaction in The Data Mine help them form their identity?</li> </ul> <p>To approach the above research questions, we conducted a sequential explanatory mixed method study to understand the growth journey of students in terms of socialization, sense of belonging and identity formation. The data were collected in two phases: a quantitative survey study followed by qualitative semi-structured interviews. The quantitative data was analyzed using descriptive and inferential statistics, and qualitative data were analyzed using thematic analysis, followed by narrative analysis. The results of the quantitative and qualitative analysis demonstrated that learning in The Data Mine happened through interaction and socialization of the students with faculty, staff, and peers at The Data Mine. Students found multiple opportunities to learn and develop data science skills, such as working on real-world projects or working in groups. This continuous interaction with peers, faculty and staff at The Data Mine helped them to learn and develop identities. This study revealed that students did develop a data science identity, but the corporate partner TAs developed a leader identity along with the data science identity. In summary all students grew and served as mentor, guide, and role models for new incoming students.</p>
87

Intraday Algorithmic Trading using Momentum and Long Short-Term Memory network strategies

Whitinger, Andrew R., II 01 May 2022 (has links)
Intraday stock trading is an infamously difficult and risky strategy. Momentum and reversal strategies and long short-term memory (LSTM) neural networks have been shown to be effective for selecting stocks to buy and sell over time periods of multiple days. To explore whether these strategies can be effective for intraday trading, their implementations were simulated using intraday price data for stocks in the S&P 500 index, collected at 1-second intervals between February 11, 2021 and March 9, 2021 inclusive. The study tested 160 variations of momentum and reversal strategies for profitability in long, short, and market-neutral portfolios, totaling 480 portfolios. Long and short portfolios for each strategy were also compared to the market to observe excess returns. Eight reversal portfolios yielded statistically significant profits, and 16 yielded significant excess returns. Tests of these strategies on another set of 16 days failed to yield statistically significant returns, though average returns remained profitable. Four LSTM network configurations were tested on the same original set of days, with no strategy yielding statistically significant returns. Close examination of the stocks chosen by LSTM networks suggests that the networks expect stocks to exhibit a momentum effect. Further studies may explore whether an intraday reversal effect can be observed over time during different market conditions and whether different configurations of LSTM networks can generate significant returns.
88

Detection of 3D Genome Folding at Multiple Scales

Akgol-Oksuz, Betul 13 April 2022 (has links)
Understanding 3D genome structure is crucial to learn how chromatin folds and how genes are regulated through the spatial organization of regulatory elements. Various technologies have been developed to investigate genome architecture. These technologies include ligation-based 3C Methodologies such as Hi-C and Micro-C, ligation-based pull-down methods like Proximity Ligation-Assisted ChIP-seq (PLAC Seq) and Paired-end tag sequencing (ChIA PET), and ligation-free methods like Split-Pool Recognition of Interactions by Tag Extension (SPRITE) and Genome Architecture Mapping (GAM). Although these technologies have provided great insight into chromatin organization, a systematic evaluation of these technologies is lacking. Among these technologies, Hi-C has been one of the most widely used methods to map genome-wide chromatin interactions for over a decade. To understand how the choice of experimental parameters determines the ability to detect and quantify the features of chromosome folding, we have first systematically evaluated two critical parameters in the Hi-C protocol: cross-linking and digestion of chromatin. We found that different protocols capture distinct 3D genome features with different efficiencies depending on the cell type (Chapter 2). Use of the updated Hi-C protocol with new parameters, which we call Hi-C 3.0, was subsequently evaluated and found to provide the best loop detection compared to all previous Hi-C protocols as well as better compartment quantification compared to Micro-C (Chapter 3). Finally, to understand how the aforementioned technologies (Hi-C, Micro-C, PLAC-Seq, ChIA-PET, SPRITE, GAM) that measure 3D organization could provide a comprehensive understanding of the genome structure, we have performed a comparison of these technologies. We found that each of these methods captures different aspects of the chromatin folding (Chapter 4). Collectively, these studies suggest that improving the 3D methodologies and integrative analyses of these methods will reveal unprecedented details of the genome structure and function.
89

Model-based assessments of freshwater ecosystems and species under climate change

Kärcher, Oskar 14 October 2019 (has links)
Climate change, global warming and anthropogenic disturbances are threatening freshwater ecosystems globally. The protection and preservation of freshwater environments, its biodiversity and all of its services for human well-being requires comprehensive knowledge of the impacts that climate change and anthropogenic disturbances have on freshwaters and freshwater species. In-depth knowledge needed for conservation strategies can be established through versatile assessments. Quantitative assessments and the investigation of prevailing environmental relationships within ecosystems constitute the basis for sustaining freshwater systems. However, it is a great challenge to quantify the multifaceted effects of climate change and to broaden the understanding of complex environmental relationships. This thesis aims at contributing to an extension of the understanding of climate change impacts on freshwater ecosystems and environmental relationships, which implies the provision of useful guidelines for the protection and preservation of freshwaters. For this, various statistical approaches based on comprehensive data sets are applied at different scales, ranging from local to global assessments. In particular, five research studies investigating the (1) water quality-nutrient and temperature relationships in European lakes, (2) drivers of freshwater fish species distributions across varying scales in the Danube River delta, (3) globally derived thermal response curves and thermal properties of native European freshwater species, (4) differences between thermal properties derived from native and global range data, and (5) thermal performances of freshwater fish species for different life stages and different global future dispersal scenarios are presented to address the effects of environmental change. Main results of this thesis comprise various aspects of conservation implications and planning. (i) The first study outlines drivers influencing water quality through studying multi-dimensional relationships and compares different modelling techniques in order to outline models that are suitable for the identification of complex driver interactions. (ii) The second study addresses scale effects on the performance of species distribution models, which are commonly used for assessments of climate change impacts, and identifies key predictors driving distributions for the varying scales and studied species. (iii) The third study parameterizes thermal responses of species from different taxonomic groups and assesses the potential resilience in terms of warming tolerance and additional thermal properties as well as the influence of future rising temperatures on current distributions. (iv) The fourth study quantifies the differences in thermal response curves and thermal properties for freshwater fishes derived from global and continental data in order to clarify the need for using global range data in studies making suggestions for conservation planning. (v) The last study estimates the impact of changing climatic conditions on species distribution ranges of two fish species for different time periods by including biotic information about thermal performances for various life stages. Overall, this thesis contributes to the broad field of studying consequences and impacts of climate change on freshwater ecosystems. By applying statistical methods tailored to the underlying investigations, useful implications for conservation planning are derived.
90

Comparing machine learning models and physics-based models in groundwater science

Boerman, Thomas Christiaan 25 January 2022 (has links)
The use of machine learning techniques in tackling hydrological problems has significantly increased over the last decade. Machine learning tools can provide alternatives or surrogates to complex and comprehensive methodologies such as physics-based numerical models. Machine learning algorithms have been used in hydrology for estimating streamflow, runoff, water table fluctuations and calculating the impacts of climate change on nutrient loading among many other applications. In recent years we have also seen arguments for and advances in combining physics-based models and machine learning algorithms for mutual benefit. This thesis contributes to these advances by addressing two different groundwater problems by developing a machine learning approach and comparing this previously developed physics-based models: i) estimating groundwater and surface water depletion caused by groundwater pumping using artificial neural networks and ii) estimating a global steady-state map of water table depth using random forests. The first chapter of this thesis outlines the purpose of this thesis and how this thesis is a contribution to the overall scientific knowledge on the topic. The results of this research contribute to three of the twenty-three major unsolved problems in hydrology, as has been summarized by a collective of hundreds of hydrologists. In the second chapter, we tested the potential of artificial neural networks (ANNs), a deeplearning tool, as an alternative method for estimating source water of groundwater abstraction compared to conventional methods (analytical solutions and numerical models). Surrogate ANN models of three previously calibrated numerical groundwater models were developed using hydrologically meaningful input parameters (e.g., well-stream distance and hydraulic diffusivity) selected by predictor parameter optimization, combining hydrological expertise and statistical methodologies (ANCOVA). The output parameters were three transient sources of groundwater abstraction (shallow and deep storage release, and local surface-water depletion). We found that the optimized ANNs have a predictive skill of up to 0.84 (R2, 2σ = ± 0.03) when predicting water sources compared to physics-based numerical (MODFLOW) models. Optimal ANN skill was obtained when using between five and seven predictor parameters, with hydraulic diffusivity and mean aquifer thickness being the most important predictor parameters. Even though initial results are promising and computationally frugal, we found that the deep learning models were not yet sufficient or outperforming numerical model simulations. The third chapter used random forests in mapping steady-state water table depth on a global scale (0.1°-spatial resolution) and to integrate the results to improve our understanding on scale and perceptual modeling of global water table depth. In this study we used a spatially biased ~1.5-million-point database of water table depth observations with a variety of iv globally distributed above- and below-ground predictor variables with causal relationships to steady-state water table depth. We mapped water table depth globally as well as at regional to continental scales to interrogate performance, feature importance and hydrologic process across scales and regions with varying hydrogeological landscapes and climates. The global water table depth map has a correlation (cross validation error) of R2 = 0.72 while our highest continental correlation map (Australia) has a correlation of R2 = 0.86. The results of this study surprisingly show that above-ground variables such as surface elevation, slope, drainage density and precipitation are among the most important predictor parameters while subsurface parameters such as permeability and porosity are notably less important. This is contrary to conventional thought among hydrogeologists, who would assume that subsurface parameters are very important. Machine learning results overall underestimate water table depth similar to existing global physics-based groundwater models which also have comparable differences between existing physics-based groundwater models themselves. The feature importance derived from our random forest models was used to develop alternative perceptual models that highlight different water table depth controls between areas with low relief and high relief. Finally, we considered the representativeness of the prediction domain and the predictor database and found that 90% of the prediction domain has a dissimilarity index lower than 0.75. We conclude that we see good extrapolation potential for our random forest models to regions with unknown water table depth, except for some high elevation regions. Finally in chapter four, the most important findings of chapters two and three are considered as contributions to the unresolved questions in hydrology. Overall, this thesis has contributed to advancing hydrological sciences through: i) mapping of global steady-state water table depth using machine learning; ii) advancing hybrid modeling by using synthetic data derived from physics-based models to train an artificial neural network for estimating storage depletion; and (iii) it contributing to answering three unsolved problems in hydrology involving themes of parameter scaling across temporal and spatial scales, extracting hydrological insight from data, the use of innovative modeling techniques to estimate hydrological fluxes/states and extrapolation of models to no-data regions. / Graduate

Page generated in 0.0809 seconds