• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 165
  • 20
  • 12
  • 11
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 331
  • 331
  • 137
  • 113
  • 83
  • 70
  • 67
  • 46
  • 44
  • 43
  • 40
  • 38
  • 37
  • 37
  • 36
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Identifying and Evaluating Early Stage Fintech Companies: Working with Consumer Internet Data and Analytic Tools

Shoop, Alexander 24 January 2018 (has links)
The purpose of this project is to work as an interdisciplinary team whose primary role is to mentor a team of WPI undergraduate students completing their Major Qualifying Project (MQP) in collaboration with Vestigo Ventures, LLC. (“Vestigo Ventures�) and Cogo Labs. We worked closely with the project sponsors at Vestigo Ventures and Cogo Labs to understand each sponsor’s goals and desires, and then translated those thoughts into actionable items and concrete deliverables to be completed by the undergraduate student team. As a graduate student team with a diverse set of educational backgrounds and a range of academic and professional experiences, we provided two primary functions throughout the duration of this project. The first function was to develop a roadmap for each individual project, with concrete steps, justification, goals and deliverables. The second function was to provide the undergraduate team with clarification and assistance throughout the implementation and completion of each project, as well as provide our opinions and thoughts on any proposed changes. The two teams worked together in lock-step in order to provide the project sponsors with a complete set of deliverables, with the undergraduate team primarily responsible for implementation and final delivery of each completed project.
62

Estratégia computacional para apoiar a reprodutibilidade e reuso de dados científicos baseado em metadados de proveniência. / Computational strategy to support the reproducibility and reuse of scientific data based on provenance metadata.

Silva, Daniel Lins da 17 May 2017 (has links)
A ciência moderna, apoiada pela e-science, tem enfrentado desafios de lidar com o grande volume e variedade de dados, gerados principalmente pelos avanços tecnológicos nos processos de coleta e processamento dos dados científicos. Como consequência, houve também um aumento na complexidade dos processos de análise e experimentação. Estes processos atualmente envolvem múltiplas fontes de dados e diversas atividades realizadas por grupos de pesquisadores geograficamente distribuídos, que devem ser compreendidas, reutilizadas e reproduzíveis. No entanto, as iniciativas da comunidade científica que buscam disponibilizar ferramentas e conscientizar os pesquisadores a compartilharem seus dados e códigos-fonte, juntamente com as publicações científicas, são, em muitos casos, insuficientes para garantir a reprodutibilidade e o reuso das contribuições científicas. Esta pesquisa objetiva definir uma estratégia computacional para o apoio ao reuso e a reprodutibilidade dos dados científicos, por meio da gestão da proveniência dos dados durante o seu ciclo de vida. A estratégia proposta nesta pesquisa é apoiada em dois componentes principais, um perfil de aplicação, que define um modelo padronizado para a descrição da proveniência dos dados, e uma arquitetura computacional para a gestão dos metadados de proveniência, que permite a descrição, armazenamento e compartilhamento destes metadados em ambientes distribuídos e heterogêneos. Foi desenvolvido um protótipo funcional para a realização de dois estudos de caso que consideraram a gestão dos metadados de proveniência de experimentos de modelagem de distribuição de espécies. Estes estudos de caso possibilitaram a validação da estratégia computacional proposta na pesquisa, demonstrando o seu potencial no apoio à gestão de dados científicos. / Modern science, supported by e-science, has faced challenges in dealing with the large volume and variety of data generated primarily by technological advances in the processes of collecting and processing scientific data. Therefore, there was also an increase in the complexity of the analysis and experimentation processes. These processes currently involve multiple data sources and numerous activities performed by geographically distributed research groups, which must be understood, reused and reproducible. However, initiatives by the scientific community with the goal of developing tools and sensitize researchers to share their data and source codes related to their findings, along with scientific publications, are often insufficient to ensure the reproducibility and reuse of scientific results. This research aims to define a computational strategy to support the reuse and reproducibility of scientific data through data provenance management during its entire life cycle. Two principal components support our strategy in this research, an application profile that defines a standardized model for the description of provenance metadata, and a computational architecture for the management of the provenance metadata that enables the description, storage and sharing of these metadata in distributed and heterogeneous environments. We developed a functional prototype for the accomplishment of two case studies that considered the management of provenance metadata during the experiments of species distribution modeling. These case studies enabled the validation of the computational strategy proposed in the research, demonstrating the potential of this strategy in supporting the management of scientific data.
63

Estratégia computacional para apoiar a reprodutibilidade e reuso de dados científicos baseado em metadados de proveniência. / Computational strategy to support the reproducibility and reuse of scientific data based on provenance metadata.

Daniel Lins da Silva 17 May 2017 (has links)
A ciência moderna, apoiada pela e-science, tem enfrentado desafios de lidar com o grande volume e variedade de dados, gerados principalmente pelos avanços tecnológicos nos processos de coleta e processamento dos dados científicos. Como consequência, houve também um aumento na complexidade dos processos de análise e experimentação. Estes processos atualmente envolvem múltiplas fontes de dados e diversas atividades realizadas por grupos de pesquisadores geograficamente distribuídos, que devem ser compreendidas, reutilizadas e reproduzíveis. No entanto, as iniciativas da comunidade científica que buscam disponibilizar ferramentas e conscientizar os pesquisadores a compartilharem seus dados e códigos-fonte, juntamente com as publicações científicas, são, em muitos casos, insuficientes para garantir a reprodutibilidade e o reuso das contribuições científicas. Esta pesquisa objetiva definir uma estratégia computacional para o apoio ao reuso e a reprodutibilidade dos dados científicos, por meio da gestão da proveniência dos dados durante o seu ciclo de vida. A estratégia proposta nesta pesquisa é apoiada em dois componentes principais, um perfil de aplicação, que define um modelo padronizado para a descrição da proveniência dos dados, e uma arquitetura computacional para a gestão dos metadados de proveniência, que permite a descrição, armazenamento e compartilhamento destes metadados em ambientes distribuídos e heterogêneos. Foi desenvolvido um protótipo funcional para a realização de dois estudos de caso que consideraram a gestão dos metadados de proveniência de experimentos de modelagem de distribuição de espécies. Estes estudos de caso possibilitaram a validação da estratégia computacional proposta na pesquisa, demonstrando o seu potencial no apoio à gestão de dados científicos. / Modern science, supported by e-science, has faced challenges in dealing with the large volume and variety of data generated primarily by technological advances in the processes of collecting and processing scientific data. Therefore, there was also an increase in the complexity of the analysis and experimentation processes. These processes currently involve multiple data sources and numerous activities performed by geographically distributed research groups, which must be understood, reused and reproducible. However, initiatives by the scientific community with the goal of developing tools and sensitize researchers to share their data and source codes related to their findings, along with scientific publications, are often insufficient to ensure the reproducibility and reuse of scientific results. This research aims to define a computational strategy to support the reuse and reproducibility of scientific data through data provenance management during its entire life cycle. Two principal components support our strategy in this research, an application profile that defines a standardized model for the description of provenance metadata, and a computational architecture for the management of the provenance metadata that enables the description, storage and sharing of these metadata in distributed and heterogeneous environments. We developed a functional prototype for the accomplishment of two case studies that considered the management of provenance metadata during the experiments of species distribution modeling. These case studies enabled the validation of the computational strategy proposed in the research, demonstrating the potential of this strategy in supporting the management of scientific data.
64

Management, visualisation & mining of quantitative proteomics data

Ahmad, Yasmeen January 2012 (has links)
Exponential data growth in life sciences demands cross discipline work that brings together computing and life sciences in a usable manner that can enhance knowledge and understanding in both fields. High throughput approaches, advances in instrumentation and overall complexity of mass spectrometry data have made it impossible for researchers to manually analyse data using existing market tools. By applying a user-centred approach to effectively capture domain knowledge and experience of biologists, this thesis has bridged the gap between computation and biology through software, PepTracker (http://www.peptracker.com). This software provides a framework for the systematic detection and analysis of proteins that can be correlated with biological properties to expand the functional annotation of the genome. The tools created in this study aim to place analysis capabilities back in the hands of biologists, who are expert in evaluating their data. Another major advantage of the PepTracker suite is the implementation of a data warehouse, which manages and collates highly annotated experimental data from numerous experiments carried out by many researchers. This repository captures the collective experience of a laboratory, which can be accessed via user-friendly interfaces. Rather than viewing datasets as isolated components, this thesis explores the potential that can be gained from collating datasets in a “super-experiment” ideology, leading to formation of broad ranging questions and promoting biology driven lines of questioning. This has been uniquely implemented by integrating tools and techniques from the field of Business Intelligence with Life Sciences and successfully shown to aid in the analysis of proteomic interaction experiments. Having conquered a means of documenting a static proteomics snapshot of cells, the proteomics field is progressing towards understanding the extremely complex nature of cell dynamics. PepTracker facilitates this by providing the means to gather and analyse many protein properties to generate new biological insight, as demonstrated by the identification of novel protein isoforms.
65

Automating an Engine to Extract Educational Priorities for Workforce City Innovation

Hobbs, Madison 01 January 2019 (has links)
This thesis is grounded in my work done through the Harvey Mudd College Clinic Program as Project Manager of the PilotCity Clinic Team. PilotCity is a startup whose mission is to transform small to mid-sized cities into centers of innovation by introducing employer partnerships and work-based learning to high school classrooms. The team was tasked with developing software and algorithms to automate PilotCity's programming and to extract educational insights from unstructured data sources like websites, syllabi, resumes, and more. The team helped engineer a web application to expand and facilitate PilotCity's usership, designed a recommender system to automate the process of matching employers to high school classrooms, and packaged a topic modeling module to extract educational priorities from more complex data such as syllabi, course handbooks, or other educational text data. Finally, the team explored automatically generating supplementary course resources using insights from topic models. This thesis will detail the team's process from beginning to final deliverables including the methods, implementation, results, challenges, future directions, and impact of the project.
66

Science des données au service des réseaux d'opérateur : proposition de cas d’utilisation, d’outils et de moyens de déploiement / Data science at the service of operator networks

Samba, Alassane 29 October 2018 (has links)
L'évolution des télécommunications amené aujourd'hui à un foisonnement des appareils connectés et une massification des services multimédias. Face à cette demande accrue de service, les opérateurs ont besoin d'adapter le fonctionnement de leurs réseaux, afin de continuer à garantir un certain niveau de qualité d'expérience à leurs utilisateurs. Pour ce faire, les réseaux d'opérateur tendent vers un fonctionnement plus cognitif voire autonomique. Il s'agit de doter les réseaux de moyens d'exploiter toutes les informations ou données à leur disposition, les aidant à prendre eux-mêmes les meilleures décisions sur leurs services et leur fonctionnement, voire s'autogérer. Il s'agit donc d'introduire de l'intelligence artificielle dans les réseaux. Cela nécessite la mise en place de moyens d'exploiter les données, d'effectuer surelles de l'apprentissage automatique de modèles généralisables, apportant l’information qui permet d'optimiser les décisions. L'ensemble de ces moyens constituent aujourd'hui une discipline scientifique appelée science des données. Cette thèse s'insère dans une volonté globale de montrer l'intérêt de l'introduction de la science des données dans différents processus d'exploitation des réseaux. Elle comporte deux contributions algorithmiques correspondant à des cas d'utilisation de la science des données pour les réseaux d'opérateur, et deux contributions logicielles, visant à faciliter, d'une part l'analyse, et d'autre part le déploiement des algorithmes issus de la science des données. Les résultats concluants de ces différents travaux ont démontré l'intérêt et la faisabilité de l'utilisation de la science des données pour l'exploitation des réseaux d'opérateur. Ces résultats ont aussi fait l'objet de plusieurs utilisations par des projets connexes. / The evolution of telecommunications has led today to a proliferation of connected devices and a massification of multimedia services. Faced with this increased demand for service, operators need to adapt the operation of their networks, in order to continue to guarantee a certain level of quality of experience to their users. To do this, operator networks tend towards a more cognitive or autonomic functioning. It is about giving the networks the means to exploit all the information or data at their disposal, helping them to make the best decisions about their services and operations,and even self-manage. It is therefore a questionof introducing artificial intelligence into networks. This requires setting up means to exploit the data, to carry out on them the automatic learning of generalizable models, providing information that can optimize decisions. All these means today constitute a scientific discipline called data science. This thesis fits into a global desire to show the interest of the introduction of data science in different network operating processes. It inlcudes two algorithmic contributions corresponding to use cases of data science for the operator networks, and two software contributions, aiming to facilitate,on the one hand, the analysis, and on the other hand the deployment of the algorithms produced through data science. The conclusive results of these various studies have demonstrated the interest and the feasibility of using data science for the exploitation of operator networks. These results have also been used by related projects.
67

Smart Classifiers and Bayesian Inference for Evaluating River Sensitivity to Natural and Human Disturbances: A Data Science Approach

Underwood, Kristen 01 January 2018 (has links)
Excessive rates of channel adjustment and riverine sediment export represent societal challenges; impacts include: degraded water quality and ecological integrity, erosion hazards to infrastructure, and compromised public safety. The nonlinear nature of sediment erosion and deposition within a watershed and the variable patterns in riverine sediment export over a defined timeframe of interest are governed by many interrelated factors, including geology, climate and hydrology, vegetation, and land use. Human disturbances to the landscape and river networks have further altered these patterns of water and sediment routing. An enhanced understanding of river sediment sources and dynamics is important for stakeholders, and will become more critical under a nonstationary climate, as sediment yields are expected to increase in regions of the world that will experience increased frequency, persistence, and intensity of storm events. Practical tools are needed to predict sediment erosion, transport and deposition and to characterize sediment sources within a reasonable measure of uncertainty. Water resource scientists and engineers use multidimensional data sets of varying types and quality to answer management-related questions, and the temporal and spatial resolution of these data are growing exponentially with the advent of automated samplers and in situ sensors (i.e., “big data”). Data-driven statistics and classifiers have great utility for representing system complexity and can often be more readily implemented in an adaptive management context than process-based models. Parametric statistics are often of limited efficacy when applied to data of varying quality, mixed types (continuous, ordinal, nominal), censored or sparse data, or when model residuals do not conform to Gaussian distributions. Data-driven machine-learning algorithms and Bayesian statistics have advantages over Frequentist approaches for data reduction and visualization; they allow for non-normal distribution of residuals and greater robustness to outliers. This research applied machine-learning classifiers and Bayesian statistical techniques to multidimensional data sets to characterize sediment source and flux at basin, catchment, and reach scales. These data-driven tools enabled better understanding of: (1) basin-scale spatial variability in concentration-discharge patterns of instream suspended sediment and nutrients; (2) catchment-scale sourcing of suspended sediments; and (3) reach-scale sediment process domains. The developed tools have broad management application and provide insights into landscape drivers of channel dynamics and riverine solute and sediment export.
68

Nonparametric Inference for High Dimensional Data

Mukhopadhyay, Subhadeep 03 October 2013 (has links)
Learning from data, especially ‘Big Data’, is becoming increasingly popular under names such as Data Mining, Data Science, Machine Learning, Statistical Learning and High Dimensional Data Analysis. In this dissertation we propose a new related field, which we call ‘United Nonparametric Data Science’ - applied statistics with “just in time” theory. It integrates the practice of traditional and novel statistical methods for nonparametric exploratory data modeling, and it is applicable to teaching introductory statistics courses that are closer to modern frontiers of scientific research. Our framework includes small data analysis (combining traditional and modern nonparametric statistical inference), big and high dimensional data analysis (by statistical modeling methods that extend our unified framework for small data analysis). The first part of the dissertation (Chapters 2 and 3) has been oriented by the goal of developing a new theoretical foundation to unify many cultures of statistical science and statistical learning methods using mid-distribution function, custom made orthonormal score function, comparison density, copula density, LP moments and comoments. It is also examined how this elegant theory yields solution to many important applied problems. In the second part (Chapter 4) we extend the traditional empirical likelihood (EL), a versatile tool for nonparametric inference, in the high dimensional context. We introduce a modified version of the EL method that is computationally simpler and applicable to a large class of “large p small n” problems, allowing p to grow faster than n. This is an important step in generalizing the EL in high dimensions beyond the p ≤ n threshold where the standard EL and its existing variants fail. We also present detailed theoretical study of the proposed method.
69

Evolutionary conservation and diversification of complex synaptic function in human proteome

Pajak, Maciej January 2018 (has links)
The evolution of synapses from early proto-synaptic protein complexes in unicellular eukaryotes to sophisticated machines comprising thousands of proteins parallels the emergence of finely tuned synaptic plasticity, a molecular correlate for memory and learning. Phenotypic change in organisms is ultimately the result of evolution of their genotype at the molecular level. Selection pressure is a measure of how changes in genome sequence that arise though naturally occurring processes in populations are fixed or eliminated in subsequent generations. Inferring phylogenetic information about proteins such as the variation of selection pressure across coding sequences can provide valuable information not only about the origin of proteins, but also the contribution of specific sites within proteins to their current roles within an organism. Recent evolutionary studies of synaptic proteins have generated attractive hypotheses about the emergence of finely-tuned regulatory mechanisms in the post-synaptic proteome related to learning, however, these analyses are relatively superficial. In this thesis, I establish a scalable molecular phylogenetic modelling framework based on three new inference methodologies to investigate temporal and spatial aspects of selection pressure changes for the whole human proteome using protein orthologs from up to 68 taxa. Temporal modelling of evolutionary selection pressure reveals informative features and patterns for the entire human proteome and identifies groups of proteins that share distinct diversification timelines. Multi-ontology enrichment analysis of these gene cohorts was used to aid biological interpretation, but these approaches are statistically under powered and do not capture a clear picture of the emergence of synaptic plasticity. Subsequent pathway-centric analysis of key synaptic pathways extends the interpretation of temporal data and allows for revision of previous hypotheses about the evolution of complex synaptic function. I proceed to integrate inferred selection pressure timeline information in the context of static protein-protein interaction data. A network analysis of the full human proteome reveals systematic patterns linking the temporal profile of proteins’ evolution and their topological role in the interaction graph. These graphs were used to test a mechanistic hypothesis that proposed a propagating diversification signal between interactors using the temporal modelling data and network analysis tools. Finally, I analyse the data of amino-acid level spatial modelling of selection pressure events in Arc, one of the master regulators of synaptic plasticity, and its interactors for which detailed experimental data is available. I use the Arc interactome as an example to discuss episodic and localised diversifying selection pressure events in tightly coupled complexes of protein and showcase potential for a similar systematic analysis of larger complexes of proteins using a pathway-centric approach. Through my work I revised our understanding of temporal evolutionary patterns that shaped contemporary synaptic function through profiling of emergence and refinement of proteins in multiple pathways of the nervous system. I also uncovered systematic effects linking dependencies between proteins with their active diversification, and hypothesised about their extension to domain level selection pressure events.
70

Crossing the Chasm: Deploying Machine Learning Analytics in Dynamic Real-World Scenarios

January 2016 (has links)
abstract: The dawn of Internet of Things (IoT) has opened the opportunity for mainstream adoption of machine learning analytics. However, most research in machine learning has focused on discovery of new algorithms or fine-tuning the performance of existing algorithms. Little exists on the process of taking an algorithm from the lab-environment into the real-world, culminating in sustained value. Real-world applications are typically characterized by dynamic non-stationary systems with requirements around feasibility, stability and maintainability. Not much has been done to establish standards around the unique analytics demands of real-world scenarios. This research explores the problem of the why so few of the published algorithms enter production and furthermore, fewer end up generating sustained value. The dissertation proposes a ‘Design for Deployment’ (DFD) framework to successfully build machine learning analytics so they can be deployed to generate sustained value. The framework emphasizes and elaborates the often neglected but immensely important latter steps of an analytics process: ‘Evaluation’ and ‘Deployment’. A representative evaluation framework is proposed that incorporates the temporal-shifts and dynamism of real-world scenarios. Additionally, the recommended infrastructure allows analytics projects to pivot rapidly when a particular venture does not materialize. Deployment needs and apprehensions of the industry are identified and gaps addressed through a 4-step process for sustainable deployment. Lastly, the need for analytics as a functional area (like finance and IT) is identified to maximize the return on machine-learning deployment. The framework and process is demonstrated in semiconductor manufacturing – it is highly complex process involving hundreds of optical, electrical, chemical, mechanical, thermal, electrochemical and software processes which makes it a highly dynamic non-stationary system. Due to the 24/7 uptime requirements in manufacturing, high-reliability and fail-safe are a must. Moreover, the ever growing volumes mean that the system must be highly scalable. Lastly, due to the high cost of change, sustained value proposition is a must for any proposed changes. Hence the context is ideal to explore the issues involved. The enterprise use-cases are used to demonstrate the robustness of the framework in addressing challenges encountered in the end-to-end process of productizing machine learning analytics in dynamic read-world scenarios. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2016

Page generated in 0.0533 seconds