• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 162
  • 20
  • 11
  • 11
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 323
  • 323
  • 135
  • 111
  • 81
  • 69
  • 66
  • 44
  • 43
  • 42
  • 39
  • 38
  • 36
  • 35
  • 34
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
261

Phénomène Big Data en entreprise : processus projet, génération de valeur et Médiation Homme-Données / Big Data phenomenon : project workflow, value generation & Human-Data Mediation

Nesvijevskaia, Anna 18 October 2019 (has links)
Le Big Data, phénomène sociotechnique porteur de mythes, se traduit dans les entreprises par la mise en place de premiers projets, plus particulièrement des projets de Data Science. Cependant, ils ne semblent pas générer la valeur espérée. La recherche-action menée au cours de 3 ans sur le terrain, à travers une étude qualitative approfondie de cas multiples, pointe des facteurs clés qui limitent cette génération de valeur, et notamment des modèles de processus projet trop autocentrés. Le résultat est (1) un modèle ajusté de dispositif projet data (Brizo_DS), ouvert et orienté sur les usages, dont la capitalisation de connaissances, destiné à réduire les incertitudes propres à ces projets exploratoires, et transposable à l’échelle d’une gestion de portefeuille de projets data en entreprise. Il est complété par (2) un outil de documentation de la qualité des données traitées, le Databook, et par (3) un dispositif de Médiation Homme-Données, qui garantissent l’alignement des acteurs vers un résultat optimal. / Big Data, a sociotechnical phenomenon carrying myths, is reflected in companies by the implementation of first projects, especially Data Science projects. However, they do not seem to generate the expected value. The action-research carried out over the course of 3 years in the field, through an in-depth qualitative study of multiple cases, points to key factors that limit this generation of value, including overly self-contained project process models. The result is (1) an open data project model (Brizo_DS), orientated on the usage, including knowledge capitalization, intended to reduce the uncertainties inherent in these exploratory projects, and transferable to the scale of portfolio management of corporate data projects. It is completed with (2) a tool for documenting the quality of the processed data, the Databook, and (3) a Human-Data Mediation device, which guarantee the alignment of the actors towards an optimal result.
262

A comparative study between algorithms for time series forecasting on customer prediction : An investigation into the performance of ARIMA, RNN, LSTM, TCN and HMM

Almqvist, Olof January 2019 (has links)
Time series prediction is one of the main areas of statistics and machine learning. In 2018 the two new algorithms higher order hidden Markov model and temporal convolutional network were proposed and emerged as challengers to the more traditional recurrent neural network and long-short term memory network as well as the autoregressive integrated moving average (ARIMA). In this study most major algorithms together with recent innovations for time series forecasting is trained and evaluated on two datasets from the theme park industry with the aim of predicting future number of visitors. To develop models, Python libraries Keras and Statsmodels were used. Results from this thesis show that the neural network models are slightly better than ARIMA and the hidden Markov model, and that the temporal convolutional network do not perform significantly better than the recurrent or long-short term memory networks although having the lowest prediction error on one of the datasets. Interestingly, the Markov model performed worse than all neural network models even when using no independent variables.
263

O impacto da capacidade de inteligência analítica de negócios na tomada de decisões na era dos grandes dados

Medeiros, Mauricius Munhoz de 27 February 2018 (has links)
Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2018-07-12T16:11:01Z No. of bitstreams: 1 Mauricius Munhoz de Medeiros_.pdf: 3956183 bytes, checksum: b542e87eedac0be0bf8f79f48709672f (MD5) / Made available in DSpace on 2018-07-12T16:11:01Z (GMT). No. of bitstreams: 1 Mauricius Munhoz de Medeiros_.pdf: 3956183 bytes, checksum: b542e87eedac0be0bf8f79f48709672f (MD5) Previous issue date: 2018-02-27 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Este estudo investigou o impacto das capacidades de inteligência analítica de negócios na expansão das capacidades cognitivas gerenciais, orientando a tomada de decisões (com base nos dados), de modo ágil (dinâmico), para a melhoria da gestão do desempenho organizacional. Explicou-se o fenômeno sob a perspectiva teórica das capacidades dinâmicas. Para a definição dos construtos, foram revisados, também, os elementos teóricos a respeito das capacidades de inteligência analítica de negócios e tomada de decisões. Executou-se uma pesquisa de métodos mistos, desenhada em duas etapas. A primeira, exploratória, realizada através de entrevistas com 10 gestores, permitiu o mapeamento dos relacionamentos e a identificação das variáveis, oportunizando o desenvolvimento do instrumento quantitativo. A segunda, confirmatória, realizada através de uma survey com 366 respondentes, cujos resultados foram analisados para validar o instrumento de pesquisa e mensurar o impacto por meio da modelagem de uma equação estrutural, confirmando-se 5 das 7 hipóteses definidas no modelo conceitual. O cerne da discussão está na explicação do impacto das capacidades de inteligência analítica de negócios na tomada decisões, onde os achados evidenciam impacto significativo das capacidades de inteligência analítica gerencial, governança e processamento de grandes dados, e analítica avançada de negócios. A pesquisa contribui para a teoria, por ter explicado as capacidades de inteligência analítica de negócios como capacidades dinâmicas, bem como pelo desenvolvimento e validação de um instrumento para a mensuração integrada dessas capacidades. Para o campo gerencial, o estudo aponta direcionamentos e recomendações ao indicar potencialidades e limitações para o desenvolvimento dessas capacidades. / This study investigated the impact of business analytical intelligence capabilities on the expansion of managerial cognitive capabilities, orienting decision making (based on data) in an agile (dynamic) way, to improve organizational performance management. The phenomenon was explained according to the theoretical perspective of dynamic capabilities. For the definition of the constructs, the theoretical elements regarding business analytical intelligence capabilities and decision making were also reviewed. A mixed-method research was carried out in two stages. The first, which was exploratory, was conducted through interviews with 10 managers and allowed the mapping of relationships and identification of variables, allowing the development of the quantitative instrument. The second, which was confirmatory, was performed through a survey with 366 interviewees, which results were analyzed to validate the research instrument and measure the impact through the modeling of a structural equation, confirming 5 of the 7 hypotheses defined in the conceptual model. The heart of the discussion lies in the explanation of the impact of business analytical intelligence capabilities on decision making, in which the findings evidence significant impact of managerial analytical intelligence capabilities, governance and the processing of big data, and advanced business analytics. This research contributes to the theory by explaining business analytical intelligence capabilities as dynamic capabilities, as well as by developing and validating an instrument for the integrated measurement of these capabilities. For the managerial field, this study points out directions and recommendations when indicating potentialities and limitations for the development of these capabilities.
264

Understanding the dynamics of embryonic stem cell differentiation

Strawbridge, Stanley Eugene January 2019 (has links)
The two defining features of mouse embryonic stem (ES) cells are self-renewal and naive pluripotency, the ability to give rise to all cell lineages in the adult body. In addition to being a unique and interesting cell type, pluripotent ES cells have demonstrated their potential for continued advancements in biomedical science. Currently, there is an improved understanding in the chemical signals and the gene regulatory network responsible for the maintenance of ES cells in the naive pluripotent state. However, less is understood about how ES cells exit pluripotency. My main aim is to study the dynamics and the factors affecting the irreversible exit from pluripotency. Expression of the reporter Rex1-GFPd2, which is inactivated upon exit from naive pluripotency, was analyzed by quantitative long-term single-cell imaging over many generations. This technique allowed chemical, physical, and genealogical information to be recorded during the transition to exit. Culture conditions that provided homogeneous populations were used in all assays and these data were validated against bulk-culture data where appropriate. Changes in real-time cell behavior were seen in cell-cell contact, motility, and cell-cycle duration. Undifferentiated ES cells form tightly joined colonies, with cells that exhibit low motility and a constant cell-cycle duration. Exit is associated with increasing cell motility, decreased cell-cell contact, and an acceleration in cell proliferation. The onset of exit is associated with a sudden and irreversible inactivation of the Rex1-GFPd2 reporter. This inactivation is asynchronous, as it occurs at different times and in different generations during ES cell differentiation. However, examination of daughter cells generated from the same mother revealed a high level of synchronicity. Further investigation revealed that high levels of correlation in cell-cycle duration and Rex1-GFPd2 expression exist between differentiating sister and cousin cells, providing strong evidence that cell potency is inherited symmetrically in cell divisions during exit $\textit{in vitro}$. How cells change fate is a fundamental question in developmental biology. Knowing the cellular dynamics during the transition out of naive pluripotency is important for harnessing the potential of ES cells and understanding how cell fate decisions are made during embryonic development. The quantification of the timing of exit from naive pluripotency coupled with identifiable changes in cellular behaviors, such as motility, cell size, and cell-cycle duration, enhances the understanding of how cell fate changes are regulated during directed differentiation.
265

Exploration of an Automated Motivation Letter Scoring System to Emulate Human Judgement

Munnecom, Lorenna, Pacheco, Miguel Chaves de Lemos January 2020 (has links)
As the popularity of the master’s in data science at Dalarna University increases, so does the number of applicants. The aim of this thesis was to explore different approaches to provide an automated motivation letter scoring system which could emulate the human judgement and automate the process of candidate selection. Several steps such as image processing and text processing were required to enable the authors to retrieve numerous features which could lead to the identification of the factors graded by the program managers. Grammatical based features and Advanced textual features were extracted from the motivation letters followed by the application of Topic Modelling methods to extract the probability of each topics occurring within a motivation letter. Furthermore, correlation analysis was applied to quantify the association between the features and the different factors graded by the program managers, followed by Ordinal Logistic Regression and Random Forest to build models with the most impactful variables. Finally, Naïve Bayes Algorithm, Random Forest and Support Vector Machine were used, first for classification and then for prediction purposes. These results were not promising as the factors were not accurately identified. Nevertheless, the authors suspected that the factors may be strongly related to the highlight of specific topics within a motivation letter which can lead to further research.
266

A Close Look at the Transient Sky in a Neighbouring Galaxy

Tikare, Kiran January 2020 (has links)
Study of the time variable sources and phenomena in Astrophysics provides us with important insights into the stellar evolution, galactic evolution, stellar population studies and cosmological studies such as number density of dark massive objects. Study of these sources and phenomena forms the basis of Time Domain surveys, where the telescopes while scanning the sky regularly for a period of time provides us with positional and temporal data of various Astrophysical sources and phenomena happening in the Universe. Our vantage point within the Milky Way galaxy greatly limits studying our galaxy in its entirety. In such a scenario our nearest neighbour The Andromeda galaxy (M31) proves to be an excellent choice as its proximity and inclination allows us to resolve millions of stars using space based telescopes. Zwicky Transient Facility (ZTF) is a new optical time domain survey at Palomar Observatory, which has collected data in the direction of M31 for over 6 months using multiple filters. This Thesis involves exploitation of this rich data set. Stars in M31 are not resolved in ZTF as it is a ground based facility. This requires us to use the large public catalogue of stars observed with Hubble Space Telescope (HST): The Panchromatic Hubble Andromeda Treasury (PHAT). The PHAT catalogue provides us with stellar coordinates and observed brightness for millions of resolved stars in the direction of the M31 in multiple filters. Processing of the large volumes of data generated by the time domain surveys, requires us to develop new data processing pipelines and utilize statistical techniques for determining various statistical features of the data and using machine learning algorithms to classify the data into different categories. End result of such processing of the data is the astronomical catalogues of various astrophysical sources and phenomena and their light curves. In this thesis we have developed a data processing and analysis pipeline based on Forced Aperture Photometry Technique. Since the stars are not resolved in ZTF, we performed photometry at pixel level. Only small portion of the ZTF dataset has been analyzed and photometric light curves have been generated for few interesting sources. In our preliminary investigations we have used a Machine Learning Algorithm to classify the resulting time series data into different categories. We also performed cross comparison with data from other studies in the region of the Andromeda galaxy.
267

Towards Privacy and Communication Efficiency in Distributed Representation Learning

Sheikh S Azam (12836108) 10 June 2022 (has links)
<p>Over the past decade, distributed representation learning has emerged as a popular alternative to conventional centralized machine learning training. The increasing interest in distributed representation learning, specifically federated learning, can be attributed to its fundamental property that promotes data privacy and communication savings. While conventional ML encourages aggregating data at a central location (e.g., data centers), distributed representation learning advocates keeping data at the source and instead transmitting model parameters across the network. However, since the advent of deep learning, model sizes have become increasingly large often comprising million-billions of parameters, which leads to the problem of communication latency in the learning process. In this thesis, we propose to tackle the problem of communication latency in two different ways: (i) learning private representation of data to enable its sharing, and (ii) reducing the communication latency by minimizing the corresponding long-range communication requirements.</p> <p><br></p> <p>To tackle the former goal, we first start by studying the problem of learning representations that are private yet informative, i.e., providing information about intended ''ally'' targets while hiding sensitive ''adversary'' attributes. We propose Exclusion-Inclusion Generative Adversarial Network (EIGAN), a generalized private representation learning (PRL) architecture that accounts for multiple ally and adversary attributes, unlike existing PRL solutions. We then address the practical constraints of the distributed datasets by developing Distributed EIGAN (D-EIGAN), the first distributed PRL method that learns a private representation at each node without transmitting the source data. We theoretically analyze the behavior of adversaries under the optimal EIGAN and D-EIGAN encoders and the impact of dependencies among ally and adversary tasks on the optimization objective. Our experiments on various datasets demonstrate the advantages of EIGAN in terms of performance, robustness, and scalability. In particular, EIGAN outperforms the previous state-of-the-art by a significant accuracy margin (47% improvement), and D-EIGAN's performance is consistently on par with EIGAN under different network settings.</p> <p><br></p> <p>We next tackle the latter objective - reducing the communication latency - and propose two timescale hybrid federated learning (TT-HF), a semi-decentralized learning architecture that combines the conventional device-to-server communication paradigm for federated learning with device-to-device (D2D) communications for model training. In TT-HF, during each global aggregation interval, devices (i) perform multiple stochastic gradient descent iterations on their individual datasets, and (ii) aperiodically engage in consensus procedure of their model parameters through cooperative, distributed D2D communications within local clusters. With a new general definition of gradient diversity, we formally study the convergence behavior of TT-HF, resulting in new convergence bounds for distributed ML. We leverage our convergence bounds to develop an adaptive control algorithm that tunes the step size, D2D communication rounds, and global aggregation period of TT-HF over time to target a sublinear convergence rate of O(1/t) while minimizing network resource utilization. Our subsequent experiments demonstrate that TT-HF significantly outperforms the current art in federated learning in terms of model accuracy and/or network energy consumption in different scenarios where local device datasets exhibit statistical heterogeneity. Finally, our numerical evaluations demonstrate robustness against outages caused by fading channels, as well favorable performance with non-convex loss functions.</p>
268

Data Science Approaches on Brain Connectivity: Communication Dynamics and Fingerprint Gradients

Uttara Vinay Tipnis (10514360) 07 May 2021 (has links)
<div>The innovations in Magnetic Resonance Imaging (MRI) in the recent decades have given rise to large open-source datasets. MRI affords researchers the ability to look at both structure and function of the human brain. This dissertation will make use of one of these large open-source datasets, the Human Connectome Project (HCP), to study the structural and functional connectivity in the brain.</div><div>Communication processes within the human brain at different cognitive states are neither well understood nor completely characterized. We assess communication processes in the human connectome using ant colony-inspired cooperative learning algorithm, starting from a source with no <i>a priori</i> information about the network topology, and cooperatively searching for the target through a pheromone-inspired model. This framework relies on two parameters, namely <i>pheromone</i> and <i>edge perception</i>, to define the cognizance and subsequent behaviour of the ants on the network and the communication processes happening between source and target. Simulations with different configurations allow the identification of path-ensembles that are involved in the communication between node pairs. In order to assess the different communication regimes displayed on the simulations and their associations with functional connectivity, we introduce two network measurements, effective path-length and arrival rate. These measurements are tested as individual and combined descriptors of functional connectivity during different tasks. Finally, different communication regimes are found in different specialized functional networks. This framework may be used as a test-bed for different communication regimes on top of an underlying topology.</div><div>The assessment of brain <i>fingerprints</i> has emerged in the recent years as an important tool to study individual differences. Studies so far have mainly focused on connectivity fingerprints between different brain scans of the same individual. We extend the concept of brain connectivity fingerprints beyond test/retest and assess <i>fingerprint gradients</i> in young adults by developing an extension of the differential identifiability framework. To do so, we look at the similarity between not only the multiple scans of an individual (<i>subject fingerprint</i>), but also between the scans of monozygotic and dizygotic twins (<i>twin fingerprint</i>). We have carried out this analysis on the 8 fMRI conditions present in the Human Connectome Project -- Young Adult dataset, which we processed into functional connectomes (FCs) and time series parcellated according to the Schaefer Atlas scheme, which has multiple levels of resolution. Our differential identifiability results show that the fingerprint gradients based on genetic and environmental similarities are indeed present when comparing FCs for all parcellations and fMRI conditions. Importantly, only when assessing optimally reconstructed FCs, we fully uncover fingerprints present in higher resolution atlases. We also study the effect of scanning length on subject fingerprint of resting-state FCs to analyze the effect of scanning length and parcellation. In the pursuit of open science, we have also made available the processed and parcellated FCs and time series for all conditions for ~1200 subjects part of the HCP-YA dataset to the scientific community.</div><div>Lastly, we have estimated the effect of genetics and environment on the original and optimally reconstructed FC with an ACE model.</div>
269

Les mises en forme algorithmiques, ruptures et continuités dans la quantification du social

Lareau, Justine 08 1900 (has links)
Ce mémoire de maîtrise porte sur les algorithmes de « data mining » et de « machine learning », constitutifs d’un domaine que l’on appelle plus récemment la « science des données ». Pour essayer d’éclairer la portée et la spécificité des enjeux que leur usage soulève dans nos sociétés, il est proposé d’interroger le rapport qu’ils entretiennent avec les fondements et les limites des outils plus traditionnels de la statistique sociale/mathématique, bien documentés en sociologie, à l'égard notamment du « langage des variables » et du raisonnement expérimental « toutes choses égales par ailleurs ». En inscrivant l’approche au croisement de la sociologie de la connaissance et de la quantification, le cadre conceptuel s’inspire de l’épistémologie comparative de Gilles-Gaston Granger, de la « méta-épistémologie historique » de Ian Hacking et de la sociohistoire de la statistique sociale d’Alain Desrosières. Par l’idée de mises en forme algorithmique de la vie sociale, les algorithmes de calcul sont envisagés comme modes d’investigation, partiellement ou complètement automatisés, procédant à des mises en forme et en ordre plurielles et différenciées du social et de ses propriétés. À partir de données de Statistique Canada servant à étayer plus concrètement les formes de connaissances produites et les visées d’objets qu’elles délimitent en termes de possibilités et de contraintes d’expérience, la présente étude de cas entreprend d’examiner le clivage des méthodes « classiques » et « contemporaines » à l’intérieur du cadre supervisé de l’apprentissage. Pour ce faire, trois techniques/familles d’algorithmes sont comparées sous l’angle de leurs opérations d’analyse: 1) les méthodes de régression logistique, 2) les arbres de décision et 3) les forêts aléatoires. L’objectif de cette analyse sociologique théorique comme empirique est d’examiner comment ces approches opèrent certains modes de classification et facilitent ou défavorisent des représentations du monde et de l’individu. Le travail conduit plus généralement à ouvrir quelques pistes de réflexion quant aux rapports de compatibilité et d’incompatibilité des formes de raisonnement du style statistique et probabiliste avec certains états du développement de la sociologie. / This master's thesis focuses on data mining and machine learning algorithms, constituting a field more recently called “data science”. To try to shed light on the specificity of the issues they raise in our societies, it is proposed to question the relationship they maintain with the foundations and the limits of the more “classic” tools of mathematical statistics in sociology, with regard in particular to the “language of variables” and to the experimental reasoning “all other things being equal” (cetaris paribus). By placing the approach at the intersection of the sociology of knowledge and quantification, the conceptual framework is inspired by the comparative epistemology of Gilles-Gaston Granger (1920-2016), the historical meta-epistemology of Ian Hacking (1936-) and the sociohistory of social statistics by Alain Desrosières (1940-2013). Through the idea of “mises en forme algorithmique de la vie sociale”, computational algorithms are considered as partially or completely automated types of investigation, carrying out plural and differentiated of shaping and ordering of the social and its properties. Using data from Statistics Canada used to more concretely support the forms of knowledge produced as well as the possibilities and experience constraints that they define, this case study sets out to examine the divide between “classical” and more “contemporary” methods of analysis within the framework of “supervised” learning. To do this, three algorithm techniques (or families of algorithms) are compared from the angle of their knowledge operations: 1) logistic regressions, 2) decision trees and 3) random forests. The objective of this theoretical as well as empirical work is to examine how these approaches operate certain modes of classification, facilitate or disadvantage representations of the world and can also be performative in social activities. The research work more generally leads to opening up some avenues of reflection as to the compatibility and incompatibility relationships of the forms of reasoning of the statistical and probabilistic style with certain states of development in society and in sociology.
270

Habilidades del equipo de ciencia de datos en la empresa moderna

Loayza Castañeda, Flor de María Micaela, Rubio Rodríguez, Estefanía Yuvel 28 November 2021 (has links)
En esta investigación se presentan postulados que diversos autores exponen sobre las habilidades del equipo de ciencia de datos en la empresa moderna; por ello, es importante conocer los antecedentes de la ciencia de datos, sus características relevantes, las habilidades blandas y técnicas del científico de datos, y la productividad científica que aporta a las organizaciones y sociedad, a fin de entender cómo esta ciencia contribuye a la mejora de las relaciones en el nivel empresarial y social. El trabajo se divide en seis partes: en primer lugar, se presenta de manera detallada cuál ha sido el método de investigación utilizado para recopilar la información. En el segundo apartado, se aborda los puntos relacionados a la ciencia de datos. El tercer apartado describe los rasgos que componen al denominado científico de datos: sus roles y funciones. En cuarto lugar, se presenta a la empresa moderna como el principal receptor de la denominada 4a Revolución Industrial. El quinto apartado se muestran los análisis y resultados de la presente investigación. Por último, en sexto lugar, se expone las conclusiones a las que se han arribado luego de reflexiones y debates, tanto individuales como grupales. / In this work, postulates that various authors expose about the skills of the data science team in modern companies are presented. Therefore, it is important to know the background of data science, its relevant characteristics, soft and technical skills of the data scientist and the scientific productivity it brings to organizations and society, to understand how this science contributes to the improvement of relationships at the business and social level. This work is divided into six parts. In the first place, the research method used to collect the work information is presented in detail. In the second section, the points related to data science are addressed. The third section describes the traits that make up the so-called data scientist: their roles and functions. Fourth, the modern company is presented as the main recipient of the so-called 4th Industrial Revolution. The fifth section shows the analysis and results of our research are presented. Finally, in sixth place, exposes the conclusions reached after reflections and debates, both individual and group. / Trabajo de Suficiencia Profesional

Page generated in 0.1325 seconds