Global ETD Search

131	Applications of Data Science to Healthcare Issues in Aging Population / 高齢化社会が抱える健康課題に対するデータ科学の応用 Ohki, Yu 25 March 2024 (has links) 学位プログラム名: 京都大学大学院思修館 / 京都大学 / 新制・課程博士 / 博士(総合学術) / 甲第25457号 / 総総博第33号 / 京都大学大学院総合生存学館総合生存学専攻 / (主査)准教授水本憲治, 教授齋藤敬, 教授今中雄一 / 学位規則第4条第1項該当 / Doctor of Philosophy / Kyoto University / DFAM Data Science Healthcare Issue Aging Population Network Science Data Analysis Data Management
132	High-variance multivariate time series forecasting using machine learning Katardjiev, Nikola January 2018 (has links) There are several tools and models found in machine learning that can be used to forecast a certain time series; however, it is not always clear which model is appropriate for selection, as different models are suited for different types of data, and domain-specific transformations and considerations are usually required. This research aims to examine the issue by modeling four types of machine- and deep learning algorithms - support vector machine, random forest, feed-forward neural network, and a LSTM neural network - on a high-variance, multivariate time series to forecast trend changes one time step in the future, accounting for lag.The models were trained on clinical trial data of patients in an alcohol addiction treatment plan provided by a Uppsala-based company. The results showed moderate performance differences, with a concern that the models were performing a random walk or naive forecast. Further analysis was able to prove that at least one model, the feed-forward neural network, was not undergoing this and was able to make meaningful forecasts one time step into the future. In addition, the research also examined the effec tof optimization processes by comparing a grid search, a random search, and a Bayesian optimization process. In all cases, the grid search found the lowest minima, though its slow runtimes were consistently beaten by Bayesian optimization, which contained only slightly lower performances than the grid search. / Det finns flera verktyg och modeller inom maskininlärning som kan användas för att utföra tidsserieprognoser, men det är sällan tydligt vilken modell som är lämplig vid val, då olika modeller är anpassade för olika sorts data. Denna forskning har som mål att undersöka problemet genom att träna fyra modeller - support vector machine, random forest, ett neuralt nätverk, och ett LSTM-nätverk - på en flervariabelstidserie med hög varians för att förutse trendskillnader ett tidssteg framåt i tiden, kontrollerat för tidsfördröjning. Modellerna var tränade på klinisk prövningsdata från patienter som deltog i en alkoholberoendesbehandlingsplan av ett Uppsalabaserat företag. Resultatet visade vissa moderata prestandaskillnader, och en oro fanns att modellerna utförde en random walk-prognos. I analysen upptäcktes det dock att den ena neurala nätverksmodellen inte gjorde en sådan prognos, utan utförde istället meningsfulla prediktioner. Forskningen undersökte även effekten av optimiseringsprocesser genomatt jämföra en grid search, random search, och Bayesisk optimisering. I alla fall hittade grid search lägsta minimumpunkten, men dess långsamma körtider blev konsistent slagna av Bayesisk optimisering, som även presterade på nivå med grid search. Data science alcohol abuse time series forecastin machine learning deep learning neural networks regression Data science alkoholmissbruk tidsserieanalys prognos maskininlärning deep learning neurala nätverk regression Information Systems, Social aspects
133	Kompendium der Online-Forschung (DGOF) Deutsche Gesellschaft für Online-Forschung e. V. (DGOF) 24 November 2021 (has links) Die DGOF veröffentlicht hier digitale Kompendien zu aktuellen Themen der Online-Forschung mit Fachbeiträgen von Experten und Expertinnen aus der Branche. info:eu-repo/classification/ddc/378 ddc:378
134	Machine Learning Modeling of Polymer Coating Formulations: Benchmark of Feature Representation Schemes Evbarunegbe, Nelson I 14 November 2023 (has links) (PDF) Polymer coatings offer a wide range of benefits across various industries, playing a crucial role in product protection and extension of shelf life. However, formulating them can be a non-trivial task given the multitude of variables and factors involved in the production process, rendering it a complex, high-dimensional problem. To tackle this problem, machine learning (ML) has emerged as a promising tool, showing considerable potential in enhancing various polymer and chemistry-based applications, particularly those dealing with high dimensional complexities. Our research aims to develop a physics-guided ML approach to facilitate the formulations of polymer coatings. As the first step, this project focuses on finding machine-readable feature representation techniques most suitable for encoding formulation ingredients. Utilizing two polymer-informatics datasets, one encompassing a large set of 700,000 common homopolymers including epoxies and polyurethanes as coating base materials while the other a relatively small set of 1000 data points of epoxy-diluent formulations, four featurization schemes to represent polymer coating molecules were benchmarked. They include the molecular access system, the extended connectivity fingerprint, molecular graph-based chemical graph network, and graph convolutional network (MG-GCN) embeddings. These representation schemes were used with ensemble models to predict molecular properties including topological surface area and viscosity. The results show that the combination of MG-GCN and ensemble models such as the extreme boosting machine and random forest models achieved the best overall performance, with coefficient of determination (r2) values of 0.74 in topological surface area and 0.84 in viscosity, which compare favorably with existing techniques. These results lay the foundation for using ML with physical modeling to expedite the development of polymer coating formulations. Machine Learning Deep Learning Data Science Ensemble Learning Feature Representation Polymer Coatings Formulations Graph Models Neural Networks Chemical Fingerprints Computational Chemistry Computer Sciences Data Science Polymer Chemistry Polymer Science
135	ONLINE STATISTICAL INFERENCE FOR LOW-RANK REINFORCEMENT LEARNING Qiyu Han (18284758) 01 April 2024 (has links) <p dir="ltr">We propose a fully online procedure to conduct statistical inference with adaptively collected data. The low-rank structure of the model parameter and the adaptivity nature of the data collection process make this task challenging: standard low-rank estimators are biased and cannot be obtained in a sequential manner while existing inference approaches in sequential decision-making algorithms fail to account for the low-rankness and are also biased. To tackle the challenges previously outlined, we first develop an online low-rank estimation process employing Stochastic Gradient Descent with noisy observations. Subsequently, to facilitate statistical inference using the online low-rank estimator, we introduced a novel online debiasing technique designed to address both sources of bias simultaneously. This method yields an unbiased estimator suitable for parameter inference. Finally, we developed an inferential framework capable of establishing an online estimator for performing inference on the optimal policy value. In theory, we establish the asymptotic normality of the proposed online debiased estimators and prove the validity of the constructed confidence intervals for both inference tasks. Our inference results are built upon a newly developed low-rank stochastic gradient descent estimator and its non-asymptotic convergence result, which is also of independent interest.</p> Context learning Reinforcement learning Statistical data science Online Decision-Making Low-rank matrix estimation Stochastic Gradient Descent (SGD) Reinforcement Learning Online inference
136	A proposal for an integrated framewoek capable of aggregating IoT data with diverse data types. / Uma proposta de um framework capaz de agregar dados de IoT com diversos tipos de dados. Faria, Maria Luisa Lopes de 30 March 2017 (has links) The volume of information in the Internet is growing exponentially. The ability to find intelligible information among vast amounts of data is transforming the human vision of the universe and everything within it. The underlying question then becomes which methods or techniques can be applied to transform the raw data into something intelligible, active and personal? This question is explored in this document by investigating techniques that improve intelligence for systems in order to make them perceptive/active to the recent information shared by each individual. Consequently, the main objective of this thesis is to enhance the experience of the user (individual) by providing a broad perspective about an event, which could result in improved ideas and better decisions. Therefore, three different data sources (individual data, sensor data, web data) have been investigated. This thesis includes research into techniques that process, interpret and reduce these data. By aggregating these techniques into a platform it is possible to deliver personalised information to applications and services. The contribution of this thesis is twofold. First, it presents a novel process that has shifted its focus from IoT technology to the user (or smart citizen). Second, this research shows that huge volumes of data can be reduced if the underlying sensor signal has adequate spectral properties to be filtered and good results can be obtained when employing a filtered sensor signal in applications. By investigating these areas it is possible to contribute to this new interconnected society by offering socially aware applications and services. / Sem resumo Data science for internet of things Inovações tecnológicas Internet das coisas Novas tecnologias da comunicação Redes e comunicação de dados Smart cities Smart lifestyle
137	Modelo de avaliação de conjuntos de dados científicos por meio da dimensão de veracidade dos dados. / Scientific datasets evaluation model based on the data veracity dimension. André Filipe de Moraes Batista 06 November 2018 (has links) A ciência é uma organização social: grupos de colaboração independentes trabalham para gerar conhecimento como um bem público. A credibilidade dos trabalhos científicos está enraizada nas evidências que os suportam, as quais incluem a metodologia aplicada, os dados adquiridos e os processos para execução dos experimentos, da análise de dados e da interpretação dos resultados obtidos. O dilúvio de dados sob o qual a atual ciência está inserida revoluciona a forma como as pesquisas são realizadas, resultando em um novo paradigma de ciência baseada em dados. Sob tal paradigma, novas atividades são inseridas no método científico de modo a organizar o processo de geração, curadoria e publicação de dados, beneficiando a comunidade científica com o reuso de conjuntos de dados científicos e a reprodutibilidade de experimentos. Nesse contexto, novas abordagens para a resolução de problemas estão sendo apresentadas, obtendo resultados que antes eram considerados de relevante dificuldade, bem como possibilitando a geração de novos conhecimentos. Diversos portais estão disponibilizando conjuntos de dados resultantes de pesquisas científicas. Todavia, tais portais pouco abordam o contexto sobre os quais os conjuntos de dados foram criados, dificultando a compreensão sobre os dados e abrindo espaço para o uso indevido ou uma interpretação errônea. Poucas são as literaturas que abordam essa problemática, deixando o foco para outros temas que lidam com o volume, a variedade e a velocidade dos dados. Essa pesquisa objetivou definir um modelo de avaliação de conjuntos de dados científicos, por meio da construção de um perfil de aplicação, o qual padroniza a descrição de conjuntos de dados científicos. Essa padronização da descrição é baseada no conceito de dimensão de Veracidade dos dados, definido ao longo da pesquisa, e permite o desenvolvimento de métricas que formam o índice de veracidade de conjuntos de dados científicos. Tal índice busca refletir o nível de detalhamento de um conjunto de dados, com base no uso dos elementos de descrição, que facilitarão o reuso dos dados e a reprodutibilidade dos experimentos científicos. O índice possui duas dimensões: a dimensão intrínseca aos dados, a qual pode ser utilizada como critério de admissão de conjunto de dados em portais de publicação de dados; e a dimensão social, mensurando a adequabilidade de um conjunto de dados para uso em uma área de pesquisa ou de aplicação, por meio da avaliação da comunidade científica. Para o modelo de avaliação proposto, um estudo de caso foi desenvolvido, descrevendo um conjunto de dados proveniente de um projeto científico internacional, o projeto GoAmazon, de modo a validar o modelo proposto entre os pares, demonstrando o potencial da solução no apoio ao reuso dos dados, podendo ser incorporado em portais de dados científicos. / Science is a social organization: independent collaboration groups work to generate knowledge as a public good. The credibility of the scientific work is entrenched in the evidence that supports it, which includes the applied methodology, the acquired data, the processes to execute the experiments, the data analysis, and the interpretation of the obtained results. The flood of data under which current science is embedded revolutionizes the way surveys are conducted, resulting in a new paradigm of data-driven science. Under such a paradigm, new activities are inserted into the scientific method to organize the process of generation, curation, and publication of data, benefiting the scientific community with the reuse and reproducibility of scientific datasets. In this context, new approaches to problem solving are being presented, obtaining results that previously were considered of relevant difficulty, as well as making possible the generation of new knowledge. Several portals are providing datasets resulting from scientific research. However, such portals do little to address the context upon which datasets are created, making it difficult to understand the data and opening up space for misuse or misinterpretation. In the Big Data area, the dimension that proposes to deal with this aspect is called Veracity. Few studies in the literature approach such a theme, focusing on other dimensions, such as volume, variety, and velocity of data. This research aimed to define a of scientific datasets, through the establishment of an application profile, which standardizes the description of scientific datasets. This standardization of the description is based on the veracity dimension concept, which is defined throughout the research and allows the development of metrics that form the Veracity Index of scientific datasets. This index seeks to reflect the level of detail of a dataset based on the use of the descriptive elements, which will facilitate the reuse and reproducibility of the data. The index is weighted by the evaluation of the scientific community in a collaborative sense, which assess the level of description, comprehension capacity, and suitability of the dataset for a given research or application area. For the proposed collaborative evaluation model, a case study was developed that described a dataset from an international scientific project, the GoAmazon project, in order to validate the proposed model among the peers, demonstrating the potential of the solution in the reuse and reproducibility of datasets, showing that such an index can be incorporated into scientific data portals. Análise de dados Avaliação colaborativa Big data Dados científicos (Confiabilidade) Perfil de aplicação Application profile Collaborative assessment Data science Veracity
138	Modelo de avaliação de conjuntos de dados científicos por meio da dimensão de veracidade dos dados. / Scientific datasets evaluation model based on the data veracity dimension. Batista, André Filipe de Moraes 06 November 2018 (has links) A ciência é uma organização social: grupos de colaboração independentes trabalham para gerar conhecimento como um bem público. A credibilidade dos trabalhos científicos está enraizada nas evidências que os suportam, as quais incluem a metodologia aplicada, os dados adquiridos e os processos para execução dos experimentos, da análise de dados e da interpretação dos resultados obtidos. O dilúvio de dados sob o qual a atual ciência está inserida revoluciona a forma como as pesquisas são realizadas, resultando em um novo paradigma de ciência baseada em dados. Sob tal paradigma, novas atividades são inseridas no método científico de modo a organizar o processo de geração, curadoria e publicação de dados, beneficiando a comunidade científica com o reuso de conjuntos de dados científicos e a reprodutibilidade de experimentos. Nesse contexto, novas abordagens para a resolução de problemas estão sendo apresentadas, obtendo resultados que antes eram considerados de relevante dificuldade, bem como possibilitando a geração de novos conhecimentos. Diversos portais estão disponibilizando conjuntos de dados resultantes de pesquisas científicas. Todavia, tais portais pouco abordam o contexto sobre os quais os conjuntos de dados foram criados, dificultando a compreensão sobre os dados e abrindo espaço para o uso indevido ou uma interpretação errônea. Poucas são as literaturas que abordam essa problemática, deixando o foco para outros temas que lidam com o volume, a variedade e a velocidade dos dados. Essa pesquisa objetivou definir um modelo de avaliação de conjuntos de dados científicos, por meio da construção de um perfil de aplicação, o qual padroniza a descrição de conjuntos de dados científicos. Essa padronização da descrição é baseada no conceito de dimensão de Veracidade dos dados, definido ao longo da pesquisa, e permite o desenvolvimento de métricas que formam o índice de veracidade de conjuntos de dados científicos. Tal índice busca refletir o nível de detalhamento de um conjunto de dados, com base no uso dos elementos de descrição, que facilitarão o reuso dos dados e a reprodutibilidade dos experimentos científicos. O índice possui duas dimensões: a dimensão intrínseca aos dados, a qual pode ser utilizada como critério de admissão de conjunto de dados em portais de publicação de dados; e a dimensão social, mensurando a adequabilidade de um conjunto de dados para uso em uma área de pesquisa ou de aplicação, por meio da avaliação da comunidade científica. Para o modelo de avaliação proposto, um estudo de caso foi desenvolvido, descrevendo um conjunto de dados proveniente de um projeto científico internacional, o projeto GoAmazon, de modo a validar o modelo proposto entre os pares, demonstrando o potencial da solução no apoio ao reuso dos dados, podendo ser incorporado em portais de dados científicos. / Science is a social organization: independent collaboration groups work to generate knowledge as a public good. The credibility of the scientific work is entrenched in the evidence that supports it, which includes the applied methodology, the acquired data, the processes to execute the experiments, the data analysis, and the interpretation of the obtained results. The flood of data under which current science is embedded revolutionizes the way surveys are conducted, resulting in a new paradigm of data-driven science. Under such a paradigm, new activities are inserted into the scientific method to organize the process of generation, curation, and publication of data, benefiting the scientific community with the reuse and reproducibility of scientific datasets. In this context, new approaches to problem solving are being presented, obtaining results that previously were considered of relevant difficulty, as well as making possible the generation of new knowledge. Several portals are providing datasets resulting from scientific research. However, such portals do little to address the context upon which datasets are created, making it difficult to understand the data and opening up space for misuse or misinterpretation. In the Big Data area, the dimension that proposes to deal with this aspect is called Veracity. Few studies in the literature approach such a theme, focusing on other dimensions, such as volume, variety, and velocity of data. This research aimed to define a of scientific datasets, through the establishment of an application profile, which standardizes the description of scientific datasets. This standardization of the description is based on the veracity dimension concept, which is defined throughout the research and allows the development of metrics that form the Veracity Index of scientific datasets. This index seeks to reflect the level of detail of a dataset based on the use of the descriptive elements, which will facilitate the reuse and reproducibility of the data. The index is weighted by the evaluation of the scientific community in a collaborative sense, which assess the level of description, comprehension capacity, and suitability of the dataset for a given research or application area. For the proposed collaborative evaluation model, a case study was developed that described a dataset from an international scientific project, the GoAmazon project, in order to validate the proposed model among the peers, demonstrating the potential of the solution in the reuse and reproducibility of datasets, showing that such an index can be incorporated into scientific data portals. Análise de dados Application profile Avaliação colaborativa Big data Collaborative assessment Dados científicos (Confiabilidade) Data science Perfil de aplicação Veracity
139	A proposal for an integrated framewoek capable of aggregating IoT data with diverse data types. / Uma proposta de um framework capaz de agregar dados de IoT com diversos tipos de dados. Maria Luisa Lopes de Faria 30 March 2017 (has links) The volume of information in the Internet is growing exponentially. The ability to find intelligible information among vast amounts of data is transforming the human vision of the universe and everything within it. The underlying question then becomes which methods or techniques can be applied to transform the raw data into something intelligible, active and personal? This question is explored in this document by investigating techniques that improve intelligence for systems in order to make them perceptive/active to the recent information shared by each individual. Consequently, the main objective of this thesis is to enhance the experience of the user (individual) by providing a broad perspective about an event, which could result in improved ideas and better decisions. Therefore, three different data sources (individual data, sensor data, web data) have been investigated. This thesis includes research into techniques that process, interpret and reduce these data. By aggregating these techniques into a platform it is possible to deliver personalised information to applications and services. The contribution of this thesis is twofold. First, it presents a novel process that has shifted its focus from IoT technology to the user (or smart citizen). Second, this research shows that huge volumes of data can be reduced if the underlying sensor signal has adequate spectral properties to be filtered and good results can be obtained when employing a filtered sensor signal in applications. By investigating these areas it is possible to contribute to this new interconnected society by offering socially aware applications and services. / Sem resumo Inovações tecnológicas Internet das coisas Novas tecnologias da comunicação Redes e comunicação de dados Data science for internet of things Smart cities Smart lifestyle
140	Applications In Sentiment Analysis And Machine Learning For Identifying Public Health Variables Across Social Media Clark, Eric Michael 01 January 2019 (has links) Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. We mined data from several public Twitter endpoints to identify content relevant to healthcare providers and public health regulatory professionals. We began by compiling content related to electronic nicotine delivery systems (or e-cigarettes) as these had become popular alternatives to tobacco products. There was an apparent need to remove high frequency tweeting entities, called bots, that would spam messages, advertisements, and fabricate testimonials. Algorithms were constructed using natural language processing and machine learning to sift human responses from automated accounts with high degrees of accuracy. We found the average hyperlink per tweet, the average character dissimilarity between each individual's content, as well as the rate of introduction of unique words were valuable attributes in identifying automated accounts. We performed a 10-fold Cross Validation and measured performance of each set of tweet features, at various bin sizes, the best of which performed with 97% accuracy. These methods were used to isolate automated content related to the advertising of electronic cigarettes. A rich taxonomy of automated entities, including robots, cyborgs, and spammers, each with different measurable linguistic features were categorized. Electronic cigarette related posts were classified as automated or organic and content was investigated with a hedonometric sentiment analysis. The overwhelming majority (≈ 80%) were automated, many of which were commercial in nature. Others used false testimonials that were sent directly to individuals as a personalized form of targeted marketing. Many tweets advertised nicotine vaporizer fluid (or e-liquid) in various “kid-friendly” flavors including 'Fudge Brownie', 'Hot Chocolate', 'Circus Cotton Candy' along with every imaginable flavor of fruit, which were long ago banned for traditional tobacco products. Others offered free trials, as well as incentives to retweet and spread the post among their own network. Free prize giveaways were also hosted whose raffle tickets were issued for sharing their tweet. Due to the large youth presence on the public social media platform, this was evidence that the marketing of electronic cigarettes needed considerable regulation. Twitter has since officially banned all electronic cigarette advertising on their platform. Social media has the capacity to afford the healthcare industry with valuable feedback from patients who reveal and express their medical decision-making process, as well as self-reported quality of life indicators both during and post treatment. We have studied several active cancer patient populations, discussing their experiences with the disease as well as survivor-ship. We experimented with a Convolutional Neural Network (CNN) as well as logistic regression to classify tweets as patient related. This led to a sample of 845 breast cancer survivor accounts to study, over 16 months. We found positive sentiments regarding patient treatment, raising support, and spreading awareness. A large portion of negative sentiments were shared regarding political legislation that could result in loss of coverage of their healthcare. We refer to these online public testimonies as “Invisible Patient Reported Outcomes” (iPROs), because they carry relevant indicators, yet are difficult to capture by conventional means of self-reporting. Our methods can be readily applied interdisciplinary to obtain insights into a particular group of public opinions. Capturing iPROs and public sentiments from online communication can help inform healthcare professionals and regulators, leading to more connected and personalized treatment regimens. Social listening can provide valuable insights into public health surveillance strategies. Computational Linguistics Data Science Machine Learning Public Health Monitoring Sentiment Analysis Social Media Computer Sciences Social and Behavioral Sciences

Search results