Global ETD Search

1	Do new forms of scholarly communication provide a pathway to open science? Zhu, Yimei January 2015 (has links) This thesis explores new forms of scholarly communication and the practice of open science among UK based academics. Open science broadly refers to practices that allow cost-free open access to academic research. Three aspects of open science are examined in this study: open access to research articles; open access to research data; and publishing ongoing research updates using social media. The study employs a mixed-methods approach, combining a series of scoping studies using qualitative methods followed up by an Internet survey of 1,829 UK academics. Overall this thesis has shown that whilst there is support for open science, the use of open science by academics was limited. Many academics were not aware of RCUK's open access policy and had limited experience of making their research articles freely accessible online. Most academics did not share their primary research data online. Although some academics had used a range of social media tools to communicate their research, the majority had not used social media in their research work. Overall, male, older and senior academics were more likely to use open access publishing and share primary research data, but were less likely to use social media for research. Academics based in Medical and Natural Sciences were more likely to use open access publishing and share research data, but less likely to use social media for their research compared to academics from Humanities and Social Sciences. Academics who were aware of RCUK's open access policy and who recognised the citation advantages of open access were more likely to publish in open access journals. Academics that were aware of RCUK's open access policy and had used social media for research were more likely to self-archive research articles. Academics that had used secondary data collected by others and self-archived research papers were more likely to share their own primary research data. Academics seemed to be strongly influenced by their colleagues' recommendation for the adoption of social media in research. Those who considered that the general public should know about their research findings were more likely to share their research on social media. A group of academics were identified and described as super users who frequently communicated ongoing research on social media. These super users were more likely to use tablet computers and have received social media training organised by their institutions. It is clear that open science is going to be a major factor in future academic work and in relation to building an academic career. Many academics have recognised the importance of open science. However to date the use of the tools for open science has been limited. With the right guidance and reinforcement of relevant policies, the new forms of scholarly communication can provide a pathway to open science which would serve to benefit individual academics, research communities and the public good. 301
2	Qualitative Thematic Analysis of Social Media Data to Assess Perceptions of Route of Administration for Antiretroviral Treatment Among People Living With HIV Matza, Louis S., Paulus, Trena, Garris, Cindy P., Van de Velde, Nicolas, Chounta, Vasiliki, Deger, Kristen A. 30 April 2020 (has links) (PDF) Background: HIV is a condition that requires lifelong treatment. Treatment options currently consist of oral antiretroviral therapies (ART) taken once or twice daily. Long-acting injectable HIV treatments are currently in development to be administered monthly or every other month. Preferences for route of administration could influence treatment adherence, which could affect treatment outcomes. The purpose of this study was to examine patient perceptions of oral and injectable routes of administration for ART. Methods: Qualitative thematic analysis was conducted to examine 5122 online discussion threads by people living with HIV (PLHIV) in the POZ Community Forums from January 2013 to June 2018. Analysis focused on identifying perceptions of oral or injectable routes of administration for ART. Relevant threads were extracted and imported into the qualitative data analysis software package ATLAS.ti.8 so that text could be reviewed and coded. Results: Analyses identified 684 relevant discussion threads including 2626 coded quotations from online posts by 568 PLHIV. The oral route of administration was discussed more frequently than injectable (2516 quotations for oral; 110 injectable). Positive statements on the oral route of administration commonly mentioned the small number of pills (276 quotations), dose frequency (245), ease of scheduling (153), and ease of use (146). PLHIV also noted disadvantages of the oral route of administration including negative emotional impact (166), difficulty with medication access (106), scheduling (131), and treatment adherence (121). Among the smaller number of PLHIV discussing injectable ART, common positive comments focused on dose frequency (34), emotional benefits of not taking a daily pill (7), potential benefits for adherence (6), overall convenience (6), and benefits for traveling (6). Some comments from PLHIV perceived the frequency of injections negatively (10), and others had negative perceptions of needles (8) or appointments required to receive injections (7). Conclusions: Qualitative analysis revealed that route of administration was frequently discussed among PLHIV on this online forum. While many expressed positive views about their daily oral medication regimen, others perceived inconveniences and challenges. Among PLHIV who were aware of a possible monthly injectable treatment, many viewed this new route of administration as a convenient alternative with potential to improve adherence. qualitative thematic analysis social media data antiretroviral treatment HIV Family Medicine
3	Preserving user privacy in social media data processing Löchner, Marc 21 November 2023 (has links) Social media data is used for analytics, e.g., in science, authorities or the industry. Privacy is often considered a secondary problem. However, protecting the privacy of social media users is demanded by laws and ethics. In order to prevent subsequent abuse, theft or public exposure of collected datasets, privacy-aware data processing is crucial. This dissertation presents a concept to process social media data with social media user’s privacy in mind. It features a data storage concept based on the cardinality estimator HyperLogLog to store social media data, so that it is not possible to extract individual items from it, but only to estimate the cardinality of items within a certain set, plus running set operations over multiple sets to extend analytical ranges. Applying this method requires to define the scope of the result before even gathering the data. This prevents the data from being misused for other purposes at a later point in time and thus follows the privacy by design principles. This work further shows methods to increase privacy through the implementation of abstraction layers. An included case study demonstrates the presented methods to be suitable for application in the field.:1 Introduction 1.1 Problem 1.2 Research objectives 1.3 Document structure 2 Related work 2.1 The notion of privacy 2.2 Privacy by design 2.3 Differential privacy 2.4 Geoprivacy 2.5 Probabilistic Data Structures 3 Concept and methods 3.1 Collateral data 3.2 Disposable data 3.3 Cardinality estimation 3.4 Data precision 3.5 Extendability 3.6 Abstraction 3.7 Time consideration 4 Summary of publications 4.1 HyperLogLog Introduction 4.2 VOST Case Study 4.3 Real-time Streaming 4.4 Abstraction Layers 4.5 VGIscience Book Chapter 4.6 Supplementary Software Materials 5 Discussion 5.1 Prevent accidental data disclosure 5.2 Feasibility in the field 5.3 Adjustability for different use cases 5.4 Limitations of HLL 5.5 Security 5.6 Outlook and further research 6 Conclusion Appendix References Publications info:eu-repo/classification/ddc/620 ddc:620
4	Integrated Predictive Modeling and Analytics for Crisis Management Alhamadani, Abdulaziz Abdulrhman 15 May 2024 (has links) The surge in the application of big data and predictive analytics in fields of crisis management, such as pandemics and epidemics, highlights the vital need for advanced research in these areas, particularly in the wake of the COVID-19 pandemic. Traditional methods, which typically rely on historical data to forecast future trends, fall short in addressing the complex and ever-changing nature of challenges like pandemics and public health crises. This inadequacy is further underscored by the pandemic's significant impact on various sectors, notably healthcare, government, and the hotel industry. Current models often overlook key factors such as static spatial elements, socioeconomic conditions, and the wealth of data available from social media, which are crucial for a comprehensive understanding and effective response to these multifaceted crises. This thesis employs spatial forecasting and predictive analytics to address crisis management in several distinct but interrelated contexts: the COVID-19 pandemic, the opioid crisis, and the impact of the pandemic on the hotel industry. The first part of the study focuses on using big data analytics to explore the relationship between socioeconomic factors and the spread of COVID-19 at the zip code level, aiming to predict high-risk areas for infection. The second part delves into the opioid crisis, utilizing semi-supervised deep learning techniques to monitor and categorize drug-related discussions on Reddit. The third part concentrates on developing spatial forecasting and providing explanations of the rising epidemic of drug overdose fatalities. The fourth part of the study extends to the realm of the hotel industry, aiming to optimize customer experience by analyzing online reviews and employing a localized Large Language Model to generate future customer trends and scenarios. Across these studies, the thesis aims to provide actionable insights and comprehensive solutions for effectively managing these major crises. For the first work, the majority of current research in pandemic modeling primarily relies on historical data to predict dynamic trends such as COVID-19. This work makes the following contributions in spatial COVID-19 pandemic forecasting: 1) the development of a unique model solely employing a wide range of socioeconomic indicators to forecast areas most susceptible to COVID-19, using detailed static spatial analysis, 2) identification of the most and least influential socioeconomic variables affecting COVID-19 transmission within communities, 3) construction of a comprehensive dataset that merges state-level COVID-19 statistics with corresponding socioeconomic attributes, organized by zip code. For the second work, we make the following contributions in detecting drug Abuse crisis via social media: 1) enhancing the Dynamic Query Expansion (DQE) algorithm to dynamically detect and extract evolving drug names in Reddit comments, utilizing a list curated from government and healthcare agencies, 2) constructing a textual Graph Convolutional Network combined with word embeddings to achieve fine-grained drug abuse classification in Reddit comments, identifying seven specific drug classes for the first time, 3) conducting extensive experiments to validate the framework, outperforming six baseline models in drug abuse classification and demonstrating effectiveness across multiple types of embeddings. The third study focuses on developing spatial forecasting and providing explanations of the escalating epidemic of drug overdose fatalities. Current research in this field has shown a deficiency in comprehensive explanations of the crisis, spatial analyses, and predictions of high-risk zones for drug overdoses. Addressing these gaps, this study contributes in several key areas: 1) Establishing a framework for spatially forecasting drug overdose fatalities predominantly affecting U.S. counties, 2) Proposing solutions for dealing with scarce and heterogeneous data sets, 3) Developing an algorithm that offers clear and actionable insights into the crisis, and 4) Conducting extensive experiments to validate the effectiveness of our proposed framework. In the fourth study, we address the profound impact of the pandemic on the hotel industry, focusing on the optimization of customer experience. Traditional methodologies in this realm have predominantly relied on survey data and limited segments of social media analytics. Those methods are informative but fall short of providing a full picture due to their inability to include diverse perspectives and broader customer feedback. Our study aims to make the following contributions: 1) the development of an integrated platform that distinguishes and extracts positive and negative Memorable Experiences (MEs) from online customer reviews within the hotel industry, 2) The incorporation of an advanced analytical module that performs temporal trend analysis of MEs, utilizing sophisticated data mining algorithms to dissect customer feedback on a monthly and yearly scale, 3) the implementation of an advanced tool that generates prospective and unexplored Memorable Experiences (MEs) by utilizing a localized Large Language Model (LLM) with keywords extracted from authentic customer experiences to aid hotel management in preparing for future customer trends and scenarios. Building on the integrated predictive modeling approaches developed in the earlier parts of this dissertation, this final section explores the significant impacts of the COVID-19 pandemic on the airline industry. The pandemic has precipitated substantial financial losses and operational disruptions, necessitating innovative crisis management strategies within this sector. This study introduces a novel analytical framework, EAGLE (Enhancing Airline Groundtruth Labels and Review rating prediction), which utilizes Large Language Models (LLMs) to improve the accuracy and objectivity of customer sentiment analysis in strategic airline route planning. EAGLE leverages LLMs for zero-shot pseudo-labeling and zero-shot text classification, to enhance the processing of customer reviews without the biases of manual labeling. This approach streamlines data analysis, and refines decision-making processes which allows airlines to align route expansions with nuanced customer preferences and sentiments effectively. The comprehensive application of LLMs in this context underscores the potential of predictive analytics to transform traditional crisis management strategies by providing deeper, more actionable insights. / Doctor of Philosophy / In today's digital age, where vast amounts of data are generated every second, understanding and managing crises like pandemics or economic disruptions has become increasingly crucial. This dissertation explores the use of advanced predictive modeling and analytics to manage various crises, significantly enhancing how predictions and responses to these challenges are developed. The first part of the research uses data analysis to identify areas at higher risk during the COVID-19 pandemic, focusing on how different socioeconomic factors can affect virus spread at a local level. This approach moves beyond traditional methods that rely on past data, providing a more dynamic way to forecast and manage public health crises. The study then examines the opioid crisis by analyzing social media platforms like Reddit. Here, a method was developed to automatically detect and categorize discussions about drug abuse. This technique aids in understanding how drug-related conversations evolve online, providing insights that could guide public health responses and policy-making. In the hospitality sector, customer reviews were analyzed to improve service quality in hotels. By using advanced data analysis tools, key trends in customer experiences were identified, which can help businesses adapt and refine their services in real-time, enhancing guest satisfaction. Finally, the study extends to the airline industry, where a model was developed that uses customer feedback to improve airline services and route planning. This part of the research shows how sophisticated analytics can help airlines better understand and meet traveler needs, especially during disruptions like the pandemic. Overall, the dissertation provides methods to better manage crises and illustrates the vast potential of predictive analytics in making informed decisions that can significantly mitigate the impacts of future crises. This research is vital for anyone—from government officials to business leaders—looking to harness the power of data for crisis management and decision-making. Spatial forecasting Big data analytics social-media data mining pandemic forecasting crisis mitigation
5	A new infrastructure demand model for urban business and leisure hubs : a case study of Taichung Ho, Hsin-Tzu January 2016 (has links) Over the last few decades there has been a gradual transformation in both the spatial and temporal patterns of urban activities. The percentage share of non-discretionary travel such as morning rush-hour commuting has been declining with the increased income level. Discretionary activities appear to rise prominently in urban business and leisure hubs, attracting large volumes of crowds which in turn imply new and changed demand for building floorspace and urban infrastructure. Despite impressive advances in the theories and models of infrastructure demand forecasting, there appear to be an apparent research gap in addressing the practical needs of infrastructure planning in and around those growing urban activity hubs. First, land use and transport interaction models which have to date been the mainstay of practical policy analytics tend to focus on non-discretionary activities such as rush-hour commuting. Secondly, the emerging activity based models, while providing significant new insights into personal, familial activities, especially the discretionary travel, are so data hungry and computing intensive that they have not yet found their roles in practical policy applications. This dissertation builds on the insights from above schools of modelling to develop a new approach that addresses the infrastructure planning needs of the growing urban hubs while keeping the data and computing realistic in medium to high income cities. The new model is designed based on an overarching hypothesis that considerable efficiency and welfare gains can be achieved in the planning and development of urban business and leisure hubs if the infrastructure provisions for discretionary and non-discretionary activities can be coordinated. This is a research theme that has been little explored in current literature. The new infrastructure demand forecasting model has been designed with regard to the above hypothesis and realistic data availability, including those emerging online. The model extends the framework of land use transport interaction models and aim to provide a practical modelling tool. Land use changes are accounted for when testing new infrastructure investment initiatives and especially the road and public transport loads are assessed throughout all time periods of a working day. The new contribution to the modelling methodology includes the extension to the land use transport interaction framework, the use of social media data for estimating night market activity distribution and a rapid estimation of road traffic speeds from Google directions API, and model validation. Another new contribution is the understanding of the nature and magnitude of future infrastructure demand through assessing three alternative land use scenarios: (1) business as usual, (2) inner city regeneration for a major business hub around the night market, and (3) dispersed suburban growth with distant subcentres. The model is able to assess the implications for future infrastructure demand and user welfare through discerning the distinct discretionary and non-discretionary activity patterns.
6	Zipf's Law for Natural Cities Extracted from Location-Based Social Media Data Wu, Sirui January 2015 (has links) Zipf’s law is one of the empirical statistical regularities found within many natural systems, ranging from protein sequences of immune receptors in cells to the intensity of solar flares from the sun. Verifying the universality of Zipf’s law can provide many opportunities for us to further seek the commonalities of phenomena that possess the power law behavior. Since power law-like phenomena, as many studies have previously indicated, is often interpreted as evidence for studying complex systems, exploring the universality of Zipf’s law is also of potential capability in explaining underlying generative mechanisms and endogenous processes, i.e. self-organization and chaos theory. The main purpose of this study was to verify whether Zipf’s law is valid for city sizes, city numbers and population extracted from natural cities. Unlike traditional city boundaries extracted by applying census-imposed and top-down imposed data, which are arbitrary and subjective, the study established the new kind of boundaries of cities, namely, natural cities through using four location-based social media data from Twitter, Brightkite, Gowalla and Freebase and head/tail breaks rule. In order to capture and quantify the hierarchical level for studying heterogeneous scales of cities, ht-index derived from head/tail breaks rule was employed. Furthermore, the validation of Zipf’s law was examined. The result revealed that the natural cities had deviations in subtle patterns when different social media data were examined. By employing head/tail breaks method, the result calculated the ht-index and detected that hierarchy levels were not largely influenced by spatial-temporal changes but rather data itself. On the other hand, the study found that Zipf’s law is not universal in the case of using location-based social media data. Compared to city numbers extracted from nightlight imagery, the study found out the reason why Zipf’s law does not hold for location-based social media data, i.e. due to bias of customer behavior. The bias mainly resulted in the emergence of natural cities were much more frequent than others in certain regions and countries so that making the emergence of natural cities was not exhibited objectively. Furthermore, the study showed whether Zipf’s law could be well observed depends not only on the data itself and man-made limitations but also on calculation methods, data precisions and scales and the idealized status of observed data. big data location-based social media data Zipf's law power law natural cities ht-index Earth and Related Environmental Sciences Geovetenskap och miljövetenskap
7	社群媒體新詞偵測系統以PTT八卦版為例 / Chinese new words detection from social media 王力弘, Wang, Li Hung Unknown Date (has links) 近年來網路社群非常活躍,非常多的網民都以社群媒體來分享與討論時事。不傴於此,網路上的群聚力量已經漸漸從虛擬走向現實,社群媒體的傳播力已經可以與大眾傳媒比擬。像台大 PTT 的八卦版就是一個這樣具指標性的社群媒體,許多新聞或是事件都從此版開始討論,然後擴散至主流媒體。透過觀察, 網路鄉民常常會以略帶灰諧的方式,發明新的詞彙去討論時事與人物,例如: 割闌尾、祭止兀、婉君、貫老闆...等。這些新詞的出現,很可能代表一個新的熱門話題的正在醞釀中。但若以傳統的關鍵詞搜索,未必能找到這些含有此類新詞的討論文章。因此,本研究提出一個基於「滑動視窗(Sliding window)」的技巧來輔助中文斷詞,以利找出這些新詞,並進而透過這些新詞對來探詢社群媒體中的新興話題。我們以此技巧修改知名的Jieba 斷詞工具,加上新詞偵測的機制,並以 PTT的八卦版為監測對象,經過長期的的監測後,結果顯示我們的系統可以正確的找出絕大多數的新詞。此外,經過與主流媒體交叉比對,本系統發現的新詞與新話題的確有極高的相關性。 / Internet new residents like to share society current event on the social media website and the influence is propagate to the reality now. For example: On Gossip(八卦版) discussion board of 台大 PTT BBS that had many post are turn into the TV News every day. After some survey we found people like to crate new words to explain society topics, This paper attempt to build up a system to detect the new words from social media. But detect the Chinese new words from unknown words is a thorny problem, on this paper we invent a way – 『Sliding Window』 to elevate the new words detection from Jieba in Chinese words Segmentation, After testing we got 96.94% correct rate and cross valid the detection result by ours system with News and Google Trending we proved the new words detection is a reasonable way to discover new topic. 中文斷詞新詞偵測社群媒體分析 Chinese Words Segmentation New Words Detection Social Media Data Analysis
8	Extracting meaningful statistics for the characterization and classification of biological, medical, and financial data Woods, Tonya M. 21 September 2015 (has links) This thesis is focused on extracting meaningful statistics for the characterization and classification of biological, medical, and financial data and contains four chapters. The first chapter contains theoretical background on scaling and wavelets, which supports the work in chapters two and three. In the second chapter, we outline a methodology for representing sequences of DNA nucleotides as numeric matrices in order to analytically investigate important structural characteristics of DNA. This methodology involves assigning unit vectors to nucleotides, placing the vectors into columns of a matrix, and accumulating across the rows of this matrix. Transcribing the DNA in this way allows us to compute the 2-D wavelet transformation and assess regularity characteristics of the sequence via the slope of the wavelet spectra. In addition to computing a global slope measure for a sequence, we can apply our methodology for overlapping sections of nucleotides to obtain an evolutionary slope. In the third chapter, we describe various ways wavelet-based scaling may be used for cancer diagnostics. There were nearly half of a million new cases of ovarian, breast, and lung cancer in the United States last year. Breast and lung cancer have highest prevalence, while ovarian cancer has the lowest survival rate of the three. Early detection is critical for all of these diseases, but substantial obstacles to early detection exist in each case. In this work, we use wavelet-based scaling on metabolic data and radiography images in order to produce meaningful features to be used in classifying cases and controls. Computer-aided detection (CAD) algorithms for detecting lung and breast cancer often focus on select features in an image and make a priori assumptions about the nature of a nodule or a mass. In contrast, our approach to analyzing breast and lung images captures information contained in the background tissue of images as well as information about specific features and makes no such a priori assumptions. In the fourth chapter, we investigate the value of social media data in building commercial default and activity credit models. We use random forest modeling, which has been shown in many instances to achieve better predictive accuracy than logistic regression in modeling credit data. This result is of interest, as some entities are beginning to build credit scores based on this type of publicly available online data alone. Our work has shown that the addition of social media data does not provide any improvement in model accuracy over the bureau only models. However, the social media data on its own does have some limited predictive power. Wavelets Scaling Regularity Classification SVM GC content Exons Introns Ovarian cancer Breast cancer Mammography Lung cancer Lung CXR Credit risk Response model Random forest Social media data Online review data
9	Diffusion de l’information dans les médias sociaux : modélisation et analyse / Information diffusion in social media : modeling and analysis Guille, Adrien 25 November 2014 (has links) Les médias sociaux ont largement modifié la manière dont nous produisons, diffusons et consommons l'information et sont de fait devenus des vecteurs d'information importants. L’objectif de cette thèse est d’aider à la compréhension du phénomène de diffusion de l’information dans les médias sociaux, en fournissant des moyens d’analyse et de modélisation.Premièrement, nous proposons MABED, une méthode statistique pour détecter automatiquement les évènements importants qui suscitent l'intérêt des utilisateurs des médias sociaux à partir du flux de messages qu'ils publient, dont l'originalité est d'exploiter la fréquence des interactions sociales entre utilisateurs, en plus du contenu textuel des messages. Cette méthode diffère par ailleurs de celles existantes en ce qu'elle estime dynamiquement la durée de chaque évènement, plutôt que de supposer une durée commune et fixée à l'avance pour tous les évènements. Deuxièmement, nous proposons T-BASIC, un modèle probabiliste basé sur la structure de réseau sous-jacente aux médias sociaux pour prédire la diffusion de l'information, plus précisément l'évolution du volume d'utilisateurs relayant une information donnée au fil du temps. Contrairement aux modèles similaires également basés sur la structure du réseau, la probabilité qu'une information donnée se diffuse entre deux utilisateurs n'est pas constante mais dépendante du temps. Nous décrivons aussi une procédure pour l'inférence des paramètres latents du modèle, dont l'originalité est de formuler les paramètres comme des fonctions de caractéristiques observables des utilisateurs. Troisièmement, nous proposons SONDY, un logiciel libre et extensible implémentant des méthodes tirées de la littérature pour la fouille et l'analyse des données issues des médias sociaux. Le logiciel manipule deux types de données : les messages publiés par les utilisateurs, et la structure du réseau social interconnectant ces derniers. Contrairement aux logiciels académiques existants qui se concentrent soit sur l'analyse des messages, soit sur l'analyse du réseau, SONDY permet d'analyser ces deux types de données conjointement en permettant l'analyse de l'influence par rapport aux évènements détectés. Les expérimentations menées à l'aide de divers jeux de données collectés sur le média social Twitter démontrent la pertinence de nos propositions et mettent en lumière des propriétés qui nous aident à mieux comprendre les mécanismes régissant la diffusion de l'information. Premièrement, en comparant les performances de MABED avec celles de méthodes récentes tirées de la littérature, nous montrons que la prise en compte des interactions sociales entre utilisateurs conduit à une détection plus précise des évènements importants, avec une robustesse accrue en présence de contenu bruité. Nous montrons également que MABED facilite l'interprétation des évènements détectés en fournissant des descriptions claires et précises, tant sur le plan sémantique que temporel. Deuxièmement, nous montrons la validité de la procédure proposée pour estimer les probabilités de diffusion sur lesquelles repose le modèle T-BASIC, en illustrant le pouvoir prédictif des caractéristiques des utilisateurs sélectionnées et en comparant les performances de la méthode d'estimation proposée avec celles de méthodes tirées de la littérature. Nous montrons aussi l'intérêt d'avoir des probabilités non constantes, ce qui permet de prendre en compte dans T-BASIC la fluctuation du niveau de réceptivité des utilisateurs des médias sociaux au fil du temps. Enfin, nous montrons comment, et dans quelle mesure, les caractéristiques sociales, thématiques et temporelles des utilisateurs affectent la diffusion de l'information. Troisièmement, nous illustrons à l'aide de divers scénarios l'utilité du logiciel SONDY, autant pour des non-experts, grâce à son interface utilisateur avancée et des visualisations adaptées, que pour des chercheurs du domaine, grâce à son interface de programmation. / Social media have greatly modified the way we produce, diffuse and consume information, and have become powerful information vectors. The goal of this thesis is to help in the understanding of the information diffusion phenomenon in social media by providing means of modeling and analysis.First, we propose MABED (Mention-Anomaly-Based Event Detection), a statistical method for automatically detecting events that most interest social media users from the stream of messages they publish. In contrast with existing methods, it doesn't only focus on the textual content of messages but also leverages the frequency of social interactions that occur between users. MABED also differs from the literature in that it dynamically estimates the period of time during which each event is discussed rather than assuming a predefined fixed duration for all events. Secondly, we propose T-BASIC (Time-Based ASynchronous Independent Cascades), a probabilistic model based on the network structure underlying social media for predicting information diffusion, more specifically the evolution of the number of users that relay a given piece of information through time. In contrast with similar models that are also based on the network structure, the probability that a piece of information propagate from one user to another isn't fixed but depends on time. We also describe a procedure for inferring the latent parameters of that model, which we formulate as functions of observable characteristics of social media users. Thirdly, we propose SONDY (SOcial Network DYnamics), a free and extensible software that implements state-of-the-art methods for mining data generated by social media, i.e. the messages published by users and the structure of the social network that interconnects them. As opposed to existing academic tools that either focus on analyzing messages or analyzing the network, SONDY permits the joint analysis of these two types of data through the analysis of influence with respect to each detected event.The experiments, conducted on data collected on Twitter, demonstrate the relevance of our proposals and shed light on some properties that give us a better understanding of the mechanisms underlying information diffusion. First, we compare the performance of MABED against those of methods from the literature and find that taking into account the frequency of social interactions between users leads to more accurate event detection and improved robustness in presence of noisy content. We also show that MABED helps with the interpretation of detected events by providing clearer textual description and more precise temporal descriptions. Secondly, we demonstrate the relevancy of the procedure we propose for estimating the pairwise diffusion probabilities on which T-BASIC relies. For that, we illustrate the predictive power of users' characteristics, and compare the performance of the method we propose to estimate the diffusion probabilities against those of state-of-the-art methods. We show the importance of having non-constant diffusion probabilities, which allows incorporating the variation of users' level of receptivity through time into T-BASIC. We also study how -- and in which proportion -- the social, topical and temporal characteristics of users impact information diffusion. Thirdly, we illustrate with various scenarios the usefulness of SONDY, both for non-experts -- thanks to its advanced user interface and adapted visualizations -- and for researchers -- thanks to its application programming interface. Détection et suivi d’évènements Social media data mining Event detection and tracking Scientific software development
10	The Evolution of Big Data and Its Business Applications Halwani, Marwah Ahmed 05 1900 (has links) The arrival of the Big Data era has become a major topic of discussion in many sectors because of the premises of big data utilizations and its impact on decision-making. It is an interdisciplinary issue that has captured the attention of scholars and created new research opportunities in information science, business, heath care, and many others fields. The problem is the Big Data is not well defined, so that there exists confusion in IT what jobs and skill sets are required in big data area. The problem stems from the newness of the Big Data profession. Because many aspects of the area are unknown, organizations do not yet possess the IT, human, and business resources necessary to cope with and benefit from big data. These organizations include health care, enterprise, logistics, universities, weather forecasting, oil companies, e-business, recruiting agencies etc., and are challenged to deal with high volume, high variety, and high velocity big data to facilitate better decision- making. This research proposes a new way to look at Big Data and Big Data analysis. It helps and meets the theoretical and methodological foundations of Big Data and addresses an increasing demand for more powerful Big Data analysis from the academic researches prospective. Essay 1 provides a strategic overview of the untapped potential of social media Big Data in the business world and describes its challenges and opportunities for aspiring business organizations. It also aims to offer fresh recommendations on how companies can exploit social media data analysis to make better business decisions—decisions that embrace the relevant social qualities of its customers and their related ecosystem. The goal of this research is to provide insights for businesses to make better, more informed decisions based on effective social media data analysis. Essay 2 provides a better understanding of the influence of social media during the 2016 American presidential election and develops a model to examine individuals' attitudes toward participating in social media (SM) discussions that might influence their decision in choosing between the two presidential election candidates, Donald Trump and Hilary Clinton. The goal of this research is to provide a theoretical foundation that supports the influence of social media on individual's decisions. Essay 3 defines the major job descriptions for careers in the new Big Data profession. It to describe the Big Data professional profile as reflected by the demand side, and explains the differences and commonalities between company-posted job requirements for data analytics, business analytics, and data scientists jobs. The main aim for this work is to clarify of the skill requirements for Big Data professionals for the joint benefit of the job market where they will be employed and of academia, where such professionals will be prepared in data science programs, to aid in the entire process of preparing and recruiting for Big Data positions. Big Data analysis Social Media Data Social Media Influence Decisions Making Big Data Professions Big data. Decision support systems. Information technology -- Management. Social media. Presidents -- Election -- 2016. Big data -- Vocational guidance.

Search results