Spelling suggestions: "subject:"cachine learning"" "subject:"amachine learning""
121 |
Characterization of the promotion, adverse events, and regulation related to synthetic nicotine products on social media: a multiplatform content analysis using topic modelingShah, Neal 08 March 2024 (has links)
Objective: Social media has been implicated as a leading driver of the youth vaping epidemic in the United States. Despite the recent proliferation of synthetic nicotine products in the marketplace, there is limited understanding of the promotion, health risks, and regulatory policy associated with these products. We aim to identify and characterize posts on Instagram and Twitter related to the promotion of synthetic nicotine products, self-reporting of adverse events following synthetic nicotine product use, and also identify discussion topics related to the regulation, health policy, and education about synthetic nicotine.
Methods: We conducted a hashtag and keyword search on Instagram and Twitter, respectively, to collect posts related to synthetic nicotine products. We then analyzed this data utilizing multilanguage BERT analysis, a pre-trained supervised topic modeling algorithm, to sort the dataset into clusters grouped based on textual similarity. After this, we manually annotated the most representative posts corresponding to each topic cluster using a codebook associated with characteristics of interest and categorized clusters by their coherence to themes of promotion, adverse events, and regulation.
Results: A total of 14,651 Instagram posts and 24,081 Twitter posts were collected from the keyword search. After an intermediary data cleaning phase to remove posts which could not be recognized by topic modeling, the multilanguage BERT topic model thematically clustered 49.4% (n=6034) of Instagram posts and 46.0% (n=3200) of tweets. After manual content analysis, we detected 52.9% (n=3193) of Instagram posts and 30.4% (n=972) of tweets that were in clusters thematically related to our study aims. The most representative theme on Instagram was synthetic nicotine electronic nicotine delivery systems (ENDS) promotion, with 91.6% (n=2924) of posts belong to that thematic cluster, followed by regulation and policy related posts (8.4%, n=268). In comparison, the most representative theme on Twitter related to self-reporting of adverse events (50.2%, n = 488), followed by promotion (39.8%, n=387), and regulation and policy (10.0%, n=97). Manual annotation of the most representative tweets within these clusters showed a higher level of coherence that Instagram clusters had towards its respective theme than Twitter clusters. A qualitative sub-analysis found that most of the synthetic nicotine ENDS promotion activity was more specifically related to the selling of synthetic nicotine ENDS products by different vendors.
Conclusion: Despite platform prohibitions against the marketing and sale of various tobacco products, there remains significant user-generated content related to synthetic nicotine products on Instagram and Twitter, with most posts related to the promotion and sales of synthetic nicotine ENDS products. Most of the synthetic nicotine ENDS content in our dataset on Instagram is closely related to the theme of ENDS product promotion, with little discussion of regulation and adverse events. On Twitter, synthetic nicotine ENDS content is more heterogeneous, with significant discussion of adverse events following synthetic nicotine ENDS use along with similar promotion of synthetic nicotine ENDS products. Further research is needed to better understand the acute health risks unique to synthetic nicotine products and whether or not these public health challenges are exacerbated due to unregulated and illegal promotion and sale of these products via social media. / 2025-03-08T00:00:00Z
|
122 |
Machine Learning for Classification of Pediatric Concussion Recovery StagesAnderson, Lauren January 2021 (has links)
Mild traumatic brain injury (mTBI), or concussion, results from sudden acceleration
or deceleration of the brain and subsequent complex tissue propagation of shock waves
that disrupt structure and function. Concussions can cause many symptoms including
headache, dizziness, and difficulty concentrating. These can be detrimental to children,
a ecting their participation in school, sport, and social activities. Therefore,
return to school (RTS) and return to activity (RTA) protocols have been developed to
help safely return children to these activities without risking further injury. The goal
of this study was to develop machine learning (ML) algorithms to predict RTA and
RTS stages, that can easily be incorporated into a smartphone application (APP).
Ideally this would assist children in tracking and determining their RTA and RTS
progression leading them to a safe and timely return.
Support vector machine classi er (SVC) and random forest (RF) algorithms were
developed to predict RTA/RTS stages. Both were modeled on previously acquired
data, and on newly acquired data, and results were compared. Models were trained
and tested using accelerometry and symptom data from pediatric concussion patients.
A sliding window technique and feature extraction were performed on raw acceleration
data to extract suitable features, which were combined with yes/no symptom
recordings as ML inputs. The dataset consisted of 67 participants aged 10 to 18, 42
female and 25 male, with a total of 844408 samples.
The best results for RTS prediction showed average accuracy of 83% for RF and
66% for SVC. For RTA predictions, the best results had average accuracy of 60% for
RF and 58% for SVC. For new data, RTS predictions showed an accuracy of 45%
for RF and 41% for SVC. RTA predictions had an accuracy of 35% for RF and 30%
for SVC. RF models had superior performance on all data. These results show that
predicting RTA/RTS is possible with ML. However, improvements to these models
can be made by training on more data prior to APP implementation. More data is
needed, as recruitment during this study was limited due to Covid-19 restrictions. / Thesis / Master of Applied Science (MASc) / Concussions are recorded in approximately 300,000 athletes annually and are estimated
to a ect up to 3.8 million individuals per year in the United States alone.
Understanding when its safe to return to normal routine after an injury is important
but challenging. Therefore, a series of stages have been developed to lead children
through a safe and timely return to sport and activity after concussion. The goal
of this study was to develop machine learning (ML) algorithms which predict these
return stages using symptom recordings and gross body movement data. Algorithms
could be incorporated into a smartphone application (APP) to provide accessible return
guidelines for children with concussions. Algorithms were created and model
performance was tested using symptom and body movement data collected from children
after a concussive injury. The results of this study show that it is possible to
predict return to school and return to activity stages with ML, and with improvements,
can be used to facilitate return from injury
|
123 |
On Platforms and Algorithms for Human-Centric SensingShaabana, Ala 05 1900 (has links)
The decreasing cost of chip manufacturing has greatly increased their distribution and availability such that sensors have become embedded in virtually all physical objects and are able to send and receive data -- giving rise to the Internet of Things (IoT). These embedded sensors are typically endowed with intelligent algorithms to transform information into real-time actionable insights. Recently, humans have taken on a larger role in the information-to-action path with the emergence of human-centric sensing. This has made it possible to observe various processes and infer information in complex personal and social spaces that may not be possible to obtain otherwise. However, a caveat of human-centric sensing is the high cost associated with high precision systems.
In this dissertation, we present two low cost and high performing end-to-end solutions for human-centric sensing of physiological phenomena. Additionally, we present a post-hoc data-driven sensor synchronization framework that exploits independent, omni-present information in the data to synchronize multiple sensors. We first propose XTREMIS -- a low-cost and portable ECG/EMG/EEG platform with a small form factor that has a sample rate comparable to research-grade EMG machines. We evaluate XTREMIS on a signal level as well as utilize it in tandem with a Gaussian Mixture Hidden Markov Model to detect finger movements in a rapid, fine-grained activity -- typing on a keyboard. Experiments show that not only does XTREMIS functionally outperforms current wearable technologies, its signal quality is high enough to achieve classification accuracy similar to research-grade EMG machines, making it a suitable platform for further research. We then present SiCILIA -- a platform that extracts physical and personal variables of a user's thermal environment to infer their clothing insulation. An individual's thermal sensation is directly correlated with the amount of clothing they are wearing. Indeed, a person's thermal comfort is crucial to their productivity and physical wellness, and is directly correlated with morale. Therefore it becomes important to be aware of actions such as adding or removing clothing as they are indicators of current thermal sensation. The proposed inference algorithm builds upon theories of body heat transfer, and is corroborated by empirical data. SiCILIA was tested in a vehicle with a passenger-controlled HVAC system. Experimental results show that the algorithm is capable of accurately predicting an occupant's thermal insulation with a low mean prediction error. In the third part of the thesis we present CRONOS -- a sensor data synchronization framework that takes advantage of events observed by two or more sensors to synchronize their internal clocks using only their data streams. Experimental results on pairwise and multi-sensor synchronization show a significant drift improvement for total drift and a very low mean absolute synchronization error for multi-sensor synchronization. / Thesis / Doctor of Philosophy (PhD)
|
124 |
Clustering Gaussian Processes: A Modified EM Algorithm for Functional Data Analysis with Application to British Columbia Coastal Rainfall PatternsPaton, Forrest January 2018 (has links)
Functional data analysis is a statistical framework where data are assumed to follow some functional form. This method of analysis is commonly applied to time series data, where time, measured continuously or in discrete intervals, serves as the lo- cation for a function’s value. In this thesis Gaussian processes, a generalization of the multivariate normal distribution to function space, are used. When multiple processes are observed on a comparable interval, clustering them into sub-populations can provide significant insights. A modified EM algorithm is developed for cluster- ing processes. The model presented clusters processes based on how similar their underlying covariance kernel is. In other words, cluster formation arises from modelling correlation between inputs (as opposed to magnitude between process values). The method is applied to both simulated data and British Columbia coastal rainfall patterns. Results show clustering yearly processes can accurately classify extreme weather patterns. / Thesis / Master of Science (MSc)
|
125 |
Towards Automating Code ReviewsFadhel, Muntazir January 2020 (has links)
Existing software engineering tools have proved useful in automating some aspects of the code review process, from uncovering defects to refactoring code. However, given that software teams still spend large amounts of time performing code reviews despite the use of such tools, much more research remains to be carried out in this area. This dissertation present two major contributions to this field. First, we perform a text classification experiment over thirty thousand GitHub review comments to understand what code reviewers typically discuss in reviews. Next, in an attempt to offer an innovative, data-driven approach to automating code reviews, we leverage probabilistic models of source code and graph embedding techniques to perform human-like code inspections. Our experimental results indicate that the proposed algorithm is able to emulate human-like code inspection behaviour in code reviews with a macro f1-score of 62%, representing an impressive contribution towards the relatively unexplored research domain of automated code reviewing tools. / Thesis / Master of Applied Science (MASc)
|
126 |
Essays in Econometrics and Machine Learning:Yao, Qingsong January 2024 (has links)
Thesis advisor: Shakeeb Khan / Thesis advisor: Zhijie Xiao / This dissertation consists of three chapters demonstrating how the current econometric problems can be solved by using machine learning techniques. In the first chapter, I propose new approaches to estimating large dimensional monotone index models. This class of models has been popular in the applied and theoretical econometrics literatures as it includes discrete choice, nonparametric transformation, and duration models. A main advantage of my approach is computational. For instance, rank estimation procedures such as those proposed in Han (1987) and Cavanagh and Sherman (1998) that optimize a nonsmooth, non convex objective function are difficult to use with more than a few regressors and so limits their use in with economic data sets. For such monotone index models with increasing dimension, we propose to use a new class of estimators based on batched gradient descent (BGD) involving nonparametric methods such as kernel estimation or sieve estimation, and study their asymptotic properties. The BGD algorithm uses an iterative procedure where the key step exploits a strictly convex objective function, resulting in computational advantages. A contribution of my approach is that the model is large dimensional and semiparametric and so does not require the use of parametric distributional assumptions. The second chapter studies the estimation of semiparametric monotone index models when the sample size n is extremely large and conventional approaches fail to work due to devastating computational burdens. Motivated by the mini-batch gradient descent algorithm (MBGD) that is widely used as a stochastic optimization tool in the machine learning field, this chapter proposes a novel subsample- and iteration-based estimation procedure. In particular, starting from any initial guess of the true parameter, the estimator is progressively updated using a sequence of subsamples randomly drawn from the data set whose sample size is much smaller than n. The update is based on the gradient of some well-chosen loss function, where the nonparametric component in the model is replaced with its Nadaraya-Watson kernel estimator that is also constructed based on the random subsamples. The proposed algorithm essentially generalizes MBGD algorithm to the semiparametric setup. Since the new method uses only a subsample to perform Nadaraya-Watson kernel estimation and conduct the update, compared with the full-sample-based iterative method, the new method reduces the computational time by roughly n times if the subsample size and the kernel function are chosen properly, so can be easily applied when the sample size n is large. Moreover, this chapter shows that if averages are further conducted across the estimators produced during iterations, the difference between the average estimator and full-sample-based estimator will be 1/\sqrt{n}-trivial. Consequently, the averaged estimator is 1/\sqrt{n}-consistent and asymptotically normally distributed. In other words, the new estimator substantially improves the computational speed, while at the same time maintains the estimation accuracy. Finally, extensive Monte Carlo experiments and real data analysis illustrate the excellent performance of novel algorithm in terms of computational efficiency when the sample size is extremely large. Finally, the third chapter studies robust inference procedure for treatment effects in panel data with flexible relationship across units via the random forest method. The key contribution of this chapter is twofold. First, it proposes a direct construction of prediction intervals for the treatment effect by exploiting the information of the joint distribution of the cross-sectional units to construct counterfactuals using random forest. In particular, it proposes a Quantile Control Method (QCM) using the Quantile Random Forest (QRF) to accommodate flexible cross-sectional structure as well as high dimensionality. Second, it establishes the asymptotic consistency of QRF under the panel/time series setup with high dimensionality, which is of theoretical interest on its own right. In addition, Monte Carlo simulations are conducted and show that prediction intervals via the QCM have excellent coverage probability for the treatment effects comparing to existing methods in the literature, and are robust to heteroskedasticity, autocorrelation, and various types of model misspecifications. Finally, an empirical application to study the effect of the economic integration between Hong Kong and mainland China on Hong Kong’s economy is conducted to highlight the potential of the proposed method. / Thesis (PhD) — Boston College, 2024. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Economics.
|
127 |
Flexible Sparse Learning of Feature SubspacesMa, Yuting January 2017 (has links)
It is widely observed that the performances of many traditional statistical learning methods degenerate when confronted with high-dimensional data. One promising approach to prevent this downfall is to identify the intrinsic low-dimensional spaces where the true signals embed and to pursue the learning process on these informative feature subspaces. This thesis focuses on the development of flexible sparse learning methods of feature subspaces for classification. Motivated by the success of some existing methods, we aim at learning informative feature subspaces for high-dimensional data of complex nature with better flexibility, sparsity and scalability.
The first part of this thesis is inspired by the success of distance metric learning in casting flexible feature transformations by utilizing local information. We propose a nonlinear sparse metric learning algorithm using a boosting-based nonparametric solution to address metric learning problem for high-dimensional data, named as the sDist algorithm. Leveraged a rank-one decomposition of the symmetric positive semi-definite weight matrix of the Mahalanobis distance metric, we restructure a hard global optimization problem into a forward stage-wise learning of weak learners through a gradient boosting algorithm. In each step, the algorithm progressively learns a sparse rank-one update of the weight matrix by imposing an L-1 regularization. Nonlinear feature mappings are adaptively learned by a hierarchical expansion of interactions integrated within the boosting framework. Meanwhile, an early stopping rule is imposed to control the overall complexity of the learned metric. As a result, without relying on computationally intensive tools, our approach automatically guarantees three desirable properties of the final metric: positive semi-definiteness, low rank and element-wise sparsity. Numerical experiments show that our learning model compares favorably with the state-of-the-art methods in the current literature of metric learning.
The second problem arises from the observation of high instability and feature selection bias when applying online methods to highly sparse data of large dimensionality for sparse learning problem. Due to the heterogeneity in feature sparsity, existing truncation-based methods incur slow convergence and high variance. To mitigate this problem, we introduce a stabilized truncated stochastic gradient descent algorithm. We employ a soft-thresholding scheme on the weight vector where the imposed shrinkage is adaptive to the amount of information available in each feature. The variability in the resulted sparse weight vector is further controlled by stability selection integrated with the informative truncation. To facilitate better convergence, we adopt an annealing strategy on the truncation rate. We show that, when the true parameter space is of low dimension, the stabilization with annealing strategy helps to achieve lower regret bound in expectation.
|
128 |
A STUDY OF REAL TIME SEARCH IN FLOOD SCENES FROM UAV VIDEOS USING DEEP LEARNING TECHNIQUESGagandeep Singh Khanuja (7486115) 17 October 2019 (has links)
<div>Following a natural disaster, one of the most important facet that influence a persons chances of survival/being found out is the time with which they are rescued. Traditional means of search operations involving dogs, ground robots, humanitarian intervention; are time intensive and can be a major bottleneck in search operations. The main aim of these operations is to rescue victims without critical delay in the shortest time possible which can be realized in real-time by using UAVs. With advancements in computational devices and the ability to learn from complex data, deep learning can be leveraged in real time environment for purpose of search and rescue operations. This research aims to solve the traditional means of search operation using the concept of deep learning for real time object detection and Photogrammetry for precise geo-location mapping of the objects(person,car) in real time. In order to do so, various pre-trained algorithms like Mask-RCNN, SSD300, YOLOv3 and trained algorithms like YOLOv3 have been deployed with their results compared with means of addressing the search operation in</div><div>real time.</div><div><br></div>
|
129 |
Graph based semi-supervised learning in computer visionHuang, Ning, January 2009 (has links)
Thesis (Ph. D.)--Rutgers University, 2009. / "Graduate Program in Biomedical Engineering." Includes bibliographical references (p. 54-55).
|
130 |
Kernel methods in supervised and unsupervised learning /Tsang, Wai-Hung. January 2003 (has links)
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2003. / Includes bibliographical references (leaves 46-49). Also available in electronic version. Access restricted to campus users.
|
Page generated in 0.0715 seconds