Spelling suggestions: "subject:"make news detection"" "subject:"make news 1detection""
1 |
Performance comparison of different machine learningmodels in detecting fake newsWan, Zhibin, Xu, Huatai January 2021 (has links)
The phenomenon of fake news has a significant impact on our social life, especially in the political world. Fake news detection is an emerging area of research. The sharing of infor-mation on the Web, primarily through Web-based online media, is increasing. The ability to identify, evaluate, and process this information is of great importance. Deliberately created disinformation is being generated on the Internet, either intentionally or unintentionally. This is affecting a more significant segment of society that is being blinded by technology. This paper illustrates models and methods for detecting fake news from news articles with the help of machine learning and natural language processing. We study and compare three different feature extraction techniques and seven different machine classification techniques. Different feature engineering methods such as TF, TF-IDF, and Word2Vec are used to gener-ate feature vectors in this proposed work. Even different machine learning classification al-gorithms were trained to classify news as false or true. The best algorithm was selected to build a model to classify news as false or true, considering accuracy, F1 score, etc., for com-parison. We perform two different sets of experiments and finally obtain the combination of fake news detection models that perform best in different situations.
|
2 |
Machine Learning explainability in text classification for Fake News detectionKurasinski, Lukas January 2020 (has links)
Fake news detection gained an interest in recent years. This made researchers try to findmodels that can classify text in the direction of fake news detection. While new modelsare developed, researchers mostly focus on the accuracy of a model. There is little researchdone in the subject of explainability of Neural Network (NN) models constructed for textclassification and fake news detection. When trying to add a level of explainability to aNeural Network model, allot of different aspects have to be taken under consideration.Text length, pre-processing, and complexity play an important role in achieving successfully classification. Model’s architecture has to be taken under consideration as well. Allthese aspects are analyzed in this thesis. In this work, an analysis of attention weightsis performed to give an insight into NN reasoning about texts. Visualizations are usedto show how 2 models, Bidirectional Long-Short term memory Convolution Neural Network (BIDir-LSTM-CNN), and Bidirectional Encoder Representations from Transformers(BERT), distribute their attentions while training and classifying texts. In addition, statistical data is gathered to deepen the analysis. After the analysis, it is concluded thatexplainability can positively influence the decisions made while constructing a NN modelfor text classification and fake news detection. Although explainability is useful, it is nota definitive answer to the problem. Architects should test, and experiment with differentsolutions, to be successful in effective model construction.
|
3 |
Intelligent gravitational search random forest algorithm for fake news detectionNatarajan, Rathika, Mehbodniya, Abolfazl, Rane, Kantilal Pitambar, Jindal, Sonika, Hasan, Mohammed Faez, Vives, Luis, Bhatt, Abhishek 01 January 2022 (has links)
El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado. / Online social media has made the process of disseminating news so quick that people have shifted their way of accessing news from traditional journalism and press to online social media sources. The rapid rotation of news on social media makes it challenging to evaluate its reliability. Fake news not only erodes public trust but also subverts their opinions. An intelligent automated system is required to detect fake news as there is a tenuous difference between fake and real news. This paper proposes an intelligent gravitational search random forest (IGSRF) algorithm to be employed to detect fake news. The IGSRF algorithm amalgamates the Intelligent Gravitational Search Algorithm (IGSA) and the Random Forest (RF) algorithm. The IGSA is an improved intelligent variant of the classical gravitational search algorithm (GSA) that adds information about the best and worst gravitational mass agents in order to retain the exploitation ability of agents at later iterations and thus avoid the trapping of the classical GSA in local optimum. In the proposed IGSRF algorithm, all the intelligent mass agents determine the solution by generating decision trees (DT) with a random subset of attributes following the hypothesis of random forest. The mass agents generate the collection of solutions from solution space using random proportional rules. The comprehensive prediction to decide the class of news (fake or real) is determined by all the agents following the attributes of random forest. The performance of the proposed algorithm is determined for the FakeNewsNet dataset, which has sub-categories of BuzzFeed and PolitiFact news categories. To analyze the effectiveness of the proposed algorithm, the results are also evaluated with decision tree and random forest algorithms. The proposed IGSRF algorithm has attained superlative results compared to the DT, RF and state-of-the-art techniques. / Revisión por pares
|
4 |
A Preliminary Observation: Can One Linguistic Feature Be the Deterministic Factor for More Accurate Fake News Detection?Chen, Yini January 2023 (has links)
This study inspected three linguistic features, specifically the percentage of nouns per sentence, the percentage of verbs per sentence, as well as the mean of dependency distance of the sentence, and observed their respective influence on the fake news classification accuracy. In comparison to the previous studies where linguistic features are combined as a set to be leveraged, this study attempted to untangle the effective individual features from the previously proposed optimal sets. In order to keep the influence of each individual feature independent from the other inspected features, the other feature is held constant in the experiments of observing each target feature. The FEVER dataset is utilized in this study, and the study incorporates the weighted random baselines and Macro F1 scores to mitigate the probable bias caused by the imbalanced distribution of labels in the dataset. GPT-2 and DistilGPT2 models are both fine-tuned to measure the performance gap between the models with different numbers of parameters. The experiment results indicate that the fake news classification accuracy and the features are not always correlated as hypothesized. Nevertheless, having attended to the challenges and limitations imposed by the dataset, this study has paved the way for future studies with similar research purposes. Future works are encouraged to extend the scope and include more linguistic features for the inspection, to eventually achieve more effective fake news classification that leverages only the most relevant features.
|
5 |
Detecting Manipulated and Adversarial Images: A Comprehensive Study of Real-world ApplicationsAlkhowaiter, Mohammed 01 January 2023 (has links) (PDF)
The great advance of communication technology comes with a rapid increase of disinformation in many kinds and shapes; manipulated images are one of the primary examples of disinformation that can affect many users. Such activity can severely impact public behavior, attitude, and belief or sway the viewers' perception in any malicious or benign direction. Additionally, adversarial attacks targeting deep learning models pose a severe risk to computer vision applications. This dissertation explores ways of detecting and resisting manipulated or adversarial attack images. The first contribution evaluates perceptual hashing (pHash) algorithms for detecting image manipulation on social media platforms like Facebook and Twitter. The study demonstrates the differences in image processing between the two platforms and proposes a new approach to find the optimal detection threshold for each algorithm. The next contribution develops a new pHash authentication to detect fake imagery on social media networks, using a self-supervised learning framework and contrastive loss. In addition, a fake image sample generator is developed to cover three major image manipulating operations (copy-move, splicing, removal). The proposed authentication technique outperforms the state-of-the-art pHash methods. The third contribution addresses the challenges of adversarial attacks to deep learning models. A new adversarial-aware deep learning system is proposed using a classical machine learning model as the secondary verification system to complement the primary deep learning model in image classification. The proposed approach outperforms current state-of-the-art adversarial defense systems. Finally, the fourth contribution fuses big data from Extra-Military resources to support military decision-making. The study proposes a workflow, reviews data availability, security, privacy, and integrity challenges, and suggests solutions. A demonstration of the proposed image authentication is introduced to prevent wrong decisions and increase integrity. Overall, the dissertation provides practical solutions for detecting manipulated and adversarial attack images and integrates our proposed solutions in supporting military decision-making workflow.
|
6 |
A Comparative study of Knowledge Graph Embedding Models for use in Fake News DetectionFrimodig, Matilda, Lanhed Sivertsson, Tom January 2021 (has links)
During the past few years online misinformation, generally referred to as fake news, has been identified as an increasingly dangerous threat. As the spread of misinformation online has increased, fake news detection has become an active line of research. One approach is to use knowledge graphs for the purpose of automated fake news detection. While large scale knowledge graphs are openly available these are rarely up to date, often missing the relevant information needed for the task of fake news detection. Creating new knowledge graphs from online sources is one way to obtain the missing information. However extracting information from unstructured text is far from straightforward. Using Natural Language Processing techniques we developed a pre-processing pipeline for extracting information from text for the purpose of creating knowledge graphs. In order to classify news as fake or not fake with the use of knowledge graphs, these need to be converted into a machine understandable format, called knowledge graph embeddings. These embeddings also allow new information to be inferred or classified based on the already existing information in the knowledge graph. Only one knowledge graph embedding model has previously been used for the purpose of fake news detection while several new models have recently been developed. We compare the performance of three different embedding models, all relying on different fundamental architectures, in the specific context of fake news detection. The models used were the geometric model TransE, the tensor decomposition model ComplEx and the deep learning model ConvKB. The results of this study shows that out of the three models, ConvKB is the best performing. However other aspects than performance need to be considered and as such these results do not necessarily mean that a deep learning approach is the most suitable for real world fake news detection.
|
7 |
Be More with Less: Scaling Deep-learning with Minimal SupervisionYaqing Wang (12470301) 28 April 2022 (has links)
<p> </p>
<p>Large-scale deep learning models have reached previously unattainable performance for various tasks. However, the ever-growing resource consumption of neural networks generates large carbon footprint, brings difficulty for academics to engage in research and stops emerging economies from enjoying growing Artificial Intelligence (AI) benefits. To further scale AI to bring more benefits, two major challenges need to be solved. Firstly, even though large-scale deep learning models achieved remarkable success, their performance is still not satisfactory when fine-tuning with only a handful of examples, thereby hindering widespread adoption in real-world applications where a large scale of labeled data is difficult to obtain. Secondly, current machine learning models are still mainly designed for tasks in closed environments where testing datasets are highly similar to training datasets. When the deployed datasets have distribution shift relative to collected training data, we generally observe degraded performance of developed models. How to build adaptable models becomes another critical challenge. To address those challenges, in this dissertation, we focus on two topics: few-shot learning and domain adaptation, where few-shot learning aims to learn tasks with limited labeled data and domain adaption address the discrepancy between training data and testing data. In Part 1, we show our few-shot learning studies. The proposed few-shot solutions are built upon large-scale language models with evolutionary explorations from improving supervision signals, incorporating unlabeled data and improving few-shot learning abilities with lightweight fine-tuning design to reduce deployment costs. In Part 2, domain adaptation studies are introduced. We develop a progressive series of domain adaption approaches to transfer knowledge across domains efficiently to handle distribution shifts, including capturing common patterns across domains, adaptation with weak supervision and adaption to thousands of domains with limited labeled data and unlabeled data. </p>
|
8 |
Web mining for social network analysisElhaddad, Mohamed Kamel Abdelsalam 09 August 2021 (has links)
Undoubtedly, the rapid development of information systems and the widespread use of electronic means and social networks have played a significant role in accelerating the pace of events worldwide, such as, in the 2012 Gaza conflict (the 8-day war), in the pro-secessionist rebellion in the 2013-2014 conflict in Eastern Ukraine, in the 2016 US Presidential elections, and in conjunction with the COVID-19 outbreak pandemic since the beginning of 2020. As the number of daily shared data grows quickly on various social networking platforms in different languages, techniques to carry out automatic classification of this huge amount of data timely and correctly are needed.
Of the many social networking platforms, Twitter is of the most used ones by netizens. It allows its users to communicate, share their opinions, and express their emotions (sentiments) in the form of short blogs easily at no cost. Moreover, unlike other social networking platforms, Twitter allows research institutions to access its public and historical data, upon request and under control. Therefore, many organizations, at different levels (e.g., governmental, commercial), are seeking to benefit from the analysis and classification of the shared tweets to serve in many application domains, for examples, sentiment analysis to evaluate and determine user’s polarity from the content of their shared text, and misleading information detection to ensure the legitimacy and the credibility of the shared information. To attain this objective, one can apply numerous data representation, preprocessing, natural language processing techniques, and machine/deep learning algorithms. There are several challenges and limitations with existing approaches, including issues with the management of tweets in multiple languages, the determination of what features the feature vector should include, and the assignment of representative and descriptive weights to these features for different mining tasks. Besides, there are limitations in existing performance evaluation metrics to fully assess the developed classification systems.
In this dissertation, two novel frameworks are introduced; the first is to efficiently analyze and classify bilingual (Arabic and English) textual content of social networks, while the second is for evaluating the performance of binary classification algorithms. The first framework is designed with: (1) An approach to handle Arabic and English written tweets, and can be extended to cover data written in more languages and from other social networking platforms, (2) An effective data preparation and preprocessing techniques, (3) A novel feature selection technique that allows utilizing different types of features (content-dependent, context-dependent, and domain-dependent), in addition to (4) A novel feature extraction technique to assign weights to the linguistic features based on how representative they are in in the classes they belong to. The proposed framework is employed in performing sentiment analysis and misleading information detection. The performance of this framework is compared to state-of-the-art classification approaches utilizing 11 benchmark datasets comprising both Arabic and English textual content, demonstrating considerable improvement over all other performance evaluation metrics. Then, this framework is utilized in a real-life case study to detect misleading information surrounding the spread of COVID-19.
In the second framework, a new multidimensional classification assessment score (MCAS) is introduced. MCAS can determine how good the classification algorithm is when dealing with binary classification problems. It takes into consideration the effect of misclassification errors on the probability of correct detection of instances from both classes. Moreover, it should be valid regardless of the size of the dataset and whether the dataset has a balanced or unbalanced distribution of its instances over the classes. An empirical and practical analysis is conducted on both synthetic and real-life datasets to compare the comportment of the proposed metric against those commonly used. The analysis reveals that the new measure can distinguish the performance of different classification techniques. Furthermore, it allows performing a class-based assessment of classification algorithms, to assess the ability of the classification algorithm when dealing with data from each class separately. This is useful if one of the classifying instances from one class is more important than instances from the other class, such as in COVID-19 testing where the detection of positive patients is much more important than negative ones. / Graduate
|
9 |
Personalized fake news aware recommendation systemSallami, Dorsaf 08 1900 (has links)
In today’s world, where online news is so widespread, various methods have been developed
in order to provide users with personalized news recommendations. Wonderful accomplish ments have been made when it comes to providing readers with everything that could attract
their attention. While accuracy is critical in news recommendation, other factors, such as
diversity, novelty, and reliability, are essential in satisfying the readers’ satisfaction. In fact,
technological advancements bring additional challenges which might have a detrimental im pact on the news domain. Therefore, researchers need to consider the new threats in the
development of news recommendations. Fake news, in particular, is a hot topic in the media
today and a new threat to public safety.
This work presents a modularized system capable of recommending news to the user and
detecting fake news, all while helping users become more aware of this issue. First, we suggest
FANAR, FAke News Aware Recommender system, a modification to news recommendation
algorithms that removes untrustworthy persons from the candidate user’s neighbourhood.
To do this, we created a probabilistic model, the Beta Trust model, to calculate user rep utation. For the recommendation process, we employed Graph Neural Networks. Then,
we propose EXMULF, EXplainable MUltimodal Content-based Fake News Detection Sys tem. It is tasked with the veracity analysis of information based on its textual content and
the associated image, together with an Explainable AI (XAI) assistant that is tasked with
combating the spread of fake news. Finally, we try to raise awareness about fake news by
providing personalized alerts based on user reliability.
To fulfill the objective of this work, we build a new dataset named FNEWR. Our exper iments reveal that EXMULF outperforms 10 state-of-the-art fake news detection models in
terms of accuracy. It is also worth mentioning that FANAR , which takes into account vi sual information in news, outperforms competing approaches based only on textual content.
Furthermore, it reduces the amount of fake news found in the recommendations list / De nos jours, où les actualités en ligne sont si répandues, diverses méthodes ont été dé veloppées afin de fournir aux utilisateurs des recommandations d’actualités personnalisées.
De merveilleuses réalisations ont été faites lorsqu’il s’agit de fournir aux lecteurs tout ce qui
pourrait attirer leur attention. Bien que la précision soit essentielle dans la recommandation
d’actualités, d’autres facteurs, tels que la diversité, la nouveauté et la fiabilité, sont essentiels
pour satisfaire la satisfaction des lecteurs. En fait, les progrès technologiques apportent des
défis supplémentaires qui pourraient avoir un impact négatif sur le domaine de l’information.
Par conséquent, les chercheurs doivent tenir compte des nouvelles menaces lors de l’élabo ration de nouvelles recommandations. Les fausses nouvelles, en particulier, sont un sujet
brûlant dans les médias aujourd’hui et une nouvelle menace pour la sécurité publique.
Au vu des faits mentionnés ci-dessus, ce travail présente un système modulaire capable
de détecter les fausses nouvelles, de recommander des nouvelles à l’utilisateur et de les aider
à être plus conscients de ce problème. Tout d’abord, nous suggérons FANAR, FAke News
Aware Recommender system, une modification d’algorithme de recommandation d’actuali tés qui élimine les personnes non fiables du voisinage de l’utilisateur candidat. A cette fin,
nous avons créé un modèle probabiliste, Beta Trust Model, pour calculer la réputation des
utilisateurs. Pour le processus de recommandation, nous avons utilisé Graph Neural Net works. Ensuite, nous proposons EXMULF, EXplainable MUltimodal Content-based Fake
News Detection System. Il s’agit de l’analyse de la véracité de l’information basée sur son
contenu textuel et l’image associée, ainsi qu’un assistant d’intelligence artificielle Explicable
(XAI) pour lutter contre la diffusion de fake news. Enfin, nous essayons de sensibiliser aux
fake news en fournissant des alertes personnalisées basées sur le profil des utilisateurs.
Pour remplir l’objectif de ce travail, nous construisons un nouveau jeu de données nommé
FNEWR. Nos résultats expérimentaux montrent qu’EXMULF surpasse 10 modèles de pointe
de détection de fausses nouvelles en termes de précision. Aussi, FANAR qui prend en compte
les informations visuelles dans les actualités, surpasse les approches concurrentes basées
uniquement sur le contenu textuel. De plus, il permet de réduire le nombre de fausses
nouvelles dans la liste des recommandations.
|
10 |
The Struggle Against Misinformation: Evaluating the Performance of Basic vs. Complex Machine Learning Models on Manipulated DataValladares Parker, Diego Gabriel January 2024 (has links)
This study investigates the application of machine learning (ML) techniques in detecting fake news, addressing the rapid spread of misinformation across social media platforms. Given the time-consuming nature of manual fact-checking, this research compares the robustness of basic machine learning models, such as Multinominal Naive Bayes classifiers, with complex models like Distil-BERT in identifying fake news. Utilizing datasets including LIAR, ISOT, and GM, this study will evaluate these models based on standard classification metrics both in single domain and cross-domain scenarios, especially when processing linguistically manipulated data. Results indicate that while complex models like Distil-BERT perform better in single-domain classifications, the Baseline models show competitive performance in cross-domain and on the manipulated dataset. However both models struggle with the manipulated dataset, highlighting a critical area for improvement in fake news detection algorithms and methods. In conclusion, the findings suggest that while both basic and complex models have their strength in certain settings, significant advancements are needed to improve against linguistic manipulations, ensuring reliable detection of fake news across varied contexts before consideration of public availability of automated classification.
|
Page generated in 0.1377 seconds