• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 71
  • 35
  • 12
  • 9
  • 5
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 155
  • 58
  • 36
  • 33
  • 30
  • 29
  • 27
  • 27
  • 26
  • 23
  • 19
  • 18
  • 18
  • 17
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

A Rule-Based Normalization System for Greek Noisy User-Generated Text

Toska, Marsida January 2020 (has links)
The ever-growing usage of social media platforms generates daily vast amounts of textual data which could potentially serve as a great source of information. Therefore, mining user-generated data for commercial, academic, or other purposes has already attracted the interest of the research community. However, the informal writing which often characterizes online user-generated texts poses a challenge for automatic text processing with Natural Language Processing (NLP) tools. To mitigate the effect of noise in these texts, lexical normalization has been proposed as a preprocessing method which in short is the task of converting non-standard word forms into a canonical one. The present work aims to contribute to this field by developing a rule-based normalization system for Greek Tweets. We perform an analysis of the categories of the out-of-vocabulary (OOV) word forms identified in the dataset and define hand-crafted rules which we combine with edit distance (Levenshtein distance approach) to tackle noise in the cases under scope. To evaluate the performance of the system we perform both an intrinsic and an extrinsic evaluation in order to explore the effect of normalization on the part-of-speech-tagging. The results of the intrinsic evaluation suggest that our system has an accuracy of approx. 95% compared to approx. 81% for the baseline. In the extrinsic evaluation, it is observed a boost of approx. 8% in the tagging performance when the text has been preprocessed through lexical normalization.
62

How to explain graph-based semi-supervised learning for non-mathematicians?

Jönsson, Mattias, Borg, Lucas January 2019 (has links)
Den stora mängden tillgänglig data på internet kan användas för att förbättra förutsägelser genom maskininlärning. Problemet är att sådan data ofta är i ett obehandlat format och kräver att någon manuellt bestämmer etiketter på den insamlade datan innan den kan användas av algoritmen. Semi-supervised learning (SSL) är en teknik där algoritmen använder ett fåtal förbehandlade exempel och därefter automatiskt bestämmer etiketter för resterande data. Ett tillvägagångssätt inom SSL är att representera datan i en graf, vilket kallas för graf-baserad semi-supervised learning (GSSL), och sedan hitta likheter mellan noderna i grafen för att automatiskt bestämma etiketter.Vårt mål i denna uppsatsen är att förenkla de avancerade processerna och stegen för att implementera en GSSL-algoritm. Vi kommer att gå igen grundläggande steg som hur utvecklingsmiljön ska installeras men även mer avancerade steg som data pre-processering och feature extraction. Feature extraction metoderna som uppsatsen använder sig av är bag-of-words (BOW) och term frequency-inverse document frequency (TF-IDF). Slutgiltligen presenterar vi klassificering av dokument med Label Propagation (LP) och Multinomial Naive Bayes (MNB) samt en detaljerad beskrivning över hur GSSL fungerar.Vi presenterar även prestanda för klassificering-algoritmerna genom att klassificera 20 Newsgroup datasetet med LP och MNB. Resultaten dokumenteras genom två olika utvärderingspoäng vilka är F1-score och accuracy. Vi gör även en jämförelse mellan MNB och LP med två olika typer av kärnor, KNN och RBF, på olika mängder av förbehandlade träningsdokument. Resultaten ifrån klassificering-algoritmerna visar att MNB är bättre på att klassificera datasetet än LP. / The large amount of available data on the web can be used to improve the predictions made by machine learning algorithms. The problem is that such data is often in a raw format and needs to be manually labeled by a human before it can be used by a machine learning algorithm. Semi-supervised learning (SSL) is a technique where the algorithm uses a few prepared samples to automatically prepare the rest of the data. One approach to SSL is to represent the data in a graph, also called graph-based semi-supervised learning (GSSL), and find similarities between the nodes for automatic labeling.Our goal in this thesis is to simplify the advanced processes and steps to implement a GSSL-algorithm. We will cover basic tasks such as setup of the developing environment and more advanced steps such as data preprocessing and feature extraction. The feature extraction techniques covered are bag-of-words (BOW) and term frequency-inverse document frequency (TF-IDF). Lastly, we present how to classify documents using Label Propagation (LP) and Multinomial Naive Bayes (MNB) with a detailed explanation of the inner workings of GSSL. We showcased the classification performance by classifying documents from the 20 Newsgroup dataset using LP and MNB. The results are documented using two different evaluation scores called F1-score and accuracy. A comparison between MNB and the LP-algorithm using two different types of kernels, KNN and RBF, was made on different amount of labeled documents. The results from the classification algorithms shows that MNB is better at classifying the data than LP.
63

Two-Refinement by Pillowing for Structured Hexahedral Meshes

Malone, J. Bruce 06 December 2012 (has links) (PDF)
A number of methods for adaptation of existing all-hexahedral grids by localized refinement have been developed; however, none ideally fit all refinement needs. This thesis presents the structure to a method of two-refinement developed for conformal, structured, all-hexahedral grids that offers flexibility beyond what has been offered to date. The method is fundamentally based on pillowing pairs of sheets of hexes. This thesis also suggests an implementation of the method, shows the results of examples refined using it and compares these results to results from implementing three-refinement on the same examples.
64

Sentimental Analysis of CyberbullyingTweets with SVM Technique

Thanikonda, Hrushikesh, Koneti, Kavya Sree January 2023 (has links)
Background: Cyberbullying involves the use of digital technologies to harass, humiliate, or threaten individuals or groups. This form of bullying can occur on various platforms such as social media, messaging apps, gaming platforms, and mobile phones. With the outbreak of covid-19, there was a drastic increase in utilization of social media. And this upsurge was coupled with cyberbullying, making it a pressing issue that needs to be addressed. Sentiment analysis involves identifying and categorizing emotions and opinions expressed in text data using natural language processing and machine learning techniques. SVM is a machine learning algorithm that has been widely used for sentiment analysis due to its accuracy and efficiency. Objectives: The main objective of this study is to use SVM for sentiment analysis of cyberbullying tweets and evaluate its performance. The study aimed to determine the feasibility of using SVM for sentiment analysis and to assess its accuracy in detecting cyberbullying. Methods: The quantitative research method is used in this thesis, and data is analyzed using statistical analysis. The data set is from Kaggle and includes data about cyberbullying tweets. The collected data is preprocessed and used to train and test an SVM model. The created model will be evaluated on the test set using evaluation accuracy, precision, recall, and F1 score to determine the performance of the SVM model developed to detect cyberbullying. Results: The results showed that SVM is a suitable technique for sentiment analysis of cyberbullying tweets. The model had an accuracy of 82.3% in detecting cyberbullying, with a precision of 0.82, recall of 0.82, and F1-score of 0.83. Conclusions: The study demonstrates the feasibility of using SVM for sentimental analysis of cyberbullying tweets. The high accuracy of the SVM model suggests that it can be used to build automated systems for detecting cyberbullying. The findings highlight the importance of developing tools to detect and address cyberbullying in the online world. The use of sentimental analysis and SVM has the potential to make a significant contribution to the fight against cyberbullying.
65

Building the Dresden Web Table Corpus: A Classification Approach

Lehner, Wolfgang, Eberius, Julian, Braunschweig, Katrin, Hentsch, Markus, Thiele, Maik, Ahmadov, Ahmad 12 January 2023 (has links)
In recent years, researchers have recognized relational tables on the Web as an important source of information. To assist this research we developed the Dresden Web Tables Corpus (DWTC), a collection of about 125 million data tables extracted from the Common Crawl (CC) which contains 3.6 billion web pages and is 266TB in size. As the vast majority of HTML tables are used for layout purposes and only a small share contains genuine tables with different surface forms, accurate table detection is essential for building a large-scale Web table corpus. Furthermore, correctly recognizing the table structure (e.g. horizontal listings, matrices) is important in order to understand the role of each table cell, distinguishing between label and data cells. In this paper, we present an extensive table layout classification that enables us to identify the main layout categories of Web tables with very high precision. We therefore identify and develop a plethora of table features, different feature selection techniques and several classification algorithms. We evaluate the effectiveness of the selected features and compare the performance of various state-of-the-art classification algorithms. Finally, the winning approach is employed to classify millions of tables resulting in the Dresden Web Table Corpus (DWTC).
66

Towards a Hybrid Imputation Approach Using Web Tables

Lehner, Wolfgang, Ahmadov, Ahmad, Thiele, Maik, Eberius, Julian, Wrembel, Robert 12 January 2023 (has links)
Data completeness is one of the most important data quality dimensions and an essential premise in data analytics. With new emerging Big Data trends such as the data lake concept, which provides a low cost data preparation repository instead of moving curated data into a data warehouse, the problem of data completeness is additionally reinforced. While traditionally the process of filling in missing values is addressed by the data imputation community using statistical techniques, we complement these approaches by using external data sources from the data lake or even the Web to lookup missing values. In this paper we propose a novel hybrid data imputation strategy that, takes into account the characteristics of an incomplete dataset and based on that chooses the best imputation approach, i.e. either a statistical approach such as regression analysis or a Web-based lookup or a combination of both. We formalize and implement both imputation approaches, including a Web table retrieval and matching system and evaluate them extensively using a corpus with 125M Web tables. We show that applying statistical techniques in conjunction with external data sources will lead to a imputation system which is robust, accurate, and has high coverage at the same time.
67

Characterization of Foods by Chromatographic and Spectroscopic Methods Coupled to Chemometrics

Aloglu, Ahmet Kemal 06 June 2018 (has links)
No description available.
68

Activity Recogniton Using Accelerometer and Gyroscope Data From Pocket-Worn Smartphones

Söderberg, Oskar, Blommegård, Oscar January 2021 (has links)
Human Activity Recognition (HAR) is a widelyresearched field that has gained importance due to recentadvancements in sensor technology and machine learning. InHAR, sensors are used to identify the activity that a person is performing.In this project, the six everyday life activities walking,biking, sitting, standing, ascending stairs and descending stairsare classified using smartphone accelerometer and gyroscope datacollected by three subjects in their everyday life. To performthe classification, two different machine learning algorithms,Artificial Neural Network (ANN) and Support Vector Machine(SVM) are implemented and compared. Moreover, we comparethe accuracy of the two sensors, both individually and combined.Our results show that the accuracy is higher using only theaccelerometer data compared to using only the gyroscope data.For the accelerometer data, the accuracy is greater than 95%for both algorithms and only between 83-93% using gyroscopedata. Also, there is a small synergy effect when using both sensors,yielding higher accuracy than for any individual sensor data, andreaching 98.5% using ANN. Furthermore, for all sensor types, theANN outperforms the SVM algorithm, having a greater accuracyby more than 1.5-9 percentage points. / Aktivitetsigenkänning är ett noga studeratforskningsområde som växt i popularitet på senare tid på grundav nya framsteg inom sensorteknologi and maskininlärning. Inomaktivitetsigenkänning använder man sensorer för att identifieravilken aktivitet en person utför. I det här projektet undersökervi de sex olika vardagsmotionsaktiviteterna gå, cykla, sitta, stå och gå i trappor (up/ner) med hjälp av data från accelerometeroch gyroskop i en smartphone som samlats in av tre olikapersoner. Två olika maskininlärningsalgoritmer implementerasoch jämförs: Artificial Neural Network (ANN) och SupportVector Machine (SVM). Vidare jämför vi noggranheten förde två sensorna, både individuellt och gemensamt. Våra resultvisar att noggranheten är större när enbart accelerometerdatananvänds jämfört med att använda enbart gyroskopdatan. Föraccelerometerdatan erhålls en noggranhet större än 95 % förbåda algoritmerna medan den siffran bara är mellan 83-93 %för gyroskopdatan. Dessutom existerar det en synergieffekt vidanvändande av båda sensorerna, och noggranheten når då 98.5% vid användande av ANN. Vidare visar våra resultat att ANNhar en noggranhet som är 1.5-9 procentenheter bättre än SVMför alla sensorer. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
69

Validation DSL for client-server applications

Fedorenko, Vitalii M. 10 1900 (has links)
<p>Given the nature of client-server applications, most use some freeform interface, like web forms, to collect user input. The main difficulty with this approach is that all parameters obtained in this fashion need to be validated and normalized to protect the application from invalid entries. This is the problem addressed here: how to take client input and preprocess it before passing the data to a back-end, which concentrates on business logic. The method of implementation is a rule engine that uses Groovy internal domain-specific language (DSL) for specifying input requirements. We will justify why the DSL is a good fit for a validation rule engine, describe existing techniques used in this area and comprehensively address the related issues of accidental complexity, security, and user experience.</p> / Master of Science (MSc)
70

A STUDY ON THE IMPACT OF PREPROCESSING STEPS ON MACHINE LEARNING MODEL FAIRNESS

Sathvika Kotha (18370548) 17 April 2024 (has links)
<p dir="ltr">The success of machine learning techniques in widespread applications has taught us that with respect to accuracy, the more data, the better the model. However, for fairness, data quality is perhaps more important than quantity. Existing studies have considered the impact of data preprocessing on the accuracy of ML model tasks. However, the impact of preprocessing on the fairness of the downstream model has neither been studied nor well understood. Throughout this thesis, we conduct a systematic study of how data quality issues and data preprocessing steps impact model fairness. Our study evaluates several preprocessing techniques for several machine learning models trained over datasets with different characteristics and evaluated using several fairness metrics. It examines different data preparation techniques, such as changing categories into numbers, filling in missing information, and smoothing out unusual data points. The study measures fairness using standards that check if the model treats all groups equally, predicts outcomes fairly, and gives similar chances to everyone. By testing these methods on various types of data, the thesis identifies which combinations of techniques can make the models both accurate and fair.The empirical analysis demonstrated that preprocessing steps like one-hot encoding, imputation of missing values, and outlier treatment significantly influence fairness metrics. Specifically, models preprocessed with median imputation and robust scaling exhibited the most balanced performance across fairness and accuracy metrics, suggesting a potential best practice guideline for equitable ML model preparation. Thus, this work sheds light on the importance of data preparation in ML and emphasizes the need for careful handling of data to support fair and ethical use of ML in society.</p>

Page generated in 0.0989 seconds