Global ETD Search

61	Metody shlukování textových dat / Textual Data Clustering Methods Miloš, Roman January 2011 (has links) Clustering of text data is one of tasks of text mining. It divides documents into the different categories that are based on their similarities. These categories help to easily search in the documents. This thesis describes the current methods that are used for the text document clustering. From these methods we chose Simultaneous keyword identification and clustering of text documents (SKWIC). It should achieve better results than the standard clustering algorithms such as k-means. There is designed and implemented an application for this algorithm. In the end, we compare SKWIC with a k-means algorithm.
62	On an automatically parallel generation technique for tetrahedral meshes Globisch, G. 30 October 1998 (has links) In order to prepare modern finite element analysis a program for the efficient parallel generation of tetrahedral meshes in a wide class of three dimensional domains having a generalized cylindric shape is presented. The applied mesh generation strategy is based on the decomposition of some 2D-reference domain into single con- nected subdomains by means of its triangulations the tetrahedral layers are built up in parallel. Adaptive grid controlling as well as nodal renumbering algorithms are involved. In the paper several examples are incorporated to demonstrate both program's capabilities and the handling with. info:eu-repo/classification/ddc/510 ddc:510 Mesh generation Parallel preprocessing Finite elements Domain decomposition MSC 65Y05
63	A Rule-Based Normalization System for Greek Noisy User-Generated Text Toska, Marsida January 2020 (has links) The ever-growing usage of social media platforms generates daily vast amounts of textual data which could potentially serve as a great source of information. Therefore, mining user-generated data for commercial, academic, or other purposes has already attracted the interest of the research community. However, the informal writing which often characterizes online user-generated texts poses a challenge for automatic text processing with Natural Language Processing (NLP) tools. To mitigate the effect of noise in these texts, lexical normalization has been proposed as a preprocessing method which in short is the task of converting non-standard word forms into a canonical one. The present work aims to contribute to this field by developing a rule-based normalization system for Greek Tweets. We perform an analysis of the categories of the out-of-vocabulary (OOV) word forms identified in the dataset and define hand-crafted rules which we combine with edit distance (Levenshtein distance approach) to tackle noise in the cases under scope. To evaluate the performance of the system we perform both an intrinsic and an extrinsic evaluation in order to explore the effect of normalization on the part-of-speech-tagging. The results of the intrinsic evaluation suggest that our system has an accuracy of approx. 95% compared to approx. 81% for the baseline. In the extrinsic evaluation, it is observed a boost of approx. 8% in the tagging performance when the text has been preprocessed through lexical normalization. nlp noisy text preprocessing rule-based levenshtein twitter normalization Greek
64	How to explain graph-based semi-supervised learning for non-mathematicians? Jönsson, Mattias, Borg, Lucas January 2019 (has links) Den stora mängden tillgänglig data på internet kan användas för att förbättra förutsägelser genom maskininlärning. Problemet är att sådan data ofta är i ett obehandlat format och kräver att någon manuellt bestämmer etiketter på den insamlade datan innan den kan användas av algoritmen. Semi-supervised learning (SSL) är en teknik där algoritmen använder ett fåtal förbehandlade exempel och därefter automatiskt bestämmer etiketter för resterande data. Ett tillvägagångssätt inom SSL är att representera datan i en graf, vilket kallas för graf-baserad semi-supervised learning (GSSL), och sedan hitta likheter mellan noderna i grafen för att automatiskt bestämma etiketter.Vårt mål i denna uppsatsen är att förenkla de avancerade processerna och stegen för att implementera en GSSL-algoritm. Vi kommer att gå igen grundläggande steg som hur utvecklingsmiljön ska installeras men även mer avancerade steg som data pre-processering och feature extraction. Feature extraction metoderna som uppsatsen använder sig av är bag-of-words (BOW) och term frequency-inverse document frequency (TF-IDF). Slutgiltligen presenterar vi klassificering av dokument med Label Propagation (LP) och Multinomial Naive Bayes (MNB) samt en detaljerad beskrivning över hur GSSL fungerar.Vi presenterar även prestanda för klassificering-algoritmerna genom att klassificera 20 Newsgroup datasetet med LP och MNB. Resultaten dokumenteras genom två olika utvärderingspoäng vilka är F1-score och accuracy. Vi gör även en jämförelse mellan MNB och LP med två olika typer av kärnor, KNN och RBF, på olika mängder av förbehandlade träningsdokument. Resultaten ifrån klassificering-algoritmerna visar att MNB är bättre på att klassificera datasetet än LP. / The large amount of available data on the web can be used to improve the predictions made by machine learning algorithms. The problem is that such data is often in a raw format and needs to be manually labeled by a human before it can be used by a machine learning algorithm. Semi-supervised learning (SSL) is a technique where the algorithm uses a few prepared samples to automatically prepare the rest of the data. One approach to SSL is to represent the data in a graph, also called graph-based semi-supervised learning (GSSL), and find similarities between the nodes for automatic labeling.Our goal in this thesis is to simplify the advanced processes and steps to implement a GSSL-algorithm. We will cover basic tasks such as setup of the developing environment and more advanced steps such as data preprocessing and feature extraction. The feature extraction techniques covered are bag-of-words (BOW) and term frequency-inverse document frequency (TF-IDF). Lastly, we present how to classify documents using Label Propagation (LP) and Multinomial Naive Bayes (MNB) with a detailed explanation of the inner workings of GSSL. We showcased the classification performance by classifying documents from the 20 Newsgroup dataset using LP and MNB. The results are documented using two different evaluation scores called F1-score and accuracy. A comparison between MNB and the LP-algorithm using two different types of kernels, KNN and RBF, was made on different amount of labeled documents. The results from the classification algorithms shows that MNB is better at classifying the data than LP. Graph based SSL Label Propagation Naive Bayes’ KNN RBF Feature extraction 20 newsgroup preprocessing graph construction Engineering and Technology Teknik och teknologier
65	Two-Refinement by Pillowing for Structured Hexahedral Meshes Malone, J. Bruce 06 December 2012 (has links) (PDF) A number of methods for adaptation of existing all-hexahedral grids by localized refinement have been developed; however, none ideally fit all refinement needs. This thesis presents the structure to a method of two-refinement developed for conformal, structured, all-hexahedral grids that offers flexibility beyond what has been offered to date. The method is fundamentally based on pillowing pairs of sheets of hexes. This thesis also suggests an implementation of the method, shows the results of examples refined using it and compares these results to results from implementing three-refinement on the same examples. hexahedral meshing pillowing two refinement eight refinement finite element analysis preprocessing adaptive meshing mesh improvement Civil and Environmental Engineering
66	Sentimental Analysis of CyberbullyingTweets with SVM Technique Thanikonda, Hrushikesh, Koneti, Kavya Sree January 2023 (has links) Background: Cyberbullying involves the use of digital technologies to harass, humiliate, or threaten individuals or groups. This form of bullying can occur on various platforms such as social media, messaging apps, gaming platforms, and mobile phones. With the outbreak of covid-19, there was a drastic increase in utilization of social media. And this upsurge was coupled with cyberbullying, making it a pressing issue that needs to be addressed. Sentiment analysis involves identifying and categorizing emotions and opinions expressed in text data using natural language processing and machine learning techniques. SVM is a machine learning algorithm that has been widely used for sentiment analysis due to its accuracy and efficiency. Objectives: The main objective of this study is to use SVM for sentiment analysis of cyberbullying tweets and evaluate its performance. The study aimed to determine the feasibility of using SVM for sentiment analysis and to assess its accuracy in detecting cyberbullying. Methods: The quantitative research method is used in this thesis, and data is analyzed using statistical analysis. The data set is from Kaggle and includes data about cyberbullying tweets. The collected data is preprocessed and used to train and test an SVM model. The created model will be evaluated on the test set using evaluation accuracy, precision, recall, and F1 score to determine the performance of the SVM model developed to detect cyberbullying. Results: The results showed that SVM is a suitable technique for sentiment analysis of cyberbullying tweets. The model had an accuracy of 82.3% in detecting cyberbullying, with a precision of 0.82, recall of 0.82, and F1-score of 0.83. Conclusions: The study demonstrates the feasibility of using SVM for sentimental analysis of cyberbullying tweets. The high accuracy of the SVM model suggests that it can be used to build automated systems for detecting cyberbullying. The findings highlight the importance of developing tools to detect and address cyberbullying in the online world. The use of sentimental analysis and SVM has the potential to make a significant contribution to the fight against cyberbullying. Cyberbullying tweets Dataset Data preprocessing Machine Learning Supervised Learning Support Vector Machine Validation. Computer Sciences Datavetenskap (datalogi)
67	Building the Dresden Web Table Corpus: A Classification Approach Lehner, Wolfgang, Eberius, Julian, Braunschweig, Katrin, Hentsch, Markus, Thiele, Maik, Ahmadov, Ahmad 12 January 2023 (has links) In recent years, researchers have recognized relational tables on the Web as an important source of information. To assist this research we developed the Dresden Web Tables Corpus (DWTC), a collection of about 125 million data tables extracted from the Common Crawl (CC) which contains 3.6 billion web pages and is 266TB in size. As the vast majority of HTML tables are used for layout purposes and only a small share contains genuine tables with different surface forms, accurate table detection is essential for building a large-scale Web table corpus. Furthermore, correctly recognizing the table structure (e.g. horizontal listings, matrices) is important in order to understand the role of each table cell, distinguishing between label and data cells. In this paper, we present an extensive table layout classification that enables us to identify the main layout categories of Web tables with very high precision. We therefore identify and develop a plethora of table features, different feature selection techniques and several classification algorithms. We evaluate the effectiveness of the selected features and compare the performance of various state-of-the-art classification algorithms. Finally, the winning approach is employed to classify millions of tables resulting in the Dresden Web Table Corpus (DWTC). info:eu-repo/classification/ddc/004 ddc:004
68	Towards a Hybrid Imputation Approach Using Web Tables Lehner, Wolfgang, Ahmadov, Ahmad, Thiele, Maik, Eberius, Julian, Wrembel, Robert 12 January 2023 (has links) Data completeness is one of the most important data quality dimensions and an essential premise in data analytics. With new emerging Big Data trends such as the data lake concept, which provides a low cost data preparation repository instead of moving curated data into a data warehouse, the problem of data completeness is additionally reinforced. While traditionally the process of filling in missing values is addressed by the data imputation community using statistical techniques, we complement these approaches by using external data sources from the data lake or even the Web to lookup missing values. In this paper we propose a novel hybrid data imputation strategy that, takes into account the characteristics of an incomplete dataset and based on that chooses the best imputation approach, i.e. either a statistical approach such as regression analysis or a Web-based lookup or a combination of both. We formalize and implement both imputation approaches, including a Web table retrieval and matching system and evaluate them extensively using a corpus with 125M Web tables. We show that applying statistical techniques in conjunction with external data sources will lead to a imputation system which is robust, accurate, and has high coverage at the same time. info:eu-repo/classification/ddc/004 ddc:004
69	Characterization of Foods by Chromatographic and Spectroscopic Methods Coupled to Chemometrics Aloglu, Ahmet Kemal 06 June 2018 (has links) No description available. Chemistry Food Science Chemometrics Classification Food Characterization Spectroscopy Multivariate Data Analysis Partial Least Squares Preprocessing Classification Tree Chromatography
70	Activity Recogniton Using Accelerometer and Gyroscope Data From Pocket-Worn Smartphones Söderberg, Oskar, Blommegård, Oscar January 2021 (has links) Human Activity Recognition (HAR) is a widelyresearched field that has gained importance due to recentadvancements in sensor technology and machine learning. InHAR, sensors are used to identify the activity that a person is performing.In this project, the six everyday life activities walking,biking, sitting, standing, ascending stairs and descending stairsare classified using smartphone accelerometer and gyroscope datacollected by three subjects in their everyday life. To performthe classification, two different machine learning algorithms,Artificial Neural Network (ANN) and Support Vector Machine(SVM) are implemented and compared. Moreover, we comparethe accuracy of the two sensors, both individually and combined.Our results show that the accuracy is higher using only theaccelerometer data compared to using only the gyroscope data.For the accelerometer data, the accuracy is greater than 95%for both algorithms and only between 83-93% using gyroscopedata. Also, there is a small synergy effect when using both sensors,yielding higher accuracy than for any individual sensor data, andreaching 98.5% using ANN. Furthermore, for all sensor types, theANN outperforms the SVM algorithm, having a greater accuracyby more than 1.5-9 percentage points. / Aktivitetsigenkänning är ett noga studeratforskningsområde som växt i popularitet på senare tid på grundav nya framsteg inom sensorteknologi and maskininlärning. Inomaktivitetsigenkänning använder man sensorer för att identifieravilken aktivitet en person utför. I det här projektet undersökervi de sex olika vardagsmotionsaktiviteterna gå, cykla, sitta, stå och gå i trappor (up/ner) med hjälp av data från accelerometeroch gyroskop i en smartphone som samlats in av tre olikapersoner. Två olika maskininlärningsalgoritmer implementerasoch jämförs: Artificial Neural Network (ANN) och SupportVector Machine (SVM). Vidare jämför vi noggranheten förde två sensorna, både individuellt och gemensamt. Våra resultvisar att noggranheten är större när enbart accelerometerdatananvänds jämfört med att använda enbart gyroskopdatan. Föraccelerometerdatan erhålls en noggranhet större än 95 % förbåda algoritmerna medan den siffran bara är mellan 83-93 %för gyroskopdatan. Dessutom existerar det en synergieffekt vidanvändande av båda sensorerna, och noggranheten når då 98.5% vid användande av ANN. Vidare visar våra resultat att ANNhar en noggranhet som är 1.5-9 procentenheter bättre än SVMför alla sensorer. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm Elektroteknik och elektronik

Search results