Global ETD Search

1	Optimizing t-SNE using random sampling techniques Buljan, Matej January 2019 (has links) The main topic of this thesis concerns t-SNE, a dimensionality reduction technique that has gained much popularity for showing great capability of preserving well-separated clusters from a high-dimensional space. Our goal with this thesis is twofold. Firstly we give an introduction to the use of dimensionality reduction techniques in visualization and, following recent research, show that t-SNE in particular is successful at preserving well-separated clusters. Secondly, we perform a thorough series of experiments that give us the ability to draw conclusions about the quality of embeddings from running t-SNE on samples of data using different sampling techniques. We are comparing pure random sampling, random walk sampling and so-called hubness sampling on a dataset, attempting to find a sampling method that is consistently better at preserving local information than simple random sampling. Throughout our testing, a specific variant of random walk sampling distinguished itself as a better alternative to pure random sampling. Machine learning t-SNE visualization sampling Mathematics Matematik
2	T-Distributed Stochastic Neighbor Embedding Data Preprocessing Impact on Image Classification using Deep Convolutional Neural Networks Droh, Erik January 2018 (has links) Image classification in Machine Learning encompasses the task of identification of objects in an image. The technique has applications in various areas such as e-commerce, social media and security surveillance. In this report the author explores the impact of using t-Distributed Stochastic Neighbor Embedding (t-SNE) on data as a preprocessing step when classifying multiple classes of clothing with a state-of-the-art Deep Convolutional Neural Network (DCNN). The t-SNE algorithm uses dimensionality reduction and groups similar objects close to each other in three-dimensional space. Extracting this information in the form of a positional coordinate gives us a new parameter which could help with the classification process since the features it uses can be different from that of the DCNN. Therefore, three slightly different DCNN models receives different input and are compared. The first benchmark model only receives pixel values, the second and third receive pixel values together with the positional coordinates from the t-SNE preprocessing for each data point, but with different hyperparameter values in the preprocessing step. The Fashion-MNIST dataset used contains 10 different clothing classes which are normalized and gray-scaled for easeof-use. The dataset contains 70.000 images in total. Results show minimum change in classification accuracy in the case of using a low-density map with higher learning rate as the data size increases, while a more dense map and lower learning rate performs a significant increase in accuracy of 4.4% when using a small data set. This is evidence for the fact that the method can be used to boost results when data is limited. / Bildklassificering i maskinlärning innefattar uppgiften att identifiera objekt i en bild. Tekniken har applikationer inom olika områden så som e-handel, sociala medier och säkerhetsövervakning. I denna rapport undersöker författaren effekten av att användat-Distributed Stochastic Neighbour Embedding (t-SNE) på data som ett förbehandlingssteg vid klassificering av flera klasser av kläder med ett state-of-the-art Deep Convolutio-nal Neural Network (DCNN). t-SNE-algoritmen använder dimensioneringsreduktion och grupperar liknande objekt nära varandra i tredimensionellt utrymme. Att extrahera denna information i form av en positionskoordinat ger oss en ny parameter som kan hjälpa till med klassificeringsprocessen eftersom funktionerna som den använder kan skilja sig från DCNN-modelen. Tre olika DCNN-modeller får olika in-data och jämförs därefter. Den första referensmodellen mottar endast pixelvärden, det andra och det tredje motar pixelvärden tillsammans med positionskoordinaterna från t-SNE-förbehandlingen för varje datapunkt men med olika hyperparametervärden i förbehandlingssteget. I studien används Fashion-MNIST datasetet som innehåller 10 olika klädklasser som är normaliserade och gråskalade för enkel användning. Datasetet innehåller totalt 70.000 bilder. Resultaten visar minst förändring i klassificeringsnoggrannheten vid användning av en låg densitets karta med högre inlärningsgrad allt eftersom datastorleken ökar, medan en mer tät karta och lägre inlärningsgrad uppnår en signifikant ökad noggrannhet på 4.4% när man använder en liten datamängd. Detta är bevis på att metoden kan användas för att öka klassificeringsresultaten när datamängden är begränsad. Djup maskininlärning Computer and Information Sciences Data- och informationsvetenskap
3	Clustering classification and human perception of automative steering wheel transient vibrations Mohd Yusoff, Sabariah January 2017 (has links) In the 21st century, the proliferation of steer-by-wire systems has become a central issue in the automobile industry. With such systems there is often an objective to minimise vibrations on the steering wheel to increase driver comfort. Nevertheless, steering wheel vibration is also recognised as an important medium that assists drivers in judging the vehicle's subsystems dynamics as well as to indicate important information such as the presence of danger. This has led to studies of the possible role of vibrational stimuli towards informing drivers of environment conditions such as road surface types. Numerous prior studies were done to identify how characteristics of steering wheel vibrational stimuli might influence driver road surface detection which suggested that there is no single, optimal, acceleration gain that could improve the detection of all road surface types. There is currently a lack of studies on the characteristics of transient vibrations of steering wheel as appear to be an important source of information to the driver road surface detection. Therefore, this study is design to identify the similarity characteristics of transient vibrations for answering the main research question: "What are the time-domain features of transient vibrations that can optimise driver road surface detection?" This study starts by critically reviewing the existing principles of transient vibrations detection to ensure that the identified transient vibrations from original steering wheel vibrations satisfy with the definition of transient vibrations. The study continues by performing the experimental activities to identify the optimal measurement signal for both identification process of transient vibrations and driver road surface detection without taking for granted the basic measurement of signal processing. The studies then identify the similarity of transient vibrations according to their time-domain features. The studies done by performing the high-dimensional reduction techniques associated with clustering methods. Result suggests that the time-domain features of transient vibrations that can optimise driver road surface detection were found to consist of duration (Δt), amplitude (m/s2), energy (r.m.s) and Kurtosis.
4	Data mining / Data mining Mrázek, Michal January 2019 (has links) The aim of this master’s thesis is analysis of the multidimensional data. Three dimensionality reduction algorithms are introduced. It is shown how to manipulate with text documents using basic methods of natural language processing. The goal of the practical part of the thesis is to process real-world data from the internet forum. Posted messages are transformed to the numerical representation, then to two-dimensional space and visualized. Later on, topics of the messages are discovered. In the last part, a few selected algorithms are compared.
5	Nové metody pro analýzu spánku a klasifikaci / Novel methods for sleep analysis and classification Navrátilová, Markéta January 2020 (has links) Tato diplomová práce se zabývá metodami pro analýzu a klasifikaci spánku. Popisuje jakjednotlivé spánkové fáze a vzorce biosignálů v průběhu spánku, tak metody pro klasifi-kaci. Příznaky jsou extrahovány na dodaných biosignálech ECG, EDA a RIP. Na základětěchto příznaků jsou klasifikovány jednotlivé spánkové fáze s využitím klasifikátoru ná-hodný les. Parametry klasifikátoru jsou optimalizovány a následně jsou vyhodnocenydosažené výsledky. Pomocí metod pro redukci dimenzionality je soubor příznaků analy-zován a výsledky jsou porovnány s výsledky ze standardní klasifikace. Řešení pro vizuali-zaci jak samotných nezpracovaných signálů, tak extrahovaných příznaků je navrhnuto aimplementováno. Dosažené výsledky jsou porovnány s publikovanými metodami.
6	Zobrazení a analýza aktivit neuronové sítě ve skrytých vrstvách / Activity of Neural Network in Hidden Layers - Visualisation and Analysis Fábry, Marko January 2016 (has links) Goal of this work was to create system capable of visualisation of activation function values, which were produced by neurons placed in hidden layers of neural networks used for speech recognition. In this work are also described experiments comparing methods for visualisation, visualisations of neural networks with different architectures and neural networks trained with different types of input data. Visualisation system implemented in this work is based on previous work of Mr. Khe Chai Sim and extended with new methods of data normalization. Kaldi toolkit was used for neural network training data preparation. CNTK framework was used for neural network training. Core of this work - the visualisation system was implemented in scripting language Python.
7	Constructing and representing a knowledge graph(KG) for Positive Energy Districts (PEDs) Davari, Mahtab January 2023 (has links) In recent years, knowledge graphs(KGs) have become essential tools for visualizing concepts and retrieving contextual information. However, constructing KGs for new and specialized domains like Positive Energy Districts (PEDs) presents unique challenges, particularly when dealing with unstructured texts and ambiguous concepts from academic articles. This study focuses on various strategies for constructing and inferring KGs, specifically incorporating entities related to PEDs, such as projects, technologies, organizations, and locations. We utilize visualization techniques and node embedding methods to explore the graph's structure and content and apply filtering techniques and t-SNE plots to extract subgraphs based on specific categories or keywords. One of the key contributions is using the longest path method, which allows us to uncover intricate relationships, interconnectedness between entities, critical paths, and hidden patterns within the graph, providing valuable insights into the most significant connections. Additionally, community detection techniques were employed to identify distinct communities within the graph, providing further understanding of the structural organization and clusters of interconnected nodes with shared themes. The paper also presents a detailed evaluation of a question-answering system based on the KG, where the Universal Sentence Encoder was used to convert text into dense vector representations and calculate cosine similarity to find similar sentences. We assess the system's performance through precision and recall analysis and conduct statistical comparisons of graph embeddings, with Node2Vec outperforming DeepWalk in capturing similarities and connections. For edge prediction, logistic regression, focusing on pairs of neighbours that lack a direct connection, was employed to effectively identify potential connections among nodes within the graph. Additionally, probabilistic edge predictions, threshold analysis, and the significance of individual nodes were discussed. Lastly, the advantages and limitations of using existing KGs(Wikidata and DBpedia) versus constructing new ones specifically for PEDs were investigated. It is evident that further research and data enrichment is necessary to address the scarcity of domain-specific information from existing sources. Knowledge graph Positive Energy Districts (PEDs) longest path Questions and Answers Community Detection Node Embedding t-SNE plots Edge Prediction Computer Sciences Datavetenskap (datalogi)
8	Stylometry: Quantifying Classic Literature For Authorship Attribution : - A Machine Learning Approach Yousif, Jacob, Scarano, Donato January 2024 (has links) Classic literature is rich, be it linguistically, historically, or culturally, making it valuable for future studies. Consequently, this project chose a set of 48 classic books to conduct a stylometric analysis on the defined set of books, adopting an approach used by a related work to divide the books into text segments, quantify the resulting text segments, and analyze the books using the quantified values to understand the linguistic attributes of the books. Apart from the latter, this project conducted different classification tasks for other objectives. In one respect, the study used the quantified values of the text segments of the books for classification tasks using advanced models like LightGBM and TabNet to assess the application of this approach in authorship attribution. From another perspective, the study utilized a State-Of-The-Art model, namely, RoBERTa for classification tasks using the segmented texts of the books instead to evaluate the performance of the model in authorship attribution. The results uncovered the characteristics of the books to a reasonable degree. Regarding the authorship attribution tasks, the results suggest that segmenting and quantifying text using stylometric analysis and supervised machine learning algorithms is practical in such tasks. This approach, while showing promise, may still require further improvements to achieve optimal performance. Lastly, RoBERTa demonstrated high performance in authorship attribution tasks. Authorship Attribution Classic Literature Analysis Clustering Data Science Deep Learning Feature Engineering Feature Extraction Gradient Descent K-Means LightGBM Machine Learning Multiclass Classification NLP Neural Network RoBERTa Stylometric Analysis Stylometry TabNet t-SNE Text Mining Transformer Models Computer Sciences Datavetenskap (datalogi) Computer and Information Sciences Data- och informationsvetenskap

Search results