• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 72
  • 35
  • 12
  • 9
  • 5
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 156
  • 58
  • 37
  • 33
  • 30
  • 30
  • 27
  • 27
  • 27
  • 23
  • 19
  • 19
  • 18
  • 17
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

Získávání znalostí z webových logů / Knowledge Discovery from Web Logs

Vlk, Vladimír January 2013 (has links)
This master's thesis deals with creating of an application, goal of which is to perform data preprocessing of web logs and finding association rules in them. The first part deals with the concept of Web mining. The second part is devoted to Web usage mining and notions related to it. The third part deals with design of the application. The forth section is devoted to describing the implementation of the application. The last section deals with experimentation with the application and results interpretation.
92

Obrazová analýza mitotických chromosomů / Digital image analysis of mitotic chromosomes

Jaroš, Luboš January 2017 (has links)
The development in modern medicine has allowed to study human genome and detect predispositions to several diseases. One of very promising techniques is the analysis of human karyotype, i.e., the number and appearance of chromosomes in the cell nucleus. The most important step in the karyotype analysis is the chromosome detection and categorization. In this work, a new algorithm for detection of chromosomes from an image of microscopic DNA sample and their categorization into seven groups was developed. The algorithm was implemented in Matlab. The accuracy of segmentation and classification was tested on a set of images from two databases with 117 and 38 images, respectively. The sensitivity of the developed segmentation reached 88% while the value of positive predictivity of segmentation reached 92%. The success rate of chromosome pairing achieves 77%.
93

Toward an on-line preprocessor for Swedish / Mot en on-line preprocessor för svenska

Wemmert, Oscar January 2017 (has links)
This bachelor thesis presents OPT (Open Parse Tool), a java program allowing for independent parsers/taggers to be run in sequence. For this thesis the existing java versions of Stagger and Maltparser has been adapted for use as modules in this program, and OPT's performance has then been compared to an existing, in use, alternative (Språkbanken's Korp Corpus Pipeline, henceforth KCP). Execution speed has been compared, and OPT's accuracy has been coarsly tested as either comparable or divergent to that of KCP. The same collection of documents containing natural text has been fed through OPT and KCP in sequence, and execution time was recorded. The tagged output of OPT and KCP was then run through SCREAM (Sjöholm, 2012) and if SCREAM produced comparable results between the two, the accuracy of OPT was considered as comparable to KCP. The results show that OPT completes its tagging and parsing of the documents in around 35 minutes, while KCP took over four hours to complete. SCREAM performed almost exactly the same using the outputs of either program, except for one case in which OPT's output gave better results than KCP's. The accuracy of OPT was thus considered comparable to KCP. The one divergent example can not fully be understood or explained in this thesis, given that the thesis considers SCREAM's internals as mostly that of a black box.
94

Preprocessing and analysis of environmental data : Application to the water quality assessment of Mexican rivers / Pré-traitement et analyse des données environnementales : application à l'évaluation de la qualité de l'eau des rivières mexicaines

Serrano Balderas, Eva Carmina 31 January 2017 (has links)
Les données acquises lors des surveillances environnementales peuvent être sujettes à différents types d'anomalies (i.e., données incomplètes, inconsistantes, inexactes ou aberrantes). Ces anomalies qui entachent la qualité des données environnementales peuvent avoir de graves conséquences lors de l'interprétation des résultats et l’évaluation des écosystèmes. Le choix des méthodes de prétraitement des données est alors crucial pour la validité des résultats d'analyses statistiques et il est assez mal défini. Pour étudier cette question, la thèse s'est concentrée sur l’acquisition des données et sur les protocoles de prétraitement des données afin de garantir la validité des résultats d'analyse des données, notamment dans le but de recommander la séquence de tâches de prétraitement la plus adaptée. Nous proposons de maîtriser l'intégralité du processus de production des données, de leur collecte sur le terrain et à leur analyse, et dans le cas de l'évaluation de la qualité de l'eau, il s’agit des étapes d'analyse chimique et hydrobiologique des échantillons produisant ainsi les données qui ont été par la suite analysées par un ensemble de méthodes statistiques et de fouille de données. En particulier, les contributions multidisciplinaires de la thèse sont : (1) en chimie de l'eau: une procédure méthodologique permettant de déterminer les quantités de pesticides organochlorés dans des échantillons d'eau collectés sur le terrain en utilisant les techniques SPE–GC-ECD (Solid Phase Extraction - Gas Chromatography - Electron Capture Detector) ; (2) en hydrobiologie : une procédure méthodologique pour évaluer la qualité de l’eau dans quatre rivières Mexicaines en utilisant des indicateurs biologiques basés sur des macroinvertébrés ; (3) en science des données : une méthode pour évaluer et guider le choix des procédures de prétraitement des données produites lors des deux précédentes étapes ainsi que leur analyse ; et enfin, (4) le développement d’un environnement analytique intégré sous la forme d’une application développée en R pour l’analyse statistique des données environnementales en général et l’analyse de la qualité de l’eau en particulier. Enfin, nous avons appliqué nos propositions sur le cas spécifique de l’évaluation de la qualité de l’eau des rivières Mexicaines Tula, Tamazula, Humaya et Culiacan dans le cadre de cette thèse qui a été menée en partie au Mexique et en France. / Data obtained from environmental surveys may be prone to have different anomalies (i.e., incomplete, inconsistent, inaccurate or outlying data). These anomalies affect the quality of environmental data and can have considerable consequences when assessing environmental ecosystems. Selection of data preprocessing procedures is crucial to validate the results of statistical analysis however, such selection is badly defined. To address this question, the thesis focused on data acquisition and data preprocessing protocols in order to ensure the validity of the results of data analysis mainly, to recommend the most suitable sequence of preprocessing tasks. We propose to control every step in the data production process, from their collection on the field to their analysis. In the case of water quality assessment, it comes to the steps of chemical and hydrobiological analysis of samples producing data that were subsequently analyzed by a set of statistical and data mining methods. The multidisciplinary contributions of the thesis are: (1) in environmental chemistry: a methodological procedure to determine the content of organochlorine pesticides in water samples using the SPE-GC-ECD (Solid Phase Extraction – Gas Chromatography – Electron Capture Detector) techniques; (2) in hydrobiology: a methodological procedure to assess the quality of water on four Mexican rivers using macroinvertebrates-based biological indices; (3) in data sciences: a method to assess and guide on the selection of preprocessing procedures for data produced from the two previous steps as well as their analysis; and (4) the development of a fully integrated analytics environment in R for statistical analysis of environmental data in general, and for water quality data analytics, in particular. Finally, within the context of this thesis that was developed between Mexico and France, we have applied our methodological approaches on the specific case of water quality assessment of the Mexican rivers Tula, Tamazula, Humaya and Culiacan.
95

Získávání znalostí na webu - shlukování / Web Mining - Clustering

Rychnovský, Martin January 2008 (has links)
This work presents the topic of data mining on the web. It is focused on clustering. The aim of this project was to study the field of clustering and to implement clustering through the k-means algorithm. Then, the algorithm was tested on a dataset of text documents and on data extracted from web. This clustering method was implemented by means of Java technologies.
96

Metody pro získávání asociačních pravidel z dat / Methods for Mining Association Rules from Data

Uhlíř, Martin January 2007 (has links)
The aim of this thesis is to implement Multipass-Apriori method for mining association rules from text data. After the introduction to the field of knowledge discovery, the specific aspects of text mining are mentioned. In the mining process, preprocessing is a very important problem, use of stemming and stop words dictionary is necessary in this case. Next part of thesis deals with meaning, usage and generating of association rules. The main part is focused on the description of Multipass-Apriori method, which was implemented. On the ground of executed tests the most optimal way of dividing partitions was set and also the best way of sorting the itemsets. As a part of testing, Multipass-Apriori method was compared with Apriori method.
97

Entwicklung und Implementierung einer Finite-Elemente-Software für mobile Endgeräte

Goller, Daniel, Glenk, Christian, Rieg, Frank 30 June 2015 (has links)
In dem Vortrag wird die Entwicklung einer Finiten-Elemente-App für Android dargelegt, sowie die Vorteile im Postprozessing von einfachen Strukturen bei der Verwendung der Gestensteuerung erörtert.
98

Pattern analysis of the user behaviour in a mobile application using unsupervised machine learning / Mönsteranalys av användarbeteenden i en mobilapp med hjälp av oövervakad maskininlärning

Hrstic, Dusan Viktor January 2019 (has links)
Continuously increasing amount of logged data increases the possibilities of finding new discoveries about the user interaction with the application for which the data is logged. Traces from the data may reveal some specific user behavioural patterns which can discover how to improve the development of the application by showing the ways in which the application is utilized. In this thesis, unsupervised machine learning techniques are used in order to group the users depending on their utilization of SEB Privat Android mobile application. The user interactions in the applications are first extracted, then various data preprocessing techniques are implemented to prepare the data for clustering and finally two clustering algorithms, namely, HDBSCAN and KMedoids are performed to cluster the data. Three types of user behaviour have been found from both K-medoids and HDBSCAN algorithm. There are users that tend to interact more with the application and navigate through its deeper layers, then the ones that consider only a quick check of their account balance or transaction, and finally regular users. Among the resulting features chosen with the help of feature selection methods, 73 % of them are related to user behaviour. The findings can be used by the developers to improve the user interface and overall functionalities of application. The user flow can thus be optimized according to the patterns in which the users tend to navigate through the application. / En ständigt växande datamängd ökar möjligheterna att hitta nya upptäckter om användningen av en mobil applikation för vilken data är loggad. Spår som visas i data kan avslöja vissa specifika användarbeteenden som kan förbättra applikationens utveckling genom att antyda hur applikationen används. I detta examensarbete används oövervakade maskininlärningstekniker för att gruppera användarna beroende på deras bruk av SEB Privat Android mobilapplikation. Användarinteraktionerna i applikationen extraheras ut först, sedan används olika databearbetningstekniker för att förbereda data för klustringen och slutligen utförs två klustringsalgoritmer, nämligen HDBSCAN och Kmedoids för att gruppera data. Tre distinkta typer av användarbeteende har hittats från både K-medoids och HDBSCAN-algoritmen. Det finns användare som har en tendens att interagera mer med applikationen och navigera genom sitt djupare lager, sedan finns det de som endast snabbt kollar på deras kontosaldo eller transaktioner och till slut finns det vanliga användare. Bland de resulterande attributen som hade valts med hjälp av teknikerna för val av attribut, är 73% av dem relaterade till användarbeteendet. Det som upptäcktes i denna avhandling kan användas för att utvecklarna ska kunna förbättra användargränssnittet och övergripande funktioner i applikationen. Användarflödet kan därmed optimeras med hänsyn till de sätt enligt vilka användarna har en speciell tendens att navigera genom applikationen.
99

Förbehandling och Hantering av Användarmärkningar på E-handelsartiklar / Preprocessing and Treatment of User Tags on E-commerce Articles

Johansson, Viktor January 2023 (has links)
Plick is an online platform with the intention of being a marketplace where users may buy and sell second-hand fashion. The platform caters to younger users, and as such borrows many ideas from well-known social network platforms - such as putting more focus on user profiles and expression, rather than just the products themselves. One of these ideas is to allow users free reign over tagging their items, rather than having them select from some constrained, pre-approved, group of categories, styles, sizes - et cetera. A problem of letting users tag products however they see fit is that a subset of users will inevitably try to 'game' the system by knowingly tagging their products using incorrect labels - resulting in inaccurate search results for many of these incorrect tags.The aim of this project is to firstly develop a pre-processing algorithm to normalize the user generated tagging data - to handle situations such as a tag having multiple different (albeit possibly all correct) spellings, capitalizations, typos, languages etc. The processed data will then be used to develop two different approaches to solve the problem of incorrect tagging. The first approach involves using the normalized data to create a graph representation of the tags and their relations to each other. Each node in the graph will represent an individual tag, and each edge between nodes will explain how closely related those two tags are. An algorithm will then be developed to, utilizing the tag relation graph, describe the relatedness of an arbitrary group of tags. The algorithm should also be able to identify any tags that are outliers among the group. The second approach entails the development of a gaussian naive bayes classifier, with the goal of identifying whether an article is anomalistic or not - given the group of tags it's been assigned.
100

T-Distributed Stochastic Neighbor Embedding Data Preprocessing Impact on Image Classification using Deep Convolutional Neural Networks

Droh, Erik January 2018 (has links)
Image classification in Machine Learning encompasses the task of identification of objects in an image. The technique has applications in various areas such as e-commerce, social media and security surveillance. In this report the author explores the impact of using t-Distributed Stochastic Neighbor Embedding (t-SNE) on data as a preprocessing step when classifying multiple classes of clothing with a state-of-the-art Deep Convolutional Neural Network (DCNN). The t-SNE algorithm uses dimensionality reduction and groups similar objects close to each other in three-dimensional space. Extracting this information in the form of a positional coordinate gives us a new parameter which could help with the classification process since the features it uses can be different from that of the DCNN. Therefore, three slightly different DCNN models receives different input and are compared. The first benchmark model only receives pixel values, the second and third receive pixel values together with the positional coordinates from the t-SNE preprocessing for each data point, but with different hyperparameter values in the preprocessing step. The Fashion-MNIST dataset used contains 10 different clothing classes which are normalized and gray-scaled for easeof-use. The dataset contains 70.000 images in total. Results show minimum change in classification accuracy in the case of using a low-density map with higher learning rate as the data size increases, while a more dense map and lower learning rate performs a significant increase in accuracy of 4.4% when using a small data set. This is evidence for the fact that the method can be used to boost results when data is limited. / Bildklassificering i maskinlärning innefattar uppgiften att identifiera objekt i en bild. Tekniken har applikationer inom olika områden så som e-handel, sociala medier och säkerhetsövervakning. I denna rapport undersöker författaren effekten av att användat-Distributed Stochastic Neighbour Embedding (t-SNE) på data som ett förbehandlingssteg vid klassificering av flera klasser av kläder med ett state-of-the-art Deep Convolutio-nal Neural Network (DCNN). t-SNE-algoritmen använder dimensioneringsreduktion och grupperar liknande objekt nära varandra i tredimensionellt utrymme. Att extrahera denna information i form av en positionskoordinat ger oss en ny parameter som kan hjälpa till med klassificeringsprocessen eftersom funktionerna som den använder kan skilja sig från DCNN-modelen. Tre olika DCNN-modeller får olika in-data och jämförs därefter. Den första referensmodellen mottar endast pixelvärden, det andra och det tredje motar pixelvärden tillsammans med positionskoordinaterna från t-SNE-förbehandlingen för varje datapunkt men med olika hyperparametervärden i förbehandlingssteget. I studien används Fashion-MNIST datasetet som innehåller 10 olika klädklasser som är normaliserade och gråskalade för enkel användning. Datasetet innehåller totalt 70.000 bilder. Resultaten visar minst förändring i klassificeringsnoggrannheten vid användning av en låg densitets karta med högre inlärningsgrad allt eftersom datastorleken ökar, medan en mer tät karta och lägre inlärningsgrad uppnår en signifikant ökad noggrannhet på 4.4% när man använder en liten datamängd. Detta är bevis på att metoden kan användas för att öka klassificeringsresultaten när datamängden är begränsad.

Page generated in 0.0529 seconds