Global ETD Search

91	Metody stemmingu používané při dolování textu / Stemming Methods Used in Text Mining Adámek, Tomáš January 2010 (has links) The main theme of this master's thesis is a description of text mining. This document is specialized to English texts and their automatic data preprocessing. The main part of this thesis analyses various stemming algorithms (Lovins, Porter and Paice/Husk). Stemming is a procedure for automatic conflating semantically related terms together via the use of rule sets. Next part of this thesis describes design of an application for various types of stemming algorithms. Application is based on the Java platform with using of graphic library Swing and MVC architecture. Next chapter contains description of implementation of the application and stemming algorithms. In the last part of this master's thesis experiments with stemming algorithms and comparing the algorithm from viewpoint to the results of classification the text are described.
92	Získávání znalostí z webových logů / Knowledge Discovery from Web Logs Vlk, Vladimír January 2013 (has links) This master's thesis deals with creating of an application, goal of which is to perform data preprocessing of web logs and finding association rules in them. The first part deals with the concept of Web mining. The second part is devoted to Web usage mining and notions related to it. The third part deals with design of the application. The forth section is devoted to describing the implementation of the application. The last section deals with experimentation with the application and results interpretation.
93	Obrazová analýza mitotických chromosomů / Digital image analysis of mitotic chromosomes Jaroš, Luboš January 2017 (has links) The development in modern medicine has allowed to study human genome and detect predispositions to several diseases. One of very promising techniques is the analysis of human karyotype, i.e., the number and appearance of chromosomes in the cell nucleus. The most important step in the karyotype analysis is the chromosome detection and categorization. In this work, a new algorithm for detection of chromosomes from an image of microscopic DNA sample and their categorization into seven groups was developed. The algorithm was implemented in Matlab. The accuracy of segmentation and classification was tested on a set of images from two databases with 117 and 38 images, respectively. The sensitivity of the developed segmentation reached 88% while the value of positive predictivity of segmentation reached 92%. The success rate of chromosome pairing achieves 77%.
94	Toward an on-line preprocessor for Swedish / Mot en on-line preprocessor för svenska Wemmert, Oscar January 2017 (has links) This bachelor thesis presents OPT (Open Parse Tool), a java program allowing for independent parsers/taggers to be run in sequence. For this thesis the existing java versions of Stagger and Maltparser has been adapted for use as modules in this program, and OPT's performance has then been compared to an existing, in use, alternative (Språkbanken's Korp Corpus Pipeline, henceforth KCP). Execution speed has been compared, and OPT's accuracy has been coarsly tested as either comparable or divergent to that of KCP. The same collection of documents containing natural text has been fed through OPT and KCP in sequence, and execution time was recorded. The tagged output of OPT and KCP was then run through SCREAM (Sjöholm, 2012) and if SCREAM produced comparable results between the two, the accuracy of OPT was considered as comparable to KCP. The results show that OPT completes its tagging and parsing of the documents in around 35 minutes, while KCP took over four hours to complete. SCREAM performed almost exactly the same using the outputs of either program, except for one case in which OPT's output gave better results than KCP's. The accuracy of OPT was thus considered comparable to KCP. The one divergent example can not fully be understood or explained in this thesis, given that the thesis considers SCREAM's internals as mostly that of a black box. Natural Language Preprocessing Part-of-Speech-Tagging Dependency Parsing Readability Human Computer Interaction
95	Preprocessing and analysis of environmental data : Application to the water quality assessment of Mexican rivers / Pré-traitement et analyse des données environnementales : application à l'évaluation de la qualité de l'eau des rivières mexicaines Serrano Balderas, Eva Carmina 31 January 2017 (has links) Les données acquises lors des surveillances environnementales peuvent être sujettes à différents types d'anomalies (i.e., données incomplètes, inconsistantes, inexactes ou aberrantes). Ces anomalies qui entachent la qualité des données environnementales peuvent avoir de graves conséquences lors de l'interprétation des résultats et l’évaluation des écosystèmes. Le choix des méthodes de prétraitement des données est alors crucial pour la validité des résultats d'analyses statistiques et il est assez mal défini. Pour étudier cette question, la thèse s'est concentrée sur l’acquisition des données et sur les protocoles de prétraitement des données afin de garantir la validité des résultats d'analyse des données, notamment dans le but de recommander la séquence de tâches de prétraitement la plus adaptée. Nous proposons de maîtriser l'intégralité du processus de production des données, de leur collecte sur le terrain et à leur analyse, et dans le cas de l'évaluation de la qualité de l'eau, il s’agit des étapes d'analyse chimique et hydrobiologique des échantillons produisant ainsi les données qui ont été par la suite analysées par un ensemble de méthodes statistiques et de fouille de données. En particulier, les contributions multidisciplinaires de la thèse sont : (1) en chimie de l'eau: une procédure méthodologique permettant de déterminer les quantités de pesticides organochlorés dans des échantillons d'eau collectés sur le terrain en utilisant les techniques SPE–GC-ECD (Solid Phase Extraction - Gas Chromatography - Electron Capture Detector) ; (2) en hydrobiologie : une procédure méthodologique pour évaluer la qualité de l’eau dans quatre rivières Mexicaines en utilisant des indicateurs biologiques basés sur des macroinvertébrés ; (3) en science des données : une méthode pour évaluer et guider le choix des procédures de prétraitement des données produites lors des deux précédentes étapes ainsi que leur analyse ; et enfin, (4) le développement d’un environnement analytique intégré sous la forme d’une application développée en R pour l’analyse statistique des données environnementales en général et l’analyse de la qualité de l’eau en particulier. Enfin, nous avons appliqué nos propositions sur le cas spécifique de l’évaluation de la qualité de l’eau des rivières Mexicaines Tula, Tamazula, Humaya et Culiacan dans le cadre de cette thèse qui a été menée en partie au Mexique et en France. / Data obtained from environmental surveys may be prone to have different anomalies (i.e., incomplete, inconsistent, inaccurate or outlying data). These anomalies affect the quality of environmental data and can have considerable consequences when assessing environmental ecosystems. Selection of data preprocessing procedures is crucial to validate the results of statistical analysis however, such selection is badly defined. To address this question, the thesis focused on data acquisition and data preprocessing protocols in order to ensure the validity of the results of data analysis mainly, to recommend the most suitable sequence of preprocessing tasks. We propose to control every step in the data production process, from their collection on the field to their analysis. In the case of water quality assessment, it comes to the steps of chemical and hydrobiological analysis of samples producing data that were subsequently analyzed by a set of statistical and data mining methods. The multidisciplinary contributions of the thesis are: (1) in environmental chemistry: a methodological procedure to determine the content of organochlorine pesticides in water samples using the SPE-GC-ECD (Solid Phase Extraction – Gas Chromatography – Electron Capture Detector) techniques; (2) in hydrobiology: a methodological procedure to assess the quality of water on four Mexican rivers using macroinvertebrates-based biological indices; (3) in data sciences: a method to assess and guide on the selection of preprocessing procedures for data produced from the two previous steps as well as their analysis; and (4) the development of a fully integrated analytics environment in R for statistical analysis of environmental data in general, and for water quality data analytics, in particular. Finally, within the context of this thesis that was developed between Mexico and France, we have applied our methodological approaches on the specific case of water quality assessment of the Mexican rivers Tula, Tamazula, Humaya and Culiacan. Données environnementales Analyse des données Pré-traitement des données Pollution de l'eau Évaluation de la qualité de l'eau Environmental data Data analysis Data preprocessing Water pollution Water quality assessment
96	Získávání znalostí na webu - shlukování / Web Mining - Clustering Rychnovský, Martin January 2008 (has links) This work presents the topic of data mining on the web. It is focused on clustering. The aim of this project was to study the field of clustering and to implement clustering through the k-means algorithm. Then, the algorithm was tested on a dataset of text documents and on data extracted from web. This clustering method was implemented by means of Java technologies.
97	Metody pro získávání asociačních pravidel z dat / Methods for Mining Association Rules from Data Uhlíř, Martin January 2007 (has links) The aim of this thesis is to implement Multipass-Apriori method for mining association rules from text data. After the introduction to the field of knowledge discovery, the specific aspects of text mining are mentioned. In the mining process, preprocessing is a very important problem, use of stemming and stop words dictionary is necessary in this case. Next part of thesis deals with meaning, usage and generating of association rules. The main part is focused on the description of Multipass-Apriori method, which was implemented. On the ground of executed tests the most optimal way of dividing partitions was set and also the best way of sorting the itemsets. As a part of testing, Multipass-Apriori method was compared with Apriori method.
98	Entwicklung und Implementierung einer Finite-Elemente-Software für mobile Endgeräte Goller, Daniel, Glenk, Christian, Rieg, Frank 30 June 2015 (has links) In dem Vortrag wird die Entwicklung einer Finiten-Elemente-App für Android dargelegt, sowie die Vorteile im Postprozessing von einfachen Strukturen bei der Verwendung der Gestensteuerung erörtert. info:eu-repo/classification/ddc/629 ddc:629 Software; Mobiles Endgerät
99	Pattern analysis of the user behaviour in a mobile application using unsupervised machine learning / Mönsteranalys av användarbeteenden i en mobilapp med hjälp av oövervakad maskininlärning Hrstic, Dusan Viktor January 2019 (has links) Continuously increasing amount of logged data increases the possibilities of finding new discoveries about the user interaction with the application for which the data is logged. Traces from the data may reveal some specific user behavioural patterns which can discover how to improve the development of the application by showing the ways in which the application is utilized. In this thesis, unsupervised machine learning techniques are used in order to group the users depending on their utilization of SEB Privat Android mobile application. The user interactions in the applications are first extracted, then various data preprocessing techniques are implemented to prepare the data for clustering and finally two clustering algorithms, namely, HDBSCAN and KMedoids are performed to cluster the data. Three types of user behaviour have been found from both K-medoids and HDBSCAN algorithm. There are users that tend to interact more with the application and navigate through its deeper layers, then the ones that consider only a quick check of their account balance or transaction, and finally regular users. Among the resulting features chosen with the help of feature selection methods, 73 % of them are related to user behaviour. The findings can be used by the developers to improve the user interface and overall functionalities of application. The user flow can thus be optimized according to the patterns in which the users tend to navigate through the application. / En ständigt växande datamängd ökar möjligheterna att hitta nya upptäckter om användningen av en mobil applikation för vilken data är loggad. Spår som visas i data kan avslöja vissa specifika användarbeteenden som kan förbättra applikationens utveckling genom att antyda hur applikationen används. I detta examensarbete används oövervakade maskininlärningstekniker för att gruppera användarna beroende på deras bruk av SEB Privat Android mobilapplikation. Användarinteraktionerna i applikationen extraheras ut först, sedan används olika databearbetningstekniker för att förbereda data för klustringen och slutligen utförs två klustringsalgoritmer, nämligen HDBSCAN och Kmedoids för att gruppera data. Tre distinkta typer av användarbeteende har hittats från både K-medoids och HDBSCAN-algoritmen. Det finns användare som har en tendens att interagera mer med applikationen och navigera genom sitt djupare lager, sedan finns det de som endast snabbt kollar på deras kontosaldo eller transaktioner och till slut finns det vanliga användare. Bland de resulterande attributen som hade valts med hjälp av teknikerna för val av attribut, är 73% av dem relaterade till användarbeteendet. Det som upptäcktes i denna avhandling kan användas för att utvecklarna ska kunna förbättra användargränssnittet och övergripande funktioner i applikationen. Användarflödet kan därmed optimeras med hänsyn till de sätt enligt vilka användarna har en speciell tendens att navigera genom applikationen. Clustering HDBSCAN K-medoids data preprocessing user behaviour mobile application Klustring HDBSCAN K-medoids databearbetning användarbeteende mobila applikationer Computer and Information Sciences Data- och informationsvetenskap
100	Förbehandling och Hantering av Användarmärkningar på E-handelsartiklar / Preprocessing and Treatment of User Tags on E-commerce Articles Johansson, Viktor January 2023 (has links) Plick is an online platform with the intention of being a marketplace where users may buy and sell second-hand fashion. The platform caters to younger users, and as such borrows many ideas from well-known social network platforms - such as putting more focus on user profiles and expression, rather than just the products themselves. One of these ideas is to allow users free reign over tagging their items, rather than having them select from some constrained, pre-approved, group of categories, styles, sizes - et cetera. A problem of letting users tag products however they see fit is that a subset of users will inevitably try to 'game' the system by knowingly tagging their products using incorrect labels - resulting in inaccurate search results for many of these incorrect tags.The aim of this project is to firstly develop a pre-processing algorithm to normalize the user generated tagging data - to handle situations such as a tag having multiple different (albeit possibly all correct) spellings, capitalizations, typos, languages etc. The processed data will then be used to develop two different approaches to solve the problem of incorrect tagging. The first approach involves using the normalized data to create a graph representation of the tags and their relations to each other. Each node in the graph will represent an individual tag, and each edge between nodes will explain how closely related those two tags are. An algorithm will then be developed to, utilizing the tag relation graph, describe the relatedness of an arbitrary group of tags. The algorithm should also be able to identify any tags that are outliers among the group. The second approach entails the development of a gaussian naive bayes classifier, with the goal of identifying whether an article is anomalistic or not - given the group of tags it's been assigned. e-commerce tags tagging preprocessing NLP graph theory AI Other Computer and Information Science Annan data- och informationsvetenskap

Search results