Global ETD Search

21	ANFIS BASED MODELS FOR ACCESSING QUALITY OF WIKIPEDIA ARTICLES Ullah, Noor January 2010 (has links) Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Due to the free nature of Wikipedia and allowing open access to everyone to edit articles the quality of articles may be affected. As all people don’t have equal level of knowledge and also different people have different opinions about a topic so there may be difference between the contributions made by different authors. To overcome this situation it is very important to classify the articles so that the articles of good quality can be separated from the poor quality articles and should be removed from the database. The aim of this study is to classify the articles of Wikipedia into two classes class 0 (poor quality) and class 1(good quality) using the Adaptive Neuro Fuzzy Inference System (ANFIS) and data mining techniques. Two ANFIS are built using the Fuzzy Logic Toolbox [1] available in Matlab. The first ANFIS is based on the rules obtained from J48 classifier in WEKA while the other one was built by using the expert’s knowledge. The data used for this research work contains 226 article’s records taken from the German version of Wikipedia. The dataset consists of 19 inputs and one output. The data was preprocessed to remove any similar attributes. The input variables are related to the editors, contributors, length of articles and the lifecycle of articles. In the end analysis of different methods implemented in this research is made to analyze the performance of each classification method used. Fuzzy Inference System Transient contribution Persistent contribution membership functions ANFIS WEKA J48
22	Ανάπτυξη μεθόδων αυτόματης αναγνώρισης του φύλου χρηστών σε κείμενα του Παγκοσμίου ιστού Μαλαγκονιάρη, Διονυσία 15 December 2014 (has links) Είναι γεγονός ότι ολοένα και περισσότεροι άνθρωποι επιλέγουν καθημερινά να χρησιμοποιήσουν τον Παγκόσμιο Ιστό προκειμένου να εκτελέσουν ένα ευρύ φάσμα δραστηριοτήτων το οποίο προσφέρεται μέσα από αυτόν. Ο αριθμός των χρηστών του διαδικτύου αυξάνεται συνεχώς, καθώς επίσης και το σύνολο των ποικίλων δραστηριοτήτων που μπορούν να εκτελεστούν μέσω των ιστοσελίδων. Όμως, έχει παρατηρηθεί ότι τα τελευταία χρόνια πέρα από πηγή πληροφόρησης, ο Παγκόσμιος Ιστός αποτελεί και ένα σημαντικότατο μέσο έκφρασης για τους ανθρώπους αλλά και επικοινωνίας μεταξύ τους. Εκατομμύρια χρηστών του Παγκόσμιου Ιστού χρησιμοποιούν καθημερινά εφαρμογές του διαδικτύου μέσω των οποίων αλληλεπιδρούν. Κάθε ένας λοιπόν από αυτούς τους χρήστες μπορεί ελεύθερα να εκφράσει την άποψή του πάνω σε διάφορα ζητήματα που τον απασχολούν, να σχολιάσει της απόψεις των άλλων χρηστών αλλά και να επικοινωνήσει με αυτούς. Σύμφωνα με τα παραπάνω λοιπόν, οι χρήστες του διαδικτύου μπορούν να επιλέξουν ανάμεσα σε πολλά μέσα που είναι διαθέσιμα όπως ιστολόγια, φόρουμ, ιστότοπους και μέσα κοινωνικής δικτύωσης προκειμένου να επικοινωνούν. Αρκετό ερευνητικό ενδιαφέρον παρουσιάζει η συλλογή, ανάλυση και αξιολόγηση δεδομένων από τον Παγκόσμιο Ιστό που έχουν παραχθεί από χρήστες. Επίσης ιδιαίτερο ενδιαφέρον παρουσιάζει ο συσχετισμός του χρήστη με το κείμενο που έχει παράξει, και η αναγνώριση κάποιων κοινωνικών χαρακτηριστικών του, όπως για παράδειγμα, αν ο χρήστης του κειμένου χ είναι άνδρας ή γυναίκα. Μια τέτοια αναγνώριση είναι δυνατή με τον εντοπισμό αντιπροσωπευτικών γνωρισμάτων ανδρικής ή γυναικείας γραφής και λόγου σε κειμενικά δεδομένα χρηστών. Η μελέτη λοιπόν των χαρακτηριστικών του περιεχομένου το οποίο έχει παραχθεί από χρήστες είναι κομβικό σημείο σε μια σειρά ερευνητικών πεδίων. Χαρακτηριστικό παράδειγμα αποτελούν οι μελέτες στα πλαίσια του πεδίου της εξόρυξης πληροφορίας (text mining), οι οποίες βασίζονται στο περιεχόμενο των χρηστών για να αλιεύσουν τις απόψεις για ένα θέμα ή για ένα προϊόν. Ως αποτέλεσμα λοιπόν, της συνεχώς αυξανόμενης δραστηριοποίησης των χρηστών είναι η συνεχής αύξηση του όγκου δεδομένων που έχουν παραχθεί από χρήστες (User Generated Content - UGC) στις ιστοσελίδες του Παγκόσμιου Ιστού. Συγκεκριμένα, το πεδίο UGC ([1],[2]) ασχολείται με την αναγνώριση και εξόρυξη web περιεχομένου που έχει παραχθεί από τους χρήστες. Σε αυτό το σημείο είναι σημαντικό να αναφέρουμε ότι οι παραπάνω μελέτες που αναφέραμε είναι αρκετά χρήσιμες στην ανάπτυξη ερευνητικών αλλά και εμπορικών εφαρμογών. Στόχος της προτεινόμενης διπλωματικής εργασίας είναι η μελέτη δεδομένων κειμένου τα οποία θα αντλήσουμε από τον Παγκόσμιο Ιστό, εστιάζοντας στα διαφοροποιητικά χαρακτηριστικά που θα εντοπιστούν τα οποία στη συνέχεια θα βοηθήσουν στην υλοποίηση του συστήματος το οποίο θα μπορεί με όσο το δυνατόν μεγαλύτερο ποσοστό ακρίβειας να εντοπίζει το φύλο του χρήστη που έχει δημιουργήσει κείμενο στον Παγκόσμιο Ιστό. Μια τέτοια προσπάθεια είναι ιδιαίτερα ενδιαφέρουσα και σημαντική, καθώς καλείται να συμβάλλει στην ερευνητική δραστηριότητα σε αυτό το σχετικά νέο πεδίο ([3], [4]). Η αναγνώριση του φύλου ενός χρήστη, χωρίς σε καμία περίπτωση να θίγεται η ανωνυμία του και τα προσωπικά του δεδομένα, βασίζεται μόνο σε κειμενικά του δεδομένα μπορεί να αποτελέσει ένα ιδιαίτερο εργαλείο με πολλές εφαρμογές. Μια σημαντική εφαρμογή αυτού εργαλείου μπορεί να αποτελέσει και η στοχευμένη διαφήμιση. Στα πλαίσια της προτεινόμενης διπλωματικής θα ακολουθηθούν τα παρακάτω βήματα. Αρχικά θα μελετηθεί η απαραίτητη βιβλιογραφία, η οποία θα μας παρέχει την απαραίτητη θεωρητική γνώση των επιστημονικών πεδίων που αφορούν στην έρευνά μας, αλλά και τις υπάρχουσες μεθοδολογίες και τεχνικές. Στη συνέχεια θα προχωρήσουμε με τη συλλογή των δεδομένων που θα χρησιμοποιήσουμε κατά τη διάρκεια της εργασίας μας. Από τα κειμενικά δεδομένα που θα συλλέξουμε αλλά και τη βιβλιογραφική επισκόπηση θα προκύψει και η αναγνώριση και η εξαγωγή των χαρακτηριστικών που θα μας βοηθήσουν στην ανίχνευση του γυναικείου/ανδρικού λόγου σε ένα δοκιμαστικό σύνολο δεδομένων. Το επόμενο βήμα θα αποτελεί η ανάπτυξη μετρικών κατηγοριοποίησης κειμένων χρήστη ανάλογα με το φύλο του. Τελειώνοντας, θα αξιολογηθεί αυτή η προσπάθεια κατηγοριοποίησης, ώστε να υλοποιηθεί κατάλληλο σύστημα αναγνώρισης του φύλου χρηστών σε κείμενα του Παγκόσμιου Ιστού. / It is a fact that more and more people choose daily to use the World Wide Web to perform a wide range of activities which are offered through it. The number of internet users is increasing, as well as all the various activities that can be performed through the WebPages. However, it has been observed recently that apart from an information source, the Web is a very important expression tool for people and communication between them. Millions of web users use daily internet applications through which they interact. Each of these users can freely express his opinion on various issues that concern him, to comment on the opinions of other users and communicate with them. According to the above, Internet users can choose among many available means to communicate such as blogs , forums , websites and social media. The collection, analysis and evaluation of data from the Web produced by users, is interesting in terms of research. Also of particular interest is the correlation between the user and the text that has produced, and the recognition of some social features, such as if the user of the text x is a man or woman. Such recognition is possible by identifying representative features of male or female writing and speech in user’s text data. Therefore the study of content characteristics that have been produced by users is a key point in a number of research fields. An example are the studies within the field of data mining (text mining), which are based on the users content in order to fish their opinions on a topic or product. As a result, the growing user activity is constantly increasing the volume of data generated by users (User Generated Content - UGC) sites on the Web. Specifically, the scope UGC ([1], [2]) deals with the recognition and extraction of web content produced by users. At this point it is important to mention that the above studies are quite useful in developing research and commercial applications. The purpose of this project is to analyze the text data that we took from the Web, focusing on distinctive features that will be identified, which will later help to be implemented into the system resulting in the gender of the user who created the text on the Web. Such effort is particularly interesting and important, as contributes to research in this relatively new field ( [3 ] , [4 ] ) . Recognizing the gender of a user, without in any way affecting the anonymous and personal data, based only on text data can be a special tool with many applications. A major application of this tool can also be targeted advertising. This thesis will follow the steps below. Initially we will study the necessary literature, which will provide us with the necessary theoretical knowledge in scientific fields related to our research, but also the existing methodologies and techniques. Then we will proceed with the collection of data that will be used during our work. From the text data we collected and literature review, the recognition and extraction of features will occur that will help us to detect the female / male ratio in a test dataset. The next step is to develop categorizing text metrics according to user’s gender. Finally, we evaluate this categorization effort in order to implement an appropriate system that identifies the sex of users in texts of the Web. Εξόρυξη δεδομένων Παγκόσμιος ιστός Φύλο χρήστη Μηχανική μάθηση 006.312 Data mining World Wide Web UCG WEKA
23	Integrace data miningových nástrojů do prostředí MS Visual Studio Dvořan, Jan January 2014 (has links) This work contains the design solution and implementation of importing data mining tool Weka to the Visual Studio and Microsoft SQL Server 2012 by Managed Plug-in algorithm. In this thesis is describe, how is possible to create new Managed Plug-In algorithm and how is possible to import tool Weka into it. To use tool Weka in the new Managed Plugin Algorithm, is used the IKVM port. The IKVM can create from Weka tool new C# library, which can be used by Managed Plug-in Algorithm.
24	Převod vybraných algoritmů data-mining z jazyka Java do binární (.exe) formy Šrom, Jakub January 2015 (has links) There are many successful systems for data-mining (eg. WEKA, RapidMiner, etc.), which currently hold many algorithms implemented in Java, which allows their use under different operating systems. The disadvantage of the interpreted source code is a slowdown in the calculation and limited memory usage. The thesis is focused on the transfer of several selected implementations of algorithms in Java binaries (.exe) through the conversion of source code in C ++ under MS Windows 7 64-bit. The aim is to speed up calculations and improve management of memory usage. Binary form must give identical results as the original form. In addition to the actual transfer, the thesis also includes comparing time and memory requirements of the original (using the Java Runtime Environment, JRE) interpreted implementation in Java (JRE 64-bit) and x64 resulting binary forms, for selected test data.
25	Klassificering av svenska nyhetsartiklar med hjälp av Support Vector Machines Blomberg, Jossefin, Jansson Martén, Felicia January 2018 (has links) Uppsatsen syftar till att minska omfattningen av påverkanskampanjer genom maskininlärningsmodellen Support Vector Machine. Arbetet utgår från en litteraturstudie samt två experiment. Litteraturstudien syftar till att ge en referensram till textklassificering med Support Vector Machines. Det första experimentet innebar träning av en Support Vector Machine för att klassificera svenska nyhetsartiklar utefter pålitlighet. Det andra experimentet innefattade en jämförelse av tränad SVM-modell och andra standardmetoder inom textklassificering. Resultaten från experimenten tyder på att SVM är ett effektivt verktyg för klassificering av svenska nyhetsartiklar men även att det finns fler modeller som är lämpliga för samma uppgift. / The aim of this paper is to reduce the extent of impact campaigns through use of the machine learning algorithm Support Vector Machine. The process involved a literature study and two experiments. The aim of the literature study was to give a frame of reference to text classification with Support Vector Machines. The first experiment involved training a SVM to be able to classify news articles written in swedish based on the reliability of the article. The second experiment involved a comparison between the trained SVM-model and other standard methods in the field. The results from the experiment indicates that SVM is a effective tool for classification of news articles written in Swedish, but also that other standard methods are suitable for the same task. SVM support vector machines nyheter opålitliga nyheter maskininlärning machine learning WEKA Computer Engineering Datorteknik
26	Porovnání výpočetní složitosti vybraných algoritmů pro dolování znalosti z dat Matzke, Miroslav January 2018 (has links) Matzke, M. Comparison of Computational Complexity of Selected Data Mining Algorithms, Diploma Thesis. Brno, 2018. This diploma thesis deals with the comparison of the time complexity and the success of the classification of selected algorithms for mining knowledge from data with focus on neural networks and optimal settings for work execution. In the theoretical part, it is essential to get acquainted with the distribution of algorithms, their functionality and complexity. Then follows the selection of algorithms with focus on neural networks and their settings, especially hidden layers, momentum and learning rate. Another part deals with data used for experimental testing, which are both nominal and numerical data, and also real or generated. Also included is the accuracy of measurement and performance measurement of the two assemblies used to test individual experiments. The third part is the testing of the time complexity and the percentage success of the algorithms and the output especially in graphical form followed by analysis and recommendations from the results with focus on the optimal setting against the automatic and initial settings.
27	Ταξινόμηση καρκινικών όγκων εγκεφάλου με χρήση μεθόδων μηχανικής μάθησης Κανάς, Βασίλειος 29 August 2011 (has links) Σκοπός αυτής της διπλωματικής εργασίας είναι να ερευνηθούν μέθοδοι μηχανικής μάθησης για την ταξινόμηση διαφόρων τύπων καρκινικών όγκων εγκεφάλου με χρήση δεδομένων μαγνητικής τομογραφίας. Η διάγνωση του τύπου του καρκίνου είναι σημαντική για τον κατάλληλο σχεδιασμό της θεραπείας. Γενικά η ταξινόμηση καρκινικών όγκων αποτελείται από επιμέρους βήματα, όπως καθορισμός των περιοχών ενδιαφέροντος (ROIs), εξαγωγή χαρακτηριστικών, επιλογή χαρακτηριστικών, ταξινόμηση. Η εργασία αυτή εστιάζει στα δύο τελευταία βήματα ώστε να εξαχθεί μια γενική επισκόπηση της επίδρασης των εκάστοτε μεθόδων όσον αφορά την ταξινόμηση των διαφόρων όγκων. Τα εξαγόμενα χαρακτηριστικά περιλαμβάνουν χαρακτηριστικά φωτεινότητας και περιγράμματος από συμβατικές τεχνικές απεικόνισης μαγνητικής τομογραφίας (Τ2, Τ1 με έγχυση σκιαγραφικού, Flair,Τ1) καθώς και μη συμβατικές τεχνικές (Μαγνητική τομογραφία αιματικής διήθησης ). Για την επιλογή των χαρακτηριστικών χρησιμοποιήθηκαν διάφορες μέθοδοι φιλτραρίσματος, όπως CFSsubset, wrapper, consistency σε συνδυασμό με μεθόδους αναζήτησης, όπως scatter, best first, greedy stepwise, με τη βοήθεια του πακέτου Waikato Environment for Knowledge Analysis (WEKA). Οι μέθοδοι εφαρμόστηκαν σε 101 ασθενείς με καρκινικούς όγκους εγκεφάλου οι οποίοι είχαν διαγνωστεί ως μετάσταση (24), μηνιγγίωμα (4), γλοίωμα βαθμού 2 (22), γλοίωμα βαθμού 3 (17) ή γλοίωμα βαθμού 4 (34) και επαληθεύτηκαν με τη στρατηγική του αχρησιμοποίητου παραδείγματος (Leave One Out-LOO) / The objective of this study is to investigate the use of pattern classification methods for distinguishing different types of brain tumors, such as primary gliomas from metastases, and also for grading of gliomas. A computer-assisted classification method combining conventional magnetic resonance imaging (MRI) and perfusion MRI is developed and used for differential diagnosis. The characterization and accurate determination of brain tumor grade and type is very important because it influences and specifies patient's treatment planning. The proposed scheme consists of several steps including ROI definition, feature extraction, feature selection and classification. The extracted features include tumor shape and intensity characteristics. Features subset selection is performed using two filtering methods, correlation-based feature selection method and consistency method, and a wrapper approach in combination with three different search algorithms (best first, greedy stepwise and scatter). These methods are implemented using the assistance of the WEKA software [20]. The highest binary classification accuracy assessed by leave-one-out (LOO) cross-validation on 102 brain tumors, is 94.1% for discrimination of metastases from gliomas, and 91.3% for discrimination of high grade from low grade neoplasms. Multi-class classification is also performed and 76.29% accuracy achieved. Ταξινόμηση Λογισμικό WEKA Μαγνητική τομογραφία 006.31 Classification Brain tumor Tumor grade Feature selection Magnetic Resonanse Imaging (MRI)
28	A Machine Learning Approach to Dialogue Act Classification in Human-Robot Conversations : Evaluation of dialogue act classification with the robot Furhat and an analysis of the market for social robots used for education / Maskininlärning för klassificering av talhandlingar i människa-robot-konversationer Olofsson, Nina, Fakih, Nivin January 2015 (has links) The interest in social robots has grown dramatically in the last decade. Several studies have investigated the potential markets for such robots and how to enhance their human-like abilities. Both of these subjects have been investigated in this thesis using the company Furhat Robotics, and their robot Furhat, as a case study. This paper explores how machine learning could be used to classify dialogue acts in human-robot conversations, which could help Furhat interact in a more human-like way. Dialogue acts are acts of natural speech, such as questions or statements. Several variables and their impact on the classification of dialogue acts were tested. The results showed that a combination of some of these variables could classify 73 % of all the dialogue acts correctly. Furthermore, this paper analyzes the market for social robots which are used for education, where human-like abilities are preferable. A literature study and an interview were conducted. The market was then analyzed using a SWOT-matrix and Porter’s Five Forces. Although the study showed that the mentioned market could be a suitable target for Furhat Robotics, there are several threats and obstacles that should be taken into account before entering the market. / Intresset för sociala robotar har ökat drastiskt under det senaste årtiondet. Ett flertal studier har undersökt hur man kan förbättra robotars mänskliga färdigheter. Vidare har studier undersökt potentiella marknader för sådana robotar. Båda dessa aspekter har studerats i denna rapport med företaget Furhat Robotics, och deras robot Furhat, som en fallstudie. Mer specifikt undersöker denna rapport hur maskininlärning kan användas för att klassificera talhandlingar i människa-robot- konversationer, vilket skulle kunna hjälpa Furhat att interagera på ett mer mänskligt sätt. Talhandlingar är indelningar av naturligt språk i olika handlingar, såsom frågor och påståenden. Flertalet variabler och deras inverkan på klassificeringen av talhandlingar testades i studien. Resultatet visade att en kombination av några av dessa variabler kunde klassificera 73 % av alla talhandlingar korrekt. Vidare analyserar denna rapport marknaden för sociala robotar inom utbildning, där mänskliga färdigheter är att föredra. En litteraturstudie och en intervju gjordes. Marknaden analyserades sedan med hjälp av en SWOT-matris och Porters femkraftsmodell. Fastän studien visade att den ovannämnda marknaden skulle kunna vara lämplig för Furhat Robotics finns ett flertal hot och hinder som företaget måste ta hänsyn till innan de tar sig in på marknaden. dialogue act dialogue act classification furhat furhat robotics classification j48 machine learning weka robotics education robots swot market segmentation talhandlingar furhat furhat robotics klassificering j48 maskininlärning weka robotik utbildningsrobotar swot marknadssegmentering Computer Sciences Datavetenskap (datalogi)
29	Visual Analytics como ferramenta de auxílio ao processo de KDD : um estudo voltado ao pré-processamento Cini, Glauber 29 March 2017 (has links) Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2017-06-27T13:53:26Z No. of bitstreams: 1 Glauber Cini_.pdf: 2121004 bytes, checksum: c1f55ddc527cdaeb7ae3c224baea727a (MD5) / Made available in DSpace on 2017-06-27T13:53:26Z (GMT). No. of bitstreams: 1 Glauber Cini_.pdf: 2121004 bytes, checksum: c1f55ddc527cdaeb7ae3c224baea727a (MD5) Previous issue date: 2017-03-29 / Nenhuma / O Visual Analytics consiste na combinação de métodos inteligentes e automáticos com a capacidade de percepção visual do ser humano visando a extração do conhecimento de conjuntos de dados. Esta capacidade visual é apoiada por interfaces interativas como, sendo a de maior importância para este trabalho, a visualização por Coordenadas Paralelas. Todavia, ferramentas que disponham de ambos os métodos automáticos (KDD) e visuais (Coordenadas Paralelas) de forma genérica e integrada mostra-se primordial. Deste modo, este trabalho apresenta um modelo integrado entre o processo de KDD e o de Visualização de Informação utilizando as Coordenadas Paralelas com ênfase no make sense of data, ao ampliar a possibilidade de exploração dos dados ainda na etapa de pré-processamento. Para demonstrar o funcionamento deste modelo, um plugin foi desenvolvido sobre a ferramenta WEKA. Este módulo é responsável por ampliar as possibilidades de utilização da ferramenta escolhida ao expandir suas funcionalidades a ponto de conceitua-la como uma ferramenta Visual Analytics. Junto a visualização de Coordenadas Paralelas disponibilizada, também se viabiliza a interação por permutação das dimensões (eixos), interação por seleção de amostras (brushing) e possibilidade de detalhamento das mesmas na própria visualização. / Visual Analytics is the combination of intelligent and automatic methods with the ability of human visual perception aiming to extract knowledge from data sets. This visual capability is supported by interactive interfaces, considering the most important for this work, the Parallel Coordinates visualization. However, tools that have both automatic methods (KDD) and visual (Parallel Coordinates) in a generic and integrated way is inherent. Thus, this work presents an integrated model between the KDD process and the Information Visualization using the Parallel Coordinates with emphasis on the make sense of data, by increasing the possibility of data exploration in the preprocessing stage. To demonstrate the operation of this model, a plugin was developed on the WEKA tool. This module is responsible for expanding the possibilities of chosen tool by expanding its functionality to the point of conceptualizing it as a Visual Analytics tool. In addition to the delivered visualization of Parallel Coordinate, it is also possible to interact by permutation of the dimensions (axes), interaction by selection of samples (brushing) and possibility of detailing them in the visualization itself. Visual Analytics KDD Visualização de informação Coordenadas paralelas WEKA Information visualization Parallel coordinates
30	Classification Performance Between Machine Learning and Traditional Programming in Java Alassadi, Abdulrahman, Ivanauskas, Tadas January 2019 (has links) This study proposes a performance comparison between two Java applications with two different programming approaches, machine learning, and traditional programming. A case where both machine learning and traditional programming can be applied is a classification problem with numeric values. The data is heart disease dataset since heart disease is the leading cause of death in the USA. Performance analysis of both applications is carried to state the differences in four main points; the development time for each application, code complexity, and time complexity of the implemented algorithms, the classification accuracy results, and the resource consumption of each application. The machine learning Java application is built with the help of WEKA library and using its NaiveBayes class to build the model and evaluate its accuracy. While the traditional programming Java application is built with the help of a cardiologist as an expert in the field of the problem to identify the injury indications values. The findings of this study are that the traditional programming application scored better performance results in development time, code complexity, and resource consumption. It scored a classification accuracy of 80.2% while the Naive Bayes algorithms in the machine learning application scored an accuracy of 85.51% but on the expense of high resource consumption and execution time. Classification performance algorithms Java benchmarking machine learning naive bayes heart disease supervised learning WEKA Computer Sciences Datavetenskap (datalogi)

Search results