Global ETD Search

31	Gestion de l’hétérogénéité d’un SI de classification documentaire multifacette et positionnement dans l’environnement des ECM. / Management of heterogeneity of a documentary multifaceted classification Information System and position in the ECM environment. Ankoud, Manel 19 December 2014 (has links) L’organisation des connaissances est une discipline investie par des bibliothécaires, documentalistes, archivistes spécialistes de l’information, informaticiens et tous professionnels de documents. Elle englobe toutes activités, études et recherches qui élaborent et traitent les processus d’organisation et de présentation des ressources documentaires utiles dans une organisation. Dans ce contexte, le projet ANR Miipa-Doc a pour objectifs d’explorer des nouvelles méthodes d’indexation ascendantes, en utilisant des termes descripteurs formulés par les individus plutôt que choisis parmi une liste préétablie, pour l’organisation des contenus documentaires complexes au sein des entreprises de large taille, et concevoir l’architecture logicielle correspondante.Dans ce projet notre contribution consiste à gérer l’hétérogénéité d’un système d’information d’organisation des contenus documentaires, basé sur une approche orientée métier et un SOC (système d’organisation des connaissances) folksonomique à facette. Nous proposons dans cette gestion une approche incrémentale dirigée par les modèles, issue de l’IDM (ingénierie dirigée par les modèles), basée sur des méta-modèles pour garantir l’aspect d’évolutivité. Après l’implémentation du prototype HyperTaging qui met en place ces deux approches, nous proposons un processus d’évaluation permet de positionner ce prototype et tous SI de classification documentaire dans l’environnement des ECM, en se basant sur des critères d’évaluation fins et particuliers. / The knowledge organization is invested by librarians, archivists, information specialists, IT professionals and all discipline of document. It includes all activities, studies and research which develop and treat organization process and presentation of relevant information resources in an organization. In this context the Miipa-Doc project aims to explore new ascendants indexing methods, using descriptors made by individuals rather than selected given list for complex contained in the organization document, in large size companies, and design the corresponding software architecture.Our contribution in this project is to manage the heterogeneity of an information system of document organization, based on a business-oriented approach and a KOS (knowledge organization system) of folksonomy facet. We propose an incremental approach this management model driven, outcome of MDE (Model Driven Engineering), based on meta-models to ensure scalability appearance. After implementing the HyperTaging prototype, that implements both approaches, we propose an evaluation process used to position the prototype and all IS of documentary classification in the environment of ECM based on purposes of delicate and particular evaluation criteria. Organisation des connaissances Knowledge organization 025.3
32	Maskininlärning för dokumentklassificering av finansielladokument med fokus på fakturor / Machine Learning for Document Classification of FinancialDocuments with Focus on Invoices Khalid Saeed, Nawar January 2022 (has links) Automatiserad dokumentklassificering är en process eller metod som syftar till att bearbeta ochhantera dokument i digitala former. Många företag strävar efter en textklassificeringsmetodiksom kan lösa olika problem. Ett av dessa problem är att klassificera och organisera ett stort antaldokument baserat på en uppsättning av fördefinierade kategorier.Detta examensarbete syftar till att hjälpa Medius, vilket är ett företag som arbetar med fakturaarbetsflöde, att klassificera dokumenten som behandlas i deras fakturaarbetsflöde till fakturoroch icke-fakturor. Detta har åstadkommits genom att implementera och utvärdera olika klassificeringsmetoder för maskininlärning med avseende på deras noggrannhet och effektivitet för attklassificera finansiella dokument, där endast fakturor är av intresse.I denna avhandling har två dokumentrepresentationsmetoder "Term Frequency Inverse DocumentFrequency (TF-IDF) och Doc2Vec" använts för att representera dokumenten som vektorer. Representationen syftar till att minska komplexiteten i dokumenten och göra de lättare att hantera.Dessutom har tre klassificeringsmetoder använts för att automatisera dokumentklassificeringsprocessen för fakturor. Dessa metoder var Logistic Regression, Multinomial Naïve Bayes och SupportVector Machine.Resultaten från denna avhandling visade att alla klassificeringsmetoder som använde TF-IDF, föratt representera dokumenten som vektorer, gav goda resultat i from av prestanda och noggranhet.Noggrannheten för alla tre klassificeringsmetoderna var över 90%, vilket var kravet för att dennastudie skulle anses vara lyckad. Dessutom verkade Logistic Regression att ha det lättare att klassificera dokumenten jämfört med andra metoder. Ett test på riktiga data "dokument" som flödarin i Medius fakturaarbetsflöde visade att Logistic Regression lyckades att korrekt klassificeranästan 96% av dokumenten.Avslutningsvis, fastställdes Logistic Regression tillsammans med TF-IDF som de övergripandeoch mest lämpliga metoderna att klara av problmet om dokumentklassficering. Dessvärre, kundeDoc2Vec inte ge ett bra resultat p.g.a. datamängden inte var anpassad och tillräcklig för attmetoden skulle fungera bra. / Automated document classification is an essential technique that aims to process and managedocuments in digital forms. Many companies strive for a text classification methodology thatcan solve a plethora of problems. One of these problems is classifying and organizing a massiveamount of documents based on a set of predefined categories.This thesis aims to help Medius, a company that works with invoice workflow, to classify theirdocuments into invoices and non-invoices. This has been accomplished by implementing andevaluating various machine learning classification methods in terms of their accuracy and efficiencyfor the task of financial document classification, where only invoices are of interest. Furthermore,the necessary pre-processing steps for achieving good performance are considered when evaluatingthe mentioned classification methods.In this study, two document representation methods "Term Frequency Inverse Document Frequency (TF-IDF) and Doc2Vec" were used to represent the documents as fixed-length vectors.The representation aims to reduce the complexity of the documents and make them easier tohandle. In addition, three classification methods have been used to automate the document classification process for invoices. These methods were Logistic Regression, Multinomial Naïve Bayesand Support Vector Machine.The results from this thesis indicate that all classification methods used TF-IDF, to represent thedocuments as vectors, give high performance and accuracy. The accuracy of all three classificationmethods is over 90%, which is the prerequisite for the success of this study. Moreover, LogisticRegression appears to cope with this task very easily, since it classifies the documents moreefficiently compared to the other methods. A test of real data flowing into Medius’ invoiceworkflow shows that Logistic Regression is able to correctly classify up to 96% of the data.In conclusion, the Logistic Regression together with TF-IDF is determined to be the overall mostappropriate method out of the other tested methods. In addition, Doc2Vec suffers to providea good result because the data set is not customized and sufficient for the method to workwell. Document classification Text classification Invoices NLP TF-IDF Doc2vec Machine Learning Logistic Regression Multinomial Naïve Bayes Support Vector Machine. Dokumentklassificering Textklassificering Fakturor NLP TF-IDF Doc2vec Maskininlärning Logistic Regression Multinomial Naïve Bayes Support Vector Machine. Computer Sciences Datavetenskap (datalogi)
33	Využití metod dolování dat pro analýzu sociálních sítí / Using of Data Mining Method for Analysis of Social Networks Novosad, Andrej January 2013 (has links) Thesis discusses data mining the social media. It gives an introduction about the topic of data mining and possible mining methods. Thesis also explores social media and social networks, what are they able to offer and what problems do they bring. Three different APIs of three social networking sites are examined with their opportunities they provide for data mining. Techniques of text mining and document classification are explored. An implementation of a web application that mines data from social site Twitter using the algorithm SVM is being described. Implemented application is classifying tweets based on their text where classes represent tweets' continents of origin. Several experiments executed both in RapidMiner software and in implemented web application are then proposed and their results examined.
34	Visualization of live search / Visualisering av realtidssök Nilsson, Olof January 2013 (has links) The classical search engine result page is used for many interactions with search results. While these are effective at communicating relevance, they do not present the context well. By giving the user an overview in the form of a spatialized display, in a domain that has a physical analog that the user is familiar with, context should become pre-attentive and obvious to the user. A prototype has been built that takes public medical information articles and assigns these to parts of the human body. The articles are indexed and made searchable. A visualization presents the coverage of a query on the human body and allows the user to interact with it to explore the results. Through usage cases the function and utility of the approach is shown. search technology search engine information retrieval live search query completion facet recall precision machine learning document classification linear classifier document categorization document clustering multi-label classification labelling hamming loss f-score visualization information visualization distance-similarity metaphor spatialized display visual information-seeking mantra user interface spatialization document processing Interaction Technologies Interaktionsteknik Computer Sciences Datavetenskap (datalogi) Information Systems Human Computer Interaction

Page generated in 0.154 seconds