Global ETD Search

541	Model pro ekonomickou simulaci procesů (sledování nákladů na nízkou jakost) / Model for Economical Process Simulation (Cost of Poor Quality Monitoring) Janíková, Dita January 2011 (has links) This work is dealing with a study of process simulation and its use for the costs of poor quality monitoring. The process simulation is generally in many companies not too used and moreover, there is no software at the market, which would easily described economical efficiency of processes. The main purpose of this work is to suggest such a application. The study contains of the detailed data models for each function of the application, requirements specification and the feasibility study, which provide approximate costs for the development of such a application.
542	WebKnox: Web Knowledge Extraction Urbansky, David 26 January 2009 (has links) This thesis focuses on entity and fact extraction from the web. Different knowledge representations and techniques for information extraction are discussed before the design for a knowledge extraction system, called WebKnox, is introduced. The main contribution of this thesis is the trust ranking of extracted facts with a self-supervised learning loop and the extraction system with its composition of known and refined extraction algorithms. The used techniques show an improvement in precision and recall in most of the matters for entity and fact extractions compared to the chosen baseline approaches. info:eu-repo/classification/ddc/004 ddc:004
543	Annotating Job Titles in Job Ads using Swedish Language Models Ridhagen, Markus January 2023 (has links) This thesis investigates automated annotation approaches to assist public authorities in Sweden in optimizing resource allocation and gaining valuable insights to enhance the preservation of high-quality welfare. The study uses pre-trained Swedish language models for the named entity recognition (NER) task of finding job titles in job advertisements from The Swedish Public Employment Service, Arbetsförmedlingen. Specifically, it evaluates the performance of the Swedish Bidirectional Encoder Representations from Transformers (BERT), developed by the National Library of Sweden (KB), referred to as KB-BERT. The thesis explores the impact of training data size on the models’ performance and examines whether active learning can enhance efficiency and accuracy compared to random sampling. The findings reveal that even with a small training dataset of 220 job advertisements, KB-BERT achieves a commendable F1-score of 0.770 in predicting job titles. The model’s performance improves further by augmenting the training data with an additional 500 annotated job advertisements, yielding an F1-score of 0.834. Notably, the highest F1-score of 0.856 is achieved by applying the active learning strategy of uncertainty sampling and the measure of mean entropy. The test data provided by Arbetsförmedlingen was re-annotated to evaluate the complexity of the task. The human annotator achieved an F1-score of 0.883. Based on these findings, it can be inferred that KB-BERT performs satisfactorily in classifying job titles from job ads. Natural language processing NLP Named-entity recognition NER BERT Active Learning Probability Theory and Statistics Sannolikhetsteori och statistik
544	Prestanda jämförelse mellan webbaserade 3D-ramverk med olika arkitekturer / Performence comparison between webbased 3D frameworks using different architectures Lindström, Emil January 2023 (has links) Denna studie jämför två olika 3D-ramverk i form av A-Frame med Entity component system och Babylon.js utan Entity component system för att undersöka hur arkitekturen i ramverket påverkar renderingstiden. För att mäta renderingstiden mäts frames per second på ett utvecklat tower defense spel per ramverk med 3D-grafik i webbläsaren Google Chrome. Detta är i form av ett tekniskt experiment för att utesluta utomstående variabler. Det utvecklades tre nivåer av komplexitet inom spelen för att undersöka hur komplexiteten påverkar renderingshastigheten men efter mätningarna visade det sig att de olika nivåerna av komplexitet i artefakterna hade ingen signifikant påverkan på resultatet. Dock fanns det en signifikant skillnad mellan de olika ramverken där A-Frame med Entity component system fick sämre resultat. För vidare arbeten behöver antalet webbläsare i experimentet utökas samt en högre nivå av komplexitet testas för att få mer verklighetstrogna resultat. / <p>Stavningsvarierade titlar på svenska och engelska:</p><p>Prestandajämförelse mellan webbaserade 3D-ramverk med olika arkitekturer</p><p>Performance comparison between web-based 3D frameworks using different architectures</p> 3D-grafik A-Frame entity component system (ECS) Babylon.js frames per second (FPS) Information Systems, Social aspects
545	Exploring Construction of a Company Domain-Specific Knowledge Graph from Financial Texts Using Hybrid Information Extraction Jen, Chun-Heng January 2021 (has links) Companies do not exist in isolation. They are embedded in structural relationships with each other. Mapping a given company’s relationships with other companies in terms of competitors, subsidiaries, suppliers, and customers are key to understanding a company’s major risk factors and opportunities. Conventionally, obtaining and staying up to date with this key knowledge was achieved by reading financial news and reports by highly skilled manual labor like a financial analyst. However, with the development of Natural Language Processing (NLP) and graph databases, it is now possible to systematically extract and store structured information from unstructured data sources. The current go-to method to effectively extract information uses supervised machine learning models, which require a large amount of labeled training data. The data labeling process is usually time-consuming and hard to get in a domain-specific area. This project explores an approach to construct a company domain-specific Knowledge Graph (KG) that contains company-related entities and relationships from the U.S. Securities and Exchange Commission (SEC) 10-K filings by combining a pre-trained general NLP with rule-based patterns in Named Entity Recognition (NER) and Relation Extraction (RE). This approach eliminates the time-consuming data-labeling task in the statistical approach, and by evaluating ten 10-k filings, the model has the overall Recall of 53.6%, Precision of 75.7%, and the F1-score of 62.8%. The result shows it is possible to extract company information using the hybrid methods, which does not require a large amount of labeled training data. However, the project requires the time-consuming process of finding lexical patterns from sentences to extract company-related entities and relationships. / Företag existerar inte som isolerade organisationer. De är inbäddade i strukturella relationer med varandra. Att kartlägga ett visst företags relationer med andra företag när det gäller konkurrenter, dotterbolag, leverantörer och kunder är nyckeln till att förstå företagets huvudsakliga riskfaktorer och möjligheter. Det konventionella sättet att hålla sig uppdaterad med denna viktiga kunskap var genom att läsa ekonomiska nyheter och rapporter från högkvalificerad manuell arbetskraft som till exempel en finansanalytiker. Men med utvecklingen av ”Natural Language Processing” (NLP) och grafdatabaser är det nu möjligt att systematiskt extrahera och lagra strukturerad information från ostrukturerade datakällor. Den nuvarande metoden för att effektivt extrahera information använder övervakade maskininlärningsmodeller som kräver en stor mängd märkta träningsdata. Datamärkningsprocessen är vanligtvis tidskrävande och svår att få i ett domänspecifikt område. Detta projekt utforskar ett tillvägagångssätt för att konstruera en företagsdomänspecifikt ”Knowledge Graph” (KG) som innehåller företagsrelaterade enheter och relationer från SEC 10-K-arkivering genom att kombinera en i förväg tränad allmän NLP med regelbaserade mönster i ”Named Entity Recognition” (NER) och ”Relation Extraction” (RE). Detta tillvägagångssätt eliminerar den tidskrävande datamärkningsuppgiften i det statistiska tillvägagångssättet och genom att utvärdera tio SEC 10-K arkiv har modellen den totala återkallelsen på 53,6 %, precision på 75,7 % och F1-poängen på 62,8 %. Resultatet visar att det är möjligt att extrahera företagsinformation med hybridmetoderna, vilket inte kräver en stor mängd märkta träningsdata. Projektet kräver dock en tidskrävande process för att hitta lexikala mönster från meningar för att extrahera företagsrelaterade enheter och relationer. Natural Language Processing Information Extraction Named Entity Recognition Relation Extraction Knowledge Graph Naturlig språkbehandling Informationsextraktion Namngiven Entitetsigenkänning Relationsextraktion Kunskapsgraf Computer and Information Sciences Data- och informationsvetenskap
546	Implementation och utvärdering av fika-applikation : En Design Science Research studie / Implementation and evaluation of a Fika-application : A Design Science Research study Lindblad, Adam January 2023 (has links) På företaget AFRY i Karlstad har en medarbetare med fika varje fredag och turordningen för detta sker i dagsläget med en Excel-fil vilket är problematiskt då inte alla har tillgång till filen och kan då inte se när det är deras tur att ta med fika, vilket resulterar i att det glöms av att ta med fika. Författaren har fått i uppdrag av AFRY att ta fram en webbaserad applikation där personalen kan se schemat för vems tur det är att ta med fika. En kravspecifikation upprättades av uppdragsgivaren tillsammans med en lista på den tekniska uppsättningen för webbapplikationen som gäller. Studiens syfte är att undersöka vilka komplikationer som uppstår vid implementationen samt hur användarna mottar applikationen. För detta har två undersökningsfrågor upprättats som ligger till grund för studien: Vad resulterar utvärderingarna i och vilka förslag till förbättring mottogs? Vilka komplikationer uppstår vid implementationen och hur har dessa lösts? Design Science Research applicerades som forskningsstrategi för studien där momenten Specificera krav, Implementering och Utvärdering användes för att skapa artefakten som det studeras kring. En agil arbetsmetod låg till grund för att föra arbetet framåt där det utvecklades i tre sprintar. Artefakten utvärderades av testpersoner som var personal hos uppdragsgivaren och det genomfördes Black Box- och Manuella tester där testpersonerna fick uttrycka och reflektera kring implementationen som utförts. Data samlades in med hjälp av strukturerade intervjuer där författaren förde anteckningar på vad som uttrycktes. Utvärderingarna resulterade i att det som implementerades saknade funktionalitet som testpersonerna efterfrågade. Flertalet data samlades in som påvisar att de krav som ställdes skulle behövt specificerats ytterligare för att uppnå högre tillfredsställelse från testpersonerna. De komplikationer som påträffades under implementationen kunde antingen lösas genom att läsa dokumentationen för teknik som användes eller att testpersonerna, som innehar mer erfarenhet än författaren, kunde lösa dessa. Slutsatsen för studien är att krav som ställs måste specificeras mer detaljrikt för att artefakten skall bli så likt användarnas bild av den som möjligt. Design Science Research Azure Active Directory Entity Framework Core LINQ Aurelia Utveckling Webbapplikation Information Systems, Social aspects
547	[pt] ESTRATÉGIAS PARA ENTENDER A CONECTIVIDADE DE PARES DE ENTIDADES EM BASES DE CONHECIMENTO / [en] STRATEGIES TO UNDERSTAND THE CONNECTIVITY OF ENTITY PAIRS IN KNOWLEDGE BASES JAVIER GUILLOT JIMENEZ 04 November 2021 (has links) [pt] O problema do relacionamento de entidades refere-se à questão de explorar uma base de conhecimento, representada como um grafo RDF, para descobrir e entender como duas entidades estão conectadas. Esta questão pode ser resolvida implementando-se uma estratégia de busca de caminhos que combina uma medida de similaridade de entidades, um limite para o grau das entidades, e um limite de expansão para reduzir o espaço de busca de caminhos, e uma medida de ranqueamento de caminhos para ordenar os caminhos relevantes entre um determinado par de entidades no grafo RDF. Esta tese inicialmente apresenta um framework, chamado CoEPinKB, juntamente com uma implementação, para experimentar estratégias de busca de caminhos. O framework apresenta como pontos de flexibilização a medida de similaridade entre entidades, o limite máximo do grau das entidades, o limite de expansão, a medida de classificação de caminhos, e a base de conhecimento. Em seguida, a tese apresenta uma avaliação de desempenho de nove estratégias de busca de caminhos usando um benchmark envolvendo dois domínios de entretenimento sobre o OpenLink Virtuoso SPARQL protocol endpoint da DBpedia. Por fim, a tese apresenta o DCoEPinKB, uma versão distribuída do framework baseado em Apache Spark, que suporta a avaliação empírica de estratégias de busca de caminhos, e apresenta uma avaliação de seis estratégias de busca de caminhos em dois domínios de entretenimento sobre dados reais coletados da DBpedia. Os resultados fornecem intuições sobre o desempenho das estratégias de busca de caminhos e sugerem que a implementação do framework, instanciado com o par de medidas de melhor desempenho, pode ser usado, por exemplo, para expandir os resultados dos motores de busca em bases de conhecimento para incluir entidades relacionadas. / [en] The entity relatedness problem refers to the question of exploring a knowledge base, represented as an RDF graph, to discover and understand how two entities are connected. This question can be addressed by implementing a path search strategy that combines an entity similarity measure with an entity degree limit and an expansion limit to reduce the path search space and a path ranking measure to order the relevant paths between a given pair of entities in the RDF graph. This thesis first introduces a framework, called CoEPinKB, together with an implementation, to experiment with path search strategies. The framework features as hot spots the entity similarity measure, the entity degree limit, the expansion limit, the path ranking measure, and the knowledge base. The thesis moves on to present a performance evaluation of nine path search strategies using a benchmark from two entertainment domains over the OpenLink Virtuoso SPARQL protocol endpoint of the DBpedia. The thesis then introduces DCoEPinKB, a distributed version of the framework based on Apache Spark, that supports the empirical evaluation of path search strategies, and presents an evaluation of six path search strategies over two entertainment domains over real-data collected from DBpedia. The results provide insights about the performance of the path search strategies and suggest that the framework implementation, instantiated with the best performing pair of measures, can be used, for example, to expand the results of search engines over knowledge bases to include related entities. [pt] BASE DE CONHECIMENTO [pt] MEDIDA DE SIMILARIDADE [pt] RELACIONAMENTO DE ENTIDADES [en] KNOWLEDGE BASES [en] SIMILARITY MEASURE [en] ENTITY RELATEDNESS
548	On the discovery of relevant structures in dynamic and heterogeneous data Preti, Giulia 22 October 2019 (has links) We are witnessing an explosion of available data coming from a huge amount of sources and domains, which is leading to the creation of datasets larger and larger, as well as richer and richer. Understanding, processing, and extracting useful information from those datasets requires specialized algorithms that take into consideration both the dynamism and the heterogeneity of the data they contain. Although several pattern mining techniques have been proposed in the literature, most of them fall short in providing interesting structures when the data can be interpreted differently from user to user, when it can change from time to time, and when it has different representations. In this thesis, we propose novel approaches that go beyond the traditional pattern mining algorithms, and can effectively and efficiently discover relevant structures in dynamic and heterogeneous settings. In particular, we address the task of pattern mining in multi-weighted graphs, pattern mining in dynamic graphs, and pattern mining in heterogeneous temporal databases. In pattern mining in multi-weighted graphs, we consider the problem of mining patterns for a new category of graphs called emph{multi-weighted graphs}. In these graphs, nodes and edges can carry multiple weights that represent, for example, the preferences of different users or applications, and that are used to assess the relevance of the patterns. We introduce a novel family of scoring functions that assign a score to each pattern based on both the weights of its appearances and their number, and that respect the anti-monotone property, pivotal for efficient implementations. We then propose a centralized and a distributed algorithm that solve the problem both exactly and approximately. The approximate solution has better scalability in terms of the number of edge weighting functions, while achieving good accuracy in the results found. An extensive experimental study shows the advantages and disadvantages of our strategies, and proves their effectiveness. Then, in pattern mining in dynamic graphs, we focus on the particular task of discovering structures that are both well-connected and correlated over time, in graphs where nodes and edges can change over time. These structures represent edges that are topologically close and exhibit a similar behavior of appearance and disappearance in the snapshots of the graph. To this aim, we introduce two measures for computing the density of a subgraph whose edges change in time, and a measure to compute their correlation. The density measures are able to detect subgraphs that are silent in some periods of time but highly connected in the others, and thus they can detect events or anomalies happened in the network. The correlation measure can identify groups of edges that tend to co-appear together, as well as edges that are characterized by similar levels of activity. For both variants of density measure, we provide an effective solution that enumerates all the maximal subgraphs whose density and correlation exceed given minimum thresholds, but can also return a more compact subset of representative subgraphs that exhibit high levels of pairwise dissimilarity. Furthermore, we propose an approximate algorithm that scales well with the size of the network, while achieving a high accuracy. We evaluate our framework with an extensive set of experiments on both real and synthetic datasets, and compare its performance with the main competitor algorithm. The results confirm the correctness of the exact solution, the high accuracy of the approximate, and the superiority of our framework over the existing solutions. In addition, they demonstrate the scalability of the framework and its applicability to networks of different nature. Finally, we address the problem of entity resolution in heterogeneous temporal data-ba-se-s, which are datasets that contain records that give different descriptions of the status of real-world entities at different periods of time, and thus are characterized by different sets of attributes that can change over time. Detecting records that refer to the same entity in such scenario requires a record similarity measure that takes into account the temporal information and that is aware of the absence of a common fixed schema between the records. However, existing record matching approaches either ignore the dynamism in the attribute values of the records, or assume that all the records share the same set of attributes throughout time. In this thesis, we propose a novel time-aware schema-agnostic similarity measure for temporal records to find pairs of matching records, and integrate it into an exact and an approximate algorithm. The exact algorithm can find all the maximal groups of pairwise similar records in the database. The approximate algorithm, on the other hand, can achieve higher scalability with the size of the dataset and the number of attributes, by relying on a technique called meta-blocking. This algorithm can find a good-quality approximation of the actual groups of similar records, by adopting an effective and efficient clustering algorithm.
549	Data Fusion and Text Mining for Supporting Journalistic Work Zsombor, Vermes January 2022 (has links) During the past several decades, journalists have been struggling with the ever growing amount of data on the internet. Investigating the validity of the sources or finding similar articles for a story can consume a lot of time and effort. These issues are even amplified by the declining size of the staff of news agencies. The solution is to empower the remaining professional journalists with digital tools created by computer scientists. This thesis project is inspired by an idea to provide software support for journalistic work with interactive visual interfaces and artificial intelligence. More specifically, within the scope of this thesis project, we created a backend module that supports several text mining methods such as keyword extraction, named entity recognition, sentiment analysis, fake news classification and also data collection from various data sources to help professionals in the field of journalism. To implement our system, first we gathered the requirements from several researchers and practitioners in journalism, media studies, and computer science, then acquired knowledge by reviewing literature on current approaches. Results are evaluated both with quantitative methods such as individual component benchmarks and also with qualitative methods by analyzing the outcomes of the semi-structured interviews with collaborating and external domain experts. Our results show that there is similarity between the domain experts' perceived value and the performance of the components on the individual evaluations. This shows us that there is potential in this research area and future work would be welcomed by the journalistic community. text mining data fusion algorithmic journalism computational journalism keyword extraction named entity recognition sentiment analysis fake news classification Computer and Information Sciences Data- och informationsvetenskap
550	[en] AN END-TO-END MODEL FOR JOINT ENTITY AND RELATION EXTRACTION IN PORTUGUESE / [pt] MODELO END-TO-END PARA EXTRAÇÃO DE ENTIDADES E RELAÇÕES DE FORMA CONJUNTA EM PORTUGUÊS LUCAS AGUIAR PAVANELLI 24 October 2022 (has links) [pt] As técnicas de processamento de linguagem natural (NLP) estão se tornando populares recentemente. A gama de aplicativos que se beneficiam de NLP é extensa, desde criar sistemas de tradução automática até ajudar no marketing de um produto. Dentro de NLP, o campo de Extração de Informações (IE) é difundido; concentra-se no processamento de textos para recuperar informações específicas sobre uma determinada entidade ou conceito. Ainda assim, a comunidade de pesquisa se concentra principalmente na construção de modelos para dados na língua inglesa. Esta tese aborda três tarefas no domínio do IE: Reconhecimento de Entidade Nomeada, Extração de Relações Semânticas e Extração Conjunta de Entidade e Relação. Primeiro, criamos um novo conjunto de dados em português no domínio biomédico, descrevemos o processo de anotação e medimos suas propriedades. Além disso, desenvolvemos um novo modelo para a tarefa de Extração Conjunta de Entidade e Relação, verificando que o mesmo é competitivo em comparação com outros modelos. Finalmente, avaliamos cuidadosamente os modelos propostos em textos de idiomas diferentes do inglês e confirmamos a dominância de modelos baseados em redes neurais. / [en] Natural language processing (NLP) techniques are becoming popular recently. The range of applications that benefit from NLP is extensive, from building machine translation systems to helping market a product. Within NLP, the Information Extraction (IE) field is widespread; it focuses on processing texts to retrieve specific information about a particular entity or concept. Still, the research community mainly focuses on building models for English data. This thesis addresses three tasks in the IE domain: Named Entity Recognition, Relation Extraction, and Joint Entity and Relation Extraction. First, we created a novel Portuguese dataset in the biomedical domain, described the annotation process, and measured its properties. Also, we developed a novel model for the Joint Entity and Relation Extraction task, verifying that it is competitive compared to other models. Finally, we carefully evaluated proposed models on non-English language datasets and confirmed the dominance of neural-based models. [pt] PROCESSAMENTO DE LINGUAGEM NATURAL [pt] EXTRACAO DE RELACOES SEMANTICAS [pt] APRENDIZAGEM PROFUNDA [en] NATURAL LANGUAGE PROCESSING [en] RELATION EXTRACTION [en] NAMED ENTITY RECOGNITION [en] DEEP LEARNING

Search results