Global ETD Search

331	Multilabel text classification of public procurements using deep learning intent detection / Textklassificering av offentliga upphandlingar med djupa artificiella neuronnät och avsåtsdetektering Suta, Adin January 2019 (has links) Textual data is one of the most widespread forms of data and the amount of such data available in the world increases at a rapid rate. Text can be understood as either a sequence of characters or words, where the latter approach is the most common. With the breakthroughs within the area of applied artificial intelligence in recent years, more and more tasks are aided by automatic processing of text in various applications. The models introduced in the following sections rely on deep-learning sequence-processing in order to process and text to produce a regression algorithm for classification of what the text input refers to. We investigate and compare the performance of several model architectures along with different hyperparameters. The data set was provided by e-Avrop, a Swedish company which hosts a web platform for posting and bidding of public procurements. It consists of titles and descriptions of Swedish public procurements posted on the website of e-Avrop, along with the respective category/categories of each text. When the texts are described by several categories (multi label case) we suggest a deep learning sequence-processing regression algorithm, where a set of deep learning classifiers are used. Each model uses one of the several labels in the multi label case, along with the text input to produce a set of text - label observation pairs. The goal becomes to investigate whether these classifiers can carry out different levels of intent, an intent which should theoretically be imposed by the different training data sets used by each of the individual deep learning classifiers. / Data i form av text är en av de mest utbredda formerna av data och mängden tillgänglig textdata runt om i världen ökar i snabb takt. Text kan tolkas som en följd av bokstäver eller ord, där tolkning av text i form av ordföljder är absolut vanligast. Genombrott inom artificiell intelligens under de senaste åren har medfört att fler och fler arbetsuppgifter med koppling till text assisteras av automatisk textbearbetning. Modellerna som introduceras i denna uppsats är baserade på djupa artificiella neuronnät med sekventiell bearbetning av textdata, som med hjälp av regression förutspår tillhörande ämnesområde för den inmatade texten. Flera modeller och tillhörande hyperparametrar utreds och jämförs enligt prestanda. Datamängden som använts är tillhandahållet av e-Avrop, ett svenskt företag som erbjuder en webbtjänst för offentliggörande och budgivning av offentliga upphandlingar. Datamängden består av titlar, beskrivningar samt tillhörande ämneskategorier för offentliga upphandlingar inom Sverige, tagna från e-Avrops webtjänst. När texterna är märkta med ett flertal kategorier, föreslås en algoritm baserad på ett djupt artificiellt neuronnät med sekventiell bearbetning, där en mängd klassificeringsmodeller används. Varje sådan modell använder en av de märkta kategorierna tillsammans med den tillhörande texten, som skapar en mängd av text - kategori par. Målet är att utreda huruvida dessa klassificerare kan uppvisa olika former av uppsåt som teoretiskt sett borde vara medfört från de olika datamängderna modellerna mottagit. Natural language processing text classification deep learning applied mathematics recurrent neural network word embedding Maskininlärning textklassificering artificiella neruonnät tillämpad matematik Probability Theory and Statistics Sannolikhetsteori och statistik
332	Constructing and representing a knowledge graph(KG) for Positive Energy Districts (PEDs) Davari, Mahtab January 2023 (has links) In recent years, knowledge graphs(KGs) have become essential tools for visualizing concepts and retrieving contextual information. However, constructing KGs for new and specialized domains like Positive Energy Districts (PEDs) presents unique challenges, particularly when dealing with unstructured texts and ambiguous concepts from academic articles. This study focuses on various strategies for constructing and inferring KGs, specifically incorporating entities related to PEDs, such as projects, technologies, organizations, and locations. We utilize visualization techniques and node embedding methods to explore the graph's structure and content and apply filtering techniques and t-SNE plots to extract subgraphs based on specific categories or keywords. One of the key contributions is using the longest path method, which allows us to uncover intricate relationships, interconnectedness between entities, critical paths, and hidden patterns within the graph, providing valuable insights into the most significant connections. Additionally, community detection techniques were employed to identify distinct communities within the graph, providing further understanding of the structural organization and clusters of interconnected nodes with shared themes. The paper also presents a detailed evaluation of a question-answering system based on the KG, where the Universal Sentence Encoder was used to convert text into dense vector representations and calculate cosine similarity to find similar sentences. We assess the system's performance through precision and recall analysis and conduct statistical comparisons of graph embeddings, with Node2Vec outperforming DeepWalk in capturing similarities and connections. For edge prediction, logistic regression, focusing on pairs of neighbours that lack a direct connection, was employed to effectively identify potential connections among nodes within the graph. Additionally, probabilistic edge predictions, threshold analysis, and the significance of individual nodes were discussed. Lastly, the advantages and limitations of using existing KGs(Wikidata and DBpedia) versus constructing new ones specifically for PEDs were investigated. It is evident that further research and data enrichment is necessary to address the scarcity of domain-specific information from existing sources. Knowledge graph Positive Energy Districts (PEDs) longest path Questions and Answers Community Detection Node Embedding t-SNE plots Edge Prediction Computer Sciences Datavetenskap (datalogi)
333	Feasibility, Efficiency, and Robustness of Secure Computation Hai H Nguyen (14206922) 02 December 2022 (has links) <p>Secure computation allows mutually distrusting parties to compute over private data. Such collaborations have widespread applications in social, scientific, commercial, and security domains. However, the overhead of achieving security is a major bottleneck to the adoption of such technologies. In this context, this thesis aims to design the most secure protocol within budgeted computational or network resources by mathematically formulating it as an optimization problem. </p> <p>With the rise in CPU power and cheap RAM, the offline-online model for secure computation has become the prominent model for real-world security systems. This thesis investigates the above-mentioned optimization problem in the information-theoretic offline-online model. In particular, this thesis presents the following selected sample of our research in greater detail. </p> <p>Round and Communication Complexity: Chor-Kushilevitz-Beaver characterized the round and communication complexity of secure two-party computation. Since then, the case of functions with randomized output remained unexplored. We proved the decidability of determining these complexities. Next, if such a protocol exists, we construct the optimal protocol; otherwise, we present an obstruction to achieving security. </p> <p>Rate and Capacity of secure computation: The efficiency of converting the offline samples into secure computation during the online phase is essential. However, investigating this ``production rate'' for general secure computations seems analytically intractable. Towards this objective, we introduce a new model of secure computation -- one without any communication -- that has several practical applications. We lay the mathematical foundations of formulating rate and capacity questions in this framework. Our research identifies the first tight rate and capacity results (a la Shannon) in secure computation. </p> <p>Reverse multiplication embedding: We identify a new problem in algebraic complexity theory that unifies several efficiency objectives in cryptography. Reverse multiplication embedding seeks to implement as many (base field) multiplications as possible using one extension field multiplication. We present optimal construction using algebraic function fields. This embedding has subsequently led to efficient improvement of secure computation, homomorphic encryption, proof systems, and leakage-resilient cryptography. </p> <p>Characterizing the robustness to side-channel attacks: Side-channel attacks present a significant threat to the offline phase. We introduce the cryptographic analog of common information to characterize the offline phase's robustness quantitatively. We build a framework for security and attack analysis. In the context of robust threshold cryptography, we present a state-of-the-art attack, threat assessment, and security fix for Shamir's secret-sharing. </p> <p><br></p> Cryptography Secure computation MPC offline-online paradigm reverse multiplication embedding round and communication complexity secure non-interactive simulation rate and capacity leakage resilience secret sharing scheme
334	Using machine learning to visualize and analyze attack graphs Cottineau, Antoine January 2021 (has links) In recent years, the security of many corporate networks have been compromised by hackers who managed to obtain important information by leveraging the vulnerabilities of those networks. Such attacks can have a strong economic impact and affect the image of the entity whose network has been attacked. Various tools are used by network security analysts to study and improve the security of networks. Attack graphs are among these tools. Attack graphs are graphs that show all the possible chains of exploits an attacker could follow to access an important host on a network. While attack graphs are useful for network security, they may become hard to read because of their size when networks become larger. Previous work tried to deal with this issue by applying simplification algorithms on graphs. Experience shows that even if these algorithms can help improve the visualization of attack graphs, we believe that improvements can be made, especially by relying on Machin Learning (ML) algorithms. Thus, the goal of this thesis is to investigate how ML can help improve the visualization of attack graphs and the security analysis of networks based on their attack graph. To reach this goal, we focus on two main areas. First we used graph clustering which is the process of creating a partition of the nodes based on their position in the graph. This improves visualization by allowing network analysts to focus on a set of related nodes instead of visualizing the whole graph. We also design several metrics for security analysis based on attack graphs. We show that the ML algorithms in both areas. The ML clustering algorithms even produce better clusters than non-ML algorithms with respect to the coverage metric, at the cost of computation time. Moreover, the ML security evaluation algorithms show faster computation times on dense attack graphs than the non-ML baseline, while producing similar results. Finally, a user interface that permits the application of the methods presented in the thesis is also developed, with the goal of making the use of such methods easier by network analysts. / Under de senaste åren har säkerheten för många företagsnätverk äventyrats av hackare som lyckats få fram viktig information genom att utnyttja sårbarheterna i dessa nätverk. Sådana attacker kan ha en stark ekonomisk inverkan och påverka bilden av den enhet vars nätverk har angripits. Olika verktyg användes av nätverkssäkerhetsanalytiker för att studera och förbättra säkerheten i nätverken. Attackgrafer ät bland dessa verktyg. Attackgrafer är diagram som visar alla möjliga kedjor av utnyttjande en angripare kan följa för att komma åt en viktig värd i ett nätverk. Även om attackgrafer är användbara för nätverkssäkerhet, kan de bli svåra att läsa på grund av deras storlek när nätverk blir större. Tidigare arbete försökte hantera detta problem genom att tillämpa förenklingsalgoritmer på grafer. Erfarenheten visar att även om dessa algoritmer kan hjälpa till att förbättra visualiseringen av attackgrafer tror vi att förbättringar kan göras, särskilt genom att förlita sig på Machine Learning (ML) algoritmer. Således är målet med denna avhandling att undersöka hur ML kan hjälpa till att förbättra visualiseringen av attackgrafer och säkerhetsanalys av nätverk baserat på deras attackgraf. För att nå detta mål fokuserar vi på två huvudområden. Först använder vi grafklustering som är processen för att skapa en partition av noderna baserat på deras position i grafen. Detta förbättrar visualiseringen genom att låta nätverksanalytiker fokusera på en uppsättning relaterade noder istället för att visualisera hela grafen. Vi utformar också flera mätvärden för säkerhetsanalys baserat på attackgrafer. Vi visar att ML-algoritmerna är lika effektiva som icke-LM-algoritmer inom båda områdena. Klusteringsalgoritmerna ML producerar till och med bättre kluster än icke-ML-algoritmer med avseende på täckningsvärdet, till kostnaden för beräkningstid. Dessutom visar ML säkerhetsutvärderingsalgoritmerna snabbare beräkningstider på täta attackgrafer än icke-ML baslinjen, samtidigt som de ger liknande resultat. Slutligen utvecklas också ett användargränssnitt som tillåter tillämpning av metoderna som presenteras i avhandlingen, med målet att göra användningen av sådana metoder enklare för nätverksanalytiker. Attack graph Graph visualization Graph clustering Network security Node embedding Attackgraf grafvisualisering grafklustering nätverkssäkerhet inbäddning av noder Computer and Information Sciences Data- och informationsvetenskap
335	Semantically Aligned Sentence-Level Embeddings for Agent Autonomy and Natural Language Understanding Fulda, Nancy Ellen 01 August 2019 (has links) Many applications of neural linguistic models rely on their use as pre-trained features for downstream tasks such as dialog modeling, machine translation, and question answering. This work presents an alternate paradigm: Rather than treating linguistic embeddings as input features, we treat them as common sense knowledge repositories that can be queried using simple mathematical operations within the embedding space, without the need for additional training. Because current state-of-the-art embedding models were not optimized for this purpose, this work presents a novel embedding model designed and trained specifically for the purpose of "reasoning in the linguistic domain".Our model jointly represents single words, multi-word phrases, and complex sentences in a unified embedding space. To facilitate common-sense reasoning beyond straightforward semantic associations, the embeddings produced by our model exhibit carefully curated properties including analogical coherence and polarity displacement. In other words, rather than training the model on a smorgaspord of tasks and hoping that the resulting embeddings will serve our purposes, we have instead crafted training tasks and placed constraints on the system that are explicitly designed to induce the properties we seek. The resulting embeddings perform competitively on the SemEval 2013 benchmark and outperform state-of- the-art models on two key semantic discernment tasks introduced in Chapter 8.The ultimate goal of this research is to empower agents to reason about low level behaviors in order to fulfill abstract natural language instructions in an autonomous fashion. An agent equipped with an embedding space of sucient caliber could potentially reason about new situations based on their similarity to past experience, facilitating knowledge transfer and one-shot learning. As our embedding model continues to improve, we hope to see these and other abilities become a reality. Nancy Fulda dissertation knowledge representation linguistic embeddings word2vec FastText universal sentence encoder BERT embeddings sentence embedding common-sense reasoning Computer Sciences Physical Sciences and Mathematics
336	New Mixed-Mode Chireix Outphasing Theory and Frequency-Agile Clockwise-Loaded Class-J Theory for High Efficiency Power Amplifiers Chang, Hsiu-Chen January 2020 (has links) No description available. Electrical Engineering
337	Natural Language Processing for Improving Search Query Results : Applied on The Swedish Armed Force's Profession Guide Harju Schnee, Andreas January 2023 (has links) Text has been the historical way of preserving and acquiring knowledge, and text data today is an increasingly growing part of the digital footprint together with the need to query this data for information. Seeking information is a constant ongoing process, and is a crucial part of many systems all around us. The ability to perform fast and effective searches is a must when dealing with vast amounts of data. This thesis implements an information retrieval system based on the Swedish Defence Force's profession guide, with the aim to produce a system that retrieves relevant professions based on user defined queries of varying size. A number of Natural Language Processing techniques are investigated and implemented, in order to transform the gathered profession descriptions a document embedding model, doc2vec, was implemented resulting in document vectors that are compared to find similarities between documents. The final system was evaluated by domain experts, represented by active military personal that quantified the relevancy of the profession retrievals into a measurable performance. The system managed to retrieve relevant information for 46.6% and 56.6% of the long- and short text inputs respectively. Resulting in a much more generalized and capable system compared to the search function available at the profession guide today. natural language processing NLP maskininlärning ML artificiell intelligens AI language model information retrieval system document embedding text representation text data augmentation Information Systems
338	On the veridicality of perfective clause-embedding verbs in Polish / A unified aspect-based analysis of incremental theme verbs with nominal and propositional complements Zuchewicz, Karolina 17 September 2020 (has links) Die vorliegende Arbeit beschäftigt sich mit der wahrheitsbasierten Bedeutung perfektiver satzeinbettender Prädikate im Polnischen, i.e. mit dem Zusammenhang zwischen Aspekt und Wahrheitsinferenz. Den Kern meiner Dissertation bilden sogenannte ‚reveal-type predicates‘ wie ‘beweisen’, ‘zeigen’ oder ‘offenbaren [dass]’. In Abhängigkeit von deren aspektueller Markierung bringen sie entweder eine maximale (bei perfektiven Verben) oder eine partielle Evidenz (bei imperfektiven Verben) für die Wahrheit einer eingebetteten Proposition mit sich. Nur wenn die Evidenz maximal ist, wird der dass-Satz notwendigerweise als wahr interpretiert. Ich habe gezeigt, dass maximale Evidenz einer totalen Affiziertheit eines nominalen inkrementellen Themas (wie z. B. in ‘einen Schrank bauen.pfv’) entspricht (Maximalität von Evidenz = Maximalität vom Schrank). Somit sind reveal-type predicates inkrementell. Außerdem habe ich eine Akzeptabilitätsstudie mit 51 polnischen MuttersprachlerInnen geplant und durchgeführt, die die Veridikalität des Perfektivs und die Neutralität des Imperfektivs bestätigt hat. Die Interpretation der Ergebnisse wurde um eine Korpusuntersuchung ergänzt. Basierend auf den theoretischen Beobachtungen und den Studienergebnissen habe ich eine einheitliche Analyse für inkrementelle Verben vorgeschlagen, die entweder ein nominales oder ein propositionales Objekt verlangen. Die von mir für das Polnische entdeckten Korrelationen gelten auch für andere slawische (Tschechisch, Russisch) und einige nicht-slawische Sprachen (austronesische Sprachen, Französisch, Ungarisch). / In my dissertation, I investigated a systematic interaction between the perfective aspect of a clause-embedding verb and a truth-oriented interpretation of embedded propositions in Polish. I demonstrated that the so-called reveal-type predicates (‘prove’, ‘reveal’, ‘show [that]’) are in complementary distribution with respect to triggering truth-related meaning of their sentential complements. Whereas perfective variants enforce embedded propositions to be true, imperfective counterparts are almost only compatible with false (or neutral) propositions. I further showed that clause-embedding reveal-type predicates exhibit an incremental structure and can therefore be treated by analogy to verbs that combine with nominal incremental themes. In the former case, we have a gradual creation of a proof, whereas in the latter case, we have a gradual creation of an object like ‘wardrobe’ (maximality of evidence = maximality of a wardrobe). I proposed a novel analysis of incremental theme verbs that combine with either nouns or clauses. According to my analysis, one possible realization of a partial-total affectedness of an incremental theme is a gradual creation of a proof for an embedded proposition. In order to obtain empirical evidence for the (non-)veridicality of (im)perfective reveal-type predicates in Polish, I conducted an acceptability judgement study with 51 Polish native speakers. I further conducted a corpus-based analysis of the frequency of investigated lexemes, which completed the interpretation of results. Apart from Polish, I provided evidence from other Slavic languages (Czech, Russian) and some non-Slavic languages (Austronesian languages, French, Hungarian). Aspekt Perfektivität Veridikalität satzeinbettende Prädikate Polnisch Aspect Perfectivity Veridicality Clause-Embedding Predicates Polish 400 Sprache KN 1835 KN 2115 ddc:400
339	Measuring Group Separability in Geometrical Space for Evaluation of Pattern Recognition and Dimension Reduction Algorithms Acevedo, Aldo, Duran, Claudio, Kuo, Ming-Ju, Ciucci, Sara, Schroeder, Michael, Cannistraci, Carlo Vittorio 22 January 2024 (has links) Evaluating group separability is fundamental to pattern recognition. A plethora of dimension reduction (DR) algorithms has been developed to reveal the emergence of geometrical patterns in a lowdimensional space, where high-dimensional sample similarities are approximated by geometrical distances. However, statistical measures to evaluate the group separability attained by DR representations are missing. Traditional cluster validity indices (CVIs) might be applied in this context, but they present multiple limitations because they are not specifically tailored for DR. Here, we introduce a new rationale called projection separability (PS), which provides a methodology expressly designed to assess the group separability of data samples in a DR geometrical space. Using this rationale, we implemented a new class of indices named projection separability indices (PSIs) based on four statistical measures: Mann-Whitney U-test p-value, Area Under the ROC-Curve, Area Under the Precision-Recall Curve, and Matthews Correlation Coeffcient. The PSIs were compared to six representative cluster validity indices and one geometrical separability index using seven nonlinear datasets and six different DR algorithms. The results provide evidence that the implemented statistical-based measures designed on the basis of the PS rationale are more accurate than the other indices and can be adopted not only for evaluating and comparing group separability of DR results but also for fine-tuning DR algorithms' hyperparameters. Finally, we introduce a second methodological innovation termed trustworthiness, a statistical evaluation that accounts for separability uncertainty and associates to the measure of each index a p-value that expresses the significance level in comparison to a null model. info:eu-repo/classification/ddc/004 ddc:004 info:eu-repo/classification/ddc/621.3 ddc:621.3
340	Experiments in speaker diarization using speaker vectors / Experiment med talarvektorer för diarisering Cui, Ming January 2021 (has links) Speaker Diarization is the task of determining ‘who spoke when?’ in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. It has emerged as an increasingly important and dedicated domain of speech research. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher-level inference on audio data. Our research focuses on the existing speaker diarization algorithms. Particularly, the thesis targets the differences between supervised and unsupervised methods. The aims of this thesis is to check the state-of-the-art algorithms and analyze which algorithm is most suitable for our application scenarios. Its main contributions are (1) an empirical study of speaker diarization algorithms; (2) appropriate corpus data pre-processing; (3) audio embedding network for creating d-vectors; (4) experiments on different algorithms and corpus and comparison of them; (5) a good recommendation for our requirements. The empirical study shows that, for embedding extraction module, due to the neural networks can be trained with big datasets, the diarization performance can be significantly improved by replacing i-vectors with d-vectors. Moreover, the differences between supervised methods and unsupervised methods are mostly in clustering module. The thesis only uses d-vectors as the input of diarization network and selects two main algorithms as compare objects: Spectral Clustering represents unsupervised method and Unbounded Interleaved-state Recurrent Neural Network (UIS-RNN) represents supervised method. / talardiarisering är uppgiften att bestämma ”vem talade när?” i en ljud- eller videoinspelning som innehåller en okänd mängd tal och även ett okänt antal talare. Det har framstått som en allt viktigare och dedikerad domän inom talforskning. Ursprungligen föreslogs det som ett forskningsämne relaterat till automatisk taligenkänning, där talardiarisering fungerar som ett processteg upströms. Under de senaste åren har dock talardiarisering blivit en viktig nyckelteknik för många uppgifter, till exempel navigering, hämtning, eller högre nivå slutledning på ljuddata. Vår forskning fokuserar på de befintliga algoritmerna för talare diarisering. Speciellt riktar sig avhandlingen på skillnaderna mellan övervakade och oövervakade metoder. Syftet med denna avhandling är att kontrollera de mest avancerade algoritmerna och analysera vilken algoritm som passar bäst för våra applikationsscenarier. Dess huvudsakliga bidrag är (1) en empirisk studie av algoritmer för talare diarisering; (2) lämplig förbehandling av corpusdata, (3) ljudinbäddningsnätverk för att skapa d-vektorer; (4) experiment på olika algoritmer och corpus och jämförelse av dem; (5) en bra rekommendation för våra krav. Den empiriska studien visar att för inbäddning av extraktionsmodul, på grund av de neurala nätverkna kan utbildas med stora datamängder, diariseringsprestandan kan förbättras avsevärt genom att ersätta i-vektorer med dvektorer. Dessutom är skillnaderna mellan övervakade metoder och oövervakade metoder mestadels i klustermodulen. Avhandlingen använder endast dvektorer som ingång till diariseringsnätverk och väljer två huvudalgoritmer som jämförobjekt: Spektralkluster representerar oövervakad metod och obegränsat återkommande neuralt nätverk (UIS-RNN) representerar övervakad metod. Speaker Diarization Embedding Extraction Module Deep Learning Supervised method Unsupervised method Talardiarisering inbäddning av extraktionsmodul djupinlärning övervakad metod oövervakad metod Computer and Information Sciences Data- och informationsvetenskap

Search results