Global ETD Search

171	Integrating dictionaries in a column-oriented database. Vestberg, Melinda January 2023 (has links) In today's data-driven world, managing large volumes of data has become a common challenge. Data-driven businesses often face the task of effectively handling and analysing such extensive datasets when real-time analysis plays a crucial role to make informed decisions. Column-oriented databases have risen in popularity as a preferred storage and analytics solution. Elisa Polystar, for instance, uses ClickHouse, a column-oriented database to provide network and service assurance solutions in their Kalix product. One of the advantages of using column-oriented databases, including ClickHouse, is the availability of compression techniques. Dictionary is an in-memory key-value structure which can be stored completely or partially in RAM and can be used in queries. This thesis conducts a series of query-based experiments to evaluate the performance of Kalix when utilising dictionary. Results show that compared to the traditional left outer join, the dictionary version performed significantly better in five queries for both query duration and memory usage. At its best, the dictionary performs 26 times faster and consumes 1526 times less memory. Column-oriented database dictionary compression techniques encoding Computer Sciences Datavetenskap (datalogi)
172	Mathematical Modeling of Gas Transport Across Cell Membrane: Forward andInverse Problems Bocchinfuso, Alberto 26 May 2023 (has links) No description available. Applied Mathematics inverse problems mathematical modeling Xenopus laevis particle filter dictionary learning cell membrane permeability
173	Death of the Dictionary? – The Rise of Zero-Shot Sentiment Classification Borst, Janos, Burghardt, Manuel, Klähn, Jannis 04 July 2024 (has links) In our study, we conduct a comparative analysis between dictionary-based sentiment analysis and entailment zero-shot text classification for German sentiment analysis. We evaluate the performance of a selection of dictionaries on eleven data sets, including four domain-specific data sets with a focus on historic German language. Our results demonstrate that, in the majority of cases, zero-shot text classification outperforms general-purpose dictionary-based approaches but falls short of the performance achieved by specifically fine-tuned models. Notably, the zero-shot approach exhibits superior performance, particularly in historic German cases, surpassing both general-purpose dictionaries and even a broadly trained sentiment model. These findings indicate that zero-shot text classification holds significant promise as an alternative, reducing the necessity for domain-specific sentiment dictionaries and narrowing the availability gap of off-the-shelf methods for German sentiment analysis. Additionally, we thoroughly discuss the inherent trade-offs associated with the application of these approaches. info:eu-repo/classification/ddc/006 ddc:006
174	Méthodologie de conversion de dictionnaires spécialisés en dictionnaires d’apprentissage : application au domaine de l’informatique Alipour, Marjan 06 1900 (has links) This work describes a methodology for converting a specialized dictionary into a learner’s dictionary. The dictionary to which we apply our conversion method is the DiCoInfo, Dictionnaire fondamental de l’informatique et de l’Internet. We focus on changes affecting the presentation of data categories. What is meant by specialized dictionary for learners, in our case, is a dictionary covering the field of computer science and Internet meeting our users’ needs in communicative and cognitive situations. Our dictionary is aimed at learners’ of the computing language. We start by presenting a detailed description of four dictionaries for learners. We explain how the observations made on these resources have helped us in developing our methodology.In order to develop our methodology, first, based on Bergenholtz and Tarp’s works (Bergenholtz 2003; Tarp 2008; Fuertes Olivera and Tarp 2011), we defined the type of users who may use our dictionary. Translators are our first intended users. Other users working in the fields related to translation are also targeted: proofreaders, technical writers, interpreters. We also determined the use situations of our dictionary. It aims to assist the learners in solving text reception and text production problems (communicative situations) and in studying the terminology of computing (cognitive situations). Thus, we could establish its lexicographical functions: communicative and cognitive functions. Then, we extracted 50 articles from the DiCoInfo to which we applied a number of changes in different aspects: the layout, the presentation of data, the navigation and the use of multimedia. The changes were made according to two fundamental parameters: 1) simplification of the presentation; 2) lexicographic functions (which include the intended users and user’s situations). In this way, we exploited the widgets offered by the technology to update the interface and layout. Strategies have been developed to organize a large number of lexical links in a simpler way. We associated these links with examples showing their use in specific contexts. Multimedia as audio pronunciation and illustrations has been used. / Ce travail décrit une méthodologie pour convertir un dictionnaire spécialisé en dictionnaire d’apprentissage. Le dictionnaire pour lequel nous proposons la méthodologie est le DiCoInfo, le Dictionnaire fondamental de l’informatique et d’Internet. Nous nous concentrons sur les modifications affectant la présentation des données de notre dictionnaire. Ce qu’on entend par dictionnaire d’apprentissage, dans notre cas, c’est un dictionnaire traitant le domaine de l’informatique et d’Internet, répondant aux besoins de nos utilisateurs dans des situations de communication et de cognition. Notre dictionnaire est destiné aux apprenants de la langue de l’informatique. Nous commençons par présenter une description détaillée de quatre dictionnaires d’apprentissage. Nous expliquons comment les observations réalisées sur ces ressources nous ont servi dans le développement de notre méthodologie. Pour développer notre méthodologie, basés sur les travaux de Bergenholtz et Tarp (Bergenholtz 2003; Tarp 2008; Fuertes Olivera et Tarp 2011), nous avons, en premier, défini les types d’utilisateurs de notre dictionnaire. Les traducteurs comptent parmi notre premier public cible. Nous nous adressons également à ceux qui oeuvrent dans les domaines connexes de la traduction : réviseurs / correcteurs, rédacteurs techniques, interprètes. Nous avons également déterminé les situations d’utilisation de notre dictionnaire. Celui-ci vise à aider les apprenants à résoudre les problèmes de compréhension, de production et de traduction de textes (situations communicatives) et à étudier la terminologie informatique (situations cognitives). Ainsi nous avons établi les fonctions lexicographiques de notre dictionnaire : fonctions communicatives et cognitives. Par la suite, nous avons extrait 50 articles à partir du DiCoInfo auxquels nous avons apporté des modifications à plusieurs niveaux : la mise en page, la présentation des données, la navigation et l'utilisation des outils audio et visuels. Les changements ont été effectués en tenant compte de deux paramètres fondamentaux : 1) la simplification de la présentation; 2) les fonctions lexicographiques (lesquelles incluent les utilisateurs visés et les situations d’utilisation). De cette manière, nous avons exploité les outils que la technologie nous offrait pour actualiser l’interface et la mise en page. Des stratégies ont été élaborées en vue d'organiser une grande partie des liens lexicaux de façon plus simple. Nous avons associé ces liens à des exemples montrant leur usage dans des contextes particuliers. Les outils audio et visuels ont été utilisés, soit la prononciation audio et les illustrations. Chaque changement a été effectué en tenant compte de la fonction lexicographique à laquelle il correspond. dictionnaire d’apprentissage dictionnaire spécialisé lexicographie d’apprentissage fonctions lexicographiques situations communicatives situations cognitives typologie d’utilisateurs apprenant learners’ dictionary specialized dictionary learner lexicography communicative situations cognitive situations users’ typology learner
175	Information Theoretic Approach To Extractive Text Summarization Ravindra, G 08 1900 (has links) Automatic text summarization techniques, which can reduce a source text to a summary text by content generalization or selection have assumed signifi- cance in recent times due to the ever expanding information explosion created by the World Wide Web. Summaries generated by generalization of information are called abstracts and those generated by selection of portions of text (sentences, phrases etc.) are called extracts. Further, summaries could for each document separately or multiple documents could be summarized together to produce a single summary. The challenges in making machines generate extracts or abstracts are primarily due to the lack of understanding of human cognitive processes. Summary generated by humans seems to be influenced by their moral, emotional and ethical stance on the subject and their background knowledge of the content being summarized.These characteristics are hardly understood and difficult to model mathematically. Further automatic summarization is very much handicapped by limitations of existing computing resources and lack of good mathematical models of cognition. In view of these, the role of rigorous mathematical theory in summarization has been limited hitherto. The research reported in this thesis is a contribution towards bringing in the awesome power of well-established concepts information theory to the field of summarization. Contributions of the Thesis The specific focus of this thesis is on extractive summarization. Its domain spans multi-document summarization as well as single document summarization. In the whole thesis the word "summarization" and "summary", imply extract generation and sentence extracts respectively. In this thesis, two new and novel summarizers referred to as ESCI (Extractive Summarization using Collocation Information) and De-ESCI (Dictionary enhanced ESCI) have been proposed. In addition, an automatic summary evaluation technique called DeFuSE (Dictionary enhanced Fuzzy Summary Evaluator) has also been introduced.The mathematical basis for the evolution of the scoring scheme proposed in this thesis and its relationship with other well-known summarization algorithms such as latent Semantic Indexing (LSI) is also derived. The work detailed in this thesis is specific to the domain of extractive summarization of unstructured text without taking into account the data set characteristics such as the positional importance of sentences. This is to ensure that the summarizer works well for a broad class of documents, and to keep the proposed models as generic as possible. Central to the proposed work is the concept of "Collocation Information of a word", its quantification and application to summarization. "Collocation Information" (CI) is the amount of information (Shannon’s measure) that a word and its collocations together contribute to the total information in the document(s) being summarized.The CI of a word has been computed using Shannon’s measure for information using a joint probability distribution. Further, a base value of CI called "Discrimination Threshold" (DT) has also been derived. To determine DT, sentences from a large collection of documents covering various topics including the topic covered by the document(s) being summarized were broken down into sequences of word collocations.The number of possible neighbors for a word within a specified collocation window was determined. This number has been called the "cardinality of the collocating set" and is represented as \|ℵ (w)\|. It is proved that if \|ℵ (w)\| determined from this large document collection for any word w is fixed, then the maximum value of the CI for a word w is proportional to \|ℵ (w)\|. This constrained maximum is the "Discrimination Threshold" and is used as the base value of CI. Experimental evidence detailed in this thesis shows that sentences containing words with CI greater than DT are most likely to be useful in an extract. Words in every sentence of the document(s) being summarized have been assigned scores based on the difference between their current value of CI and their respective DT. Individual word scores have been summed to derive a score for every sentence. Sentences are ranked according to their scores and the first few sentences in the rank order have been selected as the extract summary. Redundant and semantically similar sentences have been excluded from the selection process using a simple similarity detection algorithm. This novel method for extraction has been called ESCI in this thesis. In the second part of the thesis, the advantages of tagging words as nouns, verbs, adjectives and adverbs without the use of sense disambiguation has been explored. A hierarchical model for abstraction of knowledge has been proposed, and those cases where such a model can improve summarization accuracy have been explained. Knowledge abstraction has been achieved by converting collocations into their hypernymous versions. In the second part of the thesis, the advantages of tagging words as nouns, verbs, adjectives and adverbs without the use of sense disambiguation has been explored. A hierarchical model for abstraction of knowledge has been proposed, and those cases where such a model can improve summarization accuracy have been explained. Knowledge abstraction has been achieved by converting collocations into their hypernymous versions. The number of levels of abstraction varies based on the sense tag given to each word in the collocation being abstracted. Once abstractions have been determined, Expectation- Maximization algorithm is used to determine the probability value of each collocation at every level of abstraction. A combination of abstracted collocations from various levels is then chosen and sentences are assigned scores based on collocation information of these abstractions.This summarization scheme has been referred to as De-ESCI (Dictionary enhanced ESCI). It had been observed in many human summary data sets that the factual attribute of the human determines the choice of noun and verb pairs. Similarly, the emotional attribute of the human determines the choice of the number of noun and adjective pairs. In order to bring these attributes into the machine generated summaries, two variants of DeESCI have been proposed. The summarizer with the factual attribute has been called as De-ESCI-F, while the summarizer with the emotional attribute has been called De-ESCI-E in this thesis. Both create summaries having two parts. First part of the summary created by De-ESCI-F is obtained by scoring and selecting only those sentences where a fixed number of nouns and verbs occur.The second part of De-ESCI-F is obtained by ranking and selecting those sentences which do not qualify for the selection process in the first part. Assigning sentence scores and selecting sentences for the second part of the summary is exactly like in ESCI. Similarly, the first part of De-ESCI-E is generated by scoring and selecting only those sentences where fixed number of nouns and adjectives occur. The second part of the summary produced by De-ESCI-E is exactly like the second part in De-ESCI-F. As the model summary generated by human summarizers may or may not contain sentences with preference given to qualifiers (adjectives), the automatic summarizer does not know apriori whether to choose sentences with qualifiers over those without qualifiers. As there are two versions of the summary produced by De-ESCI-F and De-ESCIE, one of them should be closer to the human summarizer’s point of view (in terms of giving importance to qualifiers). This technique of choosing the best candidate summary has been referred to as De-ESCI-F/E. Performance Metrics The focus of this thesis is to propose new models and sentence ranking techniques aimed at improving the accuracy of the extract in terms of sentences selected, rather than on the readability of the summary. As a result, the order of sentences in the summary is not given importance during evaluation. Automatic evaluation metrics have been used and the performance of the automatic summarizer has been evaluated in terms of precision, recall and f-scores obtained by comparing its output with model human generated extract summaries. A novel summary evaluator called DeFuSE has been proposed in this thesis, and its scores are used along with the scores given by a standard evaluator called ROUGE. DeFuSE evaluates an extract in terms of precision, recall and f-score relying on The use of WordNet hypernymy structure to identify semantically similar sentences in a document. It also uses fuzzy set theory to compute the extent to which a sentence from the machine generated extract belongs to the model summary. Performance of candidate summarizers has been discussed in terms of percentage improvement in fscore relative to the baselines. Average of ROUGE and DeFuSE f-score for every summary is computed, and the mean value of these scores is used to compare performance improvement. Performance For illustrative purposes, DUC 2002 and DUC 2003 multi-document data sets have been used. From these data sets only the 400 word summaries of DUC 2002 and track-4 (novelty track) summaries of DUC 2003 are useful for evaluation of sentence extracts and hence only these have been used. f-score has been chosen as a measure of performance. Standard baselines such as coverage, size and lead and also probabilistic baselines have been used to measure percentage improvement in f-score of candidate summarizers relative to these baselines. Further, summaries generated by MEAD using centroid and length as features for ranking (MEAD-CL), MEAD using positional, centroid and length as features for ranking (MEAD-CLP), Microsoft Word automatic summarizer (MS-Word) and Latent Semantic Indexing (LSI) based summarizer were used to compare the performance of the proposed summarization schemes. Abstracting Collocation Method Information Theory Text Summarization Discrimination Threshold Dictionary Enhanced ESCI Collocation Information Automatic Extractive Summarizer Computer Science
176	Description de pratiques d’enseignement visant à former les élèves à l’utilisation du dictionnaire électronique en classe de français au secondaire Singcaster, Mélissa 10 1900 (has links) Cette recherche vise à mieux comprendre comment certain·e·s enseignant·e·s de français du secondaire forment leurs élèves à utiliser le dictionnaire électronique en classe en décrivant, d’une part, les savoirs et les savoir-faire liés à son utilisation qui font l’objet d’un enseignement en classe et, d’autre part, les pratiques relatives à l’enseignement de ces savoirs et savoir-faire. Pour parvenir à nos objectifs, nous avons mené des entrevues avec huit enseignant·e·s, qui ont également noté dans un journal de bord, pendant un mois, les activités ou les interventions sollicitant l’utilisation du dictionnaire électronique qu’ils·elles ont réalisées en classe. À la lumière de notre analyse des données, nous avons tracé le portrait des pratiques d’enseignement de chaque enseignant·e, et une comparaison des similitudes et des différences entre les portraits nous a ensuite permis de relever trois profils distincts de pratiques dont le but est de former les élèves à l’utilisation du dictionnaire électronique : 1) l’enseignement spontané, axé sur quelques éléments liés à son utilisation, 2) l’enseignement planifié en début d’année et spontané ensuite, axé sur une plus grande variété d’éléments et, enfin, 3) l’enseignement planifié régulier, qui intègre lui aussi une grande variété d’éléments liés à l’utilisation du dictionnaire électronique, mais qui comprend également des savoirs qui relèvent spécifiquement de l’usage du format électronique. Il ressort de notre étude que l’intégration d’un outil comme le dictionnaire électronique dans les pratiques d’enseignement est un processus long et complexe, et que la richesse des dictionnaires mis à la disposition des enseignant·e·s n’est pas garante d’un enseignement plus riche ou plus varié. À ce titre, nous pensons qu’une formation portant sur l’utilisation du dictionnaire électronique pourrait être utile aux enseignant·e·s en exercice de même qu’aux futurs·e·s enseignant·e·s, car elle leur permettrait de se familiariser avec son utilisation à titre personnel d’abord, une étape essentielle à l’intégration du DÉ dans les pratiques d’enseignement, et à titre pédagogique ensuite. / This research aims to better understand how some French secondary school teachers train their students to use the electronic dictionary in class by determining the knowledge and skills related to its use that are the subject of classroom teaching and by describing the practices related to the teaching of these dictionary skills. To achieve our goals, we conducted interviews with eight teachers, who also noted in a diary, for a month, the activities or interventions requiring the use of the electronic dictionary that they carried out in class. In the light of our data analysis, we drew a portrait of the teaching practices of each teacher, and a comparison of the similarities and differences between the portraits then enabled us to identify three distinct profiles of practices whose goal is to train students to use the electronic dictionary: 1) spontaneous teaching, focusing on a few elements related to its use, 2) teaching planned at the start of the year and spontaneous thereafter, focusing on a greater variety of elements and, finally, 3) regular planned teaching, which also incorporates a wide variety of elements related to the use of the electronic dictionary, but which also includes knowledge that relates specifically to the use of the electronic format. Our study shows that the integration of a tool such as the electronic dictionary into teaching practices is a long and complex process, and that the wealth of dictionaries made available to teachers does not guarantee richer or better teaching. As such, we believe that training on the use of electronic dictionary could be useful for practicing teachers as well as future teachers, because it would allow them to become more familiar with its use in a personal capacity first, and then for educational purposes. dictionnaire électronique pratiques d’enseignement compétence dictionnairique didactique du français intégration des TIC en éducation electronic dictionary teaching practices dictionary skills French didactics ICT integration in education
177	Užití techniky lámání hesel u komprimačních formátů RAR, ZIP a 7z a extrakce hesel z samorozbalovacích archivů / Analysis of the Possibility of Password Break through for RAR, ZIP and 7z Formats Prustoměrský, Milan January 2013 (has links) This Thesis deals with analysis of the possiblity of password breakthrough for common compression formats and password extraction from self-extraction archives used for malicious software. Structure of compression programs, ciphers and connection between cipher and archives is described. Common and specialized attacks on archives and ciphers are described. Structure of self-extracting archives and password location is used to create extractor of passwords in self-extracting archives.
178	Use Case Driven Evaluation of Database Systems for ILDA Thapa, Shova 18 November 2022 (has links) No description available. Computer Science Computer Engineering Databases Full-text search MySQL MariaDB MongoDB MongoDB Atlas Atlas Search Elasticsearch Solr Elasticsearch vs. Solr Atlas Search vs. Elasticsearch evaluation use cases ILDA ILDA dictionary dictionary Myaamia Indigenous language search features comparison
179	Acquisitions d'IRM de diffusion à haute résolution spatiale : nouvelles perspectives grâce au débruitage spatialement adaptatif et angulaire St-Jean, Samuel January 2015 (has links) Le début des années 2000 a vu la cartographie du génome humain se réaliser après 13 ans de recherche. Le défi du prochain siècle réside dans la construction du connectome humain, qui consiste à cartographier les connexions du cerveau en utilisant l’imagerie par résonance magnétique (IRM) de diffusion. Cette technique permet en effet d’étudier la matière blanche du cerveau de façon complètement non invasive. Bien que le défi soit monumental, la résolution d’une image d’IRM se situe à l’échelle macroscopique et est environ 1000 fois inférieure à la taille des axones qu’il faut cartographier. Pour aider à pallier à ce problème, ce mémoire propose une nouvelle technique de débruitage spécialement conçue pour l’imagerie de diffusion. L’algorithme Non Local Spatial and Angular Matching (NLSAM) se base sur les principes du block matching et du dictionary learning pour exploiter la redondance des données d’IRM de diffusion. Un seuillage sur les voisins angulaire est aussi réalisé à l’aide du sparse coding, où l’erreur de reconstruction en norme l2 est bornée par la variance locale du bruit. L’algorithme est aussi conçu pour gérer le biais du bruit Ricien et Chi non centré puisque les images d’IRM contiennent du bruit non Gaussien. Ceci permet ainsi d’acquérir des données d’IRM de diffusion à une plus grande résolution spatiale que présentement disponible en milieu clinique. Ce travail ouvre donc la voie à un meilleur type d’acquisition, ce qui pourrait contribuer à révéler de nouveaux détails anatomiques non discernables à la résolution spatiale présentement utilisée par la communauté d’IRM de diffusion. Ceci pourrait aussi éventuellement contribuer à identifier de nouveaux biomarqueurs permettant de comprendre les maladies dégénératives telles que la sclérose en plaques, la maladie d’Alzheimer et la maladie de Parkinson. Imagerie par résonance magnétique IRM de diffusion Débruitage Block Matching Dictionary Learning Sparse Coding Non Local methods Non Local Spatial and Angular Matching
180	Von Champollion bis Erman Brose, Marc, Hensel, Josephine, Sperveslage, Gunnar 20 April 2016 (has links) (PDF) Das Projekt \"Altägyptische Wörterbücher im Verbund\" ist ein am Ägyptologischen Institut der Universität Leipzig angesiedeltes Teilvorhaben des Projekts „Wissensrohstoff Text“, an dem sich, aus ESF-Mitteln finanziert, sieben Leipziger geisteswissenschaftliche Institute und das Institut für Informatik beteiligen. Das Ägyptische weist eine mehr als 4000jährige Sprachgeschichte auf. Nach der Entzifferung der Hieroglyphen durch J.-F. Champollion (1822) widmete man sich im 19. und frühen 20. Jh. der Erfassung des Wortschatzes und der Ermittlung von Wortbedeutungen. Das Ende dieser Pionierphase markiert das Wörterbuch der ägyptischen Sprache von Erman/Grapow (Hauptbände 1926-1931), das noch heute ein Standardwerk darstellt. Diesem gehen aber bereits eine Vielzahl von Wörterbüchern, Wortlisten und Glossaren voran, die inzwischen weitgehend vergessen, aber wissenschaftsgeschichtlich von höchster Bedeutung sind. Denn aus ihnen lassen sich einerseits das schrittweise Verständnis der ägyptischen Sprache und die angewandten Methoden zu ihrer Erschließung ablesen und andererseits das Fundament unseres heutigen lexikographischen Wissens eruieren. Das Projekt schafft mittels eines Wörterbuchportals eine Infrastruktur, um das Vorkommen von Wörtern in altägyptischen Wörterbüchern und anderen lexikogra-phisch relevanten Publikationen mit den modernen Lemmaansetzungen der digitalen Wortliste des Thesaurus Linguae Aegyptiae (TLA) (http://aaew.bbaw.de/tla) zu ver¬knüpfen. So wird eine automatisierte Auswertung der Wörterbücher als Beitrag zur Geschichte der ägyptischen Lexikographie ermöglicht. Der TLA enthält neben einer Wortliste eine Textdatenbank, so dass über die Verknüpfung mit der Wortliste auch eine Verlinkung mit ägyptischen Volltexten und Textbelegen erfolgt. This article presents a short overview of the project „Altägyptische Wörterbücher im Verbund“ hosted at Leipzig University. Its aim is to establish a digital infrastructure for linking the lexical material of selected dictionaries of Ancient Egyptian of the 19th and early 20th century to a modern standard wordlist, the one of the Thesaurus Linguae Aegyptiae (TLA). Geschichte der Ägyptologie Lexikographie Wörterbuch Altägyptisch 19. Jahrhundert Wortliste History of Egyptology Lexicography Dictionary Ancient Egyptian 19th Century Wordlist ddc:930

Search results