Spelling suggestions: "subject:"databehandling"" "subject:"databehandlings""
71 |
A Random Indexing Approach to Unsupervised Selectional Preference InductionHägglöf, Hillevi, Tengstrand, Lisa January 2011 (has links)
A selectional preference is the relation between a head-word and plausible arguments of that head-word. Estimation of the association feature between these words is important to natural language processing applications such as Word Sense Disambiguation. This study presents a novel approach to selectional preference induction within a Random Indexing word space. This is a spatial representation of meaning where distributional patterns enable estimation of the similarity between words. Using only frequency statistics about words to estimate how strongly one word selects another, the aim of this study is to develop a flexible method that is not language dependent and does not require any annotated resourceswhich is in contrast to methods from previous research. In order to optimize the performance of the selectional preference model, experiments including parameter tuning and variation of corpus size were conducted. The selectional preference model was evaluated in a pseudo-word evaluation which lets the selectional preference model decide which of two arguments have a stronger correlation to a given verb. Results show that varying parameters and corpus size does not affect the performance of the selectional preference model in a notable way. The conclusion of the study is that the language modelused does not provide the adequate tools to model selectional preferences. This might be due to a noisy representation of head-words and their arguments.
|
72 |
Compound Processing for Phrase-Based Statistical Machine TranslationStymne, Sara January 2009 (has links)
In this thesis I explore how compound processing can be used to improve phrase-based statistical machine translation (PBSMT) between English and German/Swedish. Both German and Swedish generally use closed compounds, which are written as one word without spaces or other indicators of word boundaries. Compounding is both common and productive, which makes it problematic for PBSMT, mainly due to sparse data problems. The adopted strategy for compound processing is to split compounds into their component parts before training and translation. For translation into Swedish and German the parts are merged after translation. I investigate the effect of different splitting algorithms for translation between English and German, and of different merging algorithms for German. I also apply these methods to a different language pair, English--Swedish. Overall the studies show that compound processing is useful, especially for translation from English into German or Swedish. But there are improvements for translation into English as well, such as a reduction of unknown words. I show that for translation between English and German different splitting algorithms work best for different translation directions. I also design and evaluate a novel merging algorithm based on part-of-speech matching, which outperforms previous methods for compound merging, showing the need for information that is carried through the translation process, rather than only external knowledge sources such as word lists. Most of the methods for compound processing were originally developed for German. I show that these methods can be applied to Swedish as well, with similar results.
|
73 |
Applying Memoization as an Approximate Computing Method for Transiently Powered Systems / Tillämpa Memoization i en Ungefärlig Beräkningsmetod för Transientdrivna SystemPerju, Dragos-Stefan January 2019 (has links)
Internet of Things (IoT) is becoming a more and more prevailing technology, as it not only makes the routine of our life easier, but it also helps industry and enteprise become more efficient. The high potential of IoT can also help support our own population on Earth, through precision agriculture, smart transportation, smart city and so on. It is therefore important that IoT is made scalable in a sustainable manner, in order to secure our own future as well.The current work is concerning transiently powered systems (TPS), which are embedded systems that use energy harvesting as their only power source. In their basic form, TPS suffer frequent reboots due to unreliable availability of energy from the environment. Initially, the throughput of such systems are therefore lower than their battery-enabled counterparts. To improve this, TPS involve checkpointing of RAM and processor state to non-volatile memory, as to keep computation progress saved throughout power loss intervals.The aim of this project is to lower the number of checkpoints necessary during an application run on a TPS in the generic case, by using approximate computing. The energy need of TPS is lowered using approximations, meaning more results are coming through when the system is working between power loss periods. For this study, the memoization technique is implemented in the form of a hash table. The Kalman filter is taken as the testing application of choice, to run on the Microchip SAM-L11 embedded platform.The memoization technique manages to yield an improvement for the Kalman application considered, versus the initial baseline version of the program. A user is allowed to ”balance” between more energy savings but more inaccurate results or the opposite, by adjusting a ”quality knob” variable epsilon ϵ.For example, for an epsilon ϵ = 0.7, the improvement is of 32% fewer checkpoints needed than for the baseline version, with the output deviating by 42% on average and 71% at its maximum point.The proof of concept has been made, being that approximate computing can indeed improve the throughput of TPS and make them more feasiable. It is pointed out however that only one single application type was tested, with a certain input trace. The hash table method implemented can behave differently depending on what application and/or data it is working with. It is therefore suggested that a pre-analysis of the specific dataset and application can be done at design time, in order to check feasiability of applying approximations for the certain case considered. / Internet of Things (IoT) håller på att bli en mer och mer utbredd teknik, eftersom det inte bara underlättar rutiner i vårt liv, utan det hjälper också industrin och företag att bli effektivare. Den höga potentialen med IoT kan också hjälpa till att ge stöd åt vår egen befolkning på jorden, genom precisionslantbruk, smart transport, smarta städer och mer. Det är därför viktigt att IoT görs skalbart på ett hållbart sätt för att säkra vår egen framtid.Det nuvarande arbetet handlar om transientdrivna system (TPS), vilket är inbäddade system som använder energiskörning som sin enda kraftkälla. I sin grundform har TPS ofta återställningar på grund av opålitlig tillgång till energi från miljön. Ursprungligen är därför sådana systems genomströmning lägre än deras batteriaktiverade motsvarigheter. För att förbättra detta använder TPS kontrollpunkter i RAM och processortillstånd till icke-flyktigt minne, för att hålla beräkningsförloppet sparat under strömförlustintervaller.Syftet med detta projekt är att sänka antalet kontrollpunkter som krävs under en applikationskörning på en TPS i ett generiskt fall, genom att använda ungefärlig datorberäkning. Energibehovet för TPS sänks med ungefärliga belopp, vilket innebär att fler resultat kommer när systemet arbetar mellan strömförlustperioder. För denna studie implementeras memoiseringstekniken i form av en hashtabell. Kalman-filtret tas som testapplikation för att köra på Microchip SAM-L11 inbäddad plattform.Memoization-tekniken lyckas ge en förbättring för Kalman-applikationen som beaktades, jämfört med den ursprungliga baslinjeversionen av programmet. En användare får ”balansera” mellan mer energibesparingar men mer felaktiga resultat eller motsatsen genom att justera en ”kvalitetsrat”-variabel epsilon ϵ. Till exempel, för en epsilon ϵ = 0.7, är förbättringen 32% färre kontrollpunkter som behövs än för baslinjeversionen, med en utdata avvikelse med 42% i genomsnitt och 71% vid sin högsta punkt.Beviset på konceptet har gjorts, att ungefärlig databeräkning verkligen kan förbättra genomströmning av TPS och göra dem mer genomförbara. Det påpekas dock att endast en enda applikationstyp testades, med ett visst inmatningsspår. Den implementerade hashtabellmetoden kan bete sig annorlunda beroende på vilken applikation och/eller data den arbetar med. Det föreslås därför att en föranalys av det specifika datasättet och applikationen kan göras vid designtidpunkten för att kontrollera genomförbarheten av att tillämpa ungefärliga belopp för det aktuella fallet.
|
74 |
A Study on Text Classification Methods and Text FeaturesDanielsson, Benjamin January 2019 (has links)
When it comes to the task of classification the data used for training is the most crucial part. It follows that how this data is processed and presented for the classifier plays an equally important role. This thesis attempts to investigate the performance of multiple classifiers depending on the features that are used, the type of classes to classify and the optimization of said classifiers. The classifiers of interest are support-vector machines (SMO) and multilayer perceptron (MLP), the features tested are word vector spaces and text complexity measures, along with principal component analysis on the complexity measures. The features are created based on the Stockholm-Umeå-Corpus (SUC) and DigInclude, a dataset containing standard and easy-to-read sentences. For the SUC dataset the classifiers attempted to classify texts into nine different text categories, while for the DigInclude dataset the sentences were classified into either standard or simplified classes. The classification tasks on the DigInclude dataset showed poor performance in all trials. The SUC dataset showed best performance when using SMO in combination with word vector spaces. Comparing the SMO classifier on the text complexity measures when using or not using PCA showed that the performance was largely unchanged between the two, although not using PCA had slightly better performance
|
75 |
Word embeddings and Patient records : The identification of MRI risk patientsKindberg, Erik January 2019 (has links)
Identification of risks ahead of MRI examinations is identified as a cumbersome and time-consuming process at the Linköping University Hospital radiology clinic. The hospital staff often have to search through large amounts of unstructured patient data to find information about implants. Word embeddings has been identified as a possible tool to speed up this process. The purpose of this thesis is to evaluate this method, and that is done by training a Word2Vec model on patient journal data and analyzing the close neighbours of key search words by calculating cosine similarity. The 50 closest neighbours of each search words are categorized and annotated as relevant to the task of identifying risk patients ahead of MRI examinations or not. 10 search words were explored, leading to a total of 500 terms being annotated. In total, 14 different categories were observed in the result and out of these 8 were considered relevant. Out of the 500 terms, 340 (68%) were considered relevant. In addition, 48 implant models could be observed which are particularly interesting because if a patient have an implant, hospital staff needs to determine it’s exact model and the MRI conditions of that model. Overall these findings points towards a positive answer for the aim of the thesis, although further developments are needed.
|
76 |
Is Simple Wikipedia simple? : – A study of readability and guidelinesIsaksson, Fabian January 2018 (has links)
Creating easy-to-read text is an issue that has traditionally been solved with manual work. But with advancing research in natural language processing, automatic systems for text simplification are being developed. These systems often need training data that is parallel aligned. For several years, simple Wikipedia has been the main source for this data. In the current study, several readability measures has been tested on a popular simplification corpus. A selection of guidelines from simple Wikipedia has also been operationalized and tested. The results imply that the following of guidelines are not greater in simple Wikipedia than in standard Wikipedia. There are however differences in the readability measures. The syntactical structures of simple Wikipedia seems to be less complex than those of standard Wikipedia. A continuation of this study would be to examine other readability measures and evaluate the guidelines not covered within the current work.
|
77 |
Hierarchical text classification of fiction books : With Thema subject categoriesReinaudo, Alice January 2019 (has links)
Categorizing books and literature of any genre and subject area is a vital task for publishers which seek to distribute their books to the appropriate audiences. It is common that different countries use different subject categorization schemes, which makes international book trading more difficult due to the need to categorize books from scratch once they reach another country. A solution to this problem has been proposed in the form of an international standard called Thema, which encompasses thousands of hierarchical subject categories. However, because this scheme is quite recent, many books published before its creation are yet to be assigned subject categories. It also is often the case that even recent books are not categorized. In this work, methods for automatic categorization of books are investigated, based on multinomial Naive Bayes and Facebook's classifier fastText. The results show some amount of promise for both classifiers, but overall, due to data imbalance and a very long training time that made it difficult to use more data, it is not possible to determine with certainty which classifier actually is best.
|
78 |
Domain Knowledge Management in Information-providing Dialogue SystemsFlycht-Eriksson (Silvervarg), Annika January 2001 (has links)
<p>In this thesis a new concept called domain knowledge management for informationproviding dialogue systems is introduced. Domain knowledge management includes issues related to representation and use of domain knowledge as well as access of background information sources, issues that previously have been incorporated in dialogue management.</p><p>The work on domain knowledge management reported in this thesis can be divided in two parts. On a general theoretical level, knowledge sources and models used for dialogue management, including domain knowledge management, are studied and related to the capabilities they support. On a more practical level, domain knowledge management is examined in the contexts of a dialogue system framework and a specific instance of this framework, the ÖTRAF system. In this system domain knowledge management is implemented in a separate module, a Domain Knowledge Manager.</p><p>The use of a specialised Domain Knowledge Manager has a number of advantages. The first is that dialogue management becomes more focused as it only has to consider dialogue phenomena, while domain-specific reasoning is handled by the Domain Knowledge Manager. Secondly, porting of a system to new domains is facilitated since domain-related issues are separated out in specialised domain knowledge sources. The third advantage with a separate module for domain knowledge management is that domain knowledge sources can be easily modified, exchanged, and reused.</p> / Report code: LiU-Tek-Lic-2001:27.
|
79 |
Object Based Concurrency for Data Parallel Applications : Programmability and EffectivenessDiaconescu, Roxana Elena January 2002 (has links)
<p>Increased programmability for concurrent applications in distributed systems requires automatic support for some of the concurrent computing aspects. These are: the decomposition of a program into parallel threads, the mapping of threads to processors, the communication between threads, and synchronization among threads.</p><p>Thus, a highly usable programming environment for data parallel applications strives to conceal data decomposition, data mapping, data communication, and data access synchronization.</p><p>This work investigates the problem of programmability and effectiveness for scientific, data parallel applications with irregular data layout. The complicating factor for such applications is the recursive, or indirection data structure representation. That is, an efficient parallel execution requires a data distribution and mapping that ensure data locality. However, the recursive and indirect representations yield poor physical data locality. We examine the techniques for efficient, load-balanced data partitioning and mapping for irregular data layouts. Moreover, in the presence of non-trivial parallelism and data dependences, a general data partitioning procedure complicates arbitrary locating distributed data across address spaces. We formulate the general data partitioning and mapping problems and show how a general data layout can be used to access data across address spaces in a location transparent manner.</p><p>Traditional data parallel models promote instruction level, or loop-level parallelism. Compiler transformations and optimizations for discovering and/or increasing parallelism for Fortran programs apply to regular applications. However, many data intensive applications are irregular (sparse matrix problems, applications that use general meshes, etc.). Discovering and exploiting fine-grain parallelism for applications that use indirection structures (e.g. indirection arrays, pointers) is very hard, or even impossible.</p><p>The work in this thesis explores a concurrent programming model that enables coarse-grain parallelism in a highly usable, efficient manner. Hence, it explores the issues of implicit parallelism in the context of objects as a means for encapsulating distributed data. The computation model results in a trivial SPMD (Single Program Multiple Data), where the non-trivial parallelism aspects are solved automatically.</p><p>This thesis makes the following contributions:</p><p>- It formulates the general data partitioning and mapping problems for data parallel applications. Based on these formulations, it describes an efficient distributed data consistency algorithm.</p><p>- It describes a data parallel object model suitable for regular and irregular data parallel applications. Moreover, it describes an original technique to map data to processors such as to preserve locality. It also presents an inter-object consistency scheme that tries to minimize communication.</p><p>- It brings evidence on the efficiency of the data partitioning and consistency schemes. It describes a prototype implementation of a system supporting implicit data parallelism through distributed objects. Finally, it presents results showing that the approach is scalable on various architectures (e.g. Linux clusters, SGI Origin 3800).</p>
|
80 |
Webbdistribuerad pedagogisk multimediaproduktion : en studie i designarbetets tvärvetenskapliga naturVaktel, Andreas, Ohlsson, Johannes January 2005 (has links)
No description available.
|
Page generated in 0.0514 seconds