Spelling suggestions: "subject:"semantic""
1 |
SWordNet: Inferring Semantically Related Words from Software ContextYang, Jinqiu January 2013 (has links)
Code search is an integral part of software development and program comprehension. The difficulty of code search lies in the inability to guess the exact words used in the code. Therefore, it is crucial for keyword-based code search to expand queries with semantically related words, e.g., synonyms and abbreviations, to increase the search effectiveness. However, it is limited to rely on resources such as English dictionaries and WordNet to obtain semantically related words in software, because many words that are semantically related in software are not semantically related in English. On the other hand, many words that are semantically related in English are not semantically related in software.
This thesis proposes a simple and general technique to automatically infer semantically re- lated words (referred to as rPairs) in software by leveraging the context of words in comments and code. In addition, we propose a ranking algorithm on the rPair results and study cross-project rPairs on two sets of software with similar functionality, i.e., media browsers and operating sys- tems. We achieve a reasonable accuracy in nine large and popular code bases written in C and Java. Our further evaluation against the state of art shows that our technique can achieve a higher precision and recall. In addition, the proposed ranking algorithm improves the rPair extraction accuracy by bringing correct rPairs to the top of the list. Our cross-project study successfully discovers overlapping rPairs among projects of similar functionality and finds that cross-project rPairs are more likely to be correct than project-specific rPairs. Since the cross-project rPairs are highly likely to be general for software of the same type, the discovered overlapping rPairs can benefit other projects of the same type that have not been anaylyzed.
|
2 |
Querying semantically heterogeneous data sources using ontologiesBreed, Aditi January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / Doina Caragea / In recent years, we have witnessed a significant increase in the number, size and diversity of the available data sources in many application domains. Data sources in a particular domain are autonomously created and maintained, and therefore distributed and semantically heterogeneous. In this thesis, we focused on the problem of querying such semantically heterogeneous data sources from a user's perspective. We approach this problem by using the concepts of ontologies and mappings between ontologies. A system for answering queries in a transparent way to the user has been designed and implemented. The main components of this system are an ontology mapping algorithm that maps user ontologies to data source ontologies, and a query processing engine that maps user queries to queries that can be answered by the data sources in the system. We have shown that machine learning algorithms can also be incorporated in the system, thus making it possible to learn machine learning classifiers (in particular, generative models such as Naïve Bayes) from distributed, semantically heterogeneous data sources. Because many data sources today are relational in nature, in this work we have dealt specifically with relational data sources, as opposed to flat files, XML or object oriented data sources. However, our system can be easily extended to other types of data sources.
|
3 |
Very Low Bitrate Video Communication : A Principal Component Analysis ApproachSöderström, Ulrik January 2008 (has links)
A large amount of the information in conversations come from non-verbal cues such as facial expressions and body gesture. These cues are lost when we don't communicate face-to-face. But face-to-face communication doesn't have to happen in person. With video communication we can at least deliver information about the facial mimic and some gestures. This thesis is about video communication over distances; communication that can be available over networks with low capacity since the bitrate needed for video communication is low. A visual image needs to have high quality and resolution to be semantically meaningful for communication. To deliver such video over networks require that the video is compressed. The standard way to compress video images, used by H.264 and MPEG-4, is to divide the image into blocks and represent each block with mathematical waveforms; usually frequency features. These mathematical waveforms are quite good at representing any kind of video since they do not resemble anything; they are just frequency features. But since they are completely arbitrary they cannot compress video enough to enable use over networks with limited capacity, such as GSM and GPRS. Another issue is that such codecs have a high complexity because of the redundancy removal with positional shift of the blocks. High complexity and bitrate means that a device has to consume a large amount of energy for encoding, decoding and transmission of such video; with energy being a very important factor for battery-driven devices. Drawbacks of standard video coding mean that it isn't possible to deliver video anywhere and anytime when it is compressed with such codecs. To resolve these issues we have developed a totally new type of video coding. Instead of using mathematical waveforms for representation we use faces to represent faces. This makes the compression much more efficient than if waveforms are used even though the faces are person-dependent. By building a model of the changes in the face, the facial mimic, this model can be used to encode the images. The model consists of representative facial images and we use a powerful mathematical tool to extract this model; namely principal component analysis (PCA). This coding has very low complexity since encoding and decoding only consist of multiplication operations. The faces are treated as single encoding entities and all operations are performed on full images; no block processing is needed. These features mean that PCA coding can deliver high quality video at very low bitrates with low complexity for encoding and decoding. With the use of asymmetrical PCA (aPCA) it is possible to use only semantically important areas for encoding while decoding full frames or a different part of the frames. We show that a codec based on PCA can compress facial video to a bitrate below 5 kbps and still provide high quality. This bitrate can be delivered on a GSM network. We also show the possibility of extending PCA coding to encoding of high definition video.
|
4 |
漢語連動句研究 / Aspects of Serial Verb Constructions in Mandarin周奎宜, Chou, Kuei Yi Unknown Date (has links)
本文研究漢語的連動結構。首先我們先將連動結構的定義做一個清楚的界定,之後再從動詞間的語意面著手,對連動結構中動詞間的語意關係,提出詳盡的說明與解釋。除此之外,根據連動句中詞彙的受限性以及動詞間的語意關聯性這兩個原則,我們再進一步將漢語的連動結構區分為四個類型。最後,我們從句法的層面切入來探討漢語的連動結構。我們認為不同的語意解讀是由於不同的句法結構所導致。 / This thesis investigates serial verb constructions (SVCs) in Mandarin Chinese. The serial verb constructions are often confused with other superficially similar structures; thus, the first objective of this study is to explicitly delimitate the definition of SVCs and to differentiate them from other structures. In addition, we will further explore the semantic relationship between the serial verbs and classify SVCs into several types according to the V1-V2 correlations, the independence between sub-events, and the lexical restrictiveness of the verbs. We will then analyze the syntactic relationship of the verbs. Finally, we also propose that there are two possible structures for SVCs in Mandarin.
Chapter 1 is a brief introduction of the term “serial verb construction.” We will first go over its definitions, the functions it can convey, and the geographical distribution of languages with SVC. In Chapter 2, we will present the distinguishing characteristics of SVC and distinguish it from other similar structures. In Chapter 3, we will present different semantic correlations between the VPs. In Chapter 4, we will classify Mandarin SVCs into different subtypes based on the lexical and semantic criteria. Chapter 5 presents the tentative syntactic analyses of Mandarin SVCs. Chapter 6 concludes this paper.
|
5 |
Entity-level Event Impact Analytics / Analyse de l'impact des évenements au niveau des entitésGovind, . 12 December 2018 (has links)
Notre société est de plus en plus présente sur le Web. En conséquence, une grande partie des événements quotidiens a vocation à être numérisée. Dans ce cadre, le Web contient des descriptions de divers événements du monde réel et provenant du monde entier. L'ampleur de ces événements peut varier, allant de ceux pertinents uniquement localement à ceux qui retiennent l'attention du monde entier. La presse et les médias sociaux permettent d’atteindre une diffusion presque mondiale. L’ensemble de toutes ces données décrivant des événements sociétaux potentiellement complexes ouvre la porte à de nombreuses possibilités de recherche pour analyser et mieux comprendre l'état de notre société.Dans cette thèse, nous étudions diverses tâches d’analyse de l’impact des événements sociétaux. Plus précisément, nous abordons trois facettes dans le contexte des événements et du Web, à savoir la diffusion d’événements dans des communautés de langues étrangères, la classification automatisée des contenus Web et l’évaluation et la visualisation de la viralité de l’actualité. Nous émettons l'hypothèse que les entités nommées associées à un événement ou à un contenu Web contiennent des informations sémantiques précieuses, qui peuvent être exploitées pour créer des modèles de prédiction précis. À l'aide de nombreuses études, nous avons montré que l'élévation du contenu Web au niveau des entités saisissait leur essence essentielle et offrait ainsi une variété d'avantages pour obtenir de meilleures performances dans diverses tâches. Nous exposons de nouvelles découvertes sur des tâches disparates afin de réaliser notre objectif global en matière d'analyse de l’impact des événements sociétaux. / Our society has been rapidly growing its presence on the Web, as a consequence we are digitizing a large collection of our daily happenings. In this scenario, the Web receives virtual occurrences of various events corresponding to their real world occurrences from all around the world. Scale of these events can vary from locally relevant ones up to those that receive global attention. News and social media of current times provide all essential means to reach almost a global diffusion. This big data of complex societal events provide a platform to many research opportunities for analyzing and gaining insights into the state of our society.In this thesis, we investigate a variety of social event impact analytics tasks. Specifically, we address three facets in the context of events and the Web, namely, diffusion of events in foreign languages communities, automated classification of Web contents, and news virality assessment and visualization. We hypothesize that the named entities associated with an event or a Web content carry valuable semantic information, which can be exploited to build accurate prediction models. We have shown with the help of multiple studies that raising Web contents to the entity-level captures their core essence, and thus, provides a variety of benefits in achieving better performance in diverse tasks. We report novel findings over disparate tasks in an attempt to fulfill our overall goal on societal event impact analytics.
|
6 |
Slovotvorba - gramaticky / sémanticky / pragmaticky - na příkladu vybraných politických projevů / Word-formation - grammatically / semantically / pragmatically - on the example of selected political speechesCharvátová, Věra January 2018 (has links)
This thesis examines selected political speeches with regard to word-formation and its processes, namely from the grammatical, semantic, and pragmatic points of view. The analyzed political speeches are the Otto von Bismarck's speech delivered on 20 July 1870, the Adolf Hitler's speech delivered on 1 September 1939, the Willy Brandt's speech delivered on 10 November 1989, and the Angela Merkel's speech from 14 December 2015. This is an interdisciplinary thesis which deals with politics, history, and linguistics. The thesis examines four different periods, namely Bismarck's era, Nazism, the Federal Republic of Germany between 1949 and 1990, and the contemporary Federal Republic of Germany. These periods are analyzed from the political, historical, and socio-cultural points of view. Subsequently, the selected political speeches are analyzed with respect to word- formation, its processes, and the period in which they were delivered. The individual results are then compared and certain conclusions are drawn from the comparison. The aim of this thesis is to highlight the importance of word-formation and its processes which are significant for political speeches from the 19th century onwards. This thesis shows and analyzes the motives, purposes, aims, and consequences of their usage in particular...
|
Page generated in 0.0721 seconds