341 |
Généralisation de données textuelles adaptée à la classification automatique / Toward new features for text miningTisserant, Guillaume 14 April 2015 (has links)
La classification de documents textuels est une tâche relativement ancienne. Très tôt, de nombreux documents de différentes natures ont été regroupés dans le but de centraliser la connaissance. Des systèmes de classement et d'indexation ont alors été créés. Ils permettent de trouver facilement des documents en fonction des besoins des lecteurs. Avec la multiplication du nombre de documents et l'apparition de l'informatique puis d'internet, la mise en œuvre de systèmes de classement des textes devient un enjeu crucial. Or, les données textuelles, de nature complexe et riche, sont difficiles à traiter de manière automatique. Dans un tel contexte, cette thèse propose une méthodologie originale pour organiser l'information textuelle de façon à faciliter son accès. Nos approches de classification automatique de textes mais aussi d'extraction d'informations sémantiques permettent de retrouver rapidement et avec pertinence une information recherchée.De manière plus précise, ce manuscrit présente de nouvelles formes de représentation des textes facilitant leur traitement pour des tâches de classification automatique. Une méthode de généralisation partielle des données textuelles (approche GenDesc) s'appuyant sur des critères statistiques et morpho-syntaxiques est proposée. Par ailleurs, cette thèse s'intéresse à la construction de syntagmes et à l'utilisation d'informations sémantiques pour améliorer la représentation des documents. Nous démontrerons à travers de nombreuses expérimentations la pertinence et la généricité de nos propositions qui permettent une amélioration des résultats de classification. Enfin, dans le contexte des réseaux sociaux en fort développement, une méthode de génération automatique de HashTags porteurs de sémantique est proposée. Notre approche s'appuie sur des mesures statistiques, des ressources sémantiques et l'utilisation d'informations syntaxiques. Les HashTags proposés peuvent alors être exploités pour des tâches de recherche d'information à partir de gros volumes de données. / We have work for a long time on the classification of text. Early on, many documents of different types were grouped in order to centralize knowledge. Classification and indexing systems were then created. They make it easy to find documents based on readers' needs. With the increasing number of documents and the appearance of computers and the internet, the implementation of text classification systems becomes a critical issue. However, textual data, complex and rich nature, are difficult to treat automatically. In this context, this thesis proposes an original methodology to organize and facilitate the access to textual information. Our automatic classification approache and our semantic information extraction enable us to find quickly a relevant information.Specifically, this manuscript presents new forms of text representation facilitating their processing for automatic classification. A partial generalization of textual data (GenDesc approach) based on statistical and morphosyntactic criteria is proposed. Moreover, this thesis focuses on the phrases construction and on the use of semantic information to improve the representation of documents. We will demonstrate through numerous experiments the relevance and genericity of our proposals improved they improve classification results.Finally, as social networks are in strong development, a method of automatic generation of semantic Hashtags is proposed. Our approach is based on statistical measures, semantic resources and the use of syntactic information. The generated Hashtags can then be exploited for information retrieval tasks from large volumes of data.
|
342 |
Exploração de informações contextuais para enriquecimento semântico em representações de textos / Exploration of contextual information for semantic enrichment in text representationsRibeiro, João Vítor Antunes 14 November 2018 (has links)
Em decorrência da crescente quantidade de documentos disponíveis em formato digital, a importância da análise computacional de grandes volumes de dados torna-se ainda mais evidente na atualidade. Embora grande parte desses documentos esteja disponível em formato de língua natural, a análise por meio de processos como a Mineração de Textos ainda é um desafio a ser superado. Normalmente, abordagens tradicionais de representação de textos como a Bag of Words desconsideram aspectos semânticos e contextuais das coleções de textos analisadas, ignorando informações que podem potencializar o desempenho das tarefas realizadas. Os principais problemas associados a essas abordagens são a alta esparsidade e dimensionalidade que prejudicam consideravelmente o desempenho das tarefas realizadas. Como o enriquecimento de representações de textos é uma das possibilidades efetivas para atenuar esses tipos de problemas, nesta dissertação foi investigada a aplicação conjunta de enriquecimentos semânticos e contextuais. Para isso foi proposta uma nova técnica de representação de textos, cuja principal novidade é a abordagem utilizada para calcular a frequência dos atributos (contextos) baseando-se em suas similaridades. Os atributos extraídos por meio dessa técnica proposta são considerados dependentes já que são formados por conjuntos de termos correlacionados que podem compartilhar informações semelhantes. A efetividade da técnica foi avaliada na tarefa de classificação automática de textos, na qual foram explorados diferentes procedimentos de enriquecimento textual e versões de modelos de linguagem baseados em word embeddings. De acordo com os resultados obtidos, há evidências favoráveis a respeito da efetividade e da aplicabilidade da técnica de representação de textos proposta. Segundo os testes de significância estatística realizados, a aplicação de enriquecimentos textuais baseados em Reconhecimento de Entidades Nomeadas e em Desambiguação Lexical de Sentido pode contribuir efetivamente para o aumento do desempenho da tarefa de classificação automática de textos, principalmente nas abordagens em que também são considerados textos de fontes externas de conhecimento como a Wikipédia. Constatou-se empiricamente que a efetividade dessa técnica proposta pode ser superior às abordagens tradicionais em cenários de aplicação baseados em informações semânticas das coleções de textos, caracterizando-a como uma alternativa promissora para a geração de representações de textos com alta densidade de informações semânticas e contextuais que se destacam pela interpretabilidade. / Due to the increasing number of available documents in digital format, the importance of computational analysis of large volumes of data becomes even more evident recently. Although most of these documents are available in natural language format, analysis through processes such as text mining is still a challenge to be overcome. Normally, traditional text representation approaches such as the bag of words disregard semantic and contextual aspects of the analyzed text collections, ignoring information that can enhance the performance of the tasks performed. The main problems associated with these approaches are the high sparsity and dimensionality that considerably impair the performance of the tasks performed. As the text representations enrichment is one of the effective possibilities to attenuate these types of problems, in this dissertation the joint application of semantic and contextual enrichment was investigated. For that a new text representation technique was proposed, whose main novelty is the approach used to calculate the frequency of attributes (contexts) based on their similarities. The attributes attributes extracted by this proposed technique are considered dependent because they are formed by sets of correlated terms that can share similar information. The effectiveness of the technique was evaluated in the automatic text classification task, in which different procedures of textual enrichment and versions of language models based on word embeddings were explored. According to the results, there is favorable evidence regarding the effectiveness and applicability of the proposed text representation technique. According to the statistical significance tests, the application of textual enrichment based on named entity recognition and word sense disambiguation can effectively contribute to the increase of the performance of the automatic text classification task, especially in the approaches that are also considered texts from external knowledge sources such asWikipedia. It has been empirically verified that the effectiveness of this proposed technique can be superior to the traditional approaches in application scenarios based on semantic information of the text collections, characterizing it as a promising alternative for the generation of text representations with high density of semantic and contextual information that stand out for their interpretability.
|
343 |
Avaliação e implementação de propostas de melhoria para o protocolo IRIS baseadas em tecnologias de web semântica. / Evaluation and implementation of improvement proposals for the IRIS protocol based on semantic web technologies.Caires, Milena Constantino 12 July 2007 (has links)
O objetivo desta dissertação é avaliar se as tecnologias de Web Semântica podem contribuir para o desenvolvimento do protocolo Internet Registry Information Service Protocol (IRIS). IRIS é um novo protocolo para o serviço de informação sobre registros da Internet. Ele ainda encontra-se em fase de desenvolvimento por um grupo de trabalho do Internet Engineering Task Force (IETF). O objetivo do grupo de trabalho é desenvolver e padronizar um novo protocolo para substituir o protocolo Whois. Whois é o protocolo padrão utilizado atualmente para serviços de informação sobre registros da Internet, por exemplo nomes de domínios, endereços Internet Protocol (IP), sistemas autônomos, dentre outros. A principal motivação para o desenvolvimento do novo protocolo foi a crescente preocupação com a segurança dos dados armazenados na base de dados Whois pois o protocolo Whois não provê nenhum mecanismo de segurança. Outro motivo foi a ausência de suporte a base de dados distribuída porque o protocolo Whois foi desenvolvido para uma base de dados centralizada e, consequentemente, não atende aos requisitos padrões para protocolos da Internet. Até agora, o grupo de trabalho abordou e solucionou dois dos principais problemas do protocolo Whois: (1) segurança e (2) suporte à base de dados distribuída. Entretanto, o desenvolvimento de um novo padrão envolve um grande investimento da comunidade, em particular com respeito a políticas baseadas em consenso. Além disso, existe uma grande barreira a ser vencida para a adoção do novo protocolo: a adoção pelos usuários. O novo protocolo deve ter longevidade sem necessidade de atualização ou substituição por outro protocolo. Para atingir esse objetivo, é preciso não apenas satisfazer necessidades imediatas, como segurança, mas prever necessidades futuras. Este estudo envolveu as seguintes atividades de pesquisa: (1) análise comparativa dos atuais protocolos de busca de informação sobre registros da Internet, (2) o estudo aprofundado do protocolo IRIS e (3) a avaliação de novas tecnologias que pudessem ser incorporadas ao novo protocolo, em particular tecnologias de Web Semântica. Os resultados deste estudo demonstraram que as tecnologias de Web Semântica garantiriam a flexibilidade e extensibilidade necessárias para que o protocolo possa se adaptar às necessidades atuais e futuras. Para validar os resultados teóricos do estudo foi implementado um protótipo baseado na especificação do protocolo IRIS utilizando tecnologias de Web semântica. Dois tipos de experimentos foram conduzidos: (1) experimentos comparando os desempenhos do protótipo e do cliente Whois e (2) avaliação de desempenho do protótipo baseada em testes de carga. Finalmente, a implementação do protótipo e subsequentes experimentos serviram como prova de conceito de que as tecnologias de Web Semântica podem contribuir para o sucesso do protocolo IRIS. / The aim of this thesis is to evaluate whether Semantic Web technologies can contribute to Internet Registry Information Service Protocol (IRIS) protocol development. IRIS is a new protocol for providing an information service for Internet resources. It is currently still under development by an Internet Engineering Task Force (IETF) working group. The objective of the working group is to develop and standardize a new protocol to replace the Whois protocol. Whois is the standard protocol used today by information services for Internet resources, i.e. domain names, Internet Protocol (IP) addresses, autonomous systems, amongst others. The motivation to develop a new protocol was based on increasing concerns regarding the security of data stored in the Whois database as the Whois protocol does not provide any security mechanism. Another motivation was the absence of support for distributed databases as the Whois protocol was developed for a centralized database, hence it no longer meets the standard requirements for Internet protocols. So far, the working group has tackled and solved two main issues concerning the Whois protocol: (1) security and (2) support for distributed databases. However, the development of a new standard demands a great investment from the community, in particular with respect to consensus-based policies. Additionally, there is one major barrier against adopting the new protocol: the users adoption. The new protocol must have longevity without being updated or replaced by another protocol. To reach this goal, it is necessary to meet not only the current requirements, such as security issues, but to cater also for future requirements. This thesis is concerned with the following research activities: (1) comparative analysis of the current protocols used to provide information services on Internet resources, (2) the IRIS protocol analysis and (3) the evaluation of new technologies that could be incorporated in the new protocol, in particular Semantic technologies. The results demonstrate that Semantic Web technologies could provide the necessary flexibility and extensibility to meet the current and future requirements of IRIS. To validate the theoretical results a prototype based on the IRIS specification was implemented using Semantic Web technologies. Two types of experiments were conducted: (1) experiments comparing the Whois and the prototype performance and (2) performance evaluation of the prototype based on load tests. Finally, the prototype implementation and subsequent experiment results serve as a proof-of-concept that Semantic Web technologies could contribute towards the IRIS protocol success.
|
344 |
Vers un meilleur accès aux informations pertinentes à l’aide du Web sémantique : application au domaine du e-tourisme / Towards a better access to relevant information with Semantic Web : application to the e-tourism domainLully, Vincent 17 December 2018 (has links)
Cette thèse part du constat qu’il y a une infobésité croissante sur le Web. Les deux types d’outils principaux, à savoir le système de recherche et celui de recommandation, qui sont conçus pour nous aider à explorer les données du Web, connaissent plusieurs problématiques dans : (1) l’assistance de la manifestation des besoins d’informations explicites, (2) la sélection des documents pertinents, et (3) la mise en valeur des documents sélectionnés. Nous proposons des approches mobilisant les technologies du Web sémantique afin de pallier à ces problématiques et d’améliorer l’accès aux informations pertinentes. Nous avons notamment proposé : (1) une approche sémantique d’auto-complétion qui aide les utilisateurs à formuler des requêtes de recherche plus longues et plus riches, (2) des approches de recommandation utilisant des liens hiérarchiques et transversaux des graphes de connaissances pour améliorer la pertinence, (3) un framework d’affinité sémantique pour intégrer des données sémantiques et sociales pour parvenir à des recommandations qualitativement équilibrées en termes de pertinence, diversité et nouveauté, (4) des approches sémantiques visant à améliorer la pertinence, l’intelligibilité et la convivialité des explications des recommandations, (5) deux approches de profilage sémantique utilisateur à partir des images, et (6) une approche de sélection des meilleures images pour accompagner les documents recommandés dans les bannières de recommandation. Nous avons implémenté et appliqué nos approches dans le domaine du e-tourisme. Elles ont été dûment évaluées quantitativement avec des jeux de données vérité terrain et qualitativement à travers des études utilisateurs. / This thesis starts with the observation that there is an increasing infobesity on the Web. The two main types of tools, namely the search engine and the recommender system, which are designed to help us explore the Web data, have several problems: (1) in helping users express their explicit information needs, (2) in selecting relevant documents, and (3) in valuing the selected documents. We propose several approaches using Semantic Web technologies to remedy these problems and to improve the access to relevant information. We propose particularly: (1) a semantic auto-completion approach which helps users formulate longer and richer search queries, (2) several recommendation approaches using the hierarchical and transversal links in knowledge graphs to improve the relevance of the recommendations, (3) a semantic affinity framework to integrate semantic and social data to yield qualitatively balanced recommendations in terms of relevance, diversity and novelty, (4) several recommendation explanation approaches aiming at improving the relevance, the intelligibility and the user-friendliness, (5) two image user profiling approaches and (6) an approach which selects the best images to accompany the recommended documents in recommendation banners. We implemented and applied our approaches in the e-tourism domain. They have been properly evaluated quantitatively with ground-truth datasets and qualitatively through user studies.
|
345 |
Les prédicats idéophoniques serbes : syntaxe et sémantique / Serbian predicative ideophones : syntaxe and semanticMilosavljević, Tanja 15 November 2018 (has links)
Les prédicats idéophoniques serbes représentent une classe de mots très courante, surtout dans la langue orale. Ces mots, qui s'apparentent par leur forme morphologique des idéophones d'une part, et qui sont dotés d’une fonction prédicative de l'autre, sont souvent classés parmi les interjections. Cependant, leur fonctionnement n'est pas celui des interjections. Le présent travail de thèse propose une première investigation sur ces formes, encore très peu étudiées dans la langue serbe. La thèse commence par une définition de la classe des prédicats idéophoniques, leur rapport avec les interjections, les onomatopées et les verbes. La partie centrale est consacrée à l'étude syntactico-sémantique de chacun de 32 prédicats idéophoniques répertoriés en serbe moderne : dans la littérature, la presse et sur Internet. Une partie synthétique présente les réflexions plus générales sur les particularités phonologiques de ces formes, les spécificités de la réalisation de leurs composants et des constructions qu'elles intègrent, de même que les problèmes de la prédication et de la prédication seconde que posent certaines formes. Sont étudiées aussi les formes synonymes et les particularités de dérivation des verbes issus d'idéophones. Une analyse sémantique plus affinée permet de différencier les idéophones à sémantique très proche, qui se situent surtout dans le domaine de « tomber » ou dans celui de « frapper ». Une conclusion générale clôt la thèse en reprenant les résultats obtenus et fait quelques comparaisons avec le fonctionnement de ces formes en russe, ce qui permet de situer la présente étude dans une perspective typologique. / Serbian predicative ideophones represent a very frequent class of words in Serbian, especially in conversational language. These words that have a morphological form of the ideophone on the one hand and a predicative function on the other, are often classified as interjections. However, these words dont have a fonction of interjection.This thesis work proposes the first investigation of these words, that are still poorly studied in the Serbian language. The thesis begins with a definition of the class of predicative ideophones, their relation to interjections, onomatopeia and verbs. The central part is dedicated to the syntactico-semantic analyses of 32 predicative ideophones identified in modern Serbian language : in the literature, the press and on the Internet. A synthetic part presents the more general reflections about the phonological particularity of these forms, the specificity of the realization of their components and the constructions that these forms integrate, as well as the predicate and the second predication in some forms. Synonymous forms and derivation of verbs from ideophones are also studied. A more refined semantic analysis allows to differentiate ideophones of very close meaning, specially for the expression of « falling » or « hitting ». In the main conclusion are made some comparisons with the function of predicative ideophones in the Russian language. So the present study may be situated in a typological perspective.
|
346 |
Similarity Reasoning over Semantic Context-GraphsBoteanu, Adrian 26 August 2015 (has links)
"Similarity is a central cognitive mechanism for humans which enables a broad range of perceptual and abstraction processes, including recognizing and categorizing objects, drawing parallelism, and predicting outcomes. It has been studied computationally through models designed to replicate human judgment. The work presented in this dissertation leverages general purpose semantic networks to derive similarity measures in a problem-independent manner. We model both general and relational similarity using connectivity between concepts within semantic networks. Our first contribution is to model general similarity using concept connectivity, which we use to partition vocabularies into topics without the need of document corpora. We apply this model to derive topics from unstructured dialog, specifically enabling an early literacy primer application to support parents in having better conversations with their young children, as they are using the primer together. Second, we model relational similarity in proportional analogies. To do so, we derive relational parallelism by searching in semantic networks for similar path pairs that connect either side of this analogy statement. We then derive human readable explanations from the resulting similar path pair. We show that our model can answer broad-vocabulary analogy questions designed for human test takers with high confidence. The third contribution is to enable symbolic plan repair in robot planning through object substitution. When a failure occurs due to unforeseen changes in the environment, such as missing objects, we enable the planning domain to be extended with a number of alternative objects such that the plan can be repaired and execution to continue. To evaluate this type of similarity, we use both general and relational similarity. We demonstrate that the task context is essential in establishing which objects are interchangeable."
|
347 |
Contributions to music semantic analysis and its acceleration techniques / Contributions à l'analyse sémantique de la musique et de ses techniques d'accélérationGao, Boyang 15 December 2014 (has links)
La production et la diffusion de musique numérisée ont explosé ces dernières années. Une telle quantité de données à traiter nécessite des méthodes efficaces et rapides pour l’analyse et la recherche automatique de musique. Cette thèse s’attache donc à proposer des contributions pour l’analyse sémantique de la musique, et en particulier pour la reconnaissance du genre musical et de l’émotion induite (ressentie par l’auditoire), à l’aide de descripteurs de bas-niveau sémantique mais également de niveau intermédiaire. En effet, le genre musical et l’émotion comptent parmi les concepts sémantiques les plus naturels perçus par les auditoires. Afin d’accéder aux propriétés sémantiques à partir des descripteurs bas-niveau, des modélisations basées sur des algorithmes de types K-means et GMM utilisant des BoW et Gaussian super vectors ont été envisagées pour générer des dictionnaires. Compte-tenu de la très importante quantité de données à traiter, l’efficacité temporelle ainsi que la précision de la reconnaissance sont des points critiques pour la modélisation des descripteurs de bas-niveau. Ainsi, notre première contribution concerne l’accélération des méthodes K-means, GMM et UMB-MAP, non seulement sur des machines indépendantes, mais également sur des clusters de machines. Afin d’atteindre une vitesse d’exécution la plus importante possible sur une machine unique, nous avons montré que les procédures d’apprentissage des dictionnaires peuvent être réécrites sous forme matricielle pouvant être accélérée efficacement grâce à des infrastructures de calcul parallèle hautement performantes telle que les multi-core CPU ou GPU. En particulier, en s’appuyant sur GPU et un paramétrage adapté, nous avons obtenu une accélération de facteur deux par rapport à une implémentation single thread. Concernant le problème lié au fait que les données ne peuvent pas être stockées dans la mémoire d’une seul ordinateur, nous avons montré que les procédures d’apprentissage des K-means et GMM pouvaient être divisées par un schéma Map-Reduce pouvant être exécuté sur des clusters Hadoop et Spark. En utilisant notre format matriciel sur ce type de clusters, une accélération de 5 à 10 fois a pu être obtenue par rapport aux librairies d’accélération de l’état de l’art. En complément des descripteurs audio bas-niveau, des descripteurs de niveau sémantique intermédiaire tels que l’harmonie de la musique sont également très importants puisqu’ils intègrent des informations d’un niveau d’abstraction supérieur à celles obtenues à partir de la simple forme d’onde. Ainsi, notre seconde contribution consiste en la modélisation de l’information liée aux notes détectées au sein du signal musical, en utilisant des connaissances sur les propriétés de la musique. Cette contribution s’appuie sur deux niveaux de connaissance musicale : le son des notes des instruments ainsi que les statistiques de co-occurrence et de transitions entre notes. Pour le premier niveau, un dictionnaire musical constitué de notes d’instruments a été élaboré à partir du synthétiseur Midi de Logic Pro 9. Basé sur ce dictionnaire, nous avons proposé un algorithme « Positive Constraint Matching Pursuit » (PCMP) pour réaliser la décomposition de la musique. Pour le second niveau, nous avons proposé une décomposition parcimonieuse intégrant les informations de statistiques d’occurrence des notes ainsi que les probabilités de co-occurrence pour guider la sélection des atomes du dictionnaire musical et pour construire un graphe à candidats multiples pour proposer des choix alternatifs lors des sélections successives. Pour la recherche du chemin global optimal de succession des notes, les probabilités de transitions entre notes ont également été incorporées. […] / Digitalized music production exploded in the past decade. Huge amount of data drives the development of effective and efficient methods for automatic music analysis and retrieval. This thesis focuses on performing semantic analysis of music, in particular mood and genre classification, with low level and mid level features since the mood and genre are among the most natural semantic concepts expressed by music perceivable by audiences. In order to delve semantics from low level features, feature modeling techniques like K-means and GMM based BoW and Gaussian super vector have to be applied. In this big data era, the time and accuracy efficiency becomes a main issue in the low level feature modeling. Our first contribution thus focuses on accelerating k-means, GMM and UBM-MAP frameworks, involving the acceleration on single machine and on cluster of workstations. To achieve the maximum speed on single machine, we show that dictionary learning procedures can elegantly be rewritten in matrix format that can be accelerated efficiently by high performance parallel computational infrastructures like multi-core CPU, GPU. In particular with GPU support and careful tuning, we have achieved two magnitudes speed up compared with single thread implementation. Regarding data set which cannot fit into the memory of individual computer, we show that the k-means and GMM training procedures can be divided into map-reduce pattern which can be executed on Hadoop and Spark cluster. Our matrix format version executes 5 to 10 times faster on Hadoop and Spark clusters than the state-of-the-art libraries. Beside signal level features, mid-level features like harmony of music, the most natural semantic given by the composer, are also important since it contains higher level of abstraction of meaning beyond physical oscillation. Our second contribution thus focuses on recovering note information from music signal with musical knowledge. This contribution relies on two levels of musical knowledge: instrument note sound and note co-occurrence/transition statistics. In the instrument note sound level, a note dictionary is firstly built i from Logic Pro 9. With the musical dictionary in hand, we propose a positive constraint matching pursuit (PCMP) algorithm to perform the decomposition. In the inter-note level, we propose a two stage sparse decomposition approach integrated with note statistical information. In frame level decomposition stage, note co-occurrence probabilities are embedded to guide atom selection and to build sparse multiple candidate graph providing backup choices for later selections. In the global optimal path searching stage, note transition probabilities are incorporated. Experiments on multiple data sets show that our proposed approaches outperform the state-of-the-art in terms of accuracy and recall for note recovery and music mood/genre classification.
|
348 |
Automaticity and the development of categorisation in preschool children : understanding the importance of playOwen, Kay January 2017 (has links)
Categorisation is the process by which items, behaviours and events are compartmentalised according to their defining attributes or properties. This may be based on simple perceptual similarities or on more complex conceptual webs. Whatever their selection criteria, categories expedite inferential capabilities, facilitating behavioural predictions and subsequently enabling response. Categorisation waives conscious effort whilst preserving that which is salient and as such, provides a highly efficient means of delineating and organising information within semantic memory. An ability to categorise is therefore fundamental to an individual’s capacity to understand the world and a necessary precursor to academic achievement. This thesis comprises a series of studies that were devised in order to investigate categorisational development in children. Study 1 involved the development of a theoretically and practically valid testing mechanism. A sample of 159 children, aged 30-50 months, participated in a series of investigations aimed at establishing the impact of test format and presentation dimensionality on categorisation performance. As a result of this, a new test battery was devised which enabled more fine grain differentiation than had been possible with the tests used by previous researchers. The battery measured four different aspects of preschool children’s categorisational abilities -categorising according to shape; according to colour; when presented with drawings of items, and when presented with the same items in the form of toys. Results found that children’s ability to categorise differed significantly according to their sex, socio-economic background and the dimensionality of the item. Study 2 utilised the same battery with 190 participants from demographically diverse cohorts. Significant differences were found between high and low socio-economic groups and between boys and girls. A Mixed- Factorial ANOVA, with a post-hoc Bonferroni demonstrated a main effect of sex; a main effect of cohort and an interaction between sex and cohort. A Kruskal-Wallis Test also showed age to be significant, confirming the findings of previous researchers concerning a developmental trajectory. However, it also found that relatively sophisticated conceptual webs emerge earlier than had previously been thought. Whilst the results from Study 2 had demonstrated relative homogeneity amongst socio-economic groups, it was noted that participants from the most disadvantaged neighbourhood performed better than those from the other low socio-economic cohort. As the two Nurseries employed different approaches, with one offering a formal curriculum and the other emphasising child-led play, it was decided that the final study would focus on categorical development in these two cohorts. The final study therefore investigated conceptual development during 96 participants’ first twelve weeks of nursery education. Forty-eight participants were drawn from a Community Nursery with a strong emphasis on child-led play and 48 were drawn from a Nursery attached to a Primary School, where the emphasis was on more formalised learning. Children’s categorisational abilities were measured during their first week in Nursery using the test battery devised for Study 1. They were then re-tested using a matched battery twelve weeks later. Change scores were calculated and analysed using a series of one-way ANOVAs. As anticipated, all participants made gains but the children who had participated in play made significantly greater gains in three out of the four measures. It is thus asserted that play is a key conducer in cognitive development and a causal executant in establishing rudimentary automaticity and, as such, should be the polestar of preschool education. This is particularly important for boys from low socio-economic backgrounds who face contiguous disadvantage. Therefore, this research demonstrates that memory-based research with young children should be conducted with toys and objects, rather than images, and that the link between social and educational stratification has its roots in early childhood and is best addressed through the provision of high-quality play opportunities.
|
349 |
Semantic spaces for video analysis of behaviourXu, Xun January 2016 (has links)
There are ever growing interests from the computer vision community into human behaviour analysis based on visual sensors. These interests generally include: (1) behaviour recognition - given a video clip or specific spatio-temporal volume of interest discriminate it into one or more of a set of pre-defined categories; (2) behaviour retrieval - given a video or textual description as query, search for video clips with related behaviour; (3) behaviour summarisation - given a number of video clips, summarise out representative and distinct behaviours. Although countless efforts have been dedicated into problems mentioned above, few works have attempted to analyse human behaviours in a semantic space. In this thesis, we define semantic spaces as a collection of high-dimensional Euclidean space in which semantic meaningful events, e.g. individual word, phrase and visual event, can be represented as vectors or distributions which are referred to as semantic representations. With the semantic space, semantic texts, visual events can be quantitatively compared by inner product, distance and divergence. The introduction of semantic spaces can bring lots of benefits for visual analysis. For example, discovering semantic representations for visual data can facilitate semantic meaningful video summarisation, retrieval and anomaly detection. Semantic space can also seamlessly bridge categories and datasets which are conventionally treated independent. This has encouraged the sharing of data and knowledge across categories and even datasets to improve recognition performance and reduce labelling effort. Moreover, semantic space has the ability to generalise learned model beyond known classes which is usually referred to as zero-shot learning. Nevertheless, discovering such a semantic space is non-trivial due to (1) semantic space is hard to define manually. Humans always have a good sense of specifying the semantic relatedness between visual and textual instances. But a measurable and finite semantic space can be difficult to construct with limited manual supervision. As a result, constructing semantic space from data is adopted to learn in an unsupervised manner; (2) It is hard to build a universal semantic space, i.e. this space is always contextual dependent. So it is important to build semantic space upon selected data such that it is always meaningful within the context. Even with a well constructed semantic space, challenges are still present including; (3) how to represent visual instances in the semantic space; and (4) how to mitigate the misalignment of visual feature and semantic spaces across categories and even datasets when knowledge/data are generalised. This thesis tackles the above challenges by exploiting data from different sources and building contextual semantic space with which data and knowledge can be transferred and shared to facilitate the general video behaviour analysis. To demonstrate the efficacy of semantic space for behaviour analysis, we focus on studying real world problems including surveillance behaviour analysis, zero-shot human action recognition and zero-shot crowd behaviour recognition with techniques specifically tailored for the nature of each problem. Firstly, for video surveillances scenes, we propose to discover semantic representations from the visual data in an unsupervised manner. This is due to the largely availability of unlabelled visual data in surveillance systems. By representing visual instances in the semantic space, data and annotations can be generalised to new events and even new surveillance scenes. Specifically, to detect abnormal events this thesis studies a geometrical alignment between semantic representation of events across scenes. Semantic actions can be thus transferred to new scenes and abnormal events can be detected in an unsupervised way. To model multiple surveillance scenes simultaneously, we show how to learn a shared semantic representation across a group of semantic related scenes through a multi-layer clustering of scenes. With multi-scene modelling we show how to improve surveillance tasks including scene activity profiling/understanding, crossscene query-by-example, behaviour classification, and video summarisation. Secondly, to avoid extremely costly and ambiguous video annotating, we investigate how to generalise recognition models learned from known categories to novel ones, which is often termed as zero-shot learning. To exploit the limited human supervision, e.g. category names, we construct the semantic space via a word-vector representation trained on large textual corpus in an unsupervised manner. Representation of visual instance in semantic space is obtained by learning a visual-to-semantic mapping. We notice that blindly applying the mapping learned from known categories to novel categories can cause bias and deteriorating the performance which is termed as domain shift. To solve this problem we employed techniques including semisupervised learning, self-training, hubness correction, multi-task learning and domain adaptation. All these methods in combine achieve state-of-the-art performance in zero-shot human action task. In the last, we study the possibility to re-use known and manually labelled semantic crowd attributes to recognise rare and unknown crowd behaviours. This task is termed as zero-shot crowd behaviours recognition. Crucially we point out that given the multi-labelled nature of semantic crowd attributes, zero-shot recognition can be improved by exploiting the co-occurrence between attributes. To summarise, this thesis studies methods for analysing video behaviours and demonstrates that exploring semantic spaces for video analysis is advantageous and more importantly enables multi-scene analysis and zero-shot learning beyond conventional learning strategies.
|
350 |
Modelagem semântica de contexto aplicada em um histórico de alarmes de processoSilva, Márcio José da January 2016 (has links)
Atualmente, os avanços tecnológicos, principalmente nas áreas de controle e automação, facilitam a inclusão de alarmes em sistemas de supervisão de plantas industriais. É possível incluir um número quase que ilimitado de alarmes com variação de tipos para cada ponto de medição de um processo. Consequentemente, o volume de informações cresce significativamente e isso pode ser prejudicial, uma vez que limita a habilidade do operador no gerenciamento de anomalias e pode exceder sua capacidade de realizar ações eficazes durante o funcionamento do processo. Este trabalho apresenta um estudo sobre modelagem semântica de contexto e utiliza uma base histórica de informações de eventos para análise de padrões. Dessa forma, o intuito é, por meio dos dados de contexto, obter conhecimento útil para inferência e determinação da situação. Uma aplicação real onde são investigados eventos ocorridos em uma planta de uma usina térmica de geração de energia elétrica é usado como estudo de caso para aplicar as ideias desenvolvidas bem como para validar a proposta. Decorrente desse estudo, é proposta uma ontologia de domínio específico implementada a partir de um modelo semântico de contexto. Por fim, é apresentada uma implementação de regras semânticas. / Nowadays, technological advance, especially in the areas of control and automation, make it easy alarm inclusion in supervision of industrial plant systems. You can include a number almost unlimited of alarms with different types for each measurement point of a process. Consequently, the volume of information grows significantly and this can be harmful since it limits the ability of the operator in managing anomalies and may exceed its ability to carry out effective actions during operation of the process. This work presents a study of semantic modeling of context and uses historical bases event information to identify patterns. Thus, the intention is use this context data to obtain useful knowledge for inference and define the situation. A real application where events of a thermal power plant for electricity generation are investigated is used as a case study to apply the ideas developed and to validate the proposal. As a Result of this study, it is proposed a specific domain ontology implemented from a semantic model of context. Finally, it is presented an implementation of semantic rules.
|
Page generated in 0.0488 seconds