Spelling suggestions: "subject:"classifiers"" "subject:"lassifiers""
21 |
What good is realism about 'natural kinds'?Creţu, Ana-Maria January 2018 (has links)
Classifications are useful and efficient. We group things into kinds to facilitate the acquisition and transmission of important, often tacit, information about a particular entity qua member of some kind. Whilst it is universally acknowledged that classifications are useful, some scientific classifications (e.g. chemical elements) are held to higher epistemic standards than folk classifications (e.g. bugs). Scientific classifications in terms of 'natural kinds' are considered to be more reliable and successful because they are highly projectible and support law-like and inductive generalisations. What counts as a natural kind is, however, controversial: according to essentialists (e.g. Putnam, Kripke, Ellis) natural kinds are mind-independent and possess essential characteristics; according to promiscuous realists (e.g. Dupre ) there are 'countless legitimate, objectively grounded ways of classifying objects in the world'; and according to scientific realists (e.g. Boyd, Psillos) natural kinds are grounded in the 'causal structure of the world'. More specifically, realism about kinds can be understood as a commitment to the existence of natural divisions (kinds) in the world that we come to know as a result of mature scientific investigation into the nature of such kinds. Realism about natural kinds is supported and articulated in terms of three main arguments, metaphysical, semantical, and epistemological. In the first part of my thesis I offer a sustained and systematic investigation of these three main arguments, with their respective promises and prospects for the viability of realism about kinds and I find them wanting, whilst in the second part of the thesis I pursue an unexplored line of inquiry regarding natural kinds and propose a mild realism about natural kinds via the ontology of real patterns.
|
22 |
Pulsar Search Using Supervised Machine LearningFord, John M. 01 January 2017 (has links)
Pulsars are rapidly rotating neutron stars which emit a strong beam of energy through mechanisms that are not entirely clear to physicists. These very dense stars are used by astrophysicists to study many basic physical phenomena, such as the behavior of plasmas in extremely dense environments, behavior of pulsar-black hole pairs, and tests of general relativity. Many of these tasks require information to answer the scientific questions posed by physicists. In order to provide more pulsars to study, there are several large-scale pulsar surveys underway, which are generating a huge backlog of unprocessed data. Searching for pulsars is a very labor-intensive process, currently requiring skilled people to examine and interpret plots of data output by analysis programs. An automated system for screening the plots will speed up the search for pulsars by a very large factor. Research to date on using machine learning and pattern recognition has not yielded a completely satisfactory system, as systems with the desired near 100% recall have false positive rates that are higher than desired, causing more manual labor in the classification of pulsars. This work proposed to research, identify, propose and develop methods to overcome the barriers to building an improved classification system with a false positive rate of less than 1% and a recall of near 100% that will be useful for the current and next generation of large pulsar surveys. The results show that it is possible to generate classifiers that perform as needed from the available training data. While a false positive rate of 1% was not reached, recall of over 99% was achieved with a false positive rate of less than 2%. Methods of mitigating the imbalanced training and test data were explored and found to be highly effective in enhancing classification accuracy.
|
23 |
"Combinação de classificadores simbólicos para melhorar o poder preditivo e descritivo de Ensembles" / Combination of symbolic classifiers to improve predictive and descriptive power of ensemblesBernardini, Flávia Cristina 17 May 2002 (has links)
A qualidade das hipóteses induzidas pelos atuais sistemas de Aprendizado de Máquina depende principalmente da quantidade e da qualidade dos atributos e exemplos utilizados no treinamento. Freqüentemente, resultados experimentais obtidos sobre grandes bases de dados, que possuem muitos atributos irrelevantes, resultam em hipóteses de baixa precisão. Por outro lado, muitos dos sistemas de aprendizado de máquina conhecidos não estão preparados para trabalhar com uma quantidade muito grande de exemplos. Assim, uma das áreas de pesquisa mais ativas em aprendizado de máquina tem girado em torno de técnicas que sejam capazes de ampliar a capacidade dos algoritmos de aprendizado para processar muitos exemplos de treinamento, atributos e classes. Para que conceitos sejam aprendidos a partir de grandes bases de dados utilizando Aprendizado de Máquina, pode-se utilizar duas abordagens. A primeira realiza uma seleção de exemplos e atributos mais relevantes, e a segunda ´e a abordagem de ensembles. Um ensemble ´e um conjunto de classificadores cujas decisões individuais são combinadas de alguma forma para classificar um novo caso. Ainda que ensembles classifiquem novos exemplos melhor que cada classificador individual, eles se comportam como caixas pretas, no sentido de nao oferecer ao usuário alguma explicação relacionada à classificação por eles fornecida. O objetivo deste trabalho é propor uma forma de combinação de classificadores simbólicos, ou seja, classificadores induzidos por algoritmos de AM simbólicos, nos quais o conhecimento é descrito na forma de regras if-then ou equivalentes, para se trabalhar com grandes bases de dados. A nossa proposta é a seguinte: dada uma grande base de dados, divide-se esta base aleatoriamente em pequenas bases de tal forma que é viável fornecer essas bases de tamanho menor a um ou vários algoritmos de AM simbólicos. Logo após, as regras que constituem os classificadores induzidos por esses algoritmos são combinadas em um único classificador. Para analisar a viabilidade do objetivo proposto, foi implementado um sistema na linguagem de programação lógica Prolog, com a finalidade de (a) avaliar regras de conhecimento induzidas por algoritmos de Aprendizado de Máquina simbólico e (b) avaliar diversas formas de combinar classificadores simbólicos bem como explicar a classificação de novos exemplos realizada por um ensemble de classificares simbólicos. A finalidade (a) é implementada pelo Módulo de Análise de Regras e a finalidade (b) pelo Módulo de Combinação e Explicação. Esses módulos constituem os módulos principais do RuleSystem. Neste trabalho, são descritos os métodos de construção de ensembles e de combinação de classificadores encontrados na literatura, o projeto e a documentação do RuleSystem, a metodologia desenvolvida para documentar o sistema RuleSystem, a implementação do Módulo de Combinação e Explicação, objeto de estudo deste trabalho, e duas aplicações do Módulo de Combinação e Explicação. A primeira aplicação utilizou uma base de dados artificiais, a qual nos permitiu observar necessidades de modificações no Módulo de Combinação e Explicação. A segunda aplicação utilizou uma base de dados reais. / The hypothesis quality induced by current machine learning algorithms depends mainly on the quantity and quality of features and examples used in the training phase. Frequently, hypothesis with low precision are obtained in experiments using large databases with a large number of irrelevant features. Thus, one active research area in machine learning is to investigate techniques able to extend the capacity of machine learning algorithms to process a large number of examples, features and classes. To learn concepts from large databases using machine learning algorithms, two approaches can be used. The first approach is based on a selection of relevant features and examples, and the second one is the ensemble approach. An ensemble is a set of classifiers whose individual decisions are combined in some way to classify a new case. Although ensembles classify new examples better than each individual classifier, they behave like black-boxes, since they do not offer any explanation to the user about their classification. The purpose of this work is to consider a form of symbolic classifiers combination to work with large databases. Given a large database, it is equally divided randomly in small databases. These small databases are supplied to one or more symbolic machine learning algorithms. After that, the rules from the resulting classifiers are combined into one classifier. To analise the viability of this proposal, was implemented a system in logic programming language Prolog, called RuleSystem. This system has two purposes; the first one, implemented by the Rule Analises Module, is to evaluate rules induced by symbolic machine learning algorithms; the second one, implemented by the Combination and Explanation Module, is to evaluate several forms of combining symbolic classifiers as well as to explain ensembled classification of new examples. Both principal modules constitute the Rule System. This work describes ensemble construction methods and combination of classifiers methods found in the literature; the project and documentation of RuleSystem; the methodology developed to document the RuleSystem; and the implementation of the Combination and Explanation Module. Two different case studies using the Combination and Explanation Module are described. The first case study uses an artificial database. Through the use of this artificial database, it was possible to improve several of the heuristics used by the the Combination and Explanation Module. A real database was used in the second case study.
|
24 |
準確界定漢語中分類詞 / Identifying true classifiers in Mandarin Chinese賴宛君, Lai, Wan Chun Unknown Date (has links)
漢語分類詞數量之歧異現象起因於未有一套共同界定分類詞之準則。因此,本篇論文採用四個以語言學為基礎之準則重新檢視漢語分類詞,並在眾多漢語分類詞分類中,採用五個語言學代表性研究提出之漢語分類詞分類為本篇語料來源。
研究分析之目的在於透過四個以語言學為基礎之準則重新檢視五個代表性人物提出之漢語分類詞分類,並使用二個數學法及一個問卷實驗法找出準確的漢語分類詞。最後,分析所得之準確的漢語分類詞再根據國語日報量詞典列出之分類詞語意做更進一步的語意分類。在分類詞語意分類上,本篇論文採用下到上之方向做分類詞語意分類而非傳統上到下之方向,提供完整且精確之漢語分類詞語意分類。 / The discrepancy in the different inventories of Mandarin Chinese classifiers results from there being no identical and consentient tests to identify Mandarin Chinese classifiers. Thus, this thesis adopts four linguistic-based tests as norms to identify Mandarin Chinese classifiers and five Mandarin Chinese classifier categorizations proposed by representative studies (Chao 1968, Erbaugh 1986, Hu 1993, Huang et. al. 1997 and Malt and Gao 2009) as sources of data in Mandarin Chinese classifier categorizations.
The data analysis focuses on offering true classifiers in Mandarin Chinese through re-classifying five Mandarin Chinese classifier categorizations on the basis of four linguistic-based tests, applying two mathematical methods and using a questionnaire experiment. Ultimately, true classifiers will be further classified on the basis of their semantic meanings from the Mandarin Daily Dictionary of Chinese Classifiers (Huang et. al.) to provide an explicit semantic categorization in a bottom-up form, rather than a traditional top-down one.
|
25 |
Induction of Classifiers from Multi-labeled Examples: an Information-retrieval Point of ViewSarinnapakorn, Kanoksri 21 December 2007 (has links)
An important task of information retrieval is to induce classifiers capable of categorizing text documents. The fact that the same document can simultaneously belong to two or more categories is referred by the term multi-label classification (or categorization). Domains of this kind have been encountered in diverse fields even outside information retrieval. This dissertation discusses one challenging aspect of text categorization: the documents (i.e., training examples) are characterized by an extremely large number of features. As a result, many existing machine learning techniques are in such domains prohibitively expensive. This dissertation seeks to reduce these costs significantly. The proposed scheme consists of two steps. The first runs a so-called baseline induction algorithm (BIA) separately on different versions of the data, each time inducing a different subclassifier---more specifically, BIA is run always on the same training documents that are each time described by a different subset of the features. The second step then combines the subclassifiers by a fusion algorithm: when a document is to be classified, each subclassifier outputs a set of class labels accompanied by its confidence in these labels; these outputs are then combined into a single multi-label recommendation. The dissertation investigates a few alternative fusion techniques, including an original one, inspired by the Dempster-Shafer Theory. The main contribution is a mechanism for assigning the mass function to individual labels from subclassifiers. The system's behavior is illustrated on two real-world data sets. As indicated, in each of them the examples are described by thousands of features, and each example is labeled with a subset of classes. Experimental evidence indicates that the method can scale up well and achieves impressive computational savings in exchange for only a modest loss in the classification performance. The fusion method proposed is also shown to be more accurate than other more traditional fusion mechanisms. For a very large multi-label data set, the proposed mechanism not only speeds up the total induction time, but also facilitates the execution of the task on a small computer. The fact that subclassifiers can be constructed independently and more conveniently from small subsets of features provides an avenue for parallel processing that might offer further increase in computational efficiency.
|
26 |
Effects of Teacher-mediated Repeated Viewings of Stories in American Sign Language on Classifier Production of Students who are Deaf or Hard of HearingBEAL-ALVAREZ, Jennifer 11 May 2012 (has links)
Students who are deaf and use sign language frequently have language delays that affect their literacy skills. Students who use American Sign Language (ASL) often lack fluent language models in both the home and school settings, delaying both the development of a first language and the development of literacy in printed English. Mediated and scaffolded instruction presented by a More Knowledgeable Other (MKO; Vygotsky, 1978, 1994) may facilitate acquisition of a first foundational language. Repeated viewings of fluent ASL models on DVDs paired with adult mediation has resulted in increases in vocabulary skills for DHH students who used ASL (Cannon, Fredrick, & Easterbrooks, 2010; Golos, 2010; Mueller & Hurtig, 2010). Classifiers are a syntactic sub-category of ASL vocabulary that provides a critical link between ASL and the meaning of English phrases. The purpose of this study was to investigate the effects of teacher-mediated repeated viewings of ASL stories on DHH students’ classifier production during narrative retells. This study included 10 student participants in second, third, and fourth grades and three teacher participants from an urban day school for students who are DHH. The researcher used a multiple baseline across participants design followed by visual analysis and calculation of the percentage of non-overlapping data (PND; Scruggs, Mastropieri, & Casto, 1987) to examine the effects of the intervention. All students increased their classifier production during narrative retells following a combination of teacher mediation paired with repeated viewings of ASL models.
|
27 |
Some aspects of the ontological development of nominal classifiers in CantonesePoon, Yuen-wai, Emma., 潘婉蕙. January 1981 (has links)
published_or_final_version / Language Studies / Master / Master of Arts
|
28 |
Closing the gap in WSD : supervised results with unsupervised methodsBrody, Samuel January 2009 (has links)
Word-Sense Disambiguation (WSD), holds promise for many NLP applications requiring broad-coverage language understanding, such as summarization (Barzilay and Elhadad, 1997) and question answering (Ramakrishnan et al., 2003). Recent studies have also shown that WSD can benefit machine translation (Vickrey et al., 2005) and information retrieval (Stokoe, 2005). Much work has focused on the computational treatment of sense ambiguity, primarily using data-driven methods. The most accurate WSD systems to date are supervised and rely on the availability of sense-labeled training data. This restriction poses a significant barrier to widespread use of WSD in practice, since such data is extremely expensive to acquire for new languages and domains. Unsupervised WSD holds the key to enable such application, as it does not require sense-labeled data. However, unsupervised methods fall far behind supervised ones in terms of accuracy and ease of use. In this thesis we explore the reasons for this, and present solutions to remedy this situation. We hypothesize that one of the main problems with unsupervised WSD is its lack of a standard formulation and general purpose tools common to supervised methods. As a first step, we examine existing approaches to unsupervised WSD, with the aim of detecting independent principles that can be utilized in a general framework. We investigate ways of leveraging the diversity of existing methods, using ensembles, a common tool in the supervised learning framework. This approach allows us to achieve accuracy beyond that of the individual methods, without need for extensive modification of the underlying systems. Our examination of existing unsupervised approaches highlights the importance of using the predominant sense in case of uncertainty, and the effectiveness of statistical similarity methods as a tool for WSD. However, it also serves to emphasize the need for a way to merge and combine learning elements, and the potential of a supervised-style approach to the problem. Relying on existing methods does not take full advantage of the insights gained from the supervised framework. We therefore present an unsupervised WSD system which circumvents the question of actual disambiguation method, which is the main source of discrepancy in unsupervised WSD, and deals directly with the data. Our method uses statistical and semantic similarity measures to produce labeled training data in a completely unsupervised fashion. This allows the training and use of any standard supervised classifier for the actual disambiguation. Classifiers trained with our method significantly outperform those using other methods of data generation, and represent a big step in bridging the accuracy gap between supervised and unsupervised methods. Finally, we address a major drawback of classical unsupervised systems – their reliance on a fixed sense inventory and lexical resources. This dependence represents a substantial setback for unsupervised methods in cases where such resources are unavailable. Unfortunately, these are exactly the areas in which unsupervised methods are most needed. Unsupervised sense-discrimination, which does not share those restrictions, presents a promising solution to the problem. We therefore develop an unsupervised sense discrimination system. We base our system on a well-studied probabilistic generative model, Latent Dirichlet Allocation (Blei et al., 2003), which has many of the advantages of supervised frameworks. The model’s probabilistic nature lends itself to easy combination and extension, and its generative aspect is well suited to linguistic tasks. Our model achieves state-of-the-art performance on the unsupervised sense induction task, while remaining independent of any fixed sense inventory, and thus represents a fully unsupervised, general purpose, WSD tool.
|
29 |
Effects of Teacher-mediated Repeated Viewings of Stories in American Sign Language on Classifier Production of Students who are Deaf or Hard of HearingBeal-Alvarez, Jennifer 11 May 2012 (has links)
Students who are deaf and use sign language frequently have language delays that affect their literacy skills. Students who use American Sign Language (ASL) often lack fluent language models in both the home and school settings, delaying both the development of a first language and the development of literacy in printed English. Mediated and scaffolded instruction presented by a More Knowledgeable Other (MKO; Vygotsky, 1978, 1994) may facilitate acquisition of a first foundational language. Repeated viewings of fluent ASL models on DVDs paired with adult mediation has resulted in increases in vocabulary skills for DHH students who used ASL (Cannon, Fredrick, & Easterbrooks, 2010; Golos, 2010; Mueller & Hurtig, 2010). Classifiers are a syntactic sub-category of ASL vocabulary that provides a critical link between ASL and the meaning of English phrases. The purpose of this study was to investigate the effects of teacher-mediated repeated viewings of ASL stories on DHH students’ classifier production during narrative retells. This study included 10 student participants in second, third, and fourth grades and three teacher participants from an urban day school for students who are DHH. The researcher used a multiple baseline across participants design followed by visual analysis and calculation of the percentage of non-overlapping data (PND; Scruggs, Mastropieri, & Casto, 1987) to examine the effects of the intervention. All students increased their classifier production during narrative retells following a combination of teacher mediation paired with repeated viewings of ASL models.
|
30 |
Boosting a Biologically Inspired Local Descriptor for Geometry-free Face and Full Multi-view 3D Object RecognitionYokono, Jerry Jun, Poggio, Tomaso 07 July 2005 (has links)
Object recognition systems relying on local descriptors are increasingly used because of their perceived robustness with respect to occlusions and to global geometrical deformations. Descriptors of this type -- based on a set of oriented Gaussian derivative filters -- are used in our recognition system. In this paper, we explore a multi-view 3D object recognition system that does not use explicit geometrical information. The basic idea is to find discriminant features to describe an object across different views. A boosting procedure is used to select features out of a large feature pool of local features collected from the positive training examples. We describe experiments on face images with excellent recognition rate.
|
Page generated in 0.0399 seconds