Global ETD Search

171	Non-asymptotic bounds for prediction problems and density estimation. Minsker, Stanislav 05 July 2012 (has links) This dissertation investigates the learning scenarios where a high-dimensional parameter has to be estimated from a given sample of fixed size, often smaller than the dimension of the problem. The first part answers some open questions for the binary classification problem in the framework of active learning. Given a random couple (X,Y) with unknown distribution P, the goal of binary classification is to predict a label Y based on the observation X. Prediction rule is constructed from a sequence of observations sampled from P. The concept of active learning can be informally characterized as follows: on every iteration, the algorithm is allowed to request a label Y for any instance X which it considers to be the most informative. The contribution of this work consists of two parts: first, we provide the minimax lower bounds for the performance of active learning methods. Second, we propose an active learning algorithm which attains nearly optimal rates over a broad class of underlying distributions and is adaptive with respect to the unknown parameters of the problem. The second part of this thesis is related to sparse recovery in the framework of dictionary learning. Let (X,Y) be a random couple with unknown distribution P. Given a collection of functions H, the goal of dictionary learning is to construct a prediction rule for Y given by a linear combination of the elements of H. The problem is sparse if there exists a good prediction rule that depends on a small number of functions from H. We propose an estimator of the unknown optimal prediction rule based on penalized empirical risk minimization algorithm. We show that the proposed estimator is able to take advantage of the possible sparse structure of the problem by providing probabilistic bounds for its performance. Active learning Sparse recovery Oracle inequality Confidence bands Infinite dictionary Estimation theory Asymptotic theory Estimation theory Distribution (Probability theory) Prediction theory Active learning Algorithms Mathematical optimization Chebyshev approximation
172	Active visual category learning Vijayanarasimhan, Sudheendra 02 June 2011 (has links) Visual recognition research develops algorithms and representations to autonomously recognize visual entities such as objects, actions, and attributes. The traditional protocol involves manually collecting training image examples, annotating them in specific ways, and then learning models to explain the annotated examples. However, this is a rather limited way to transfer human knowledge to visual recognition systems, particularly considering the immense number of visual concepts that are to be learned. I propose new forms of active learning that facilitate large-scale transfer of human knowledge to visual recognition systems in a cost-effective way. The approach is cost-effective in the sense that the division of labor between the machine learner and the human annotators respects any cues regarding which annotations would be easy (or hard) for either party to provide. The approach is large-scale in that it can deal with a large number of annotation types, multiple human annotators, and huge pools of unlabeled data. In particular, I consider three important aspects of the problem: (1) cost-sensitive multi-level active learning, where the expected informativeness of any candidate image annotation is weighed against the predicted cost of obtaining it in order to choose the best annotation at every iteration. (2) budgeted batch active learning, a novel active learning setting that perfectly suits automatic learning from crowd-sourcing services where there are multiple annotators and each annotation task may vary in difficulty. (3) sub-linear time active learning, where one needs to retrieve those points that are most informative to a classifier in time that is sub-linear in the number of unlabeled examples, i.e., without having to exhaustively scan the entire collection. Using the proposed solutions for each aspect, I then demonstrate a complete end-to-end active learning system for scalable, autonomous, online learning of object detectors. The approach provides state-of-the-art recognition and detection results, while using minimal total manual effort. Overall, my work enables recognition systems that continuously improve their knowledge of the world by learning to ask the right questions of human supervisors. / text Artificial intelligence Active learning Object recognition Object detection Cost-sensitive learning Multi-level learning Budgeted learning Large-scale active learning Live learning Machine learning Visual recognition system
173	The Effects Of Activities Based On Role-play On Ninth Grade Students Kucuker (tuncer), Yadikar 01 September 2004 (has links) (PDF) This study intented to investigate the effects of activities based on role-play on ninth grade students&rsquo / achievement and attitudes at simple electric circuits. In this study, Physics Achievement Test was developed to evaluate students&rsquo / achievement on simple electric circuits and role-play activities about simple electric circuits were prepared. In addition, Physics Attitude Scale was administered to explore students&rsquo / attitude towards physics. The present study was conducted at one of the high schools in Acipayam during 2003-2004 Spring Semester with a total number of 104 (51 female and 53 male) 9th students from four classes of two physics teachers. One class of each physics teacher was assigned as experimental and instructed by role-play activities on the other hand the other classes of each physics teacher was as control group and instructed by traditional method. The teachers were trained for how to implement role-play activities in the class before the study began. Physics Attitude Scale and Physics Achievement Tests were applied twice as a pre-test and after a three-week treatment period as a post-test to both groups to assess and compare the effectiveness of two different types of teaching / role-play versus traditional teaching method. Data were collected utilizing Physics Achievement Test and Physics Attitude Scale. Data of this study were analyzed utilizing descriptive and inferential statistics. The scores of the post-tests were analyzed by statistical techniques of Multivariate Analyses of Covariance (MANCOVA). Experimental group compared to control group tended to favor a significant difference in the achievement. However the statistical analysis failed to show any significant differences between the experimental and control groups&rsquo / attitude towards physics at simple electric circuits.
174	How can a science educator incorporate field study into their advanced high school science courses? Apffel, Michael Alexis 01 January 2006 (has links) Organizes information and opportunities for high school level science field work and categorizes it to inform the educator of the field study possibilities. Assists educators in overcoming the obstacles of implementing field science into existing science courses. Several field study lesson plans are provided. Fieldwork (Educational method) Science Fieldwork Active learning Active learning Fieldwork (Educational method) Science and Mathematics Education
175	Deep Active Learning for Image Classification using Different Sampling Strategies Saleh, Shahin January 2021 (has links) Convolutional Neural Networks (CNNs) have been proved to deliver great results in the area of computer vision, however, one fundamental bottleneck with CNNs is the fact that it is heavily dependant on the ground truth, that is, labeled training data. A labeled dataset is a group of samples that have been tagged with one or more labels. In this degree project, we mitigate the data greedy behavior of CNNs by applying deep active learning with various kinds of sampling strategies. The main focus will be on the sampling strategies random sampling, least confidence sampling, margin sampling, entropy sampling, and K- means sampling. We choose to study the random sampling strategy since it will work as a baseline to the other sampling strategies. Moreover, the least confidence sampling, margin sampling, and entropy sampling strategies are uncertainty based sampling strategies, hence, it is interesting to study how they perform in comparison with the geometrical based K- means sampling strategy. These sampling strategies will help to find the most informative/representative samples amongst all unlabeled samples, thus, allowing us to label fewer samples. Furthermore, the benchmark datasets MNIST and CIFAR10 will be used to verify the performance of the various sampling strategies. The performance will be measured in terms of accuracy and less data needed. Lastly, we concluded that by using least confidence sampling and margin sampling we reduced the number of labeled samples by 79.25% in comparison with the random sampling strategy for the MNIST dataset. Moreover, by using entropy sampling we reduced the number of labeled samples by 67.92% for the CIFAR10 dataset. / Faltningsnätverk har visat sig leverera bra resultat inom området datorseende, men en fundamental flaskhals med Faltningsnätverk är det faktum att den är starkt beroende av klassificerade datapunkter. I det här examensarbetet hanterar vi Faltningsnätverkens giriga beteende av klassificerade datapunkter genom att använda deep active learning med olika typer av urvalsstrategier. Huvudfokus kommer ligga på urvalsstrategierna slumpmässigt urval, minst tillförlitlig urval, marginal baserad urval, entropi baserad urval och K- means urval. Vi väljer att studera den slumpmässiga urvalsstrategin eftersom att den kommer användas för att mäta prestandan hos de andra urvalsstrategierna. Dessutom valde vi urvalsstrategierna minst tillförlitlig urval, marginal baserad urval, entropi baserad urval eftersom att dessa är osäkerhetsbaserade strategier som är intressanta att jämföra med den geometribaserade strategin K- means. Dessa urvalsstrategier hjälper till att hitta de mest informativa/representativa datapunkter bland alla oklassificerade datapunkter, vilket gör att vi behöver klassificera färre datapunkter. Vidare kommer standard dastaseten MNIST och CIFAR10 att användas för att verifiera prestandan för de olika urvalsstrategierna. Slutligen drog vi slutsatsen att genom att använda minst tillförlitlig urval och marginal baserad urval minskade vi mängden klassificerade datapunkter med 79, 25%, i jämförelse med den slumpmässiga urvalsstrategin, för MNIST- datasetet. Dessutom minskade vi mängden klassificerade datapunkter med 67, 92% med hjälp av entropi baserad urval för CIFAR10datasetet. Convolutional Neural Network Deep Active Learning Deep Learning Image Classification Sampling Strategies SemiSupervised Learning. Bildklassificering Faltningsnätverk Deep Active Learning Djupinlärning Semiövervakat lärande Urvalsstrategier. Computer and Information Sciences Data- och informationsvetenskap
176	Analyzing the performance of active learning strategies on machine learning problems Werner, Vendela January 2023 (has links) Digitalisation within industries is rapidly advancing and data possibilities are growing daily. Machine learning models need a large amount of data that are well-annotated for good performance. To get well-annotated data, an expert is needed, which is expensive, and the annotation itself could be very time-consuming. The performance of machine learning models is dependent on the size of the data set since a large amount of annotation is required for a good performance. Active learning has emerged as a solution to increase the size of the data through selective annotation. Instead of labelling data points at random, active learning strategies can be used to select data points based on informativeness or uncertainty. The challenge lies in determining the most effective active learning strategy for a combination of machine learning model and problem type. Although active learning has been around for a while, benchmarking strategies have not widely been explored. The aim of the thesis was to benchmark different AL strategies and analyse their performance on underlying ML problems and ML methods/models. For this purpose, an experiment was constructed to, in an unbiased way, compare different machine learning models in combination with different active learning strategies within the areas of computer vision, drug discovery, and natural language processing. Nine different active learning strategies were analysed in the thesis, with a random strategy working as the baseline, tested on six different machine learning methods/models. The result of this thesis was that active learning had a positive effect within all problem areas and especially worked well for unbalanced data. The two main conclusions are that all active learning strategies work better for a smaller budget due to the importance of selecting informative data points and that prediction-based strategies are the most successful for all problem types. / Föreställ dig möjligheten att ha ett verktyg för att bota en genetisk sjukdom. Idag finns data överallt, även ditt DNA anses vara fullt av värdefull information och mysterier redo att utforskas. I våra data finns det oändliga kopplingar och dolda relationer som inte ens det bästa mänskliga sinnet kan hitta och datorkraft har blivit en styrka att räkna med. Ett vinnande koncept har visat sig vara human-in-the-loop-programmering, där människa och dator arbetar tillsammans. Detta kallas inom maskininlärning för supervised learn- ing. Normalt sett kräver supervised learning en stor mängd data, och för mer komplexa uppgifter, en expert då feedback från en människa förväntas. Man kan se datorn som en detektiv och experten som dennes chef som pekar i rätt riktning. Riktningen pekas ut genom annotering av data, man berättar för datorn vilket svar som är rätt så att den lär sig ta ut särdrag. Exempelvis om man vill ha ett program som skiljer på hund från katt så kan det vara svårt att veta vad som är vad om man aldrig har sett ett djur innan. Båda har två öron, två ögon, fyra ben, och i många fall, även päls. En människa kan då berätta för datorn om det är en hund eller katt som syns på bilden och datorn kommer då börja lära sig se mönster och se utmärkande egenskaper. Att annotera data tar både lång tid och kostar mycket pengar. Vad gör man egentligen när mängden data är för liten, och/eller kostnaden för en expert blir för stor? Sam är en person med en sällsynt genetisk sjukdom. De har hört talas om ett program som bygger på supervised learning som kan ge förslag på vilken medicinsk behandling de kan pröva för att lindra sina symtom. På grund av den unika genetiska sjukdom som Sam har så finns det inte mycket data om detta, vilket gör att programvaran inte kommer fungera i Sams fall. Kom ihåg att supervised learning behöver mycket data som är väl annoterad för att ge pålitlig utdata. Hur ska programmeraren kunna hjälpa Sam? Med active learning såklart! Active learning är ett samlingsnamn för olika strategier som selekterar de mest informativa, eller osäkra datapunkterna att annotera. I stället för att exempelvis göra 2000 annoteringar kan en bättre prestanda åstadkommas med enbart 100. Skillnaden ligger i att det under supervised learning utan active learn- ing presenteras en färdig uppsättning av punkter för experten att annotera. Med active learning sker en interaktion för att välja ut punkter för annotering. Detta resulterar i en mer kostnadseffektiv inlärning som även presterar bra på ett litet data set. Detta exjobb har studerat prestationen av active learning inom läkemedelsbranschen och även prob- lem inom datorseende och språkteknologi. Resultatet gav att minst en av de applicerade active learning strategierna ledde till en förbättrad prestanda inom samtliga områden. Kanske kan vi i framtiden faktiskt använda active learning till att hjälpa personer som Sam och ha verktyget för att lösa mysteriet och bota dennes genetiska sjukdom. computer science bioinformatics machine learning active learning artificial intelligence supervised learning Astrazeneca maskininlärning artificiell intelligens datorvetenskap active learning bioinformatik supervised learning Computer Engineering Datorteknik
177	Kompetenscenter : En genomlysning av Kompetenscenters digitala klassrum Dadayan, Tatevik, Englöv, Alice January 2023 (has links) Denna studie undersöker användningen av active learning i Kompetenscenters digitala klassrum med fokus på områden som kreativt klimat, motivation och digital undervisning. Kompetenscenter är en kommunal vuxenutbildning som ligger i Köping. Syftet med studien är att förbättra studenternas engagemang i distansstudier och skapa ett mer tillgängligt och inkluderande klimat i det digitala klassrummet. Innovationsbidraget ligger i att kunna skapa bättre förutsättningar för studenterna i den digitala miljön med hjälp av lärometoden active learning. Den valda forskningsmetoden är ett kvalitativt angreppssätt med en fallstudiedesign. Forskarna har utgått från en abduktiv ansats då forskarna kontinuerligt har jämfört ny empiri med teori. Som datainsamlingsmetoder har forskarna använt sig utav sju semistrukturerade intervjuer med sju olika respondenter från Kompetenscenter. Den genomförda analysmetoden är tematisk analys av empiriska data, forskarna har kodat insamlade data för att identifiera teman. Genom lärometoden active learning har studien tagit fram riktlinjer för Kompetenscenter. Implementeringen av dessa riktlinjer kommer hjälpa Kompetenscenter att skapa ett kreativt klimat som är tillgängligt och inkluderande för studenterna. / This study investigates the use of active learning in the Competence Center’s digital classroom with a focus on areas such as creative climate, motivation, and digital teaching. Competence Center is a municipal adult education located in Köping. The aim of this study is to improve student engagement in distance learning and create a more accessible and inclusive climate in the digital classroom. The innovation contribution lies in being able to create better conditions for the students in the digital environment with the help of the teaching method active learning. The chosen research method is a qualitative approach with a case study design. The researchers have used an abductive approach, in which case the researchers have continuously compared new empirical evidence with theory. As data collection methods, the researchers have used seven semi-structured interviews with seven different respondents from Competence Center. The analysis method carried out is thematic analysis of empirical data, the researchers have coded the collected data to identify themes. Through the teaching method active learning, the study has produced guidelines for Competence Center. The implementation of these guidelines will help Competence Center to create a creative climate that is accessible and inclusive for the students in the digital classroom. Active learning motivation creative climate digital studies Active learning motivation kreativt klimat digitala studier Övrig annan teknik
178	Practical Cost-Conscious Active Learning for Data Annotation in Annotator-Initiated Environments Haertel, Robbie A. 12 August 2013 (has links) (PDF) Many projects exist whose purpose is to augment raw data with annotations that increase the usefulness of the data. The number of these projects is rapidly growing and in the age of “big data” the amount of data to be annotated is likewise growing within each project. One common use of such data is in supervised machine learning, which requires labeled data to train a predictive model. Annotation is often a very expensive proposition, particularly for structured data. The purpose of this dissertation is to explore methods of reducing the cost of creating such data sets, including annotated text corpora.We focus on active learning to address the annotation problem. Active learning employs models trained using machine learning to identify instances in the data that are most informative and least costly. We introduce novel techniques for adapting vanilla active learning to situations wherein data instances are of varying benefit and cost, annotators request work “on-demand,” and there are multiple, fallible annotators of differing levels of accuracy and cost. In order to account for data instances of varying cost, we build a model of cost from real annotation data based on a user study. We also introduce a novel cost-conscious active learning algorithm which we call return-on-investment, that selects instances for annotation that contain the most benefit per unit cost. To address the issue of annotators that request instances “on-demand,” we develop a parallel, “no-wait” framework that performs computation while the annotator is annotating. As a result, annotators need not wait for the computer to determine the best instance for them to annotate—a common problem with existing approaches. Finally, we introduce a Bayesian model designed to simultaneously infer ground truth annotations from noisy annotations, infer each individual annotators accuracy, and predict its own accuracy on unseen data, without the use of a held-out set. We extend ROI-based active learning and our annotation framework to handle multiple annotators using this model. As a whole, our work shows that the techniques introduced in this dissertation reduce the cost of annotation in scenarios that are more true-to-life than previous research. active learning cost-sensitive learning machine learning return-on-investment Bayesian models parallel active learning natural language processing part-of-speech tagging Computer Sciences
179	[pt] ESTRATÉGIAS PARA OTIMIZAR PROCESSOS DE ANOTAÇÃO E GERAÇÃO DE DATASETS DE SEGMENTAÇÃO SEMÂNTICA EM IMAGENS DE MAMOGRAFIA / [en] STRATEGIES TO OPTIMIZE ANNOTATION PROCESSES AND GENERATION OF SEMANTIC SEGMENTATION DATASETS IN MAMMOGRAPHY IMAGES BRUNO YUSUKE KITABAYASHI 17 November 2022 (has links) [pt] Com o avanço recente do uso de aprendizagem profunda supervisionada (supervised deep learning) em aplicações no ramo da visão computacional, a indústria e a comunidade acadêmica vêm evidenciando que uma das principais dificuldades para o sucesso destas aplicações é a falta de datasets com a suficiente quantidade de dados anotados. Nesse sentido aponta-se a necessidade de alavancar grandes quantidades de dados rotulados para que estes modelos inteligentes possam solucionar problemas pertinentes ao seu contexto para atingir os resultados desejados. O uso de técnicas para gerar dados anotados de maneira mais eficiente está sendo cada vez mais explorado, juntamente com técnicas para o apoio à geração dos datasets que servem de insumos para o treinamento dos modelos de inteligência artificial. Este trabalho tem como propósito propor estratégias para otimizar processos de anotação e geração de datasets de segmentação semântica. Dentre as abordagens utilizadas neste trabalho destacamos o Interactive Segmentation e Active Learning. A primeira, tenta melhorar o processo de anotação de dados, tornando-o mais eficiente e eficaz do ponto de vista do anotador ou especialista responsável pela rotulagem dos dados com uso de um modelo de segmentação semântica que tenta imitar as anotações feitas pelo anotador. A segunda, consiste em uma abordagem que permite consolidar um modelo deep learning utilizando um critério inteligente, visando a seleção de dados não anotados mais informativos para o treinamento do modelo a partir de uma função de aquisição que se baseia na estimação de incerteza da rede para realizar a filtragem desses dados. Para aplicar e validar os resultados de ambas as técnicas, o trabalho os incorpora em um caso de uso relacionado em imagens de mamografia para segmentação de estruturas anatômicas. / [en] With the recent advancement of the use of supervised deep learning in applications in the field of computer vision, the industry and the academic community have been showing that one of the main difficulties for the success of these applications is the lack of datasets with a sufficient amount of annotated data. In this sense, there is a need to leverage large amounts of labeled data so that these intelligent models can solve problems relevant to their context to achieve the desired results. The use of techniques to generate annotated data more efficiently is being increasingly explored, together with techniques to support the generation of datasets that serve as inputs for the training of artificial intelligence models. This work aims to propose strategies to optimize annotation processes and generation of semantic segmentation datasets. Among the approaches used in this work, we highlight Interactive Segmentation and Active Learning. The first one tries to improve the data annotation process, making it more efficient and effective from the point of view of the annotator or specialist responsible for labeling the data using a semantic segmentation model that tries to imitate the annotations made by the annotator. The second consists of an approach that allows consolidating a deep learning model using an intelligent criterion, aiming at the selection of more informative unannotated data for training the model from an acquisition function that is based on the uncertainty estimation of the network to filter these data. To apply and validate the results of both techniques, the work incorporates them in a use case in mammography images for segmentation of anatomical structures. [pt] APRENDIZADO PROFUNDO [pt] IMAGENS DE MAMOGRAFIA [pt] SEGMENTACAO SEMANTICA [pt] ANOTACOES ASSISTIDAS [pt] ACTIVE LEARNING [en] DEEP LEARNING [en] MAMMOGRAPHY IMAGES [en] SEMANTIC SEGMENTATION [en] ASSISTED LABELLING [en] ACTIVE LEARNING
180	Active learning for text classification in cyber security / Aktiv inlärning för textklassificering i cyberdomänen Carp, Amanda January 2023 (has links) In the domain of cyber security, machine learning promises advanced threat detection. However, the volume of available unlabeled data poses challenges for efficient data management. This study investigates the potential for active learning, a subset of interactive machine learning, to reduce the effort required for manual data labelling. Through different query strategies, the most informative unlabeled data points were selected for manual labelling. The performance of different query strategies was assessed by testing a transformer model’s ability to accurately distinguish tweets mentioning names of advanced persistent threats. The findings suggest that the K-means diversity-based query strategy outperformed both the uncertainty-based approach and the random data point selection, when the amount of labelled training data was limited. This study also evaluated the cost-effective active learning approach, which incorporates high-confidence data points into the training dataset. However, this was shown to be the least effective strategy. Lastly, the study acknowledges that the computational time taken for each query strategy varies significantly between strategies. Hence, an optimal query strategy selection requires a balanced consideration of F-score performance taken together with time efficiency. / Maskininlärning skulle kunna användas för avancerad hotdetektion i cyberdomänen. Dock utgör behovet av träningsdata tillsammans med den stora tillgången till oannoterad data en utmaning. Detta arbete undersöker huruvida aktiv inlärning, en delmängd av interaktiv maskininlärning, kan minska behovet av annoterad data. Genom olika frågestrategier valdes de mest informativa datapunkterna ut för mänsklig annotering. Resultaten för de olika frågestrategierna utvärderades sedan genom att testa en maskininlärningsmodells förmåga att korrekt urskilja tweets som innehåller namn på cyberhotsaktörer. Resultaten tyder på att när mängden annoterad data var begränsad, presterade den diversifieringsbaserade strategin K-means bättre än både den osäkerhetsbaserade frågestrategin och strategin som väljer ut datapunkter slumpmässigt. Denna studie utvärderade också kostnadseffektiv aktiv inlärning som lägger till datapunkter som modellen redan är relativt säker på till träningsdatamängden. Denna metod visade sig dock vara den minst effektiva strategin. Slutligen visar arbetet att beräkningstiden som krävs för varje frågestrategi varierar avsevärt. För att utse den mest optimala frågestrategin krävs därför ett övervägande av både prestanda och tidsåtgång. Interactive machine learning Active learning Cost-effective active learning Cyber environment Interaktiv maskininlärning Aktiv inlärning Kostnadseffektiv aktiv inlärning Cyberdomänen Computer and Information Sciences Data- och informationsvetenskap

Search results