Global ETD Search

11	Deep Learning-Enabled Multitask System for Exercise Recognition and Counting Yu, Qingtian 17 September 2021 (has links) Exercise is a prevailing topic in modern society as more people are pursuing a healthy lifestyle. Physical activities provide unimaginable benefits to human well-being from the inside out. 2D human pose estimation, action recognition and repetitive counting fields developed rapidly in the past several years. However, few works combined them together as a whole system to assist people in evaluating body poses, recognizing exercises and counting repetitive actions. The existing methods estimate pose positions first, and utilize human joints locations in the other two tasks. In this thesis, we propose a multitask system covering the three domains. Different from the methodology used in the literature, heatmaps which are the byproducts of 2D human pose estimation models are adopted for exercise recognition and counting. Recent heatmap processing methods are proven effective in extracting dynamic body pose information. Inspired by this, we propose a new deep-learning multitask model of exercise recognition & repetition counting, and apply these approaches to the multitask for the first time. To meet the needs of the multitask model, we create a new dataset Rep-Penn with action, counting and speed labels. A two-stage training strategy is applied in the training process. Our multitask system can estimate human pose, identify physical activities and count repeated motions. We achieved 95.69% accuracy in exercise recognition on Rep-Penn dataset. The multitask model also performed well in repetitive counting with 0.004 Mean Average Error (MAE) and 0.997 Off-By-One (OBO) accuracy on Rep-Penn dataset. Compared with existing frameworks, our method obtained state-of-the-art results. exercise multitask system heatmap Rep-Penn dataset
12	Leveraging Defects Life-Cycle for Labeling Defective Classes Vandehei, Bailey R 01 December 2019 (has links) (PDF) Data from software repositories are a very useful asset to building dierent kinds of models and recommender systems aimed to support software developers. Specically, the identication of likely defect-prone les (i.e., classes in Object-Oriented systems) helps in prioritizing, testing, and analysis activities. This work focuses on automated methods for labeling a class in a version as defective or not. The most used methods for automated class labeling belong to the SZZ family and fail in various circum- stances. Thus, recent studies suggest the use of aect version (AV) as provided by developers and available in the issue tracker such as JIRA. However, in many cir- cumstances, the AV might not be used because it is unavailable or inconsistent. The aim of this study is twofold: 1) to measure the AV availability and consistency in open-source projects, 2) to propose, evaluate, and compare to SZZ, a new method for labeling defective classes which is based on the idea that defects have a stable life-cycle in terms of proportion of versions needed to discover the defect and to x the defect. Results related to 212 open-source projects from the Apache ecosystem, featuring a total of about 125,000 defects, show that the AV cannot be used in the majority (51%) of defects. Therefore, it is important to investigate automated meth- ods for labeling defective classes. Results related to 76 open-source projects from the Apache ecosystem, featuring a total of about 6,250,000 classes that are are aected by 60,000 defects and spread over 4,000 versions and 760,000 commits, show that the proposed method for labeling defective classes is, in average among projects and de- fects, more accurate, in terms of Precision, Kappa, F1 and MCC than all previously proposed SZZ methods. Moreover, the improvement in accuracy from combining SZZ with defects life-cycle information is statistically signicant but practically irrelevant ( overall and in average, more accurate via defects' life-cycle than any SZZ method. affect version defect prediction dataset Software Engineering
13	Can data in optometric practice be used to provide an evidence base for ophthalmic public health? Slade, S.V., Davey, Christopher J., Shickle, D. 19 May 2016 (has links) Yes / Purpose: The purpose of this paper is to investigate the potential of using primary care optometry data to support ophthalmic public health, research and policy making. Methods: Suppliers of optometric electronic patient record systems (EPRs) were interviewed to gather information about the data present in commercial software programmes and the feasibility of data extraction. Researchers were presented with a list of metrics that might be included in an optometric practice dataset via a survey circulated by email to 102 researchers known to have an interest in eye health. Respondents rated the importance of each metric for research. A further survey presented the list of metrics to 2000 randomly selected members of the College of Optometrists. The optometrists were asked to specify how likely they were to enter information about each metric in a routine sight test consultation. They were also asked if data were entered as free text, menus or a combination of these. Results: Current EPRs allowed the input of data relating to the metrics of interest. Most data entry was free text. There was a good match between high priority metrics for research and those commonly recorded in optometric practice. Conclusions: Although there were plenty of electronic data in optometric practice, this was highly variable and often not in an easily analysed format. To facilitate analysis of the evidence for public health purposes a UK based minimum dataset containing standardised clinical information is recommended. Further research would be required to develop suitable coding for the individual metrics included. The dataset would need to capture information from all sectors of the population to ensure effective planning of any future interventions.
14	Bounded Expectation of Label Assignment: Dataset Annotation by Supervised Splitting with Bias-Reduction Techniques Herbst, Alyssa Kathryn 20 January 2020 (has links) Annotating large unlabeled datasets can be a major bottleneck for machine learning applications. We introduce a scheme for inferring labels of unlabeled data at a fraction of the cost of labeling the entire dataset. We refer to the scheme as Bounded Expectation of Label Assignment (BELA). BELA greedily queries an oracle (or human labeler) and partitions a dataset to find data subsets that have mostly the same label. BELA can then infer labels by majority vote of the known labels in each subset. BELA makes the decision to split or label from a subset by maximizing a lower bound on the expected number of correctly labeled examples. BELA improves upon existing hierarchical labeling schemes by using supervised models to partition the data, therefore avoiding reliance on unsupervised clustering methods that may not accurately group data by label. We design BELA with strategies to avoid bias that could be introduced through this adaptive partitioning. We evaluate BELA on labeling of four datasets and find that it outperforms existing strategies for adaptive labeling. / Master of Science / Most machine learning classifiers require data with both features and labels. The features of the data may be the pixel values for an image, the words in a text sample, the audio of a voice clip, and more. The labels of a dataset define the data. They place the data into one of several categories, such as determining whether a image is of a cat or dog, or adding subtitles to Youtube videos. The labeling of a dataset can be expensive, and usually requires a human to annotate. Human labeled data can be moreso expensive if the data requires an expert labeler, as in the labeling of medical images, or when labeling data is particularly time consuming. We introduce a scheme for labeling data that aims to lessen the cost of human labeled data by labeling a subset of an entire dataset and making an educated guess on the labels of the remaining unlabeled data. The labeled data generated from our approach may be then used towards the training of a classifier, or an algorithm that maps the features of data to a guessed label. This is based off of the intuition that data with similar features will also have similar labels. Our approach uses a game-like process of, at any point, choosing between one of two possible actions: we may either label a new data point, thus learning more about the dataset, or we may split apart the dataset into multiple subsets of data. We will eventually guess the labels of the unlabeled data by assigning each unlabeled data point the majority label of the data subset that it belongs to. The novelty in our approach is that we use supervised classifiers, or splitting techniques that use both the features and the labels of data, to split a dataset into new subsets. We use bias reduction techniques that enable us to use supervised splitting. Active Learning Machine learning Dataset Annotation
15	Duomenų atnaujinimo lygiagretumo konfliktų sprendimas prekybos ir klientų aptarnavimo sistemose / Data update concurrency conflict sollutions in commerce and customer service systems Kėsas, Marius 26 May 2004 (has links) There are many benefits to upgrading your data access layer to ADO.NET, most of which involve using the intrinsic DataSet object. The DataSet object is basically a disconnected, in-memory replica of a database. DataSets provide many benefits, but also present a few challenges. Specifically, you can run into problems related to data concurrency exceptions. I've created a simple Windows® Forms customer service application that illustrates the potential pitfalls of this particular problem. I'll walk you through my research and show you ways to overcome the data concurrency issues that arose. DataSets provide a number of benefits. For example, you gain the ability to enforce rules of integrity in memory rather than at the database level. The most important benefit of using DataSets, however, is improved performance. Since the DataSet is disconnected from the underlying database, your code will make fewer calls to the database, significantly boosting performance. As with most performance optimizations, this one comes with a price. Since the DataSet object is disconnected from the underlying database, there is always a chance that the data is out of date. Since a DataSet doesn't hold live data, but rather a snapshot of live data at the time the DataSet was filled, problems related to data concurrency can occur. Informatics Engineering Atnaujinimo lygiagretumas DataSet concurrency Duomenų lygiagretumas DataSet update conflicts
16	Pokročilé metody detekce hran v obraze / Advanced Image Edge Detection Mezírka, Martin January 2015 (has links) The goal of this work is to investigate options how to apply trainable edge detection algorithm Structured forest for fast edge detection to information extraction from historici maps and medical images. For the work, annotated dataset was created and the detektor was tested on it. Structured forest achieved better results on map data, compared with classical detectors. Success rate of finding edges of bones was similar at both approaches. Aim of the work is focused on comparing different image annotation styles, experiments with dataset, including determining parameters and evaluation of the methods.
17	Automatiskt genererade dataset med SfM : En undersökning av SfM och dess egenskaper Elmesten, Jonas January 2021 (has links) Fler och fler industrier vänder blickarna mot A.I. (artificiell intelligens) för att undersöka om och hur det kan användas för att effektivisera olika processer. Men för att träna upp en A.I. krävs oftast stora mängder data där man kan behöva förbereda väldigt mycket manuellt innan man ens kan påbörja träningsprocessen. SCA Skog AB ser dock många fördelar med att göra A.I. till en naturlig del av sin digitaliseringsprocess, där man bland annat är intresserad utav visuella bedömningar av träd. Dataset för visuella bedömningar kan se ut på olika sätt, men i detta fall var det relevant att skapa dataset i form av konturer för trädstammar. Med hjälp av en A.I. som skulle kunna visuellt segmentera och klassificera träd så skulle man öppna upp för många nya möjligheter inom skogsindustrin. Under detta projekt har jag undersökt hur man skulle kunna automatisera processen för skapandet av dataset i skogsmiljöer för just visuella bedömningar. Som ett resultat av att försöka uppnå detta, så fick jag experimentera med bildbaserade punktmoln som på olika sätt tillät projektet att avancera framåt. Ur dessa punktmoln kunde jag sedan segmentera träden för att i nästa process skapa konturer längs alla träd med hjälp av utvunnen data ur segmenteringen. Jag tittade först och främst på hur man automatiskt skulle kunna skapa konturer för alla träd i bildsekvensen, för att sedan låta en användare gå in och finjustera konturerna. I resultatet kan man sedan tydligt se skillnaden i tidsåtgång för att använda programmet och inte. Programmet kan skapa och uppdatera pixel-masker snabbare än vad jag manuellt kunde utföra samma arbete, där jag dock hade önskat på en mer markant skillnad i tidsåtgång jämfört med den rent manuella insatsen. Under projektets gång så kunde jag identifiera några större problem som förhindrade detta, där man med lämplig utrustning skulle kunna uppnå ett mycket bättre resultat än vad som gjordes under detta projekt. Resultaten talar ändå för att det kan vara lönt att undersöka metoden mer ingående. / More and more industries are turning their eyes towards A.I. (artificial intelligence) and its rapid development, in hope of utilizing it to remove labor intense operations. But large amounts of manually processed data is often required before starting the learning process, which can be a huge problem to deal with. SCA Skog AB is still very curious in how they could use A.I. in forestry, where visual inspection of trees is of particular interest. There are many visual problems that modern A.I. can solve, where in this case it’s a matter of finding contours of trees and classify them. If this would be possible, a lot of interesting opportunities would open up to be experimented with. During this project I’ve examined the possibility of reducing the time it takes to manually create datasets of forest environments for this particular visual problem. As a result of trying to achieve this, I had to examine image-based point clouds and their properties to find out how they could be used in this process. From the SfM-point cloud I was able to segment all visible trees with an segmentation algorithm and isolate these points to extract the 2D→3Dconnection. I could then use that connection to create pixel masks and apply it to the image sequence to paint out all the contours of the segmented trees. A method to automatically update these pixel masks in terms of adding and removal was also implemented, where any update would propagate through the image sequence and reduce the time for manual adjustment. From testing the program, it’s clear that time could be saved doing various kinds of contour updating-operations. The program could by itself create pixel masks that then could be updated in a way that a lot of need for manual updating was reduced, though the result in terms of time saved was not as substantial as one would have hoped for. Issues with the point cloud caused some major problems due to it’s low precision. Using better equipment for image gathering would most likely be the best way to improve the results of this project. The result still tells us that this method is worth researching further. A.I LiDAR SfM point could dataset A.I LiDAR SfM punktmoln dataset. Software Engineering Programvaruteknik
18	La recommandation des jeux de données basée sur le profilage pour le liage des données RDF / Profile-based Datas and Recommendation for RDF Data Linking Ben Ellefi, Mohamed 01 December 2016 (has links) Avec l’émergence du Web de données, notamment les données ouvertes liées, une abondance de données est devenue disponible sur le web. Cependant, les ensembles de données LOD et leurs sous-graphes inhérents varient fortement par rapport a leur taille, le thème et le domaine, les schémas et leur dynamicité dans le temps au niveau des données. Dans ce contexte, l'identification des jeux de données appropriés, qui répondent a des critères spécifiques, est devenue une tâche majeure, mais difficile a soutenir, surtout pour répondre a des besoins spécifiques tels que la recherche d'entités centriques et la recherche des liens sémantique des données liées. Notamment, en ce qui concerne le problème de liage des données, le besoin d'une méthode efficace pour la recommandation des jeux de données est devenu un défi majeur, surtout avec l'état actuel de la topologie du LOD, dont la concentration des liens est très forte au niveau des graphes populaires multi-domaines tels que DBpedia et YAGO, alors qu'une grande liste d'autre jeux de données considérés comme candidats potentiels pour le liage est encore ignorée. Ce problème est dû a la tradition du web sémantique dans le traitement du problème de "identification des jeux de données candidats pour le liage". Bien que la compréhension de la nature du contenu d'un jeu de données spécifique est une condition cruciale pour les cas d'usage mentionnées, nous adoptons dans cette thèse la notion de "profil de jeu de données"- un ensemble de caractéristiques représentatives pour un jeu de données spécifique, notamment dans le cadre de la comparaison avec d'autres jeux de données. Notre première direction de recherche était de mettre en œuvre une approche de recommandation basée sur le filtrage collaboratif, qui exploite à la fois les prols thématiques des jeux de données, ainsi que les mesures de connectivité traditionnelles, afin d'obtenir un graphe englobant les jeux de données du LOD et leurs thèmes. Cette approche a besoin d'apprendre le comportement de la connectivité des jeux de données dans le LOD graphe. Cependant, les expérimentations ont montré que la topologie actuelle de ce nuage LOD est loin d'être complète pour être considéré comme des données d'apprentissage.Face aux limites de la topologie actuelle du graphe LOD, notre recherche a conduit a rompre avec cette représentation de profil thématique et notamment du concept "apprendre pour classer" pour adopter une nouvelle approche pour l'identification des jeux de données candidats basée sur le chevauchement des profils intensionnels entre les différents jeux de données. Par profil intensionnel, nous entendons la représentation formelle d'un ensemble d'étiquettes extraites du schéma du jeu de données, et qui peut être potentiellement enrichi par les descriptions textuelles correspondantes. Cette représentation fournit l'information contextuelle qui permet de calculer la similarité entre les différents profils d'une manière efficace. Nous identifions le chevauchement de différentes profils à l'aide d'une mesure de similarité semantico-fréquentielle qui se base sur un classement calcule par le tfidf et la mesure cosinus. Les expériences, menées sur tous les jeux de données lies disponibles sur le LOD, montrent que notre méthode permet d'obtenir une précision moyenne de 53% pour un rappel de 100%.Afin d'assurer des profils intensionnels de haute qualité, nous introduisons Datavore- un outil oriente vers les concepteurs de métadonnées qui recommande des termes de vocabulaire a réutiliser dans le processus de modélisation des données. Datavore fournit également les métadonnées correspondant aux termes recommandés ainsi que des propositions des triples utilisant ces termes. L'outil repose sur l’écosystème des Vocabulaires Ouverts Lies (LOV) pour l'acquisition des vocabulaires existants et leurs métadonnées. / With the emergence of the Web of Data, most notably Linked Open Data (LOD), an abundance of data has become available on the web. However, LOD datasets and their inherent subgraphs vary heavily with respect to their size, topic and domain coverage, the schemas and their data dynamicity (respectively schemas and metadata) over the time. To this extent, identifying suitable datasets, which meet specific criteria, has become an increasingly important, yet challenging task to supportissues such as entity retrieval or semantic search and data linking. Particularlywith respect to the interlinking issue, the current topology of the LOD cloud underlines the need for practical and efficient means to recommend suitable datasets: currently, only well-known reference graphs such as DBpedia (the most obvious target), YAGO or Freebase show a high amount of in-links, while there exists a long tail of potentially suitable yet under-recognized datasets. This problem is due to the semantic web tradition in dealing with "finding candidate datasets to link to", where data publishers are used to identify target datasets for interlinking.While an understanding of the nature of the content of specific datasets is a crucial prerequisite for the mentioned issues, we adopt in this dissertation the notion of "dataset profile" - a set of features that describe a dataset and allow the comparison of different datasets with regard to their represented characteristics. Our first research direction was to implement a collaborative filtering-like dataset recommendation approach, which exploits both existing dataset topic proles, as well as traditional dataset connectivity measures, in order to link LOD datasets into a global dataset-topic-graph. This approach relies on the LOD graph in order to learn the connectivity behaviour between LOD datasets. However, experiments have shown that the current topology of the LOD cloud group is far from being complete to be considered as a ground truth and consequently as learning data.Facing the limits the current topology of LOD (as learning data), our research has led to break away from the topic proles representation of "learn to rank" approach and to adopt a new approach for candidate datasets identication where the recommendation is based on the intensional profiles overlap between differentdatasets. By intensional profile, we understand the formal representation of a set of schema concept labels that best describe a dataset and can be potentially enriched by retrieving the corresponding textual descriptions. This representation provides richer contextual and semantic information and allows to compute efficiently and inexpensively similarities between proles. We identify schema overlap by the help of a semantico-frequential concept similarity measure and a ranking criterion based on the tfidf cosine similarity. The experiments, conducted over all available linked datasets on the LOD cloud, show that our method achieves an average precision of up to 53% for a recall of 100%. Furthermore, our method returns the mappings between the schema concepts across datasets, a particularly useful input for the data linking step.In order to ensure a high quality representative datasets schema profiles, we introduce Datavore\| a tool oriented towards metadata designers that provides rankedlists of vocabulary terms to reuse in data modeling process, together with additional metadata and cross-terms relations. The tool relies on the Linked Open Vocabulary (LOV) ecosystem for acquiring vocabularies and metadata and is made available for the community. Liage de données RDF Jeux de données RDF Web Sémantique Profile de jeux de données Recommendation des jeux de données Linked Data RDF dataset Semantic WEB Dataset Profiling Dataset Recommendation
19	Cloud intrusion detection based on change tracking and a new benchmark dataset Aldribi, Abdulaziz 30 August 2018 (has links) The adoption of cloud computing has increased dramatically in recent years due to at- tractive features such as flexibility, cost reductions, scalability, and pay per use. Shifting towards cloud computing is attracting not only industry but also government and academia. However, given their stringent privacy and security policies, this shift is still hindered by many security concerns related to the cloud computing features, namely shared resources, virtualization and multi-tenancy. These security concerns vary from privacy threats and lack of transparency to intrusions from within and outside the cloud infrastructure. There- fore, to overcome these concerns and establish a strong trust in cloud computing, there is a need to develop adequate security mechanisms for effectively handling the threats faced in the cloud. Intrusion Detection Systems (IDSs) represent an important part of such mech- anisms. Developing cloud based IDS that can capture suspicious activity or threats, and prevent attacks and data leakage from both inside and outside the cloud environment is paramount. However, cloud computing is faced with a multidimensional and rapidly evolv- ing threat landscape, which makes cloud based IDS more challenging. Moreover, one of the most significant hurdles for developing such cloud IDS is the lack of publicly available datasets collected from a real cloud computing environment. In this dissertation, we intro- duce the first public dataset of its kind, named ISOT Cloud Intrusion Dataset (ISOT-CID), for cloud intrusion detection. The dataset consists of several terabytes of data, involving normal activities and a wide variety of attack vectors, collected over multiple phases and periods of time in a real cloud environment. We also introduce a new hypervisor-based cloud intrusion detection system (HIDS) that uses online multivariate statistical change analysis to detect anomalous network behaviors. As a departure from the conventional monolithic network IDS feature model, we leverage the fact that a hypervisor consists of a collection of instances, to introduce an instance-oriented feature model that exploits indi- vidual as well as correlated behaviors of instances to improve the detection capability. The proposed approach is evaluated using ISOT-CID and the experiments along with results are presented. / Graduate / 2020-08-14 Cloud Computing Intrusion Detection Cloud computing Dataset Change Point
20	Systematic generation of datasets and benchmarks for modern computer vision Malireddi, Sri Raghu 03 April 2019 (has links) Deep Learning is dominant in the field of computer vision, thanks to its high performance. This high performance is driven by large annotated datasets and proper evaluation benchmarks. However, two important areas in computer vision, depth-based hand segmentation, and local features, respectively lack a large well-annotated dataset and a benchmark protocol that properly demonstrates its practical performance. Therefore, in this thesis, we focus on these two problems. For hand segmentation, we create a novel systematic way to easily create automatic semantic segmentation annotations for large datasets. We achieved this with the help of traditional computer vision techniques and minimal hardware setup of one RGB-D camera and two distinctly colored skin-tight gloves. Our method allows easy creation of large-scale datasets with high annotation quality. For local features, we create a new modern benchmark, that reveals their different aspects. Specifically wide-baseline stereo matching and Multi-View Stereo (MVS), of keypoints in a more practical setup, namely Structure-from-Motion (SfM). We believe that through our new benchmark, we will be able to spur research on learned local features to a more practical direction. In this respect, the benchmark developed for the thesis will be used to host a challenge on local features. / Graduate sfm computer-vision deep-learning hand segmentation dataset

Search results