231 |
Charakterizace chodců ve videu / Pedestrian Attribute AnalysisStudená, Zuzana January 2019 (has links)
This work deals with obtaining pedestrian information, which are captured by static, external cameras located in public, outdoor or indoor spaces. The aim is to obtain as much information as possible. Information such as gender, age and type of clothing, accessories, fashion style, or overall personality are obtained using using convolutional neural networks. One part of the work consists of creating a new dataset that captures pedestrians and includes information about the person's sex, age, and fashion style. Another part of the thesis is the design and implementation of convolutional neural networks, which classify the mentioned pedestrian characteristics. Neural networks evaluate pedestrian input images in PETA, FashionStyle14 and BUT Pedestrian Attributes datasets. Experiments performed over the PETA and FashionStyle datasets compare my results to various convolutional neural networks described in publications. Further experiments are shown on created BUT data set of pedestrian attributes.
|
232 |
Datová sada pro klasifikaci síťových zařízení pomocí strojového učení / Dataset for Classification of Network Devices Using Machine LearningEis, Pavel January 2021 (has links)
Automatic classification of devices in computer network can be used for detection of anomalies in a network and also it enables application of security policies per device type. The key to creating a device classifier is a quality data set, the public availability of which is low and the creation of a new data set is difficult. The aim of this work is to create a tool, that will enable automated annotation of the data set of network devices and to create a classifier of network devices that uses only basic data from network flows. The result of this work is a modular tool providing automated annotation of network devices using system ADiCT of Cesnet's association, search engines Shodan and Censys, information from PassiveDNS, TOR, WhoIs, geolocation database and information from blacklists. Based on the annotated data set are created several classifiers that classify network devices according to the services they use. The results of the work not only significantly simplify the process of creating new data sets of network devices, but also show a non-invasive approach to the classification of network devices.
|
233 |
An integrated approach to groundwater exploration using remotely sensed imagery and geophysical techniques: a case study in the Archean basement and Karoo sedimentary basins of Limpopo Province of South AfricaMagakane, Ronald 20 September 2019 (has links)
MESMEG / Department of Mining and Environmental Geology / Many recent studies have shown that some of the greatest water needs occur in areas underlain by crystalline rocks with complex hydrogeology. Crystalline basement rocks underlie over 60% of the South African surface, and the Limpopo Province of South Africa is no exception. Previous attempts to develop the lithologies of Limpopo for groundwater abstraction without the use of sound scientific methodologies resulted in low yielding boreholes and a higher rate of borehole failure. The complexity of the lithologies in the region necessitates the use of sound scientific methodologies for the delineation of promising groundwater potential zones. Therefore, the principal objective of the present study was to delineate groundwater potential zones through an integrated approach of remote sensing, geophysics, as well as the use of ancillary datasets.
The area of focus is located in the northeastern section of Limpopo province, covering an area of about 16 800km2. Geologically, it is underlain by three Lithostratigraphic domains comprised of Archean-aged basement rocks, Soutpansberg volcano-sedimentary succession and subsidiary basins of the main Karoo young sedimentary cover. In general, the groundwater potential of a region is a function of factors such as lithology, lineaments, slope, climate and land use/ land cover. Thus, the present study used parameters such as lineaments, lithologies, slope, and land use/ land cover to produce a groundwater potential zone map. The thematic layers were prepared from raw datasets, which include; LANDSAT 8 OLI, ASTER-DEM, aeromagnetic data, geological maps, and land use/land cover data, which were overlaid in a GIS environment.
The resultant groundwater map revealed the presence of five distinct classes of groundwater potential zones, which were categorised into excellent, good, moderate, low and very low. Interpretation of the results shows that the study area is dominated by areas that may be regarded as moderate water potential zones, covering about 52% of the total area. On the other hand, low and good groundwater potential zones occur in almost equal proportions of 19.52 % and 24 % respectively. The results obtained were validated using GRIP borehole dataset, and a number of follow-up geophysical surveys.
iii
Overlaying of the boreholes dataset on the map showed positive correlation between borehole yields groundwater potential zones. On the other hand, follow-up Vertical Electrical Sounding surveys revealed the presence of conductive layers in some selected target areas. The groundwater potential zone map and validation results provided a meaningful regional assessment of groundwater distribution in the study area. Thus, the results of this study can be used as a guideline for future groundwater exploration projects. / NRF
|
234 |
Identifikace cover verzí skladeb pomocí harmonických příznaků, modelu harmonie a harmonické složitosti / Cover Song Identification using Music Harmony Features, Model and Complexity AnalysisMaršík, Ladislav January 2019 (has links)
Title: Cover Song Identification using Music Harmony Features, Model and Complexity Analysis Author: Ladislav Maršík Department: Department of Software Engineering Supervisor: Prof. RNDr. Jaroslav Pokorný, CSc., Department of Software Engineering Abstract: Analysis of digital music and its retrieval based on the audio fe- atures is one of the popular topics within the music information retrieval (MIR) field. Every musical piece has its characteristic harmony structure, but harmony analysis is seldom used for retrieval. Retrieval systems that do not focus on similarities in harmony progressions may consider two versions of the same song different, even though they differ only in instrumentation or a singing voice. This thesis takes various paths in exploring, how music harmony can be used in MIR, and in particular, the cover song identification (CSI) task. We first create a music harmony model based on the knowledge of music theory. We define novel concepts: a harmonic complexity of a mu- sical piece, as well as the chord and chroma distance features. We show how these concepts can be used for retrieval, complexity analysis, and how they compare with the state-of-the-art of music harmony modeling. An extensive comparison of harmony features is then performed, using both the novel fe- atures and the...
|
235 |
Extraction of medical knowledge from clinical reports and chest x-rays using machine learning techniquesBustos, Aurelia 19 June 2019 (has links)
This thesis addresses the extraction of medical knowledge from clinical text using deep learning techniques. In particular, the proposed methods focus on cancer clinical trial protocols and chest x-rays reports. The main results are a proof of concept of the capability of machine learning methods to discern which are regarded as inclusion or exclusion criteria in short free-text clinical notes, and a large scale chest x-ray image dataset labeled with radiological findings, diagnoses and anatomic locations. Clinical trials provide the evidence needed to determine the safety and effectiveness of new medical treatments. These trials are the basis employed for clinical practice guidelines and greatly assist clinicians in their daily practice when making decisions regarding treatment. However, the eligibility criteria used in oncology trials are too restrictive. Patients are often excluded on the basis of comorbidity, past or concomitant treatments and the fact they are over a certain age, and those patients that are selected do not, therefore, mimic clinical practice. This signifies that the results obtained in clinical trials cannot be extrapolated to patients if their clinical profiles were excluded from the clinical trial protocols. The efficacy and safety of new treatments for patients with these characteristics are not, therefore, defined. Given the clinical characteristics of particular patients, their type of cancer and the intended treatment, discovering whether or not they are represented in the corpus of available clinical trials requires the manual review of numerous eligibility criteria, which is impracticable for clinicians on a daily basis. In this thesis, a large medical corpora comprising all cancer clinical trials protocols in the last 18 years published by competent authorities was used to extract medical knowledge in order to help automatically learn patient’s eligibility in these trials. For this, a model is built to automatically predict whether short clinical statements were considered inclusion or exclusion criteria. A method based on deep neural networks is trained on a dataset of 6 million short free-texts to classify them between elegible or not elegible. For this, pretrained word embeddings were used as inputs in order to predict whether or not short free-text statements describing clinical information were considered eligible. The semantic reasoning of the word-embedding representations obtained was also analyzed, being able to identify equivalent treatments for a type of tumor in an analogy with the drugs used to treat other tumors. Results show that representation learning using deep neural networks can be successfully leveraged to extract the medical knowledge from clinical trial protocols and potentially assist practitioners when prescribing treatments. The second main task addressed in this thesis is related to knowledge extraction from medical reports associated with radiographs. Conventional radiology remains the most performed technique in radiodiagnosis services, with a percentage close to 75% (Radiología Médica, 2010). In particular, chest x-ray is the most common medical imaging exam with over 35 million taken every year in the US alone (Kamel et al., 2017). They allow for inexpensive screening of several pathologies including masses, pulmonary nodules, effusions, cardiac abnormalities and pneumothorax. For this task, all the chest-x rays that had been interpreted and reported by radiologists at the Hospital Universitario de San Juan (Alicante) from Jan 2009 to Dec 2017 were used to build a novel large-scale dataset in which each high-resolution radiograph is labeled with its corresponding metadata, radiological findings and pathologies. This dataset, named PadChest, includes more than 160,000 images obtained from 67,000 patients, covering six different position views and additional information on image acquisition and patient demography. The free text reports written in Spanish by radiologists were labeled with 174 different radiographic findings, 19 differential diagnoses and 104 anatomic locations organized as a hierarchical taxonomy and mapped onto standard Unified Medical Language System (UMLS) terminology. For this, a subset of the reports (a 27%) were manually annotated by trained physicians, whereas the remaining set was automatically labeled with deep supervised learning methods using attention mechanisms and fed with the text reports. The labels generated were then validated in an independent test set achieving a 0.93 Micro-F1 score. To the best of our knowledge, this is one of the largest public chest x-ray databases suitable for training supervised models concerning radiographs, and also the first to contain radiographic reports in Spanish. The PadChest dataset can be downloaded on request from http://bimcv.cipf.es/bimcv-projects/padchest/. PadChest is intended for training image classifiers based on deep learning techniques to extract medical knowledge from chest x-rays. It is essential that automatic radiology reporting methods could be integrated in a clinically validated manner in radiologists’ workflow in order to help specialists to improve their efficiency and enable safer and actionable reporting. Computer vision methods capable of identifying both the large spectrum of thoracic abnormalities (and also the normality) need to be trained on large-scale comprehensively labeled large-scale x-ray datasets such as PadChest. The development of these computer vision tools, once clinically validated, could serve to fulfill a broad range of unmet needs. Beyond implementing and obtaining results for both clinical trials and chest x-rays, this thesis studies the nature of the health data, the novelty of applying deep learning methods to obtain large-scale labeled medical datasets, and the relevance of its applications in medical research, which have contributed to its extramural diffusion and worldwide reach. This thesis describes this journey so that the reader is navigated across multiple disciplines, from engineering to medicine up to ethical considerations in artificial intelligence applied to medicine.
|
236 |
3D Object Detection Using Virtual Environment Assisted Deep Network TrainingDale, Ashley S. 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world
image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety
of configurations. When the MR-CNN architecture was initialized with MS COCO
weights and the heads were trained with a mix of synthetic data and real world data,
F1 scores improved in four of the five classes: The average maximum F1-score of
all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91,
compared to F1 = 0.89 for the networks trained exclusively with real data, and the
standard deviation of the maximum mean F1-score for synthetically trained networks
is σ∗ = 0.015, compared to σ_F1 = 0.020 for the networks trained exclusively with real F1
data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background.
|
237 |
Political Trust and Its Determinants : Exploring the role of cultural and institutional related determinants of political trust in SwedenBjörebäck, Leonard January 2021 (has links)
There is a current widespread knowledge about what factors that is of importance when explaining levels of individuals political trust. Unfortunately, the same knowledge is not at hand as to whether these ‘determinants’ of political have changed over time and if so how? In other words, can we assume that citizens form their trust in similar matters over time or has there been a shift? With the purpose of contributing to new knowledge about political trust, this thesis mainly explored if there has been a change in the effects of often argued to be strong determinants on political trust and secondly if there are any trends as to how these effects has changed over time. In order to realize answers to these most likely never posed questions, the theoretical framework departed from Mishler and Rose’s sectioning of cultural and institutional theory which entail very different views on the origin and dynamic of political trust. Later the two theories were operationalized into cultural and institutional related variables in accordance to available variables found in “The SOM Institute Cumulative Dataset 1986-2019”. Through numerous multiple linear regression analyses utilizing Swedish data between 1998 and 2019 it shows that the effect of most explanatory variables on political trust changes, but since these effects were small from the start there are reasons to question what weight the changes are carrying. Onwards, by performing interaction analyses the thesis was able to conclude a handful of positive and negative linear trends arching over the 22-year period meaning that some explanatory variables have become increasingly and decreasingly important when explaining the variation in political trust, which in turn indicates that the Swedish population on average tends to form their trust slightly different in 2019 as opposed to in 1998.
|
238 |
Aplikace pro zpracování dat z oblasti genového inženýrství / Application for the Data Processing in the Area of Genome EngineeringBrychta, Jan January 2008 (has links)
This masters thesis has a few objectives. One of them is to acquaint with the problems of genome engineering, especially with fragmentation of DNA, the macromolecule DNA, the methods for purification and separation of the nucleic acids, the enzymes used for modification of these acids, amplification and get to know with cluster and gradient analysis as well. The next aim is to peruse the existed application and compare it to the layout of the proposed application, that is the third aim. The last one from the objectives is the implementation and the report how was the application tested by the real data. The results will be discussed as well as the possibilities of the further extension.
|
239 |
Data Collection and Layout Analysis on Visually Rich Documents using Multi-Modular Deep Learning.Stahre, Mattias January 2022 (has links)
The use of Deep Learning methods for Document Understanding has been embraced by the research community in recent years. A requirement for Deep Learning methods and especially Transformer Networks, is access to large datasets. The objective of this thesis was to evaluate a state-of-the-art model for Document Layout Analysis on a public and custom dataset. Additionally, the objective was to build a pipeline for building a dataset specifically for Visually Rich Documents. The research methodology consisted of a literature study to find the state-of-the-art model for Document Layout Analysis and a relevant dataset used to evaluate the chosen model. The literature study also included research on how existing datasets in the domain were collected and processed. Finally, an evaluation framework was created. The evaluation showed that the chosen multi-modal transformer network, LayoutLMv2, performed well on the Docbank dataset. The custom build dataset was limited by class imbalance, although good performance for the larger classes. The annotator tool and its auto-tagging feature performed well and the proposed pipelined showed great promise for creating datasets with Visually Rich Documents. In conclusion, this thesis project answers the research questions and suggests two main opportunities. The first is to encourage others to build datasets with Visually Rich Documents using a similar pipeline to the one presented in this paper. The second is to evaluate the possibility of creating the visual token information for LayoutLMv2 as part of the transformer network rather than using a separate CNN. / Användningen av Deep Learning-metoder för dokumentförståelse har anammats av forskarvärlden de senaste åren. Ett krav för Deep Learning-metoder och speciellt Transformer Networks är tillgång till stora datamängder. Syftet med denna avhandling var att utvärdera en state-of-the-art modell för analys av dokumentlayout på en offentligt tillgängligt dataset. Dessutom var målet att bygga en pipeline för att bygga en dataset specifikt för Visuallt Rika Dokument. Forskningsmetodiken bestod av en litteraturstudie för att hitta modellen för Document Layout Analys och ett relevant dataset som användes för att utvärdera den valda modellen. Litteraturstudien omfattade också forskning om hur befintliga dataset i domänen samlades in och bearbetades. Slutligen skapades en utvärderingsram. Utvärderingen visade att det valda multimodala transformatornätverket, LayoutLMv2, fungerade bra på Docbank-datasetet. Den skapade datasetet begränsades av klassobalans även om bra prestanda för de större klasserna erhölls. Annotatorverktyget och dess autotaggningsfunktion fungerade bra och den föreslagna pipelinen visade sig vara mycket lovande för att skapa dataset med VVisuallt Rika Dokument.svis besvarar detta examensarbete forskningsfrågorna och föreslår två huvudsakliga möjligheter. Den första är att uppmuntra andra att bygga datauppsättningar med Visuallt Rika Dokument med en liknande pipeline som den som presenteras i denna uppsats. Det andra är att utvärdera möjligheten att skapa den visuella tokeninformationen för LayoutLMv2 som en del av transformatornätverket snarare än att använda en separat CNN.
|
240 |
Residues in Succession U-Net for Fast and Efficient SegmentationSultana, Aqsa 11 August 2022 (has links)
No description available.
|
Page generated in 0.1004 seconds