31 |
Testing Fuzzy Extractors for Face Biometrics: Generating Deep DatasetsTambay, Alain Alimou 11 November 2020 (has links)
Biometrics can provide alternative methods for security than conventional authentication methods. There has been much research done in the field of biometrics, and efforts have been made to make them more easily usable in practice. The initial application for our work is a proof of concept for a system that would expedite some low-risk travellers’ arrival into the country while preserving the user’s privacy. This thesis focuses on the subset of problems related to the generation of cryptographic keys from noisy data, biometrics in our case.
This thesis was built in two parts. In the first, we implemented a key generating quantization-based fuzzy extractor scheme for facial feature biometrics based on the work by Dodis et al. and Sutcu, Li, and Memon. This scheme was modified to increased user privacy, address some implementation-based issues, and add testing-driven changes to tailor it towards its expected real-world usage. We show that our implementation does not significantly affect the scheme's performance, while providing additional protection against malicious actors that may gain access to the information stored on a server where biometric information is stored.
The second part consists of the creation of a process to automate the generation of deep datasets suitable for the testing of similar schemes. The process led to the creation of a larger dataset than those available for free online for minimal work, and showed that these datasets can be further expanded with only little additional effort. This larger dataset allowed for the creation of more representative recognition challenges. We were able to show that our implementation performed similarly to other non-commercial schemes. Further refinement will be necessary if this is to be compared to commercial applications.
|
32 |
Visualization of spatio-temporal data in two dimensional spaceBaskaran, Savitha 15 November 2016 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Spatio-temporal data is becoming very popular in the recent times, as there are
large number of datasets that collect both location and temporal information in the
real time. The main challenge is that extracting useful insights from such large data
set is extremely complex and laborious. In this thesis, we have proposed a novel 2D
technique to visualize the spatio-temporal big data. The visualization of the combined
interaction between the spatial and temporal data is of high importance to uncover
the insights and identify the trends within the data.
Maps have been a successful way to represent the spatial information. Addition-
ally, in this work, colors are used to represent the temporal data. Every data point
has the time information which is converted into relevant color, based on the HSV
color model. The variation in the time is represented by transition from one color to
another and hence provide smooth interpolation. The proposed solution will help the
user to quickly understand the data and gain insights.
|
33 |
Extracting Symptoms from Narrative Text using Artificial IntelligenceGandhi, Priyanka 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Electronic health records collect an enormous amount of data about patients. However, the information about the patient’s illness is stored in progress notes that are in an un- structured format. It is difficult for humans to annotate symptoms listed in the free text. Recently, researchers have explored the advancements of deep learning can be applied to pro- cess biomedical data. The information in the text can be extracted with the help of natural language processing. The research presented in this thesis aims at automating the process of symptom extraction. The proposed methods use pre-trained word embeddings such as BioWord2Vec, BERT, and BioBERT to generate vectors of the words based on semantics and syntactic structure of sentences. BioWord2Vec embeddings are fed into a BiLSTM neural network with a CRF layer to capture the dependencies between the co-related terms in the sentence. The pre-trained BERT and BioBERT embeddings are fed into the BERT model with a CRF layer to analyze the output tags of neighboring tokens. The research shows that with the help of the CRF layer in neural network models, longer phrases of symptoms can be extracted from the text. The proposed models are compared with the UMLS Metamap tool that uses various sources to categorize the terms in the text to different semantic types and Stanford CoreNLP, a dependency parser, that analyses syntactic relations in the sentence to extract information. The performance of the models is analyzed by using strict, relaxed, and n-gram evaluation schemes. The results show BioBERT with a CRF layer can extract the majority of the human-labeled symptoms. Furthermore, the model is used to extract symptoms from COVID-19 tweets. The model was able to extract symptoms listed by CDC as well as new symptoms.
|
34 |
A New SCADA Dataset for Intrusion Detection System ResearchTurnipseed, Ian P 14 August 2015 (has links)
Supervisory Control and Data Acquisition (SCADA) systems monitor and control industrial control systems in many industrials and economic sectors which are considered critical infrastructure. In the past, most SCADA systems were isolated from all other networks, but recently connections to corporate enterprise networks and the Internet have increased. Security concerns have risen from this new found connectivity. This thesis makes one primary contribution to researchers and industry. Two datasets have been introduced to support intrusion detection system research for SCADA systems. The datasets include network traffic captured on a gas pipeline SCADA system in Mississippi State University’s SCADA lab. IDS researchers lack a common framework to train and test proposed algorithms. This leads to an inability to properly compare IDS presented in literature and limits research progress. The datasets created for this thesis are available to be used to aid researchers in assessing the performance of SCADA IDS systems.
|
35 |
Cyberthreats, Attacks and Intrusion Detection in Supervisory Control and Data Acquisition NetworksGao, Wei 14 December 2013 (has links)
Supervisory Control and Data Acquisition (SCADA) systems are computer-based process control systems that interconnect and monitor remote physical processes. There have been many real world documented incidents and cyber-attacks affecting SCADA systems, which clearly illustrate critical infrastructure vulnerabilities. These reported incidents demonstrate that cyber-attacks against SCADA systems might produce a variety of financial damage and harmful events to humans and their environment. This dissertation documents four contributions towards increased security for SCADA systems. First, a set of cyber-attacks was developed. Second, each attack was executed against two fully functional SCADA systems in a laboratory environment; a gas pipeline and a water storage tank. Third, signature based intrusion detection system rules were developed and tested which can be used to generate alerts when the aforementioned attacks are executed against a SCADA system. Fourth, a set of features was developed for a decision tree based anomaly based intrusion detection system. The features were tested using the datasets developed for this work. This dissertation documents cyber-attacks on both serial based and Ethernet based SCADA networks. Four categories of attacks against SCADA systems are discussed: reconnaissance, malicious response injection, malicious command injection and denial of service. In order to evaluate performance of data mining and machine learning algorithms for intrusion detection systems in SCADA systems, a network dataset to be used for benchmarking intrusion detection systemswas generated. This network dataset includes different classes of attacks that simulate different attack scenarios on process control systems. This dissertation describes four SCADA network intrusion detection datasets; a full and abbreviated dataset for both the gas pipeline and water storage tank systems. Each feature in the dataset is captured from network flow records. This dataset groups two different categories of features that can be used as input to an intrusion detection system. First, network traffic features describe the communication patterns in a SCADA system. This research developed both signature based IDS and anomaly based IDS for the gas pipeline and water storage tank serial based SCADA systems. The performance of both types of IDS were evaluates by measuring detection rate and the prevalence of false positives.
|
36 |
Analysis and Comparison of a Detailed Land Cover Dataset versus the National Land Cover Dataset (NLCD) in Blacksburg, VirginiaWhite, Claire McKenzie 19 January 2012 (has links)
While many studies have completed accuracy assessments on the National Land Cover Dataset (NLCD), little research has utilized a detailed digitized land cover dataset, like that available for the Town of Blacksburg, for this comparison. This study aims to evaluate the information available from a detailed land cover dataset and compare it with the National Land Cover Dataset (NLCD) at a localized scale. More specifically, it utilizes the detailed land cover dataset for the Town of Blacksburg to analyze the land cover distribution for varying land uses including single-family residential, multi-family residential, and non-residential. In addition, an application scenario assigns an area-weighted curve number to watersheds based on each land cover dataset. This study exhibits the importance of obtaining detailed land cover datasets for cities and towns. Furthermore, it shows the comprehensive information and subsequent quantifications that can be surmised from a detailed land cover dataset. / Master of Science
|
37 |
Integrating Multiple Deep Learning Models for Disaster Description in Low-Altitude VideosWang, Haili 12 1900 (has links)
Computer vision technologies are rapidly improving and becoming more important in disaster response. The majority of disaster description techniques now focus either on identify objects or categorize disasters. In this study, we trained multiple deep neural networks on low-altitude imagery with highly imbalanced and noisy labels. We utilize labeled images from the LADI dataset to formulate a solution for general problem in disaster classification and object detection. Our research integrated and developed multiple deep learning models that does the object detection task as well as the disaster scene classification task. Our solution is competitive in the TRECVID Disaster Scene Description and Indexing (DSDI) task, demonstrating that it is comparable to other suggested approaches in retrieving disaster-related video clips.
|
38 |
Analys av prediktiv precision av maskininlärningsalgoritmerRemgård, Jonas January 2017 (has links)
Maskininlärning (eng: Machine Learning) har på senare tid blivit ett populärt ämne. En fråga som många användare ställer sig är hur mycket data det behövs för att få ett så korrekt svar som möjligt. Detta arbete undersöker relationen mellan inlärningsdata, mängd såväl som struktur, och hur väl algoritmen presterar. Fyra olika typer av datamängder (Iris, Digits, Symmetriskt och Dubbelsymetriskt) studerades med hjälp av tre olika algoritmer (Support Vector Classifier, K-Nearest Neighbor och Decision Tree Classifier). Arbetet fastställer att alla tre algoritmers prestation förbättras vid större mängd inlärningsdata upp till en viss gräns, men att denna gräns är olika för varje algoritm. Datainstansernas struktur påverkar också algoritmernas prestation där dubbelsymmetri ger starkare prestation än enkelsymmetri. / In recent years Machine Learning has become a popular subject. A challange that many users face is choosing the correct amount of training data. This study researches the relationship between the amount and structure of training data and the accuracy of the algorithm. Four different datasets (Iris, Digits, Symmetry and Double symmetry) were used with three different algorithms (Support Vector Classifier, K-Nearest Neighbor and Decision Tree Classifier). This study concludes that all algorithms perform better with more training data up to a certain limit, which is different for each algorithm. The structure of the dataset also affects the performance, where double symmetry gives greater performance than simple symmetry.
|
39 |
Hierarchical Bayesian Dataset SelectionZhou, Xiaona 05 1900 (has links)
Despite the profound impact of deep learning across various domains, supervised model training critically depends on access to large, high-quality datasets, which are often challenging to identify. To address this, we introduce <b>H</b>ierarchical <b>B</b>ayesian <b>D</b>ataset <b>S</b>election (<b>HBDS</b>), the first dataset selection algorithm that utilizes hierarchical Bayesian modeling, designed for collaborative data-sharing ecosystems. The proposed method efficiently decomposes the contributions of dataset groups and individual datasets to local model performance using Bayesian updates with small data samples. Our experiments on two benchmark datasets demonstrate that HBDS not only offers a computationally lightweight solution but also enhances interpretability compared to existing data selection methods, by revealing deep insights into dataset interrelationships through learned posterior distributions. HBDS outperforms traditional non-hierarchical methods by correctly identifying all relevant datasets, achieving optimal accuracy with fewer computational steps, even when initial model accuracy is low. Specifically, HBDS surpasses its non-hierarchical counterpart by 1.8% on DIGIT-FIVE and 0.7% on DOMAINNET, on average. In settings with limited resources, HBDS achieves a 6.9% higher accuracy than its non-hierarchical counterpart. These results confirm HBDS's effectiveness in identifying datasets that improve the accuracy and efficiency of deep learning models when collaborative data utilization is essential. / Master of Science / Deep learning technologies have revolutionized many domains and applications, from voice recognition in smartphones to automated recommendations on streaming services. However, the success of these technologies heavily relies on having access to large and high-quality datasets. In many cases, selecting the right datasets can be a daunting challenge. To tackle this, we have developed a new method that can quickly figure out which datasets or groups of datasets contribute most to improving the performance of a model with only a small amount of data needed. Our tests prove that this method is not only effective and light on computation but also helps us understand better how different datasets relate to each other.
|
40 |
Detekce chodců ve snímku pomocí metod strojového učení / Pedestrians Detection in Traffic Environment by Machine LearningTilgner, Martin January 2019 (has links)
Tato práce se zabývá detekcí chodců pomocí konvolučních neuronových sítí z pohledu autonomního vozidla. A to zejména jejich otestováním ve smyslu nalezení vhodné praxe tvorby datasetu pro machine learning modely. V práci bylo natrénováno celkem deset machine learning modelů meta architektur Faster R-CNN s ResNet 101 jako feature extraktorem a SSDLite s feature extraktorem MobileNet_v2. Tyto modely byly natrénovány na datasetech o různých velikostech. Nejlépší výsledky byly dosaženy na datasetu o velikosti 5000 snímků. Kromě těchto modelů byl vytvořen nový dataset zaměřující se na chodce v noci. Dále byla vytvořena knihovna Python funkcí pro práci s datasety a automatickou tvorbu datasetu.
|
Page generated in 0.0455 seconds