101 |
Deep Learning for Code Generation using Snippet Level Parallel DataJain, Aneesh 05 January 2023 (has links)
In the last few years, interest in the application of deep learning methods for software engineering tasks has surged. A variety of different approaches like transformer based methods, statistical machine translation models, models inspired from natural language settings have been proposed and shown to be effective at tasks like code summarization, code synthesis and code translation. Multiple benchmark data sets have also been released but all suffer from one limitation or the other. Some data sets only support a select few programming languages while others support only certain tasks. These limitations restrict researchers' ability to be able to perform thorough analyses of their proposed methods. In this work we aim to alleviate some of the limitations faced by researchers who work in the paradigm of deep learning applications for software engineering tasks. We introduce a large, parallel, multi-lingual programming language data set that supports tasks like code summarization, code translation, code synthesis and code search in 7 different languages. We provide benchmark results for the current state of the art models on all these tasks and we also explore some limitations of current evaluation metrics for code related tasks. We provide a detailed analysis of the compilability of code generated by deep learning models because that is a better measure of ascertaining usability of code as opposed to scores like BLEU and CodeBLEU. Motivated by our findings about compilability, we also propose a reinforcement learning based method that incorporates code compilability and syntax level feedback as rewards and we demonstrate it's effectiveness in generating code that has less syntax errors as compared to baselines. In addition, we also develop a web portal that hosts the models we have trained for code translation. The portal allows translation between 42 possible language pairs and also allows users to check compilability of the generated code. The intent of this website is to give researchers and other audiences a chance to interact with and probe our work in a user-friendly way, without requiring them to write their own code to load and inference the models. / Master of Science / Deep neural networks have now become ubiquitous and find their applications in almost every technology and service we use today. In recent years, researchers have also started applying neural network based methods to problems in the software engineering domain. Software engineering by it's nature requires a lot of documentation, and creating this natural language documentation automatically using programs as input to the neural networks has been one their first applications in this domain. Other applications include translating code between programming languages and searching for code using natural language as one does on websites like stackoverflow. All of these tasks now have the potential to be powered by deep neural networks. It is common knowledge that neural networks are data hungry and in this work we present a large data set containing codes in multiple programming languages like Java, C++, Python, C#, Javascript, PHP and C. Our data set is intended to foster more research in automating software engineering tasks using neural networks. We provide an analysis of performance of multiple state of the art models using our data set in terms of compilability, which measures the number of syntax errors in the code, as well as other metrics. In addition, propose our own deep neural network based model for code translation, which uses feedback from programming language compilers in order to reduce the number of syntax errors in the generated code. We also develop and present a website where some of our code translation models have been hosted. The website allows users to interact with our work in an easy manner without any knowledge of deep learning and get a sense of how these technologies are being applied for software engineering tasks.
|
102 |
WiSDM: a platform for crowd-sourced data acquisition, analytics, and synthetic data generationChoudhury, Ananya 15 August 2016 (has links)
Human behavior is a key factor influencing the spread of infectious diseases. Individuals adapt their daily routine and typical behavior during the course of an epidemic -- the adaptation is based on their perception of risk of contracting the disease and its impact. As a result, it is desirable to collect behavioral data before and during a disease outbreak. Such data can help in creating better computer models that can, in turn, be used by epidemiologists and policy makers to better plan and respond to infectious disease outbreaks. However, traditional data collection methods are not well suited to support the task of acquiring human behavior related information; especially as it pertains to epidemic planning and response.
Internet-based methods are an attractive complementary mechanism for collecting behavioral information. Systems such as Amazon Mechanical Turk (MTurk) and online survey tools provide simple ways to collect such information. This thesis explores new methods for information acquisition, especially behavioral information that leverage this recent technology.
Here, we present the design and implementation of a crowd-sourced surveillance data acquisition system -- WiSDM. WiSDM is a web-based application and can be used by anyone with access to the Internet and a browser. Furthermore, it is designed to leverage online survey tools and MTurk; WiSDM can be embedded within MTurk in an iFrame. WiSDM has a number of novel features, including, (i) ability to support a model-based abductive reasoning loop: a flexible and adaptive information acquisition scheme driven by causal models of epidemic processes, (ii) question routing: an important feature to increase data acquisition efficacy and reduce survey fatigue and (iii) integrated surveys: interactive surveys to provide additional information on survey topic and improve user motivation.
We evaluate the framework's performance using Apache JMeter and present our results. We also discuss three other extensions of WiSDM: Adapter, Synthetic Data Generator, and WiSDM Analytics. The API Adapter is an ETL extension of WiSDM which enables extracting data from disparate data sources and loading to WiSDM database. The Synthetic Data Generator allows epidemiologists to build synthetic survey data using NDSSL's Synthetic Population as agents. WiSDM Analytics empowers users to perform analysis on the data by writing simple python code using Versa APIs. We also propose a data model that is conducive to survey data analysis. / Master of Science
|
103 |
Topographic Effects in Strong Ground MotionRai, Manisha 14 September 2015 (has links)
Ground motions from earthquakes are known to be affected by earth's surface topography. Topographic effects are a result of several physical phenomena such as the focusing or defocusing of seismic waves reflected from a topographic feature and the interference between direct and diffracted seismic waves. This typically causes an amplification of ground motion on convex features such as hills and ridges and a de-amplification on concave features such as valleys and canyons. Topographic effects are known to be frequency dependent and the spectral accelerations can sometimes reach high values causing significant damages to the structures located on the feature. Topographically correlated damage pattern have been observed in several earthquakes and topographic amplifications have also been observed in several recorded ground motions. This phenomenon has also been extensively studied through numerical analyses. Even though different studies agree on the nature of topographic effects, quantifying these effects have been challenging. The current literature has no consensus on how to predict topographic effects at a site. With population centers growing around regions of high seismicity and prominent topographic relief, such as California, and Japan, the quantitative estimation of the effects have become very important. In this dissertation, we address this shortcoming by developing empirical models that predict topographic effects at a site. These models are developed through an extensive empirical study of recorded ground motions from two large strong-motion datasets namely the California small to medium magnitude earthquake dataset and the global NGA-West2 datasets, and propose topographic modification factors that quantify expected amplification or deamplification at a site.
To develop these models, we required a parameterization of topography. We developed two types of topographic parameters at each recording stations. The first type of parameter is developed using the elevation data around the stations, and comprise of parameters such as smoothed slope, smoothed curvature, and relative elevation. The second type of parameter is developed using a series of simplistic 2D numerical analysis. These numerical analyses compute an estimate of expected 2D topographic amplification of a simple wave at a site in several different directions. These 2D amplifications are used to develop a family of parameters at each site. We study the trends in the ground motion model residuals with respect to these topographic parameters to determine if the parameters can capture topographic effects in the recorded data. We use statistical tests to determine if the trends are significant, and perform mixed effects regression on the residuals to develop functional forms that can be used to predict topographic effect at a site. Finally, we compare the two types of parameters, and their topographic predictive power. / Ph. D.
|
104 |
<b>L</b><b>I</b><b>DAR-BASED QUANTIFICATION OF INDIANA LAKE MICHIGAN SHORELINE CHANGES</b>Tasmiah Ahsan (12503458) 18 April 2024 (has links)
<p dir="ltr">Recent high-water levels in Lake Michigan caused extensive shoreline changes along the Indiana coastline. To evaluate recent shoreline changes of the Indiana coastline along Lake Michigan, topographic LiDAR surveys available for the years 2008, 2012, 2013, 2018, 2020, and 2022 were analyzed. This study included LiDAR data of over 400 cross-shore transects, generated at 100 m spacing. Beach profiles were generated to detect the shoreline position and quantify beach width and nearshore volume change. The analysis revealed accretion of both shoreline and beach width from 2008 to 2013 during a low water level period. The beach was rebuilt with a median increased value of 4 m. On the contrary, the shoreline eroded during increasing and high-water periods. Both shoreline and beach width receded with median values of 41 m and 32 m respectively during the period of water level increase from 2013 to 2020. Consequently, the beach profiles lost a median sand volume of 21.6 m<sup>3</sup>/m. Overall, the Indiana shoreline moved with a median of 18 m landward from 2008 to 2022. However, there was a large amount of spatial variability in the shoreline changes. The shoreline movement varied spatially between 63 m recession to 29 m accretion. Similarly, beach profiles showed a loss of median sand volume of 10 m<sup>3</sup>/m. The volume change ranged from 918 m<sup>3</sup>/m loss to 296 m<sup>3</sup>/m accumulation varying spatially along the shoreline. The largest sand loss was experienced at the downdrift of Michigan city harbor near Mt. Baldy. In addition to the spatial variation, the recession also varied slightly with shoreline type. The natural and hardened beaches were mostly recessional. The recession along the hardened shoreline was influenced by the timing of construction and its proximity to inland areas. Buffered beaches, characterized by a swath of vegetation or dunes, experienced the least erosion.</p>
|
105 |
A Comparison of SVM Classifiers with Embedded Feature SelectionJohansson, Adam, Mattsson, Anton January 2024 (has links)
Since their introduction in 1995, Support Vector Machines (SVM) have come to be a widely employed machine learning model for binary classification, owing to their explainable architecture, efficient forward inference, and good ability to generalize. A common desire, not only for SVMs but for machine learning classifiers in general, is to have the model do feature selection, using only a limited subset of the available attributes in its predictions. Various alterations to the SVM problem formulation exist that address this, and in this report we compare a range of such SVM models. We compare how the accuracy and feature selection compare between the models for different datasets, both real and synthetic, and we also investigate the impact of dataset size on the aforementioned quantities. Our conclusions are that models trained to classify samples based on a smaller subset of features, tend to perform at a comparable level to dense models, with particular advantage when the dataset is small. Furthermore, as the training dataset grows in size, the number of selected features also increases, giving a more complex classifier when prompted with a larger data supply.
|
106 |
Fundus-DeepNet: Multi-Label Deep Learning Classification System for Enhanced Detection of Multiple Ocular Diseases through Data Fusion of Fundus ImagesAl-Fahdawi, S., Al-Waisy, A.S., Zeebaree, D.Q., Qahwaji, Rami S.R., Natiq, H., Mohammed, M.A., Nedoma, J., Martinek, R., Deveci, M. 29 September 2023 (has links)
Yes / Detecting multiple ocular diseases in fundus images is crucial in ophthalmic diagnosis. This study introduces the Fundus-DeepNet system, an automated multi-label deep learning classification system designed to identify multiple ocular diseases by integrating feature representations from pairs of fundus images (e.g., left and right eyes). The study initiates with a comprehensive image pre-processing procedure, including circular border cropping, image resizing, contrast enhancement, noise removal, and data augmentation. Subsequently, discriminative deep feature representations are extracted using multiple deep learning blocks, namely the High-Resolution Network (HRNet) and Attention Block, which serve as feature descriptors. The SENet Block is then applied to further enhance the quality and robustness of feature representations from a pair of fundus images, ultimately consolidating them into a single feature representation. Finally, a sophisticated classification model, known as a Discriminative Restricted Boltzmann Machine (DRBM), is employed. By incorporating a Softmax layer, this DRBM is adept at generating a probability distribution that specifically identifies eight different ocular diseases. Extensive experiments were conducted on the challenging Ophthalmic Image Analysis-Ocular Disease Intelligent Recognition (OIA-ODIR) dataset, comprising diverse fundus images depicting eight different ocular diseases. The Fundus-DeepNet system demonstrated F1-scores, Kappa scores, AUC, and final scores of 88.56%, 88.92%, 99.76%, and 92.41% in the off-site test set, and 89.13%, 88.98%, 99.86%, and 92.66% in the on-site test set.In summary, the Fundus-DeepNet system exhibits outstanding proficiency in accurately detecting multiple ocular diseases, offering a promising solution for early diagnosis and treatment in ophthalmology. / European Union under the REFRESH – Research Excellence for Region Sustainability and High-tech Industries project number CZ.10.03.01/00/22_003/0000048 via the Operational Program Just Transition. The Ministry of Education, Youth, and Sports of the Czech Republic - Technical University of Ostrava, Czechia under Grants SP2023/039 and SP2023/042.
|
107 |
Počítání vozidel v statickém obraze / Counting Vehicles in Static ImagesZemánek, Ondřej January 2020 (has links)
Tato práce se zaměřuje na problém počítání vozidel v statickém obraze bez znalosti geometrických vlastností scény. V rámci řešení bylo implementováno a natrénováno 5 architektur konvolučních neuronových sítí. Také byl pořízen rozsáhlý dataset s 19 310 snímky pořízených z 12pohledů a zachycujících 7 různých scén. Použité konvoluční sítě mapují vstupní vzorek na mapu hustoty vozidel, ze které lze získat jejich počet a lokalizaci v kontextu vstupního snímku. Hlavním přínosem této práce je porovnání a aplikace dosavadních nejlepších řešení pro počítání objektů v obraze. Většina z těchto architektur byla navržena pro počítání lidí v obraze, proto musely být uzpůsobeny pro potřeby počítání vozidel v statickém obraze. Natrénované modely jsou vyhodnoceny GAME metrikou na TRANCOS datasetu a na velkém spojeném datasetu. Dosažené výsledky všech modelů jsou následně popsány a porovnány.
|
108 |
Detection of facade cracks using deep learningEriksson, Linus January 2020 (has links)
Facade cracks are a common problem in the north of Sweden due to shifting temperatures creating frost in the facades which ultimately damages the facades, often in the form of cracks. To fix these cracks, workers must visually inspect the facades to find them which is a difficult and time-consuming task. This project explores the possibilities of creating an algorithm that can classify cracks on facades with the help of deep learning models. The idea is that in the future, an algorithm like this could be implemented on a drone that hoovers around buildings, filming the facade, and reporting back if there are any damages to the facade. The work in this project is exploratory and the path of convolutional neural networks has been explored, as well as the possibility to simulate training data due to the lack of real-world data. The experimental work in this project led to some interesting conclusions for further work. The relatively small amount of data used in this project points towards the possibility of using simulated data as a complement to real data, as well as the possibility of using convolutional neural networks as a means of classifying facades for crack recognition. The data and conclusions collected in this report can be used as a preparatory work for a working prototype algorithm.
|
109 |
Enhancing Long-Term Human Motion Forecasting using Quantization-based Modelling. : Integrating Attention and Correlation for 3D Motion Prediction / Förbättring av långsiktig prognostisering av mänsklig rörelse genom kvantisering-baserad modellering. : Integrering av uppmärksamhet och korrelation för 3D-rörelseförutsägelse.González Gudiño, Luis January 2023 (has links)
This thesis focuses on addressing the limitations of existing human motion prediction models by extending the prediction horizon to very long-term forecasts. The objective is to develop a model that achieves one of the best stable prediction horizons in the field, providing accurate predictions without significant error increase over time. Through the utilization of quantization based models our research successfully achieves the desired objective with the proposed aligned version of Mean Per Joint Position Error. The first of the two proposed models, an attention-based Vector Quantized Variational AutoEncoder, demonstrates good performance in predicting beyond conventional time boundaries, maintaining low error rates as the prediction horizon extends. While slight discrepancies in joint positions are observed, the model effectively captures the underlying patterns and dynamics of human motion, which remains highly applicable in real-world scenarios. Furthermore, our investigation into a correlation-based Vector Quantized Variational AutoEncoder, as an alternative to attention-based one, highlights the challenges in capturing complex relationships and meaningful patterns within the data. The correlation-based VQ-VAE’s tendency to predict flat outputs emphasizes the need for further exploration and innovative approaches to improve its performance. Overall, this thesis contributes to the field of human motion prediction by extending the prediction horizon and providing insights into model performance and limitations. The developed model introduces a novel option to consider when contemplating long-term prediction applications across various domains and sets the foundation for future research to enhance performance in long-term scenarios. / Denna avhandling fokuserar på att hantera begränsningarna i befintliga modeller för förutsägelse av mänskliga rörelser genom att utöka förutsägelsehorisonten till mycket långsiktiga prognoser. Målet är att utveckla en modell som uppnår en av de bästa stabila prognoshorisonterna inom området, vilket ger korrekta prognoser utan betydande felökning över tiden. Genom att använda kvantiseringsbaserade modeller uppnår vår forskning framgångsrikt det önskade målet med den föreslagna anpassade versionen av Mean Per Joint Position Error. Den första av de två föreslagna modellerna, en uppmärksamhetsbaserad Vector Quantized Variational AutoEncoder, visar goda resultat när det gäller att förutsäga bortom konventionella tidsgränser och bibehåller låga felfrekvenser när förutsägelsehorisonten förlängs. Även om små avvikelser i ledpositioner observeras, fångar modellen effektivt de underliggande mönstren och dynamiken i mänsklig rörelse, vilket förblir mycket tillämpligt i verkliga scenarier. Vår undersökning av en korrelationsbaserad Vector Quantized Variational AutoEncoder, som ett alternativ till en uppmärksamhetsbaserad sådan, belyser dessutom utmaningarna med att fånga komplexa relationer och meningsfulla mönster i data. Den korrelationsbaserade VQ-VAE:s tendens att förutsäga platta utdata understryker behovet av ytterligare utforskning och innovativa metoder för att förbättra dess prestanda. Sammantaget bidrar denna avhandling till området för förutsägelse av mänskliga rörelser genom att utöka förutsägelsehorisonten och ge insikter om modellens prestanda och begränsningar. Den utvecklade modellen introducerar ett nytt alternativ att ta hänsyn till när man överväger långsiktiga prediktionstillämpningar inom olika områden och lägger grunden för framtida forskning för att förbättra prestanda i långsiktiga scenarier.
|
110 |
Image-classification for Brain Tumor using Pre-trained Convolutional Neural Network : Bildklassificering för hjärntumör medhjälp av förtränat konvolutionell tneuralt nätverkOsman, Ahmad, Alsabbagh, Bushra January 2023 (has links)
Brain tumor is a disease characterized by uncontrolled growth of abnormal cells inthe brain. The brain is responsible for regulating the functions of all other organs,hence, any atypical growth of cells in the brain can have severe implications for itsfunctions. The number of global mortality in 2020 led by cancerous brains was estimatedat 251,329. However, early detection of brain cancer is critical for prompttreatment and improving patient’s quality of life as well as survival rates. Manualmedical image classification in diagnosing diseases has been shown to be extremelytime-consuming and labor-intensive. Convolutional Neural Networks (CNNs) hasproven to be a leading algorithm in image classification outperforming humans. Thispaper compares five CNN architectures namely: VGG-16, VGG-19, AlexNet, EffecientNetB7,and ResNet-50 in terms of performance and accuracy using transferlearning. In addition, the authors discussed in this paper the economic impact ofCNN, as an AI approach, on the healthcare sector. The models’ performance isdemonstrated using functions for loss and accuracy rates as well as using the confusionmatrix. The conducted experiment resulted in VGG-19 achieving best performancewith 97% accuracy, while EffecientNetB7 achieved worst performance with93% accuracy. / Hjärntumör är en sjukdom som kännetecknas av okontrollerad tillväxt av onormalaceller i hjärnan. Hjärnan är ansvarig för att styra funktionerna hos alla andra organ,därför kan all onormala tillväxt av celler i hjärnan ha allvarliga konsekvenser för dessfunktioner. Antalet globala dödligheten ledda av hjärncancer har uppskattats till251329 under 2020. Tidig upptäckt av hjärncancer är dock avgörande för snabb behandlingoch för att förbättra patienternas livskvalitet och överlevnadssannolikhet.Manuell medicinsk bildklassificering vid diagnostisering av sjukdomar har visat sigvara extremt tidskrävande och arbetskrävande. Convolutional Neural Network(CNN) är en ledande algoritm för bildklassificering som har överträffat människor.Denna studie jämför fem CNN-arkitekturer, nämligen VGG-16, VGG-19, AlexNet,EffecientNetB7, och ResNet-50 i form av prestanda och noggrannhet. Dessutom diskuterarförfattarna i studien CNN:s ekonomiska inverkan på sjukvårdssektorn. Modellensprestanda demonstrerades med hjälp av funktioner om förlust och noggrannhetsvärden samt med hjälp av en Confusion matris. Resultatet av det utfördaexperimentet har visat att VGG-19 har uppnått bästa prestanda med 97% noggrannhet,medan EffecientNetB7 har uppnått värsta prestanda med 93% noggrannhet.
|
Page generated in 0.0462 seconds