Spelling suggestions: "subject:"recurrent neural networks"" "subject:"decurrent neural networks""
111 |
State of Charge and Range Estimation of Lithium-ion Batteries in Electric VehiclesKhanum, Fauzia January 2021 (has links)
Switching from fossil-fuel-powered vehicles to electric vehicles has become an international focus in the pursuit of combatting climate change. Regardless, the adoption of electric vehicles has been slow, in part, due to range anxiety. One solution to mitigating range anxiety is to provide a more accurate state of charge (SOC) and range estimation. SOC estimation of lithium-ion batteries for electric vehicle application is a well-researched topic, yet minimal tools and code exist online for researchers and students alike. To that end, a publicly available Kalman filter-based SOC estimation function is presented. The MATLAB function utilizes a second-order resistor-capacitor equivalent circuit model. It requires the SOC-OCV (open circuit voltage) curve, internal resistance, and equivalent circuit model battery parameters. Users can use an extended Kalman filter (EKF) or adaptive extended Kalman filter (AEKF) algorithm and temperature-dependent battery data. A practical example is illustrated using the LA92 driving cycle of a Turnigy battery at multiple temperatures ranging from -10C to 40C.
Current range estimation methods suffer from inaccuracy as factors including temperature, wind, driver behaviour, battery voltage, current, SOC, route/terrain, and much more make it difficult to model accurately. One of the most critical factors in range estimation is the battery. However, most models thus far are represented using equivalent circuit models as they are more widely researched. Another limitation is that any machine learning-based range estimation is typically based on historical driving data that require odometer readings for training.
A range estimation algorithm using a machine learning-based voltage estimation model is presented. Specifically, the long short-term memory cell in a recurrent neural network is used for the battery model. The model is trained with two datasets, classic and whole, from the experimental data of four Tesla/Panasonic 2170 battery cells. All network training is completed on SHARCNET, a resource provided by Canada Compute to researchers. The classically trained network achieved an average root mean squared error (RMSE) of 44 mV compared to 34 mV achieved by the network trained on the whole dataset. Based on the whole dataset, all test cases achieve an end range estimation of less than 5 km with an average of 0.29 km. / Thesis / Master of Applied Science (MASc)
|
112 |
Разработка системы автоматического распознавания автомобильных номеров в реальных дорожных условиях : магистерская диссертация / Development of a system for automatic recognition of license plates in real road conditionsЗайкис, Д. В., Zaikis, D. V. January 2023 (has links)
Цель работы – разработка автоматической системы распознавания номерных знаков автомобилей, в естественных дорожных условиях, в том числе в сложных погодных и физических условиях, таких как недостаточная видимость, загрязнение, умышленное или непреднамеренное частичное скрытие символов. Объектом исследования являются цифровые изображения автомобилей в естественной среде. Методы исследования: сверточные нейронные сети, в том числе одноэтапные детекторы (SSOD), комбинации сетей с промежуточными связями между слоями - Cross Stage Partial Network (CSPNet) и сети, объединяющей информацию с разных уровней сети – Path Aggregation Network (PANet), преобразования изображений с помощью библиотеки OpenCV, включая фильтры Собеля и Гауса, преобразование Кэнни, методы глубокого машинного обучения для обработки последовательностей LSTM, CRNN, CRAFT. В рамках данной работы разработана система распознавания автомобильных номеров, переводящая графические данные из цифрового изображения или видеопотока в текст в виде файлов различных форматов. Задача детекции автомобильных номеров на изображениях решена с помощью глубокой нейронной сети YoLo v5, представляющая собой современную модель обнаружения объектов, основанную на архитектуре с использованием CSPNet и PANet. Она обеспечивает высокую скорость и точность при обнаружении объектов на изображениях. Благодаря своей эффективности и масштабируемости, YoLov5 стала популярным выбором для решения задач компьютерного зрения в различных областях. Для решения задачи распознавания текса на обнаруженных объектах используется алгоритм детектирования объектов, основанный на преобразованиях Кэнни, фильтрах Собеля и Гаусса и нейронная сеть keras-ocr, на основе фреймворка keras, представляющая собой комбинацию сверточной нейронной сети (CNN) и рекуррентной нейронной сети (RNN), решающая задачу распознавания печатного текста. Созданный метод способен безошибочно распознавать 85 % предоставленных номеров, преимущественно российского стандарта. Полученный функционал может быть внедрен в существующую системы фото- или видео-фиксации трафика и использоваться в рамках цифровизации систем трекинга и контроля доступа и безопасности на дорогах и объектах транспортной инфраструктуры. Выпускная квалификационная работа в теоретической и описательной части выполнена в текстовом редакторе Microsoft Word и представлена в электронном формате. Практическая часть выполнялась в jupiter-ноутбуке на платформе облачных вычислений Google Collaboratory. / The goal of the work is to develop an automatic system for recognizing car license plates in natural road conditions, including difficult weather and physical conditions, such as insufficient visibility, pollution, intentional or unintentional partial hiding of symbols. The object of the study is digital images of cars in their natural environment. Research methods: convolutional neural networks, including single-stage detectors (SSOD), combinations of networks with intermediate connections between layers - Cross Stage Partial Network (CSPNet) and networks that combine information from different levels of the network - Path Aggregation Network (PANet), image transformations using the OpenCV library, including Sobel and Gauss filters, Canny transform, deep machine learning methods for processing LSTM, CRNN, CRAFT sequences. As part of this work, a license plate recognition system has been developed that converts graphic data from a digital image or video stream into text in the form of files in various formats. The problem of detecting license plates in images is solved using the YoLo v5 deep neural network, which is a modern object detection model based on an architecture using CSPNet and PANet. It provides high speed and accuracy in detecting objects in images. Due to its efficiency and scalability, YoLov5 has become a popular choice for solving computer vision problems in various fields. To solve the problem of text recognition on detected objects, an object detection algorithm is used, based on Canny transforms, Sobel and Gaussian filters, and the keras-ocr neural network, based on the keras framework, which is a combination of a convolutional neural network (CNN) and a recurrent neural network (RNN) , which solves the problem of recognizing printed text. The created method is capable of accurately recognizing 85% of the provided numbers, mainly of the Russian standard. The resulting functionality can be implemented into existing systems for photo or video recording of traffic and used as part of the digitalization of tracking systems and access control and security on roads and transport infrastructure facilities. The final qualifying work in the theoretical and descriptive parts was completed in the text editor Microsoft Word and presented in electronic format. The practical part was carried out on a jupiter laptop on the Google Collaboratory cloud computing platform.
|
113 |
Deep Recurrent Q Networks for Dynamic Spectrum Access in Dynamic Heterogeneous Envirnments with Partial ObservationsXu, Yue 23 September 2022 (has links)
Dynamic Spectrum Access (DSA) has strong potential to address the need for improved spectrum efficiency. Unfortunately, traditional DSA approaches such as simple "sense-and-avoid" fail to provide sufficient performance in many scenarios. Thus, the combination of sensing with deep reinforcement learning (DRL) has been shown to be a promising alternative to previously proposed simplistic approaches. DRL does not require the explicit estimation of transition probability matrices and prohibitively large matrix computations as compared to traditional reinforcement learning methods. Further, since many learning approaches cannot solve the resulting online Partially-Observable Markov Decision Process (POMDP), Deep Recurrent Q-Networks (DRQN) have been proposed to determine the optimal channel access policy via online learning. The fundamental goal of this dissertation is to develop DRL-based solutions to address this POMDP-DSA problem. We mainly consider three aspects in this work: (1) optimal transmission strategies, (2) combined intelligent sensing and transmission strategies, and (c) learning efficiency or online convergence speed. Four key challenges in this problem are (1) the proposed DRQN-based node does not know the other nodes' behavior patterns a priori and must to predict the future channel state based on previous observations; (2) the impact to primary user throughput during learning and even after learning must be limited; (3) resources can be wasted the sensing/observation; and (4) convergence speed must be improved without impacting performance performance. We demonstrate in this dissertation, that the proposed DRQN can learn: (1) the optimal transmission strategy in a variety of environments under partial observations; (2) a sensing strategy that provides near-optimal throughput in different environments while dramatically reducing the needed sensing resources; (3) robustness to imperfect observations; (4) a sufficiently flexible approach that can accommodate dynamic environments, multi-channel transmission and the presence of multiple agents; (5) in an accelerated fashion utilizing one of three different approaches. / Doctor of Philosophy / With the development of wireless communication, such as 5G, global mobile data traffic has experienced tremendous growth, which makes spectrum resources even more critical for future networks. However, the spectrum is an exorbitant and scarce resource. Dynamic Spectrum Access (DSA) has strong potential to address the need for improved spectrum efficiency. Unfortunately, traditional DSA approaches such as simple "sense-and-avoid" fail to provide sufficient performance in many scenarios. Thus, the combination of sensing with deep reinforcement learning (DRL) has been shown to be a promising alternative to previously proposed simplistic approaches. Compared with traditional reinforcement learning methods, DRL does not require explicit estimation of transition probability matrices and extensive matrix computations. Furthermore, since many learning methods cannot solve the resulting online partially observable Markov decision process (POMDP), a deep recurrent Q-network (DRQN) is proposed to determine the optimal channel access policy through online learning. The basic goal of this paper is to develop a DRL-based solution to this POMDP-DSA problem. This paper mainly focuses on improving performance from three directions. 1. Find the optimal (or sub-optimal) channel access strategy based on fixed partial observation mode; 2. Based on work 1, propose a more intelligent way to dynamically and efficiently find more reasonable (higher efficiency) sensing/observation policy and corresponding channel access strategy; 3. On the premise of ensuring performance, use different machine learning algorithms or structures to improve learning efficiency and avoid users waiting too long for expected performance. Through the research in these three main directions, we have found an efficient and diverse solution, namely DRQN-based technology.
|
114 |
Passive RFID Module with LSTM Recurrent Neural Network Activity Classification Algorithm for Ambient Assisted LivingOguntala, George A., Hu, Yim Fun, Alabdullah, Ali A.S., Abd-Alhameed, Raed, Ali, Muhammad, Luong, D.K. 23 March 2021 (has links)
Yes / Human activity recognition from sensor data is a critical research topic to achieve remote health monitoring and ambient assisted living (AAL). In AAL, sensors are integrated into conventional objects aimed to support targets capabilities through digital environments that are sensitive, responsive and adaptive to human activities. Emerging technological paradigms to support AAL within the home or community setting offers people the prospect of a more individually focused care and improved quality of living. In the present work, an ambient human activity classification framework that augments information from the received signal strength indicator (RSSI) of passive RFID tags to obtain detailed activity profiling is proposed. Key indices of position, orientation, mobility, and degree of activities which are critical to guide reliable clinical management decisions using 4 volunteers are employed to simulate the research objective. A two-layer, fully connected sequence long short-term memory recurrent neural network model (LSTM RNN) is employed. The LSTM RNN model extracts the feature of RSS from the sensor data and classifies the sampled activities using SoftMax. The performance of the LSTM model is evaluated for different data size and the hyper-parameters of the RNN are adjusted to optimal states, which results in an accuracy of 98.18%. The proposed framework suits well for smart health and smart homes which offers pervasive sensing environment for the elderly, persons with disability and chronic illness.
|
115 |
CONTRIBUTIONS TO EFFICIENT AUTOMATIC TRANSCRIPTION OF VIDEO LECTURESAgua Teba, Miguel Ángel del 04 November 2019 (has links)
Tesis por compendio / [ES] Durante los últimos años, los repositorios multimedia en línea se han convertido
en fuentes clave de conocimiento gracias al auge de Internet, especialmente en
el área de la educación. Instituciones educativas de todo el mundo han dedicado
muchos recursos en la búsqueda de nuevos métodos de enseñanza, tanto para
mejorar la asimilación de nuevos conocimientos, como para poder llegar a una
audiencia más amplia. Como resultado, hoy en día disponemos de diferentes
repositorios con clases grabadas que siven como herramientas complementarias en
la enseñanza, o incluso pueden asentar una nueva base en la enseñanza a
distancia. Sin embargo, deben cumplir con una serie de requisitos para que la
experiencia sea totalmente satisfactoria y es aquí donde la transcripción de los
materiales juega un papel fundamental. La transcripción posibilita una búsqueda
precisa de los materiales en los que el alumno está interesado, se abre la
puerta a la traducción automática, a funciones de recomendación, a la
generación de resumenes de las charlas y además, el poder hacer
llegar el contenido a personas con discapacidades auditivas. No obstante, la
generación de estas transcripciones puede resultar muy costosa.
Con todo esto en mente, la presente tesis tiene como objetivo proporcionar
nuevas herramientas y técnicas que faciliten la transcripción de estos
repositorios. En particular, abordamos el desarrollo de un conjunto de herramientas
de reconocimiento de automático del habla, con énfasis en las técnicas de aprendizaje
profundo que contribuyen a proporcionar transcripciones precisas en casos de
estudio reales. Además, se presentan diferentes participaciones en competiciones
internacionales donde se demuestra la competitividad del software comparada con
otras soluciones. Por otra parte, en aras de mejorar los sistemas de
reconocimiento, se propone una nueva técnica de adaptación de estos sistemas al
interlocutor basada en el uso Medidas de Confianza. Esto además motivó el
desarrollo de técnicas para la mejora en la estimación de este tipo de medidas
por medio de Redes Neuronales Recurrentes.
Todas las contribuciones presentadas se han probado en diferentes repositorios
educativos. De hecho, el toolkit transLectures-UPV es parte de un conjunto de
herramientas que sirve para generar transcripciones de clases en diferentes
universidades e instituciones españolas y europeas. / [CA] Durant els últims anys, els repositoris multimèdia en línia s'han convertit
en fonts clau de coneixement gràcies a l'expansió d'Internet, especialment en
l'àrea de l'educació. Institucions educatives de tot el món han dedicat
molts recursos en la recerca de nous mètodes d'ensenyament, tant per
millorar l'assimilació de nous coneixements, com per poder arribar a una
audiència més àmplia. Com a resultat, avui dia disposem de diferents
repositoris amb classes gravades que serveixen com a eines complementàries en
l'ensenyament, o fins i tot poden assentar una nova base a l'ensenyament a
distància. No obstant això, han de complir amb una sèrie de requisits perquè la
experiència siga totalment satisfactòria i és ací on la transcripció dels
materials juga un paper fonamental. La transcripció possibilita una recerca
precisa dels materials en els quals l'alumne està interessat, s'obri la
porta a la traducció automàtica, a funcions de recomanació, a la
generació de resums de les xerrades i el poder fer
arribar el contingut a persones amb discapacitats auditives. No obstant, la
generació d'aquestes transcripcions pot resultar molt costosa.
Amb això en ment, la present tesi té com a objectiu proporcionar noves
eines i tècniques que faciliten la transcripció d'aquests repositoris. En
particular, abordem el desenvolupament d'un conjunt d'eines de reconeixement
automàtic de la parla, amb èmfasi en les tècniques d'aprenentatge profund que
contribueixen a proporcionar transcripcions precises en casos d'estudi reals. A
més, es presenten diferents participacions en competicions internacionals on es
demostra la competitivitat del programari comparada amb altres solucions.
D'altra banda, per tal de millorar els sistemes de reconeixement, es proposa una
nova tècnica d'adaptació d'aquests sistemes a l'interlocutor basada en l'ús de
Mesures de Confiança. A més, això va motivar el desenvolupament de tècniques per
a la millora en l'estimació d'aquest tipus de mesures per mitjà de Xarxes
Neuronals Recurrents.
Totes les contribucions presentades s'han provat en diferents repositoris
educatius. De fet, el toolkit transLectures-UPV és part d'un conjunt d'eines
que serveix per generar transcripcions de classes en diferents universitats i
institucions espanyoles i europees. / [EN] During the last years, on-line multimedia repositories have become key
knowledge assets thanks to the rise of Internet and especially in the area of
education. Educational institutions around the world have devoted big efforts
to explore different teaching methods, to improve the transmission of knowledge
and to reach a wider audience. As a result, online video lecture repositories
are now available and serve as complementary tools that can boost the learning
experience to better assimilate new concepts. In order to guarantee the success
of these repositories the transcription of each lecture plays a very important
role because it constitutes the first step towards the availability of many other
features. This transcription allows the searchability of learning materials,
enables the translation into another languages, provides recommendation
functions, gives the possibility to provide content summaries, guarantees
the access to people with hearing disabilities, etc. However, the
transcription of these videos is expensive in terms of time and human cost.
To this purpose, this thesis aims at providing new tools and techniques that
ease the transcription of these repositories. In particular, we address the
development of a complete Automatic Speech Recognition Toolkit with an special
focus on the Deep Learning techniques that contribute to provide accurate
transcriptions in real-world scenarios. This toolkit is tested against many
other in different international competitions showing comparable transcription
quality. Moreover, a new technique to improve the recognition accuracy has been
proposed which makes use of Confidence Measures, and constitutes the spark that
motivated the proposal of new Confidence Measures techniques that helped to
further improve the transcription quality. To this end, a new speaker-adapted
confidence measure approach was proposed for models based on Recurrent Neural
Networks.
The contributions proposed herein have been tested in real-life scenarios in
different educational repositories. In fact, the transLectures-UPV toolkit is
part of a set of tools for providing video lecture transcriptions in many
different Spanish and European universities and institutions. / Agua Teba, MÁD. (2019). CONTRIBUTIONS TO EFFICIENT AUTOMATIC TRANSCRIPTION OF VIDEO LECTURES [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/130198 / Compendio
|
116 |
Outlier detection with ensembled LSTM auto-encoders on PCA transformed financial data / Avvikelse-detektering med ensemble LSTM auto-encoders på PCA-transformerad finansiell dataStark, Love January 2021 (has links)
Financial institutions today generate a large amount of data, data that can contain interesting information to investigate to further the economic growth of said institution. There exists an interest in analyzing these points of information, especially if they are anomalous from the normal day-to-day work. However, to find these outliers is not an easy task and not possible to do manually due to the massive amounts of data being generated daily. Previous work to solve this has explored the usage of machine learning to find outliers in these financial datasets. Previous studies have shown that the pre-processing of data usually stands for a big part in information loss. This work aims to study if there is a proper balance in how the pre-processing is carried out to retain the highest amount of information while simultaneously not letting the data remain too complex for the machine learning models. The dataset used consisted of Foreign exchange transactions supplied by the host company and was pre-processed through the use of Principal Component Analysis (PCA). The main purpose of this work is to test if an ensemble of Long Short-Term Memory Recurrent Neural Networks (LSTM), configured as autoencoders, can be used to detect outliers in the data and if the ensemble is more accurate than a single LSTM autoencoder. Previous studies have shown that Ensemble autoencoders can prove more accurate than a single autoencoder, especially when SkipCells have been implemented (a configuration that skips over LSTM cells to make the model perform with more variation). A datapoint will be considered an outlier if the LSTM model has trouble properly recreating it, i.e. a pattern that is hard to classify, making it available for further investigations done manually. The results show that the ensembled LSTM model proved to be more accurate than that of a single LSTM model in regards to reconstructing the dataset, and by our definition of an outlier, more accurate in outlier detection. The results from the pre-processing experiments reveal different methods of obtaining an optimal number of components for your data. One of those is by studying retained variance and accuracy of PCA transformation compared to model performance for a certain number of components. One of the conclusions from the work is that ensembled LSTM networks can prove very powerful, but that alternatives to pre-processing should be explored such as categorical embedding instead of PCA. / Finansinstitut genererar idag en stor mängd data, data som kan innehålla intressant information värd att undersöka för att främja den ekonomiska tillväxten för nämnda institution. Det finns ett intresse för att analysera dessa informationspunkter, särskilt om de är avvikande från det normala dagliga arbetet. Att upptäcka dessa avvikelser är dock inte en lätt uppgift och ej möjligt att göra manuellt på grund av de stora mängderna data som genereras dagligen. Tidigare arbete för att lösa detta har undersökt användningen av maskininlärning för att upptäcka avvikelser i finansiell data. Tidigare studier har visat på att förbehandlingen av datan vanligtvis står för en stor del i förlust av emphinformation från datan. Detta arbete syftar till att studera om det finns en korrekt balans i hur förbehandlingen utförs för att behålla den högsta mängden information samtidigt som datan inte förblir för komplex för maskininlärnings-modellerna. Det emphdataset som användes bestod av valutatransaktioner som tillhandahölls av värdföretaget och förbehandlades genom användning av Principal Component Analysis (PCA). Huvudsyftet med detta arbete är att undersöka om en ensemble av Long Short-Term Memory Recurrent Neural Networks (LSTM), konfigurerad som autoenkodare, kan användas för att upptäcka avvikelser i data och om ensemblen är mer precis i sina predikteringar än en ensam LSTM-autoenkodare. Tidigare studier har visat att en ensembel avautoenkodare kan visa sig vara mer precisa än en singel autokodare, särskilt när SkipCells har implementerats (en konfiguration som hoppar över vissa av LSTM-cellerna för att göra modellerna mer varierade). En datapunkt kommer att betraktas som en avvikelse om LSTM-modellen har problem med att återskapa den väl, dvs ett mönster som nätverket har svårt att återskapa, vilket gör datapunkten tillgänglig för vidare undersökningar. Resultaten visar att en ensemble av LSTM-modeller predikterade mer precist än en singel LSTM-modell när det gäller att återskapa datasetet, och då enligt vår definition av avvikelser, mer precis avvikelse detektering. Resultaten från förbehandlingen visar olika metoder för att uppnå ett optimalt antal komponenter för dina data genom att studera bibehållen varians och precision för PCA-transformation jämfört med modellprestanda. En av slutsatserna från arbetet är att en ensembel av LSTM-nätverk kan visa sig vara mycket kraftfulla, men att alternativ till förbehandling bör undersökas, såsom categorical embedding istället för PCA.
|
117 |
Deep Learning in the Web Browser for Wind Speed Forecasting using TensorFlow.js / Djupinlärning i Webbläsaren för Vindhastighetsprognoser med TensorFlow.jsMoazez Gharebagh, Sara January 2023 (has links)
Deep Learning is a powerful and rapidly advancing technology that has shown promising results within the field of weather forecasting. Implementing and using deep learning models can however be challenging due to their complexity. One approach to potentially overcome the challenges with deep learning is to run deep learning models directly in the web browser. This approach introduces several advantages, including accessibility, data privacy, and the ability to access device sensors. The ability to run deep learning models on the web browser thus opens new possibilities for research and development in areas such as weather forecasting. In this thesis, two deep learning models that run in the web browser are implemented using JavaScript and TensorFlow.js to predict wind speed in the near future. Specifically, the application of Long Short-Term Memory and Gated Recurrent Units models are investigated. The results demonstrate that both the Long Short-Term Memory and Gated Recurrent Units models achieve similar performance and are able to generate predictions that closely align with the expected patterns when the variations in the data are less significant. The best performing Long Short-Term Memory model achieved a mean squared error of 0.432, a root mean squared error of 0.657 and a mean average error of 0.459. The best performing Gated Recurrent Units model achieved a mean squared error of 0.435, a root mean squared error of 0.660 and a mean average error of 0.461. / Djupinlärning är en kraftfull teknik som genomgår snabb utveckling och har uppnått lovande resultat inom väderprognoser. Att implementera och använda djupinlärningsmodeller kan dock vara utmanande på grund av deras komplexitet. Ett möjligt sätt att möta utmaningarna med djupinlärning är att köra djupinlärningsmodeller direkt i webbläsaren. Detta sätt medför flera fördelar, inklusive tillgänglighet, dataintegritet och möjligheten att använda enhetens egna sensorer. Att kunna köra djupinlärningsmodeller i webbläsaren bidrar därför med möjligheter för forskning och utveckling inom områden såsom väderprognoser. I denna studie implementeras två djupinlärningsmodeller med JavaScript och TensorFlow.js som körs i webbläsaren för att prediktera vindhastighet i en nära framtid. Specifikt undersöks tillämpningen av modellerna Long Short-Term Memory och Gated Recurrent Units. Resultaten visar att både Long Short-Term Memory och Gated Recurrent Units modellerna presterar lika bra och kan generera prediktioner som är nära förväntade mönster när variationen i datat är mindre signifikant. Den Long Short-Term Memory modell som presterade bäst uppnådde en mean squared error på 0.432, en root mean squared error på 0.657 och en mean average error på 0.459. Den Gated Recurrent Units modell som presterade bäst uppnådde en mean squared error på 0.435, en root mean squared error på 0.660 och en mean average error på 0.461.
|
118 |
MahlerNet : Unbounded Orchestral Music with Neural Networks / Orkestermusik utan begränsning med neurala nätverkLousseief, Elias January 2019 (has links)
Modelling music with mathematical and statistical methods in general, and with neural networks in particular, has a long history and has been well explored in the last decades. Exactly when the first attempt at strictly systematic music took place is hard to say; some would say in the days of Mozart, others would say even earlier, but it is safe to say that the field of algorithmic composition has a long history. Even though composers have always had structure and rules as part of the writing process, implicitly or explicitly, following rules at a stricter level was well investigated in the middle of the 20th century at which point also the first music writing computer program based on mathematics was implemented. This work in computer science focuses on the history of musical composition with computers, also known as algorithmic composition, using machine learning and neural networks and consists of two parts: a literature survey covering in-depth the last decades in the field from which is drawn inspiration and experience to construct MahlerNet, a neural network based on the previous architectures MusicVAE, BALSTM, PerformanceRNN and BachProp, capable of modelling polyphonic symbolic music with up to 23 instruments. MahlerNet is a new architecture that uses a custom preprocessor with musical heuristics to normalize and filter the input and output files in MIDI format into a data representation that it uses for processing. MahlerNet, and its preprocessor, was written altogether for this project and produces music that clearly shows musical characteristics reminiscent of the data it was trained on, with some long-term structure, albeit not in the form of motives and themes. / Matematik och statistik i allmänhet, och maskininlärning och neurala nätverk i synnerhet, har sedan långt tillbaka använts för att modellera musik med en utveckling som kulminerat under de senaste decennierna. Exakt vid vilken historisk tidpunkt som musikalisk komposition för första gången tillämpades med strikt systematiska regler är svårt att säga; vissa skulle hävda att det skedde under Mozarts dagar, andra att det skedde redan långt tidigare. Oavsett vilket, innebär det att systematisk komposition är en företeelse med lång historia. Även om kompositörer i alla tider följt strukturer och regler, medvetet eller ej, som en del av kompositionsprocessen började man under 1900-talets mitt att göra detta i högre utsträckning och det var också då som de första programmen för musikalisk komposition, baserade på matematik, kom till. Den här uppsatsen i datateknik behandlar hur musik historiskt har komponerats med hjälp av datorer, ett område som också är känt som algoritmisk komposition. Uppsatsens fokus ligger på användning av maskininlärning och neurala nätverk och består av två delar: en litteraturstudie som i hög detalj behandlar utvecklingen under de senaste decennierna från vilken tas inspiration och erfarenheter för att konstruera MahlerNet, ett neuralt nätverk baserat på de tidigare modellerna MusicVAE, BALSTM, PerformanceRNN och BachProp. MahlerNet kan modellera polyfon musik med upp till 23 instrument och är en ny arkitektur som kommer tillsammans med en egen preprocessor som använder heuristiker från musikteori för att normalisera och filtrera data i MIDI-format till en intern representation. MahlerNet, och dess preprocessor, är helt och hållet implementerade för detta arbete och kan komponera musik som tydligt uppvisar egenskaper från den musik som nätverket tränats på. En viss kontinuitet finns i den skapade musiken även om det inte är i form av konkreta teman och motiv.
|
119 |
Dynamic Student Embeddings for a Stable Time Dimension in Knowledge TracingTump, Clara January 2020 (has links)
Knowledge tracing is concerned with tracking a student’s knowledge as she/he engages with exercises in an (online) learning platform. A commonly used state-of-theart knowledge tracing model is Deep Knowledge Tracing (DKT) which models the time dimension as a sequence of completed exercises per student by using a Long Short-Term Memory Neural Network (LSTM). However, a common problem in this sequence-based model is too much instability in the time dimension of the modelled knowledge of a student. In other words, the student’s knowledge on a skill changes too quickly and unreliably. We propose dynamic student embeddings as a stable method for encoding the time dimension of knowledge tracing systems. In this method the time dimension is encoded in time slices of a fixed size, while the model’s loss function is designed to smoothly align subsequent time slices. We compare the dynamic student embeddings to DKT on a large-scale real-world dataset, and we show that dynamic student embeddings provide a more stable knowledge tracing while retaining good performance. / Kunskapsspårning handlar om att modellera en students kunskaper då den arbetar med uppgifter i en (online) lärplattform. En vanlig state-of-the-art kunskapsspårningsmodell är Deep Knowledge Tracing (DKT) vilken modellerar tidsdimensionen som en sekvens av avslutade uppgifter per student med hjälp av ett neuronnät kallat Long Short-Term Memory Neural Network (LSTM). Ett vanligt problem i dessa sekvensbaserade modeller är emellertid en för stor instabilitet i tidsdimensionen för studentens modellerade kunskap. Med andra ord, studentens kunskaper förändras för snabbt och otillförlitligt. Vi föreslår därför Dynamiska Studentvektorer som en stabil metod för kodning av tidsdimensionen för kunskapsspårningssystem. I denna metod kodas tidsdimensionen i tidsskivor av fix storlek, medan modellens förlustfunktion är utformad för att smidigt justera efterföljande tidsskivor. I denna uppsats jämför vi de Dynamiska Studentvektorer med DKT i en storskalig verklighetsbaserad dataset, och visar att Dynamiska Studentvektorer tillhandahåller en stabilare kunskapsspårning samtidigt som prestandan bibehålls.
|
120 |
Réseaux de neurones à relaxation entraînés par critère d'autoencodeur débruitantSavard, François 08 1900 (has links)
L’apprentissage machine est un vaste domaine où l’on cherche à apprendre les paramètres
de modèles à partir de données concrètes. Ce sera pour effectuer des tâches demandant
des aptitudes attribuées à l’intelligence humaine, comme la capacité à traiter des don-
nées de haute dimensionnalité présentant beaucoup de variations. Les réseaux de neu-
rones artificiels sont un exemple de tels modèles. Dans certains réseaux de neurones dits
profonds, des concepts "abstraits" sont appris automatiquement.
Les travaux présentés ici prennent leur inspiration de réseaux de neurones profonds,
de réseaux récurrents et de neuroscience du système visuel. Nos tâches de test sont
la classification et le débruitement d’images quasi binaires. On permettra une rétroac-
tion où des représentations de haut niveau (plus "abstraites") influencent des représentations à bas niveau. Cette influence s’effectuera au cours de ce qu’on nomme relaxation,
des itérations où les différents niveaux (ou couches) du modèle s’interinfluencent. Nous
présentons deux familles d’architectures, l’une, l’architecture complètement connectée,
pouvant en principe traiter des données générales et une autre, l’architecture convolutionnelle, plus spécifiquement adaptée aux images. Dans tous les cas, les données utilisées
sont des images, principalement des images de chiffres manuscrits.
Dans un type d’expérience, nous cherchons à reconstruire des données qui ont été
corrompues. On a pu y observer le phénomène d’influence décrit précédemment en comparant le résultat avec et sans la relaxation. On note aussi certains gains numériques et
visuels en terme de performance de reconstruction en ajoutant l’influence des couches
supérieures. Dans un autre type de tâche, la classification, peu de gains ont été observés.
On a tout de même pu constater que dans certains cas la relaxation aiderait à apprendre
des représentations utiles pour classifier des images corrompues. L’architecture convolutionnelle développée, plus incertaine au départ, permet malgré tout d’obtenir des reconstructions numériquement et visuellement semblables à celles obtenues avec l’autre
architecture, même si sa connectivité est contrainte. / Machine learning is a vast field where we seek to learn parameters for models from
concrete data. The goal will be to execute various tasks requiring abilities normally
associated more with human intelligence than with a computer program, such as the
ability to process high dimensional data containing a lot of variations. Artificial neural
networks are a large class of such models. In some neural networks said to be deep, we
can observe that high level (or "abstract") concepts are automatically learned.
The work we present here takes its inspiration from deep neural networks, from
recurrent networks and also from neuroscience of the visual system. Our test tasks are
classification and denoising for near binary images. We aim to take advantage of a
feedback mechanism through which high-level representations, that is to say relatively
abstract concepts, can influence lower-level representations. This influence will happen
during what we call relaxation, which is iterations where the different levels (or layers)
of the model can influence each other. We will present two families of architectures
based on this mechanism. One, the fully connected architecture, can in principle accept
generic data. The other, the convolutional one, is specifically made for images. Both
were trained on images, though, and mostly images of written characters.
In one type of experiment, we want to reconstruct data that has been corrupted. In
these tasks, we have observed the feedback influence phenomenon previously described
by comparing the results we obtained with and without relaxation. We also note some
numerical and visual improvement in terms of reconstruction performance when we add
upper layers’ influence. In another type of task, classification, little gain has been noted.
Still, in one setting where we tried to classify noisy data with a representation trained
without prior class information, relaxation did seem to improve results significantly. The
convolutional architecture, a bit more risky at first, was shown to produce numerical and
visual results in reconstruction that are near those obtained with the fully connected
version, even though the connectivity is much more constrained.
|
Page generated in 0.0942 seconds