Global ETD Search

1	Recognition of nonstationary signals with particular reference to the piano Carrasco, Ana Cristina Pereira Rosa Pascoalinho January 2002 (has links) No description available. 786 Automatic music transcription
2	Automatic music transcription using structure and sparsity O'Hanlon, Ken January 2014 (has links) Automatic Music Transcription seeks a machine understanding of a musical signal in terms of pitch-time activations. One popular approach to this problem is the use of spectrogram decompositions, whereby a signal matrix is decomposed over a dictionary of spectral templates, each representing a note. Typically the decomposition is performed using gradient descent based methods, performed using multiplicative updates based on Non-negative Matrix Factorisation (NMF). The final representation may be expected to be sparse, as the musical signal itself is considered to consist of few active notes. In this thesis some concepts that are familiar in the sparse representations literature are introduced to the AMT problem. Structured sparsity assumes that certain atoms tend to be active together. In the context of AMT this affords the use of subspace modelling of notes, and non-negative group sparse algorithms are proposed in order to exploit the greater modelling capability introduced. Stepwise methods are often used for decomposing sparse signals and their use for AMT has previously been limited. Some new approaches to AMT are proposed by incorporation of stepwise optimal approaches with promising results seen. Dictionary coherence is used to provide recovery conditions for sparse algorithms. While such guarantees are not possible in the context of AMT, it is found that coherence is a useful parameter to consider, affording improved performance in spectrogram decompositions. 621.382
3	Language of music : a computational model of music interpretation McLeod, Andrew Philip January 2018 (has links) Automatic music transcription (AMT) is commonly defined as the process of converting an acoustic musical signal into some form of musical notation, and can be split into two separate phases: (1) multi-pitch detection, the conversion of an audio signal into a time-frequency representation similar to a MIDI file; and (2) converting from this time-frequency representation into a musical score. A substantial amount of AMT research in recent years has concentrated on multi-pitch detection, and yet, in the case of the transcription of polyphonic music, there has been little progress. There are many potential reasons for this slow progress, but this thesis concentrates on the (lack of) use of music language models during the transcription process. In particular, a music language model would impart to a transcription system the background knowledge of music theory upon which a human transcriber relies. In the related field of automatic speech recognition, it has been shown that the use of a language model drawn from the field of natural language processing (NLP) is an essential component of a system for transcribing spoken word into text, and there is no reason to believe that music should be any different. This thesis will show that a music language model inspired by NLP techniques can be used successfully for transcription. In fact, this thesis will create the blueprint for such a music language model. We begin with a brief overview of existing multi-pitch detection systems, in particular noting four key properties which any music language model should have to be useful for integration into a joint system for AMT: it should (1) be probabilistic, (2) not use any data a priori, (3) be able to run on live performance data, and (4) be incremental. We then investigate voice separation, creating a model which achieves state-of-the-art performance on the task, and show that, used as a simple music language model, it improves multi-pitch detection performance significantly. This is followed by an investigation of metrical detection and alignment, where we introduce a grammar crafted for the task which, combined with a beat-tracking model, achieves state-of-the-art results on metrical alignment. This system's success adds more evidence to the long-existing hypothesis that music and language consist of extremely similar structures. We end by investigating the joint analysis of music, in particular showing that a combination of our two models running jointly outperforms each running independently. We also introduce a new joint, automatic, quantitative metric for the complete transcription of an audio recording into an annotated musical score, something which the field currently lacks.
4	New Approaches to Optical Music Recognition Alfaro-Contreras, María 22 September 2023 (has links) El Reconocimiento Óptico de Música (Optical Music Recognition, OMR) es un campo de investigación que estudia cómo leer computacionalmente la notación musical presente en documentos y almacenarla en un formato digital estructurado. Los enfoques tradicionales de OMR suelen estructurarse en torno a un proceso de varias etapas: (i) preprocesamiento de imágenes, donde se abordan cuestiones relacionadas con el proceso de escaneado y la calidad del papel, (ii) segmentación y clasificación de símbolos, donde se detectan y etiquetan los distintos elementos de la imagen, (iii) reconstrucción de la notación musical, una fase de postprocesamiento del proceso de reconocimiento, y (iv) codificación de resultados, donde los elementos reconocidos se almacenan en un formato simbólico adecuado. Estos sistemas logran tasas de reconocimiento competitivas a costa de utilizar determinadas heurísticas, adaptadas a los casos para los que fueron diseñados. En consecuencia, la escalabilidad se convierte en una limitación importante, ya que para cada colección o tipo notacional es necesario diseñar un nuevo conjunto de heurísticas. Además, otro inconveniente de estos enfoques tradicionales es la necesidad de un etiquetado detallado, a menudo obtenido manualmente. Dado que cada símbolo se reconoce individualmente, se requieren las posiciones exactas de cada uno de ellos, junto con sus correspondientes etiquetas musicales. Los enfoques tradicionales de OMR suelen estructurarse en torno a un proceso de varias etapas: (i) preprocesamiento de imágenes, donde se abordan cuestiones relacionadas con el proceso de escaneado y la calidad del papel, (ii) segmentación y clasificación de símbolos, donde se detectan y etiquetan los distintos elementos de la imagen, (iii) reconstrucción de la notación musical, una fase de postprocesamiento del proceso de reconocimiento, y (iv) codificación de resultados, donde los elementos reconocidos se almacenan en un formato simbólico adecuado. Estos sistemas logran tasas de reconocimiento competitivas a costa de utilizar determinadas heurísticas, adaptadas a los casos para los que fueron diseñados. En consecuencia, la escalabilidad se convierte en una limitación importante, ya que para cada colección o tipo notacional es necesario diseñar un nuevo conjunto de heurísticas. Además, otro inconveniente de estos enfoques tradicionales es la necesidad de un etiquetado detallado, a menudo obtenido manualmente. Dado que cada símbolo se reconoce individualmente, se requieren las posiciones exactas de cada uno de ellos, junto con sus correspondientes etiquetas musicales. La incorporación del Aprendizaje Profundo (Deep Learning, DL) en el OMR ha producido un cambio hacia el uso de sistemas holísticos o de extremo a extremo basados en redes neuronales para la etapa de segmentación y clasificación de símbolos, tratando el proceso de reconocimiento como un único paso en lugar de dividirlo en distintas subtareas. Al aprender simultáneamente los procesos de extracción de características y clasificación, estas soluciones eliminan la necesidad de diseñar procesos específicos para cada caso: las características necesarias para la clasificación se infieren directamente de los datos. Para lograrlo, solo son necesarios pares de entrenamiento formados por la imagen de entrada y su correspondiente transcripción. En otras palabras, este enfoque evita la necesidad de anotar las posiciones exactas de los símbolos, lo que simplifica aún más el proceso de transcripción. El enfoque de extremo a extremo ha sido recientemente explorado en la literatura, pero siempre bajo la suposición de que un determinado preproceso ya ha segmentado los diferentes pentagramas de una partitura. El objetivo es, por tanto, recuperar la serie de símbolos musicales que aparecen en una imagen de un pentagrama. En este contexto, las Redes Neuronales Convolucionales Recurrentes (Convolutional Recurrent Neural Networks, CRNN) representan el estado del arte: el bloque convolucional se encarga de extraer características relevantes de la imagen de entrada, mientras que las capas recurrentes interpretan estas características en términos de secuencias de símbolos musicales. Las CRNN se entrenan principalmente utilizando la función de pérdida de Clasificación Temporal Conexionista (Connectionist Temporal Classification, CTC), la cual permite el entrenamiento sin requerir información explícita sobre la ubicación de los símbolos en la imagen. Para la etapa de inferencia, generalmente se emplea una política de decodificación voraz, es decir, se recupera la secuencia de mayor probabilidad. Esta tesis presenta una serie de contribuciones, organizadas en tres grupos distintos pero interconectados, que avanzan en el desarrollo de sistemas de OMR a nivel de pentagrama más robustos y generalizables. El primer grupo de contribuciones se centra en la reducción del esfuerzo humano al utilizar sistemas de OMR. Se comparan los tiempos de transcripción con y sin la ayuda de un sistema de OMR, observando que su uso acelera el proceso, aunque requiere una cantidad suficiente de datos etiquetados, lo cual implica un esfuerzo humano. Por lo tanto, se propone utilizar técnicas de Aprendizaje Auto- Supervisado (Self-Supervised Learning, SSL) para preentrenar un clasificador de símbolos, logrando una precisión superior al 80% al utilizar solo un ejemplo por clase en el entrenamiento. Este clasificador de símbolos puede acelerar el proceso de etiquetado de datos. El segundo grupo de contribuciones mejora el rendimiento de los sistemas de OMR de dos maneras. Por un lado, se propone una codificación musical que permite reconocer música monofónica y homofónica. Por otro lado, se mejora el rendimiento de los sistemas mediante el uso de la bidimensionalidad de la representación agnóstica, introduciendo tres cambios en el enfoque estándar: (i) una nueva arquitectura que incluye ramas específicas para captura características relacionadas con la forma (duración del evento) o la altura (tono) de los símbolos musicales, (ii) el uso de una representación de secuencia dividida, que requiere que el modelo prediga los atributos de forma y altura de manera secuencial, y (iii) un algoritmo de decodificación voraz personalizado que garantiza que la representación mencionada se cumple en la secuencia predicha. El tercer y último grupo de contribuciones explora las sinergias entre OMR y su equivalente en audio, la Transcripción Automática de Música (Automatic Music Transcription, AMT). Estas contribuciones confirman la existencia de sinergias entre ambos campos y evalúan distintos enfoques de fusión tardía para la transcripción multimodal, lo que se traduce en mejoras significativas en la precisión de la transcripción. Por último, la tesis concluye comparando los enfoques de fusión temprana y fusión tardía, y afirma que la fusión tardía ofrece más flexibilidad y mejor rendimiento. / Esta tesis ha sido financiada por el Ministerio de Universidades a través del programa de ayudas para la formación de profesorado universitario (Ref. FPU19/04957). Deep Learning Optical Music Recognition Automatic Music Transcription
5	Computational methods for the alignment and score-informed transcription of piano music Wang, Siying January 2017 (has links) This thesis is concerned with computational methods for alignment and score-informed transcription of piano music. Firstly, several methods are proposed to improve the alignment robustness and accuracywhen various versions of one piece of music showcomplex differences with respect to acoustic conditions or musical interpretation. Secondly, score to performance alignment is applied to enable score-informed transcription. Although music alignment methods have considerably improved in accuracy in recent years, the task remains challenging. The research in this thesis aims to improve the robustness for some cases where there are substantial differences between versions and state-of-the-art methods may fail in identifying a correct alignment. This thesis first exploits the availability of multiple versions of the piece to be aligned. By processing these jointly, the alignment process can be stabilised by exploiting additional examples of how a section might be interpreted or which acoustic conditions may arise. Two methods are proposed, progressive alignment and profile HMM, both adapted from the multiple biological sequence alignment task. Experiments demonstrate that these methods can indeed improve the alignment accuracy and robustness over comparable pairwise methods. Secondly, this thesis presents a score to performance alignment method that can improve the robustness in cases where some musical voices, such as the melody, are played asynchronously to others - a stylistic device used in musical expression. The asynchronies between the melody and the accompaniment are handled by treating the voices as separate timelines in a multi-dimensional variant of dynamic time warping (DTW). The method measurably improves the alignment accuracy for pieces with asynchronous voices and preserves the accuracy otherwise. Once an accurate alignment between a score and an audio recording is available, the score information can be exploited as prior knowledge in automatic music transcription (AMT), for scenarios where score is available, such as music tutoring. Score-informed dictionary learning is used to learn the spectral pattern of each pitch that describes the energy distribution of the associated notes in the recording. More precisely, the dictionary learning process in non-negative matrix factorization (NMF) is constrained using the aligned score. This way, by adapting the dictionary to a given recording, the proposed method improves the accuracy over the state-of-the-art.
6	Nástroj pro podporu cvičení klavírních skladeb / Tool for Piano Practising Ustinov, Nikita January 2020 (has links) The purpose of this work is to design and implement an application for piano practice. The main disadvantage of existing applications is the limited selection of songs that can be practiced. The application, which is the result of this work, will solve the problem - the user will be able to upload to the application any piano record he wants to practice, and the application will take care of creating a training process. The training process consists of showing the user a certain way of editing notes and simultaneously controlling what the user is playing with a microphone. The biggest challenge of the diploma thesis was to find a way to accurately and quickly classify the audio signal from the microphone. This problem was solved using two independent neural networks with different architectures, which were trained on different data sets. To justify the chosen solution, all the necessary theoretical musical and scientific concepts and methods that are directly related to it will be presented. The resulting application will be tested in three respects: accuracy, speed, the usability of the user.
7	Nástroj pro podporu cvičení klavírních skladeb / Tool for Piano Practising Ustinov, Nikita January 2021 (has links) The purpose of this work is to design and implement an application for piano practice. The main disadvantage of existing applications is the limited selection of songs that can be practiced. The application, which is the result of this work, will solve the problem - the user will be able to upload to the application any piano record he wants to practice, and the application will take care of creating a training process. The training process consists of showing the user a certain way of editing notes and simultaneously controlling what the user is playing with a microphone. The biggest challenge of the diploma thesis was to find a way to accurately and quickly classify the audio signal from the microphone. This problem was solved using two independent neural networks with different architectures, which were trained on different data sets. To justify the chosen solution, all the necessary theoretical musical and scientific concepts and methods that are directly related to it will be presented. The resulting application will be tested in next respects: accuracy, speed, noise resistance and the usability of the user.
8	Automatic Music Transcription with Convolutional Neural Networks Using Intuitive Filter Shapes Sleep, Jonathan 01 October 2017 (has links) (PDF) This thesis explores the challenge of automatic music transcription with a combination of digital signal processing and machine learning methods. Automatic music transcription is important for musicians who can't do it themselves or find it tedious. We start with an existing model, designed by Sigtia, Benetos and Dixon, and develop it in a number of original ways. We find that by using convolutional neural networks with filter shapes more tailored for spectrogram data, we see better and faster transcription results when evaluating the new model on a dataset of classical piano music. We also find that employing better practices shows improved results. Finally, we open-source our test bed for pre-processing, training, and testing the models to assist in future research. Music Automatic Music Transcription Machine Learning Convolutional Neural Neural Networks Digital Signal Processing Artificial Intelligence and Robotics Music
9	Automatic Music Transcription based on Prior Knowledge from Musical Acoustics. Application to the repertoires of the Marovany zither of Madagascar / Transcription automatique de musique basé sur des connaissances a prior issues de l'Acoustique Musicale. Application aux répertoires de la cithare marovany de Madagascar Cazau, Dorian 12 October 2015 (has links) L’ethnomusicologie est l’étude de la musique en mettant l’accent sur les aspects culturels, sociaux, matériels, cognitifs et/ou biologiques. Ce sujet de thèse, motivé par Pr. Marc Chemillier, ethnomusicologue au laboratoire CAMS-EHESS, traite du développement d’un système automatique de transcription dédié aux répertoires de musique de la cithare marovany de Madagascar. Ces répertoires sont transmis oralement, résultant d’un processus de mémorisation/ transformation de motifs musicaux de base. Ces motifs sont un patrimoine culturel important du pays, et évoluent en permanence sous l’influence d’autres pratiques et genres musicaux. Les études ethnomusicologiques actuelles visent à comprendre l’évolution du répertoire traditionnel, et de préserver ce patrimoine. Pour servir cette cause, notre travail consiste à fournir des outils informatiques d’analyse musicale pour organiser et structurer des enregistrements audio de cet instrument. La transcription automatique de musique consiste à estimer les notes d’un enregistrement à travers les trois attributs : temps de début, hauteur et durée de note. Notre travail sur cette thématique repose sur l’incorporation de connaissances musicales a priori dans les systèmes informatiques. Une première étape de cette thèse fût donc de générer cette connaissance et de la formaliser en vue de cette incorporation. Cette connaissance explorer les caractéristiques multi-modales du signal musical, incluant le timbre, le langage musical et les techniques de jeu. La recherche effectée dans cette thèse se distingue en deux axes : un premier plus appliqué, consistant à développer un système de transcription de musique dédié à la marovany, et un second plus fondamental, consistant à fournir une analyse plus approfondie des contributions de la connaissance dans la transcription automatique de musique. Notre premier axe de recherche requiert une précision de transcription très bonne (c.a.d. une F-measure supérieure à 95 % avec des tolérances d’erreur standardes) pour faire office de supports analytiques dans des études musicologiques. Pour cela, nous utilisons une technologie de captation multicanale appliquée aux instruments à cordes pincées. Les systèmes développés à partir de cette technologie utilisent un capteur par corde, permettant de décomposer un signal polyphonique en une somme de signaux monophoniques respectifs à chaque corde, ce qui simplifie grandement la tâche de transcription. Différents types de capteurs (optiques, piézoélectriques, électromagnétiques) ont été testés. Après expérimentation, les capteurs piézoélectriques, bien qu’invasifs, se sont avérés avoir les meilleurs rapports signal-sur-bruit et séparabilité inter-capteurs. Cette technologie a aussi permis le développement d’une base de données dite “ground truth" (vérité de terrain), indispensable pour l’évaluation quantitative des systèmes de transcription de musique. Notre second axe de recherche propose des investigations plus approfondies concernant l’incorporation de connaissance a priori dans les systèmes automatiques de transcription de musique. Deux méthodes statistiques ont été utilisées comme socle théorique, à savoir le PLCA (Probabilistic Latent Component Analysis) pour l’estimation multi-pitch et le HMM (Hidden Markov Models). / Ethnomusicology is the study of musics around the world that emphasize their cultural, social, material, cognitive and/or biological. This PhD sub- ject, initiated by Pr. Marc CHEMILLIER, ethnomusicolog at the laboratory CAMS-EHESS, deals with the development of an automatic transcription system dedicated to the repertoires of the traditional marovany zither from Madagascar. These repertoires are orally transmitted, resulting from a pro- cess of memorization/transformation of original base musical motives. These motives represent an important culture patrimony, and are evolving contin- ually under the inuences of other musical practices and genres mainly due to globalization. Current ethnomusicological studies aim at understanding the evolution of the traditional repertoire through the transformation of its original base motives, and preserving this patrimony. Our objectives serve this cause by providing computational tools of musical analysis to organize and structure audio recordings of this instrument. Automatic Music Transcription (AMT) consists in automatically estimating the notes in a recording, through three attributes: onset time, duration and pitch. On the long range, AMT systems, with the purpose of retrieving meaningful information from complex audio, could be used in a variety of user scenarios such as searching and organizing music collections with barely any human labor. One common denominator of our diferent approaches to the task of AMT lays in the use of explicit music-related prior knowledge in our computational systems. A step of this PhD thesis was then to develop tools to generate automatically this information. We chose not to restrict ourselves to a speciprior knowledge class, and rather explore the multi-modal characteristics of musical signals, including both timbre (i.e. modeling of the generic \morphological" features of the sound related to the physics of an instrument, e.g. intermodulation, sympathetic resonances, inharmonicity) and musicological (e.g. harmonic transition, playing dynamics, tempo and rhythm) classes. This prior knowledge can then be used in com- putational systems of transcriptions. The research work on AMT performed in this PhD can be divided into a more \applied research" (axis 1), with the development of ready-to-use operational transcription tools meeting the cur- rent needs of ethnomusicologs to get reliable automatic transcriptions, and a more \basic research" (axis 2), providing deeper insight into the functioning of these tools. Our axis of research requires a transcription accuracy high enough 1 (i.e. average F-measure superior to 95 % with standard error tolerances) to provide analytical supports for musicological studies. Despite a large enthusiasm for AMT challenges, and several audio-to-MIDI converters available commercially, perfect polyphonic AMT systems are out of reach of today's al- gorithms. In this PhD, we explore the use of multichannel capturing sensory systems for AMT of several acoustic plucked string instruments, including the following traditional African zithers: the marovany (Madagascar), the Mvet (Cameroun), the N'Goni (Mali). These systems use multiple string- dependent sensors to retrieve discriminatingly some physical features of their vibrations. For the AMT task, such a system has an obvious advantage in this application, as it allows breaking down a polyphonic musical signal into the sum of monophonic signals respective to each string. Transcription Automatique de Musique Modélisation acoustique et statistique Apprentissage Machine Instruments à cordes pincées Acoustique Musicale Analyse musicologique informatisée Automatic Music Transcription Statistical modeling and learning Musical Acoustics knowledge 620.2
10	Electric Guitar to MIDI Conversion / Electric Guitar to MIDI Conversion Klčo, Michal January 2018 (has links) Automatický přepis hudby a odhad vícero znějících tónu jsou stále výzvou v oblasti dolování informací z hudby. Moderní systémy jsou založeny na různých technikách strojového učení pro dosažení co nejpřesnějšího přepisu hudby. Některé z nich jsou také omezeny na konkrétní hudební nástroj nebo hudební žánr, aby se snížila rozmanitost analyzovaného zvuku. V této práci je navrženo, vyhodnoceno a porovnáváno několik systémů pro konverzi nahrávek elektrické kytary do MIDI souború, založených na různých technikách strojového učení a technikách spektrální analýzy.

Search results