Global ETD Search

11	New Approaches to Optical Music Recognition Alfaro-Contreras, María 22 September 2023 (has links) El Reconocimiento Óptico de Música (Optical Music Recognition, OMR) es un campo de investigación que estudia cómo leer computacionalmente la notación musical presente en documentos y almacenarla en un formato digital estructurado. Los enfoques tradicionales de OMR suelen estructurarse en torno a un proceso de varias etapas: (i) preprocesamiento de imágenes, donde se abordan cuestiones relacionadas con el proceso de escaneado y la calidad del papel, (ii) segmentación y clasificación de símbolos, donde se detectan y etiquetan los distintos elementos de la imagen, (iii) reconstrucción de la notación musical, una fase de postprocesamiento del proceso de reconocimiento, y (iv) codificación de resultados, donde los elementos reconocidos se almacenan en un formato simbólico adecuado. Estos sistemas logran tasas de reconocimiento competitivas a costa de utilizar determinadas heurísticas, adaptadas a los casos para los que fueron diseñados. En consecuencia, la escalabilidad se convierte en una limitación importante, ya que para cada colección o tipo notacional es necesario diseñar un nuevo conjunto de heurísticas. Además, otro inconveniente de estos enfoques tradicionales es la necesidad de un etiquetado detallado, a menudo obtenido manualmente. Dado que cada símbolo se reconoce individualmente, se requieren las posiciones exactas de cada uno de ellos, junto con sus correspondientes etiquetas musicales. Los enfoques tradicionales de OMR suelen estructurarse en torno a un proceso de varias etapas: (i) preprocesamiento de imágenes, donde se abordan cuestiones relacionadas con el proceso de escaneado y la calidad del papel, (ii) segmentación y clasificación de símbolos, donde se detectan y etiquetan los distintos elementos de la imagen, (iii) reconstrucción de la notación musical, una fase de postprocesamiento del proceso de reconocimiento, y (iv) codificación de resultados, donde los elementos reconocidos se almacenan en un formato simbólico adecuado. Estos sistemas logran tasas de reconocimiento competitivas a costa de utilizar determinadas heurísticas, adaptadas a los casos para los que fueron diseñados. En consecuencia, la escalabilidad se convierte en una limitación importante, ya que para cada colección o tipo notacional es necesario diseñar un nuevo conjunto de heurísticas. Además, otro inconveniente de estos enfoques tradicionales es la necesidad de un etiquetado detallado, a menudo obtenido manualmente. Dado que cada símbolo se reconoce individualmente, se requieren las posiciones exactas de cada uno de ellos, junto con sus correspondientes etiquetas musicales. La incorporación del Aprendizaje Profundo (Deep Learning, DL) en el OMR ha producido un cambio hacia el uso de sistemas holísticos o de extremo a extremo basados en redes neuronales para la etapa de segmentación y clasificación de símbolos, tratando el proceso de reconocimiento como un único paso en lugar de dividirlo en distintas subtareas. Al aprender simultáneamente los procesos de extracción de características y clasificación, estas soluciones eliminan la necesidad de diseñar procesos específicos para cada caso: las características necesarias para la clasificación se infieren directamente de los datos. Para lograrlo, solo son necesarios pares de entrenamiento formados por la imagen de entrada y su correspondiente transcripción. En otras palabras, este enfoque evita la necesidad de anotar las posiciones exactas de los símbolos, lo que simplifica aún más el proceso de transcripción. El enfoque de extremo a extremo ha sido recientemente explorado en la literatura, pero siempre bajo la suposición de que un determinado preproceso ya ha segmentado los diferentes pentagramas de una partitura. El objetivo es, por tanto, recuperar la serie de símbolos musicales que aparecen en una imagen de un pentagrama. En este contexto, las Redes Neuronales Convolucionales Recurrentes (Convolutional Recurrent Neural Networks, CRNN) representan el estado del arte: el bloque convolucional se encarga de extraer características relevantes de la imagen de entrada, mientras que las capas recurrentes interpretan estas características en términos de secuencias de símbolos musicales. Las CRNN se entrenan principalmente utilizando la función de pérdida de Clasificación Temporal Conexionista (Connectionist Temporal Classification, CTC), la cual permite el entrenamiento sin requerir información explícita sobre la ubicación de los símbolos en la imagen. Para la etapa de inferencia, generalmente se emplea una política de decodificación voraz, es decir, se recupera la secuencia de mayor probabilidad. Esta tesis presenta una serie de contribuciones, organizadas en tres grupos distintos pero interconectados, que avanzan en el desarrollo de sistemas de OMR a nivel de pentagrama más robustos y generalizables. El primer grupo de contribuciones se centra en la reducción del esfuerzo humano al utilizar sistemas de OMR. Se comparan los tiempos de transcripción con y sin la ayuda de un sistema de OMR, observando que su uso acelera el proceso, aunque requiere una cantidad suficiente de datos etiquetados, lo cual implica un esfuerzo humano. Por lo tanto, se propone utilizar técnicas de Aprendizaje Auto- Supervisado (Self-Supervised Learning, SSL) para preentrenar un clasificador de símbolos, logrando una precisión superior al 80% al utilizar solo un ejemplo por clase en el entrenamiento. Este clasificador de símbolos puede acelerar el proceso de etiquetado de datos. El segundo grupo de contribuciones mejora el rendimiento de los sistemas de OMR de dos maneras. Por un lado, se propone una codificación musical que permite reconocer música monofónica y homofónica. Por otro lado, se mejora el rendimiento de los sistemas mediante el uso de la bidimensionalidad de la representación agnóstica, introduciendo tres cambios en el enfoque estándar: (i) una nueva arquitectura que incluye ramas específicas para captura características relacionadas con la forma (duración del evento) o la altura (tono) de los símbolos musicales, (ii) el uso de una representación de secuencia dividida, que requiere que el modelo prediga los atributos de forma y altura de manera secuencial, y (iii) un algoritmo de decodificación voraz personalizado que garantiza que la representación mencionada se cumple en la secuencia predicha. El tercer y último grupo de contribuciones explora las sinergias entre OMR y su equivalente en audio, la Transcripción Automática de Música (Automatic Music Transcription, AMT). Estas contribuciones confirman la existencia de sinergias entre ambos campos y evalúan distintos enfoques de fusión tardía para la transcripción multimodal, lo que se traduce en mejoras significativas en la precisión de la transcripción. Por último, la tesis concluye comparando los enfoques de fusión temprana y fusión tardía, y afirma que la fusión tardía ofrece más flexibilidad y mejor rendimiento. / Esta tesis ha sido financiada por el Ministerio de Universidades a través del programa de ayudas para la formación de profesorado universitario (Ref. FPU19/04957). Deep Learning Optical Music Recognition Automatic Music Transcription
12	Computational methods for the alignment and score-informed transcription of piano music Wang, Siying January 2017 (has links) This thesis is concerned with computational methods for alignment and score-informed transcription of piano music. Firstly, several methods are proposed to improve the alignment robustness and accuracywhen various versions of one piece of music showcomplex differences with respect to acoustic conditions or musical interpretation. Secondly, score to performance alignment is applied to enable score-informed transcription. Although music alignment methods have considerably improved in accuracy in recent years, the task remains challenging. The research in this thesis aims to improve the robustness for some cases where there are substantial differences between versions and state-of-the-art methods may fail in identifying a correct alignment. This thesis first exploits the availability of multiple versions of the piece to be aligned. By processing these jointly, the alignment process can be stabilised by exploiting additional examples of how a section might be interpreted or which acoustic conditions may arise. Two methods are proposed, progressive alignment and profile HMM, both adapted from the multiple biological sequence alignment task. Experiments demonstrate that these methods can indeed improve the alignment accuracy and robustness over comparable pairwise methods. Secondly, this thesis presents a score to performance alignment method that can improve the robustness in cases where some musical voices, such as the melody, are played asynchronously to others - a stylistic device used in musical expression. The asynchronies between the melody and the accompaniment are handled by treating the voices as separate timelines in a multi-dimensional variant of dynamic time warping (DTW). The method measurably improves the alignment accuracy for pieces with asynchronous voices and preserves the accuracy otherwise. Once an accurate alignment between a score and an audio recording is available, the score information can be exploited as prior knowledge in automatic music transcription (AMT), for scenarios where score is available, such as music tutoring. Score-informed dictionary learning is used to learn the spectral pattern of each pitch that describes the energy distribution of the associated notes in the recording. More precisely, the dictionary learning process in non-negative matrix factorization (NMF) is constrained using the aligned score. This way, by adapting the dictionary to a given recording, the proposed method improves the accuracy over the state-of-the-art.
13	Nástroj pro podporu cvičení klavírních skladeb / Tool for Piano Practising Ustinov, Nikita January 2020 (has links) The purpose of this work is to design and implement an application for piano practice. The main disadvantage of existing applications is the limited selection of songs that can be practiced. The application, which is the result of this work, will solve the problem - the user will be able to upload to the application any piano record he wants to practice, and the application will take care of creating a training process. The training process consists of showing the user a certain way of editing notes and simultaneously controlling what the user is playing with a microphone. The biggest challenge of the diploma thesis was to find a way to accurately and quickly classify the audio signal from the microphone. This problem was solved using two independent neural networks with different architectures, which were trained on different data sets. To justify the chosen solution, all the necessary theoretical musical and scientific concepts and methods that are directly related to it will be presented. The resulting application will be tested in three respects: accuracy, speed, the usability of the user.
14	Nástroj pro podporu cvičení klavírních skladeb / Tool for Piano Practising Ustinov, Nikita January 2021 (has links) The purpose of this work is to design and implement an application for piano practice. The main disadvantage of existing applications is the limited selection of songs that can be practiced. The application, which is the result of this work, will solve the problem - the user will be able to upload to the application any piano record he wants to practice, and the application will take care of creating a training process. The training process consists of showing the user a certain way of editing notes and simultaneously controlling what the user is playing with a microphone. The biggest challenge of the diploma thesis was to find a way to accurately and quickly classify the audio signal from the microphone. This problem was solved using two independent neural networks with different architectures, which were trained on different data sets. To justify the chosen solution, all the necessary theoretical musical and scientific concepts and methods that are directly related to it will be presented. The resulting application will be tested in next respects: accuracy, speed, noise resistance and the usability of the user.
15	Towards Interactive Multimodal Music Transcription Valero-Mas, Jose J. 11 July 2017 (has links) La transcripción de música por computador es de vital importancia en tareas del llamo campo de la Extracción y recuperación de información musical por su utilidad como proceso para la obtención de una abstracción simbólica que codifica el contenido musical de un fichero de audio. En esta disertación se estudia este problema desde una perspectiva diferente a la típicamente considerada para estos problemas, la perspectiva interactiva y multimodal. En este paradigma el usuario cobra especial importancia puesto que es parte activa en la resolución del problema (interactividad); por otro lado, la multimodalidad implica que diferentes fuentes de información extraídas de la misma señal se aúnan para ayudar a una mejor resolución de la tarea. Music transcription Pattern recognition Human-computer interaction Music information retrieval Lenguajes y Sistemas Informáticos
16	Computationally efficient methods for polyphonic music transcription Pertusa, Antonio 09 July 2010 (has links) Este trabajo propone una serie de métodos eficientes para convertir una señal de audio musical polifónica (WAV, MP3) en una partitura (MIDI). Music transcription Onset detection Audio signal processing Machine learning Lenguajes y Sistemas Informáticos
17	Devil in the Strawstack, Devil in the Details: A Comparative Study of Old-Time Fiddle Tune Transcriptions Yeagle, Kalia 01 May 2020 (has links) This thesis asks what transcriptions of old-time fiddle tunes might tell us about their underlying purposes and the nature of transcription. How could differing approaches to transcription reflect the intentions of the author, and what are those intentions? What does this suggest about how aural information is prioritized? Through a comparative analysis of three transcriptions of the same recording—Tommy Jarrell’s “Devil in the Strawstack”—this thesis examines how musical information is prioritized and how transcribers have adapted their methods to better reflect the nuances of old-time music. The three transcriptions come from Clare Milliner and Walt Koken (The Milliner-Koken Collection of American Fiddle Tunes), Drew Beisswenger (Appalachian Fiddle Tunes), and John Engle. The analysis of these transcriptions suggests new frameworks for interpreting old-time fiddling, further conversations about the possibilities and limitations of transcription, and provides insight into the underlying purposes of transcription. Old-time music fiddle tunes music transcription ethnomusicology American folk music American Studies Appalachian Studies Ethnomusicology Musicology
18	Automatic Music Transcription with Convolutional Neural Networks Using Intuitive Filter Shapes Sleep, Jonathan 01 October 2017 (has links) (PDF) This thesis explores the challenge of automatic music transcription with a combination of digital signal processing and machine learning methods. Automatic music transcription is important for musicians who can't do it themselves or find it tedious. We start with an existing model, designed by Sigtia, Benetos and Dixon, and develop it in a number of original ways. We find that by using convolutional neural networks with filter shapes more tailored for spectrogram data, we see better and faster transcription results when evaluating the new model on a dataset of classical piano music. We also find that employing better practices shows improved results. Finally, we open-source our test bed for pre-processing, training, and testing the models to assist in future research. Music Automatic Music Transcription Machine Learning Convolutional Neural Neural Networks Digital Signal Processing Artificial Intelligence and Robotics Music
19	A Comparison of the Transcription Techniques of Godowsky and Liszt as Exemplified in Their Transcriptions of Three Schubert Lieder Cloutier, David, 1948- 12 1900 (has links) This investigation sought to compare the transcription techniques of two pianist-composers, Godowsky and Liszt, using three Schubert lieder as examples. The lieder were "Das Wandern" from Die Schöne Müllerin, "Gute Nacht" from Winterreise, and "Liebesbotschaft" from Schwanengesang. They were compared using four criteria: tonality, counterpoint, timbral effects, and harmony. Liszt, following a practice common in the nineteenth century, was primarily concerned with bringing new music into the home of the domestic pianist. The piano transcription was the most widely used and successful medium for accomplishing this. Liszt also frequently transcribed pieces of a particular composer in order to promulgate them by featuring them in his recitals. The Schubert lieder fall into this category. Liszt did not drastically alter the original in these compositions. Indeed, in the cases of "Liebesbotschaft" and "Das Wandern," very little alteration beyond the incorporation of the melody into the piano accompaniment, occurs.Godowsky, in contrast, viewed the transcription as a vehicle for composing a new piece. He intended to improve upon the original by adding his own inspiration to it. Godowsky was particularly ingenious in adding counterpoint, often chromatic, to the original. Examples of Godowsky's use of counterpoint can be found in "Das Wandern" and "Gute Nacht." While Liszt strove to remain faithful to Schubert's intentions, Godowsky exercised his ingenuity at will, being only loosely concerned with the texture and atmosphere of the lieder. "Gute Nacht" and "Liebesbotschaft" are two examples that show how far afield Godowsky could stray from the original by the addition of chromatic voicing and counterpoint. Godowsky*s compositions can be viewed as perhaps the final statement on the possibilities of piano writing in the traditional sense. As such these works deserve to be investigated and performed. piano music composers music transcription piano transcription Godowsky, Leopold, -- 1870-1938. Liszt, Franz, -- 1811-1886. Schubert, Franz, -- 1797-1828. -- Songs. Piano music -- History and criticism. Arrangement (Music)
20	Automatic Music Transcription based on Prior Knowledge from Musical Acoustics. Application to the repertoires of the Marovany zither of Madagascar / Transcription automatique de musique basé sur des connaissances a prior issues de l'Acoustique Musicale. Application aux répertoires de la cithare marovany de Madagascar Cazau, Dorian 12 October 2015 (has links) L’ethnomusicologie est l’étude de la musique en mettant l’accent sur les aspects culturels, sociaux, matériels, cognitifs et/ou biologiques. Ce sujet de thèse, motivé par Pr. Marc Chemillier, ethnomusicologue au laboratoire CAMS-EHESS, traite du développement d’un système automatique de transcription dédié aux répertoires de musique de la cithare marovany de Madagascar. Ces répertoires sont transmis oralement, résultant d’un processus de mémorisation/ transformation de motifs musicaux de base. Ces motifs sont un patrimoine culturel important du pays, et évoluent en permanence sous l’influence d’autres pratiques et genres musicaux. Les études ethnomusicologiques actuelles visent à comprendre l’évolution du répertoire traditionnel, et de préserver ce patrimoine. Pour servir cette cause, notre travail consiste à fournir des outils informatiques d’analyse musicale pour organiser et structurer des enregistrements audio de cet instrument. La transcription automatique de musique consiste à estimer les notes d’un enregistrement à travers les trois attributs : temps de début, hauteur et durée de note. Notre travail sur cette thématique repose sur l’incorporation de connaissances musicales a priori dans les systèmes informatiques. Une première étape de cette thèse fût donc de générer cette connaissance et de la formaliser en vue de cette incorporation. Cette connaissance explorer les caractéristiques multi-modales du signal musical, incluant le timbre, le langage musical et les techniques de jeu. La recherche effectée dans cette thèse se distingue en deux axes : un premier plus appliqué, consistant à développer un système de transcription de musique dédié à la marovany, et un second plus fondamental, consistant à fournir une analyse plus approfondie des contributions de la connaissance dans la transcription automatique de musique. Notre premier axe de recherche requiert une précision de transcription très bonne (c.a.d. une F-measure supérieure à 95 % avec des tolérances d’erreur standardes) pour faire office de supports analytiques dans des études musicologiques. Pour cela, nous utilisons une technologie de captation multicanale appliquée aux instruments à cordes pincées. Les systèmes développés à partir de cette technologie utilisent un capteur par corde, permettant de décomposer un signal polyphonique en une somme de signaux monophoniques respectifs à chaque corde, ce qui simplifie grandement la tâche de transcription. Différents types de capteurs (optiques, piézoélectriques, électromagnétiques) ont été testés. Après expérimentation, les capteurs piézoélectriques, bien qu’invasifs, se sont avérés avoir les meilleurs rapports signal-sur-bruit et séparabilité inter-capteurs. Cette technologie a aussi permis le développement d’une base de données dite “ground truth" (vérité de terrain), indispensable pour l’évaluation quantitative des systèmes de transcription de musique. Notre second axe de recherche propose des investigations plus approfondies concernant l’incorporation de connaissance a priori dans les systèmes automatiques de transcription de musique. Deux méthodes statistiques ont été utilisées comme socle théorique, à savoir le PLCA (Probabilistic Latent Component Analysis) pour l’estimation multi-pitch et le HMM (Hidden Markov Models). / Ethnomusicology is the study of musics around the world that emphasize their cultural, social, material, cognitive and/or biological. This PhD sub- ject, initiated by Pr. Marc CHEMILLIER, ethnomusicolog at the laboratory CAMS-EHESS, deals with the development of an automatic transcription system dedicated to the repertoires of the traditional marovany zither from Madagascar. These repertoires are orally transmitted, resulting from a pro- cess of memorization/transformation of original base musical motives. These motives represent an important culture patrimony, and are evolving contin- ually under the inuences of other musical practices and genres mainly due to globalization. Current ethnomusicological studies aim at understanding the evolution of the traditional repertoire through the transformation of its original base motives, and preserving this patrimony. Our objectives serve this cause by providing computational tools of musical analysis to organize and structure audio recordings of this instrument. Automatic Music Transcription (AMT) consists in automatically estimating the notes in a recording, through three attributes: onset time, duration and pitch. On the long range, AMT systems, with the purpose of retrieving meaningful information from complex audio, could be used in a variety of user scenarios such as searching and organizing music collections with barely any human labor. One common denominator of our diferent approaches to the task of AMT lays in the use of explicit music-related prior knowledge in our computational systems. A step of this PhD thesis was then to develop tools to generate automatically this information. We chose not to restrict ourselves to a speciprior knowledge class, and rather explore the multi-modal characteristics of musical signals, including both timbre (i.e. modeling of the generic \morphological" features of the sound related to the physics of an instrument, e.g. intermodulation, sympathetic resonances, inharmonicity) and musicological (e.g. harmonic transition, playing dynamics, tempo and rhythm) classes. This prior knowledge can then be used in com- putational systems of transcriptions. The research work on AMT performed in this PhD can be divided into a more \applied research" (axis 1), with the development of ready-to-use operational transcription tools meeting the cur- rent needs of ethnomusicologs to get reliable automatic transcriptions, and a more \basic research" (axis 2), providing deeper insight into the functioning of these tools. Our axis of research requires a transcription accuracy high enough 1 (i.e. average F-measure superior to 95 % with standard error tolerances) to provide analytical supports for musicological studies. Despite a large enthusiasm for AMT challenges, and several audio-to-MIDI converters available commercially, perfect polyphonic AMT systems are out of reach of today's al- gorithms. In this PhD, we explore the use of multichannel capturing sensory systems for AMT of several acoustic plucked string instruments, including the following traditional African zithers: the marovany (Madagascar), the Mvet (Cameroun), the N'Goni (Mali). These systems use multiple string- dependent sensors to retrieve discriminatingly some physical features of their vibrations. For the AMT task, such a system has an obvious advantage in this application, as it allows breaking down a polyphonic musical signal into the sum of monophonic signals respective to each string. Transcription Automatique de Musique Modélisation acoustique et statistique Apprentissage Machine Instruments à cordes pincées Acoustique Musicale Analyse musicologique informatisée Automatic Music Transcription Statistical modeling and learning Musical Acoustics knowledge 620.2

Search results