Global ETD Search

31	Software pro digitální mixážní pult / Software for Digital Mixing Console Zoň, Robin January 2018 (has links) This thesis describes the design and implementation of a software for digital mixing console built on the Windows platform. This software is designed to offer real-time multi-channel audio processing using multiple input and output units, signal routing between these units and insertion and management of VST plug-in modules. The software uses an audio interface connected with ASIO technology. The thesis is divided into several applications. Main application which computes audio samples and allows insertion and management of plug-ins is programmed in C++ using JUCE technology. This application can be controlled with its own local graphical interface or with web control interface, which is programmed in TypeScript with the use of React technology. Web interface allows user to control VST plug-in modules with its own custom implementation of plug-in control.
32	Tvorba grafické knihovny pro zásuvné moduly VST / Creation of the Graphic Library for VST Plug-Ins Dufka, Filip January 2019 (has links) Master‘s thesis covers use of graphical user interface in audio plug-ins. In first part structure and rendering techniques of audio plug-ins graphical libraries are described. Their efficiency and their way of memory utilization is questioned. Next part puts these techniques in comparison with the state of the art methods used in computer graphics and gaming industry. Their possible use in audio graphical interafaces is analyzed. In the following part, thesis finds solutions to uneffective methods in frequently used situations by presenting deferred shading into audio parameter editor with the goal of photorealistic rendering. Second introduced technique of „Knob Normal Maps“ reduces number of images needed for rendering of turning knob from hundereds to six with comparable results. Goal of diploma thesis was to create graphical library. Graphical library with name RealBox was created, and introduced techniques are the core features. Library reduces work needed to achieve graphical user interfaces for 2D and 3D cases of use. Full class and method documentation for RealBox library was assembled. Library was tested during creation of three VST plugins, with different approaches and emphasis on quick work and fine rendering. Graphical library offers new, faster way of creating audio plug-in interfaces.
33	Real-time audio processing for an embedded Linux system using a dual-kernel approach Kulkarni, Nitin January 2017 (has links) Professional audio processing systems such as digital musical instruments, audiomixers, etc. must operate with very tight constraints on overall processing latencyand CPU performance. Consequently, traditional implementations are still mostlybased on specialized hardware like Digital Signal Processors (DSP) and Real-TimeOperating Systems (RTOS) to meet such requirements. However, such systems areminimalistic in nature and they lack many features (e.g. network connectivity, widehardware support, etc.) that a general-purpose operating system such as Linuxoffers. Linux is a very popular choice for the operating system used in embeddeddevices, and many developers have started to use it for designing real-time systemswith relaxed timing constraints. However, none of the available solutions using astandard Linux kernel can satisfy the low-latency requirements of professional audiosystems.In this thesis, a dual kernel approach is employed to enable an embedded Linuxsystem to process audio with low roundtrip latency. The solution is developed usingthe Xenomai framework for real-time computation, which is based on a techniqueknown as interrupt pipeline (I-pipe). I-Pipe enables interrupt virtualization througha micro-kernel running between the Linux kernel and the interrupt controller hardware.The designed system includes an x86 Atom System-on-Chip (SoC), an XMOSmicrocontroller and audio converters to and from the analog domain. Custom kerneldrivers and libraries have been developed to expose the audio programming functionalitiesto programs running in user-space. As a result, the system can achieverobust real-time performance appropriate for professional audio applications, andat the same time it has all the advantages of a traditional Linux solution such ascompatibility with external devices and ease of programming. The real-time capabilityis measured by evaluating the performance in terms of worst case responsetime of the real-time tasks in comparison to the same metrics obtained under astandard Linux kernel. The overall roundtrip latency of audio processing is showedto be improved by almost an order of magnitude (around 2.5ms instead of 20ms). / Profesionella system för ljudbearbetning, som digitala musikinstrument, mixerbord,etc, arbetar med väldigt hårda krav på tidfördröjning och CPU-prestanda. Som enkonsekvens har dessa system traditionellt implementerats på specialiserad hårdvarasom specifika DSP-processor och speciella realtidsoperativsystem. Den typen avsystem är till sin natur minimalistiska och saknar många funktioner (till exempelnätverk och brett stöd för olika hårdvaror) som mer generella operativsystem,som Linux, kan erbjuda. Linux är ett väldigt populärt val av operativsystem förinbyggda system och många utvecklare har även börjat använda det till realtidssystemmed mindre hårda tidskrav. Det finns dock idag inte någon lösning med enstandard-linuxkärna som kan tillfredsställda de krav på låg fördröjning som krävsför användning i profesionella ljudsystem.I det här examensarbetet används en dubbelkärneuppsättning för att ge ettinbyggt Linuxsystem möjlighet att bearbeta digitalt ljud med låg fördröjning. Lösningenanvänder Xenomai-ramverket för realtidsberäkningar baserat på en teknikkallad interrupt pipeline (I-pipe). I-pipe ger möjlighet att virtualisera interruptgenom en mikrokärna som körs som ett lager mellan Linuxkärnan och hårdvaransinterruptcontroller.Det resulterande systemet inkluderar ett x86 Atom-enchipssystem, en XMOSmicrocontroller, och ljudkonverterare till och från analoga ljud in- och utgångar.Drivrutiner och bibliotek utvecklas för att ge direkt tillgång till ljudfunktionerfrån applikationer. Systemet ges därmed robust realtidsprestanda som gör detlämpligt för profesionella ljudtillämpningar samtidigt som det behåller alla fördelarfrån ett traditionellt Linuxsystem, som kompabilitet med extern hårdvara och enklareapplikationsutveckling. Systemets realtidsprestanda utvärderas som den maximalauppmätta tidfördröjning vid realtidsberäkningar jämfört med motsvarandeberäkningar på en standardlinuxkärna. Resultaten visade på en förbättring på nästanen storleksordning (ca 2,5ms mot 20ms). Linux Device Driver Scheduling Interrupt Latencies RTDM SPI XMOS Audio Processing Round-trip Latency RTDM Linuxdrivrutin Schemaläggning avbrottsfördröjning interruptfördröjning ljudbearbetning tidsfördröjning XMOS SPI Elektroteknik och elektronik
34	Robot s autonomním audio-vizuálním řízením / Robot with autonomous audio-video control Dvořáček, Štěpán January 2019 (has links) This thesis describes the design and realization of a mobile robot with autonomous audio-visual control. This robot is able of movement based on sensors consisting of camera and microphone. The mechanical part consists of components made with 3D print technology and omnidirectional Mecanum wheels. Software utilizes OpenCV library for image processing and computes MFCC a DTW for voice command detection.
35	Smart Sheet Music Reader for Android / Smart Sheet Music Reader for Android Smejkal, Vojtěch January 2014 (has links) Oblasti jako automatické otáčení stránek nebo automatický hudební doprovod jsou studovány již několik desetiletí. Tato práce shrnuje současné metody pro počítačové sledování not v reálném čase. Zabývá se také hudebními příznaky jako jsou chroma třídy a syntetizované spektrální šablony. Dále popisuje klíčové části systému jako krátkodobou Fourierovu transformaci a Dynamické borcení času. V rámci projektu byl navrhnut a vyvinut vlastní systém pro sledování pozice hráče v notách, který byl následně implementován jako mobilní aplikace. Výsledný systém dokáže sledovat i skladby s výrazně odlišným tempem, pauzami během hry nebo drobnými odchylkami od předepsaných not.
36	Evaluation of Methods for Sound Source Separation in Audio Recordings Using Machine Learning Gidlöf, Amanda January 2023 (has links) Sound source separation is a popular and active research area, especially with modern machine learning techniques. In this thesis, the focus is on single-channel separation of two speakers into individual streams, and specifically considering the case where two speakers are also accompanied by background noise. There are different methods to separate speakers and in this thesis three different methods are evaluated: the Conv-TasNet, the DPTNet, and the FaSNetTAC. The methods were used to train models to perform the sound source separation. These models were evaluated and validated through three experiments. Firstly, previous results for the chosen separation methods were reproduced. Secondly, appropriate models applicable for NFC's datasets and applications were created, to fulfill the aim of this thesis. Lastly, all models were evaluated on an independent dataset, similar to datasets from NFC. The results were evaluated using the metrics SI-SNRi and SDRi. This thesis provides recommended models and methods suitable for NFC applications, especially concluding that the Conv-TasNet and the DPTNet are reasonable choices. Sound source separation signal processing audio processing speech enhancement speaker identification speech separation speech processing speaker separation single channel multi channel cocktail party problem machine learning neural network deep learning deep neural network convolutional network dual-path network filter-and-sum network recurrent network beamforming Communication Systems Kommunikationssystem
37	Towards a Nuanced Evaluation of Voice Activity Detection Systems : An Examination of Metrics, Sampling Rates and Noise with Deep Learning / Mot en nyanserad utvärdering av system för detektering av talaktivitet Joborn, Ludvig, Beming, Mattias January 2022 (has links) Recently, Deep Learning has revolutionized many fields, where one such area is Voice Activity Detection (VAD). This is of great interest to sectors of society concerned with detecting speech in sound signals. One such sector is the police, where criminal investigations regularly involve analysis of audio material. Convolutional Neural Networks (CNN) have recently become the state-of-the-art method of detecting speech in audio. But so far, understanding the impact of noise and sampling rates on such methods remains incomplete. Additionally, there are evaluation metrics from neighboring fields that remain unintegrated into VAD. We trained on four different sampling rates and found that changing the sampling rate could have dramatic effects on the results. As such, we recommend explicitly evaluating CNN-based VAD systems on pertinent sampling rates. Further, with increasing amounts of white Gaussian noise, we observed better performance by increasing the capacity of our Gated Recurrent Unit (GRU). Finally, we discuss how careful consideration is necessary when choosing a main evaluation metric, leading us to recommend Polyphonic Sound Detection Score (PSDS). voice activity detection VAD deep learning machine learning ML artificial intelligence AI convolutional neural network CNN deep neural network DNN sound event detection SED mel spectrogram audio processing polyphonic sound detection score PSDS signal processing signal to noise ratio SNR RCRNN sampling rate Gaussian noise Computer Sciences Datavetenskap (datalogi)
38	Multimedia Forensics Using Metadata Ziyue Xiang (17989381) 21 February 2024 (has links) <p dir="ltr">The rapid development of machine learning techniques makes it possible to manipulate or synthesize video and audio information while introducing nearly indetectable artifacts. Most media forensics methods analyze the high-level data (e.g., pixels from videos, temporal signals from audios) decoded from compressed media data. Since media manipulation or synthesis methods usually aim to improve the quality of such high-level data directly, acquiring forensic evidence from these data has become increasingly challenging. In this work, we focus on media forensics techniques using the metadata in media formats, which includes container metadata and coding parameters in the encoded bitstream. Since many media manipulation and synthesis methods do not attempt to hide metadata traces, it is possible to use them for forensics tasks. First, we present a video forensics technique using metadata embedded in MP4/MOV video containers. Our proposed method achieved high performance in video manipulation detection, source device attribution, social media attribution, and manipulation tool identification on publicly available datasets. Second, we present a transformer neural network based MP3 audio forensics technique using low-level codec information. Our proposed method can localize multiple compressed segments in MP3 files. The localization accuracy of our proposed method is higher compared to other methods. Third, we present an H.264-based video device matching method. This method can determine if the two video sequences are captured by the same device even if the method has never encountered the device. Our proposed method achieved good performance in a three-fold cross validation scheme on a publicly available video forensics dataset containing 35 devices. Fourth, we present a Graph Neural Network (GNN) based approach for the analysis of MP4/MOV metadata trees. The proposed method is trained using Self-Supervised Learning (SSL), which increased the robustness of the proposed method and makes it capable of handling missing/unseen data. Fifth, we present an efficient approach to compute the spectrogram feature with MP3 compressed audio signals. The proposed approach decreases the complexity of speech feature computation by ~77.6% and saves ~37.87% of MP3 decoding time. The resulting spectrogram features lead to higher synthetic speech detection performance.</p> Audio processing Computer vision Image and video coding Image processing Pattern recognition Video processing Digital forensics Deep learning Deepfake detection Digital forensics Video forensics Audio forensics Video metadata Audio metadata H.264 MP3 MP4 Video manipulation detection Video compression Audio compression Decision tree Deep learning Dimensionality reduction Spectrogram Graph neural networks Neural networks Transformer neural networks
39	Machine Learning for Speech Forensics and Hypersonic Vehicle Applications Emily R Bartusiak (6630773) 06 December 2022 (has links) <p>Synthesized speech may be used for nefarious purposes, such as fraud, spoofing, and misinformation campaigns. We present several speech forensics methods based on deep learning to protect against such attacks. First, we use a convolutional neural network (CNN) and transformers to detect synthesized speech. Then, we investigate closed set and open set speech synthesizer attribution. We use a transformer to attribute a speech signal to its source (i.e., to identify the speech synthesizer that created it). Additionally, we show that our approach separates different known and unknown speech synthesizers in its latent space, even though it has not seen any of the unknown speech synthesizers during training. Next, we explore machine learning for an objective in the aerospace domain.</p> <p><br></p> <p>Compared to conventional ballistic vehicles and cruise vehicles, hypersonic glide vehicles (HGVs) exhibit unprecedented abilities. They travel faster than Mach 5 and maneuver to evade defense systems and hinder prediction of their final destinations. We investigate machine learning for identifying different HGVs and a conic reentry vehicle (CRV) based on their aerodynamic state estimates. We also propose a HGV flight phase prediction method. Inspired by natural language processing (NLP), we model flight phases as “words” and HGV trajectories as “sentences.” Next, we learn a “grammar” from the HGV trajectories that describes their flight phase transition patterns. Given “words” from the initial part of a HGV trajectory and the “grammar”, we predict future “words” in the “sentence” (i.e., future HGV flight phases in the trajectory). We demonstrate that this approach successfully predicts future flight phases for HGV trajectories, especially in scenarios with limited training data. We also show that it can be used in a transfer learning scenario to predict flight phases of HGV trajectories that exhibit new maneuvers and behaviors never seen before during training.</p> Audio processing Computer vision Digital forensics Deep learning machine learning deep learning speech forensics media forensics convolutional neural network transformer convolutional transformer ensemble spectrogram analysis mel spectrogram analysis synthesized speech detection synthesized speech attribution closed set open set t-stochastic neighbor embedding latent space analysis hypersonics hypersonic glide vehicles vehicle classification flight phase prediction stochastic grammar k-nearest neighbors support vector machine probabilistic context-free grammar automatic distillation of structure generalized earley parser transfer learning

Search results