Global ETD Search

1	Video Quality Assessment in Broadcasting Prytz, Anders January 2010 (has links) <p>In broadcasting, the assessment of video quality is mostly done by a group of highly experienced people. This is a time consuming task and demands lot of resources. In this thesis the goal is to investigate the possibility to assess perceived video quality with the use of objective quality assessment methods. The work is done in collaboration with Telenor Satellite Broadcasting AS, to improve their quality verification process from a broadcasting perspective. The material used is from the SVT Fairytale tape and a tape from the Norwegian cup final in football 2009. All material is in the native resolution of 1080i and is encoded in the H.264/AVC format. All chosen compression settings are more or less used in daily broadcasting. A subjective video quality assessment been carried out to create a comparison basis of perceived quality. The subjective assessment sessions carried out by following ITU recommendations. Telenor SBc provided a video quality analysing system, the Video Clarity Clearview system that contains the objective PSNR, DMOS and JND. DMOS and JND are two pseudo-subjective assessment methods that use objective methods mapped to subjective results. The methods hopefully predict the perceived quality and eases quality assessment in broadcasting. The correlation between the subjective and objective results is tested with linear, exponential and polynomial fitting functions. The correlation for the different methods did not achieve a result that proved use of objective methods to assess perceived quality, independent of content. The best correlation result is 0.75 for the objective DMOS method. The analysis shows that there are possible dependencies in the relationship between subjective and objective results. By measuring spatial and temporal information possible dependent correlation results are investigated. The results for dependent relationships between subjective and objective results are good. There are some indications that the two pseudo-subjective methods, JND and DMOS, can be used to assess perceived video quality. This applies when the mapping functions are dependent on spatial and temporal information of the reference sequences. The correlation achieved for dependent fitting functions, that has a suitable progression, are in the range 0.9 -- 0.98. In the subjective tests, the subjects used were non-experts in quality evaluation. Some of the results indicate that subjects might have a problem with assessing sequences with high spatial information. This thesis creates a basis for further research on the use of objective methods to assess the perceived quality.</p> ntnudaim SIE7 kommunikasjonsteknologi Lyd- og bildebehandling
2	Voice Transformation based on Gaussian mixture models Gundersen, Terje January 2010 (has links) <p>In this thesis, a probabilistic model for transforming a voice to sound like another specific voice is tested. The model is fully automatic and only requires some 100 training sentences from both speakers with the same acoustic content. The classical source-filter decomposition allows prosodic and spectral transformation to be performed independently. The transformations are based on a Gaussian mixture model and a transformation function suggested by Y. Stylianou. Feature vectors of the same content from the source and target speaker, aligned in time by dynamic time warping, are fitted to a GMM. The short time spectra, represented as cepstral coefficients and derived from LPC, and the pitch periods, represented as fundamental frequency estimated from the RAPT algorithm, are transformed with the same probabilistic transformation function. Several techniques of spectrum and pitch transformation were assessed in addition to some novel smoothing techniques of the fundamental frequency contour. The pitch transform was implemented on the excitation signal from the inverse LP filtering by time domain PSOLA. The transformed spectrum parameters were used in the synthesis filter with the transformed excitation as input to yield the transformed voice. A listening test was performed with the best setup from objective tests and the results indicate that it is possible to recognise the transformed voice as the target speaker with a 72 % probability. However, the synthesised voice was affected by a muffling effect due to incorrect frequency transformation and the prosody sounded somewhat robotic.</p> ntnudaim SIE7 kommunikasjonsteknologi Lyd- og bildebehandling
3	Video Quality Assessment in Broadcasting Prytz, Anders January 2010 (has links) In broadcasting, the assessment of video quality is mostly done by a group of highly experienced people. This is a time consuming task and demands lot of resources. In this thesis the goal is to investigate the possibility to assess perceived video quality with the use of objective quality assessment methods. The work is done in collaboration with Telenor Satellite Broadcasting AS, to improve their quality verification process from a broadcasting perspective. The material used is from the SVT Fairytale tape and a tape from the Norwegian cup final in football 2009. All material is in the native resolution of 1080i and is encoded in the H.264/AVC format. All chosen compression settings are more or less used in daily broadcasting. A subjective video quality assessment been carried out to create a comparison basis of perceived quality. The subjective assessment sessions carried out by following ITU recommendations. Telenor SBc provided a video quality analysing system, the Video Clarity Clearview system that contains the objective PSNR, DMOS and JND. DMOS and JND are two pseudo-subjective assessment methods that use objective methods mapped to subjective results. The methods hopefully predict the perceived quality and eases quality assessment in broadcasting. The correlation between the subjective and objective results is tested with linear, exponential and polynomial fitting functions. The correlation for the different methods did not achieve a result that proved use of objective methods to assess perceived quality, independent of content. The best correlation result is 0.75 for the objective DMOS method. The analysis shows that there are possible dependencies in the relationship between subjective and objective results. By measuring spatial and temporal information possible dependent correlation results are investigated. The results for dependent relationships between subjective and objective results are good. There are some indications that the two pseudo-subjective methods, JND and DMOS, can be used to assess perceived video quality. This applies when the mapping functions are dependent on spatial and temporal information of the reference sequences. The correlation achieved for dependent fitting functions, that has a suitable progression, are in the range 0.9 -- 0.98. In the subjective tests, the subjects used were non-experts in quality evaluation. Some of the results indicate that subjects might have a problem with assessing sequences with high spatial information. This thesis creates a basis for further research on the use of objective methods to assess the perceived quality. ntnudaim SIE7 kommunikasjonsteknologi Lyd- og bildebehandling
4	Voice Transformation based on Gaussian mixture models Gundersen, Terje January 2010 (has links) In this thesis, a probabilistic model for transforming a voice to sound like another specific voice is tested. The model is fully automatic and only requires some 100 training sentences from both speakers with the same acoustic content. The classical source-filter decomposition allows prosodic and spectral transformation to be performed independently. The transformations are based on a Gaussian mixture model and a transformation function suggested by Y. Stylianou. Feature vectors of the same content from the source and target speaker, aligned in time by dynamic time warping, are fitted to a GMM. The short time spectra, represented as cepstral coefficients and derived from LPC, and the pitch periods, represented as fundamental frequency estimated from the RAPT algorithm, are transformed with the same probabilistic transformation function. Several techniques of spectrum and pitch transformation were assessed in addition to some novel smoothing techniques of the fundamental frequency contour. The pitch transform was implemented on the excitation signal from the inverse LP filtering by time domain PSOLA. The transformed spectrum parameters were used in the synthesis filter with the transformed excitation as input to yield the transformed voice. A listening test was performed with the best setup from objective tests and the results indicate that it is possible to recognise the transformed voice as the target speaker with a 72 % probability. However, the synthesised voice was affected by a muffling effect due to incorrect frequency transformation and the prosody sounded somewhat robotic. ntnudaim SIE7 kommunikasjonsteknologi Lyd- og bildebehandling
5	Audiovisual Contents Segmentation Sundøy, Kristoffer Johan January 2010 (has links) The objective of this thesis is to detect high level semantic ideas to help to impose a structure on television talk shows. Indexing TV-shows is a subject that, to our knowledge, is rarely talked about in the scientific community.There is no common understanding of what this imposed structure should look like. We can say that the purpose is to organise the audiovisual content into sections that convey a specific information. It thus encompasses issues as diverse as scene segmentation, speech noise detection, speaker identification, etc. The basic problem of structuring is the gap between the information extracted from visual data flow and human interpretation made by the user of these data. Numerous studies have examined the organisation of highly structured video content. Thus, the state of the art has many studies on sport or newscast transmissions. Our goal is to detect key audiovisual events using a variety of descriptors and generic classifiers. We propose a generic approach that is able to assess all TV-show indexing problems. This enables an operator to use one single tool to infer a logical structure. Our approach can be considered as ``semi-automatic'' in the sense that the training data is collected on the fly by the operator who is asked to arbitrarily select one video excerpt of each concept involved. We have assessed a wide selection of audio and video features, used MKL as a feature selection algorithm and then built various content detectors and segmentors useful for imposing broad semantic classes on television data.This master's thesis was set forth by TELECOM ParisTech and was begun there March 1, 2010. This final report was submitted to TELECOM ParisTech, NTNU and Institute EURECOM August 29, 2010. ntnudaim:5746 SIE7 kommunikasjonsteknologi Lyd- og bildebehandling
6	Point to point wireless audio with limited bandwidth and processing Brenden, Even Steen January 2012 (has links) This project investigates the possibility of implementing a point-to-point wireless audio transmission system for real-time operation on a specific embedded platform. The platform is originally intended for wireless PC peripherals such as keyboards and mouses and uses a proprietary radio module for transmission. The report accompanying the project can be used as a guide for doing a similar project on other platforms. The report is organized as follows. Properties of the platform are presented, and an assessment on what how it can perform in terms of computational power is carried out. ADPCM is chosen as a compression scheme and its underlying theory is briefly presented. As the platform does not do native floating point operations, fixed point number representation and operations are discussed. Error concealment for packet transmission is briefly discussed. An assessment on the performance of the on-chip analog-to-digital converter is carried out and an approach to implementing a 1-bit digital-to-analog converter using pulse-width modulation is discussed. Implementation issues when designing a real-time audio communication system are discussed. Performance tests are carried out, and finally a set of real-world applications for the system are presented. The project finds that a point-to-point wireless audio transmission system for half-duplex, real-time operation is possible for the given platform. Because of radio limitations, full-duplex operation is yet to be proven as possible. A project exists that investigates the radio module for the same platform in details. For confidentiality, the name of the platform is not referenced. ntnudaim:8133 MTKOM kommunikasjonsteknologi Lyd- og bildebehandling
7	A framework for the simulation and validation of acoustic fields from medical ultrasound transducers Bakstad, Ole January 2012 (has links) The unified simulation framework for medical ultrasound, FieldSim, cur- rently supports linear and non-linear simulations by using Field II and Aber- sim, respectively. In this thesis the quasi-linear simulation tool, Propose, is incorporated in FieldSim and verified. The implementation uses Field II to generate the initial pressure propagated by Propose. It is shown to pro- duce satisfactory results when compared to standalone Propose, Field II and Abersim for both the fundamental and second harmonic. The results are also verified in the water tank. The running time is found to be slower than standalone Propose, but still substantially quicker than Abersim for non-linear simulations. By combining core features of the FieldSim frame- work and Propose new features are presented. It is now possible to easily simulate the second harmonic from a transducer with a measured impulse response and arbitrary excitation pulse using Propose in minutes, compared to hours with Abersim. By rotating the initial field steering can now be achieved in Propose in few lines of MATLAB code. A link between a research scanner and the FieldSim framework is pre- sented. When finalized a FieldSim simulation can be converted to a file read by PyTexo making it possible to use the exact same setup for both simu- lations and measurements in the water tank. The current implementation supports single ultrasound beams and B-mode with fixed focus. ntnudaim:6800 MTKOM kommunikasjonsteknologi Lyd- og bildebehandling
8	Reliable Broadcast Contribution over the Public Internet Markman, Martin Alexander Jarnfeld, Tokheim, Stian January 2012 (has links) Broadcast contribution is point-point media transfer from recording sites to local editing studios, between studio facilities and to distribution centers. The contribution phase has strict Quality of Service (QoS) requirements to reliability and bandwidth - any error might degrade end users Quality of Experience (QoE) in the consecutive distribution phase. Dedicated IP contribution networks has become the preferred technology for contribution from content creation sites. Occasionally, however, contribution happens at a site without access to a dedicated IP contribution network - in which case the broadcaster must utilize less optimal technologies. Based on IP, the public Internet may be a superior solution in some scenarios due to high bandwidth and geographical coverage - if the internet can conform to strict contribution requirements to QoS and QoE. This thesis attempts to give a clear answer to this question. Our investigation was done in three parts. First, we uncovered recent Internet QoS trends in Norway. We found that the internet has become an Internet video delivery platform, which in turn has resulted in bandwidth increase in access networks. Bandwidth in residential access links now conforms to contribution requirements. ISPs make profit according to the level of offered QoS, broadcasters can therefore expect high QoS. Also, broadcasters can buy QoS guarantees, which may be a viable and safe solution. Secondly, we recorded over 21 hours of Internet QoS statistics on a connection traversing 11 routers and one peering point. The measured level of every QoS metric (packet loss, jitter and re-ordering) conformed to professional contribution network requirements, except the rate of packet loss bursts. However, no burst above 200 ms was recorded and no two consecutive bursts happened within a 2 second time frame. Based on this, we explained how simple error control strategies can correct or mask packet loss burst with a 200-250 ms delay tradeoff and 15-30% bandwidth overhead. Third, we did subjective tests in the Internet with two professional JPEG2000 contribution gateways delivered by T-Vips. A full movie was encoded at 70 Mbit/s, a bitrate used for very high quality contribution, and shown to a test panel of 24 participants. By analyzing questionnaires, we proved that contribution over the internet yield equally good QoE as cable TV. Also, we found that noticeable degradations due to packet loss happened once per hour on average. Furthermore, packet loss bursts below 4 ms was generally not visible to the viewers. Because the Internet provides both the required QoS and QoE, we concluded that broadcasters can do contribution over the Internet at the required level of quality whenever this is a favorable option. ntnudaim:6992 MTKOM kommunikasjonsteknologi Lyd- og bildebehandling
9	Improving the Visual Experience When Coding Computer Generated Content Using the H.264 Standard Berthelsen, Nicolai January 2011 (has links) The purpose of this Master thesis was to improve the visual experience when coding computer generated content (CGC) using the H.264 standard. As H.264 is designed primarily to code natural video, it exhibits weaknesses when coding CGC at low bit rates. The thesis has focused on identifying and modifying the components in the H.264 algorithm responsible for the occurrence of unwanted noise artifacts. The research method was based on performing quantitative research to confirm or deny the hypothesis claiming that the H.264 algorithm performs sub-optimally when coding CGC. Experiments were conducted using coders written specically for the thesis. The results from these experiments were then analyzed, and conclusions were drawn based on empirical observations. An implementation of H.264 was used to identify the noise artifacts resulting from coding CGC at low rates. The results indicated that H.264 indeed performs sub-optimally when coding CGC. We learned that the reason for this was that the characteristics of CGC led to the signal being more compactly represented in the spatial domain than in the transform domain. We therefore proposed to omit the component transform and quantize the residual signal directly. This method, called residual scalar quantization (RSQ), was shown to outperform traditional H.264 coding for certain CGC in terms of quantified visual quality and bit rate. However, even when outperformed, the RSQ coder did not exhibit any of the noise artifacts present when coding with the traditional coder. We also introduced Rate-Distortion optimization, which allowed the coder to adaptively choose between traditional and RSQ coding, ensuring that each block is coded optimally, independent of the source content. This scheme was shown to outperform both stand-alone coders for all sample content. A quantizer with representation levels tailored specifically for the characteristics of CGC was also presented, and experiments showed that it outperformed uniform quantization when coding CGC. The results in this thesis were produced by simplified versions of the actual coders, and may not be completely accurate. However, the accumulated results indicate that RSQ may indeed outperform traditional H.264 coding for CGC. To confirm the theories that have been presented, the proposed techniques should be implemented in a full-scale implementation of H.264 and the experiments repeated. ntnudaim:6284 MTKOM kommunikasjonsteknologi Lyd- og bildebehandling
10	Numerisk modellering av spredning fra kuler og sylindere anvendt i romakustikk / Numerical Modelling of Scattering from Spheres and Cylinders applied to Room Acoustics Waagø, Per January 2010 (has links) Denne oppgaven handler om spredning av lyd fra kuler og sylindere. Den diffuserende effekten til en konstellasjon av spredende sylindere i et rom undersøkes ved hjelp av numeriske simuleringer med endelig element-metoden utført i COMSOL Multiphysics. Modellen av rommet er en todimensjonal, forenklet modell inspirert av St. Olavs domkirke i Trondheim hvor det henger flere klynger med kuleformede lamper i glass som trolig bidrar til å gjøre lydfeltet mer diffust. Spredningsmønstrene fra ei kule og en sylinder studeres og sammenlignes analytisk. Det analytiske uttrykket for spredning fra en sylinder blir også benyttet til å evaluere den numeriske løsningen, og til å evaluere ulike valg for en totalt absorberende grensebetingelse i modellen. Spredningen fra samlingen av sylindere sammenlignes med spredning fra en enkelt sylinder, et kvadratisk prisme og en plan reflektor, og med spredningen fra en samling kuler, en enkelt kule, en kube, og en kvadratisk reflektor. Sammenligningene gjøres ved hjelp av diffusjonskoeffisienter beregnet fra numeriske simuleringer. ntnudaim SIE7 kommunikasjonsteknologi Lyd- og bildebehandling

Search results