Spelling suggestions: "subject:"codebook."" "subject:"notebooks.""
1 |
Speech Recognition Using a Synthesized CodebookSmith, Lloyd A. (Lloyd Allen) 08 1900 (has links)
Speech sounds generated by a simple waveform synthesizer were used to create a vector quantization codebook for use in speech recognition. Recognition was tested over the TI-20 isolated word data base using a conventional DTW matching algorithm. Input speech was band limited to 300 - 3300 Hz, then passed through the Scott Instruments Corp. Coretechs process, implemented on a VET3 speech terminal, to create the speech representation for matching. Synthesized sounds were processed in software by a VET3 signal processing emulation program. Emulation and recognition were performed on a DEC VAX 11/750.
The experiments were organized in 2 series. A preliminary experiment, using no vector quantization, provided a baseline for comparison.
The original codebook contained 109 vectors, all derived from 2 formant synthesized sounds. This codebook was decimated through the course of the first series of experiments, based on the number of times each vector was used in quantizing the training data for the previous experiment, in order to determine the smallest subset of vectors suitable for coding the speech data base. The second series of experiments altered several test conditions in order to evaluate the applicability of the minimal synthesized codebook to conventional codebook training.
The baseline recognition rate was 97%. The recognition rate for synthesized codebooks was approximately 92% for sizes ranging from 109 to 16 vectors. Accuracy for smaller codebooks was slightly less than 90%. Error analysis showed that the primary loss in dropping below 16 vectors was in coding of voiced sounds with high frequency second formants. The 16 vector synthesized codebook was chosen as the seed for the second series of experiments.
After one training iteration, and using a normalized distortion score, trained codebooks performed with an accuracy of 95.1%. When codebooks were trained and tested on different sets of speakers, accuracy was 94.9%, indicating that very little speaker dependence was introduced by the training.
|
2 |
Knowledge-based speech enhancementSrinivasan, Sriram January 2005 (has links)
Speech is a fundamental means of human communication. In the last several decades, much effort has been devoted to the efficient transmission and storage of speech signals. With advances in technology making mobile communication ubiquitous, communications anywhere has become a reality. The freedom and flexibility offered by mobile technology brings with it new challenges, one of which is robustness to acoustic background noise. Speech enhancement systems form a vital front-end for mobile telephony in noisy environments such as in cars, cafeterias, subway stations, etc., in hearing aids, and to improve the performance of speech recognition systems. In this thesis, which consists of four research articles, we discuss both single and multi-microphone approaches to speech enhancement. The main contribution of this thesis is a framework to exploit available prior knowledge about both speech and noise. The physiology of speech production places a constraint on the possible shapes of the speech spectral envelope, and this information s captured using codebooks of speech linear predictive (LP) coefficients obtained from a large training database. Similarly, information about commonly occurring noise types is captured using a set of noise codebooks, which can be combined with sound environment classi¯cation to treat different environments differently. In paper A, we introduce maximum-likelihood estimation of the speech and noise LP parameters using the codebooks. The codebooks capture only the spectral shape. The speech and noise gain factors are obtained through a frame-by-frame optimization, providing good performance in practical nonstationary noise environments. The estimated parameters are subsequently used in a Wiener filter. Paper B describes Bayesian minimum mean squared error estimation of the speech and noise LP parameters and functions there-of, while retaining the in- stantaneous gain computation. Both memoryless and memory-based estimators are derived. While papers A and B describe single-channel techniques, paper C describes a multi-channel Bayesian speech enhancement approach, where, in addition to temporal processing, the spatial diversity provided by multiple microphones s also exploited. In paper D, we introduce a multi-channel noise reduction technique motivated by blind source separation (BSS) concepts. In contrast to standard BSS approaches, we use the knowledge that one of the signals is speech and that the other is noise, and exploit their different characteristics. / QC 20100929
|
3 |
Interference Management in Non-cooperative NetworksMotahari, Seyed Abolfazl 02 October 2009 (has links)
Spectrum sharing is known as a key solution to accommodate the increasing number of users and the growing demand for throughput in wireless networks. While spectrum sharing improves the data rate in sparse networks, it suffers from interference of concurrent links in dense networks. In fact, interference is the primary barrier to enhance the overall throughput of the network, especially in the medium and high signal-to-noise ratios (SNR’s). Managing interference to overcome this barrier has emerged as a crucial step in developing efficient wireless networks. This thesis deals with optimum and sub-optimum interference management-cancelation in non-cooperative networks.
Several techniques for interference management including novel strategies such as interference alignment and structural coding are investigated. These methods are applied to obtain optimum and sub-optimum coding strategies in such networks. It is shown that a single strategy is not able to achieve the maximum throughput in all possible scenarios and in fact a careful design is required to fully exploit all available resources in each realization of the system.
This thesis begins with a complete investigation of the capacity region of the two-user Gaussian interference channel. This channel models the basic interaction between two users sharing the same spectrum for data communication. New outer bounds outperforming known bounds are derived using Genie-aided techniques. It is proved that these outer bounds meet the known inner bounds in some special cases, revealing the sum capacity of this channel over a certain range of parameters which has not been known in the past.
A novel coding scheme applicable in networks with single antenna nodes is proposed next. This scheme converts a single antenna system to an equivalent Multiple Input Multiple Output (MIMO) system with fractional dimensions. Interference can be aligned along these dimensions and higher multiplexing gains can be achieved. Tools from the field of Diophantine approximation in number theory are used to show that the proposed coding scheme in fact mimics the traditional schemes used in MIMO systems where each data stream is sent along a direction and alignment happens when several streams are received along the same direction. Two types of constellation are proposed for the encoding part, namely the single layer constellation and the multi-layer constellation. Using single layer constellations, the coding scheme is applied to the two-user $X$ channel. It is proved that the total Degrees-of-Freedom (DOF), i.e. $\frac{4}{3}$, of the channel is achievable almost surely. This is the first example in which it is shown that a time invariant single antenna system does not fall short of achieving this known upper bound on the DOF.
Using multi-layer constellations, the coding scheme is applied to the symmetric three-user GIC. Achievable DOFs are derived for all channel gains. It is observed that the DOF is everywhere discontinuous (as a function of the channel gain). In particular, it is proved that for the irrational channel gains the achievable DOF meets the upper bound of $\frac{3}{2}$. For the rational gains, the achievable DOF has a gap to the known upper bounds. By allowing carry over from multiple layers, however, it is shown that higher DOFs can be achieved for the latter.
The $K$-user single-antenna Gaussian Interference Channel (GIC) is considered, where the channel coefficients are NOT necessarily time-variant or frequency selective. It is proved that the total DOF of this channel is $\frac{K}{2}$ almost surely, i.e. each user enjoys half of its maximum DOF. Indeed, we prove that the static time-invariant interference channels are rich enough to allow simultaneous interference alignment at all receivers. To derive this result, we show that single-antenna interference channels can be treated as \emph{pseudo multiple-antenna systems} with infinitely-many antennas. Such machinery enables us to prove that the real or complex $M \times M$ MIMO GIC achieves its total DOF, i.e., $\frac{MK}{2}$, $M \geq 1$. The pseudo multiple-antenna systems are developed based on a recent result in the field of Diophantine approximation which states that the convergence part of the Khintchine-Groshev theorem holds for points on non-degenerate manifolds. As a byproduct of the scheme, the total DOFs of the $K\times M$ $X$ channel and the uplink of cellular systems are derived.
Interference alignment requires perfect knowledge of channel state information at all nodes. This requirement is sometimes infeasible and users invoke random coding to communicate with their corresponding receivers. Alternative interference management needs to be implemented and this problem is addressed in the last part of the thesis. A coding scheme for a single user communicating in a shared medium is proposed. Moreover, polynomial time algorithms are proposed to obtain best achievable rates in the system. Successive rate allocation for a $K$-user interference channel is performed using polynomial time algorithms.
|
4 |
Interference Management in Non-cooperative NetworksMotahari, Seyed Abolfazl 02 October 2009 (has links)
Spectrum sharing is known as a key solution to accommodate the increasing number of users and the growing demand for throughput in wireless networks. While spectrum sharing improves the data rate in sparse networks, it suffers from interference of concurrent links in dense networks. In fact, interference is the primary barrier to enhance the overall throughput of the network, especially in the medium and high signal-to-noise ratios (SNR’s). Managing interference to overcome this barrier has emerged as a crucial step in developing efficient wireless networks. This thesis deals with optimum and sub-optimum interference management-cancelation in non-cooperative networks.
Several techniques for interference management including novel strategies such as interference alignment and structural coding are investigated. These methods are applied to obtain optimum and sub-optimum coding strategies in such networks. It is shown that a single strategy is not able to achieve the maximum throughput in all possible scenarios and in fact a careful design is required to fully exploit all available resources in each realization of the system.
This thesis begins with a complete investigation of the capacity region of the two-user Gaussian interference channel. This channel models the basic interaction between two users sharing the same spectrum for data communication. New outer bounds outperforming known bounds are derived using Genie-aided techniques. It is proved that these outer bounds meet the known inner bounds in some special cases, revealing the sum capacity of this channel over a certain range of parameters which has not been known in the past.
A novel coding scheme applicable in networks with single antenna nodes is proposed next. This scheme converts a single antenna system to an equivalent Multiple Input Multiple Output (MIMO) system with fractional dimensions. Interference can be aligned along these dimensions and higher multiplexing gains can be achieved. Tools from the field of Diophantine approximation in number theory are used to show that the proposed coding scheme in fact mimics the traditional schemes used in MIMO systems where each data stream is sent along a direction and alignment happens when several streams are received along the same direction. Two types of constellation are proposed for the encoding part, namely the single layer constellation and the multi-layer constellation. Using single layer constellations, the coding scheme is applied to the two-user $X$ channel. It is proved that the total Degrees-of-Freedom (DOF), i.e. $\frac{4}{3}$, of the channel is achievable almost surely. This is the first example in which it is shown that a time invariant single antenna system does not fall short of achieving this known upper bound on the DOF.
Using multi-layer constellations, the coding scheme is applied to the symmetric three-user GIC. Achievable DOFs are derived for all channel gains. It is observed that the DOF is everywhere discontinuous (as a function of the channel gain). In particular, it is proved that for the irrational channel gains the achievable DOF meets the upper bound of $\frac{3}{2}$. For the rational gains, the achievable DOF has a gap to the known upper bounds. By allowing carry over from multiple layers, however, it is shown that higher DOFs can be achieved for the latter.
The $K$-user single-antenna Gaussian Interference Channel (GIC) is considered, where the channel coefficients are NOT necessarily time-variant or frequency selective. It is proved that the total DOF of this channel is $\frac{K}{2}$ almost surely, i.e. each user enjoys half of its maximum DOF. Indeed, we prove that the static time-invariant interference channels are rich enough to allow simultaneous interference alignment at all receivers. To derive this result, we show that single-antenna interference channels can be treated as \emph{pseudo multiple-antenna systems} with infinitely-many antennas. Such machinery enables us to prove that the real or complex $M \times M$ MIMO GIC achieves its total DOF, i.e., $\frac{MK}{2}$, $M \geq 1$. The pseudo multiple-antenna systems are developed based on a recent result in the field of Diophantine approximation which states that the convergence part of the Khintchine-Groshev theorem holds for points on non-degenerate manifolds. As a byproduct of the scheme, the total DOFs of the $K\times M$ $X$ channel and the uplink of cellular systems are derived.
Interference alignment requires perfect knowledge of channel state information at all nodes. This requirement is sometimes infeasible and users invoke random coding to communicate with their corresponding receivers. Alternative interference management needs to be implemented and this problem is addressed in the last part of the thesis. A coding scheme for a single user communicating in a shared medium is proposed. Moreover, polynomial time algorithms are proposed to obtain best achievable rates in the system. Successive rate allocation for a $K$-user interference channel is performed using polynomial time algorithms.
|
5 |
Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classificationAlqasrawi, Yousef T. N., Neagu, Daniel, Cowling, Peter I. January 2013 (has links)
No / The bag of visual words (BOW) model is an efficient image representation technique for image categorization and annotation tasks. Building good visual vocabularies, from automatically extracted image feature vectors, produces discriminative visual words, which can improve the accuracy of image categorization tasks. Most approaches that use the BOW model in categorizing images ignore useful information that can be obtained from image classes to build visual vocabularies. Moreover, most BOW models use intensity features extracted from local regions and disregard colour information, which is an important characteristic of any natural scene image. In this paper, we show that integrating visual vocabularies generated from each image category improves the BOW image representation and improves accuracy in natural scene image classification. We use a keypoint density-based weighting method to combine the BOW representation with image colour information on a spatial pyramid layout. In addition, we show that visual vocabularies generated from training images of one scene image dataset can plausibly represent another scene image dataset on the same domain. This helps in reducing time and effort needed to build new visual vocabularies. The proposed approach is evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories, respectively, using 10-fold cross-validation. The experimental results, using support vector machines with histogram intersection kernel, show that the proposed approach outperforms baseline methods such as Gist features, rgbSIFT features and different configurations of the BOW model.
|
6 |
Multiple-antenna Communications with Limited Channel State InformationKhoshnevis, Behrouz 14 November 2011 (has links)
Due to its significant advantage in spectral efficiency, multiple-antenna communication technology will undoubtedly be a major component in future wireless system implementations. However, the full exploitation of this technology also requires perfect feedback of channel state information (CSI) to the transmitter-- something that is not practically feasible. This motivates the study of limited feedback systems, where CSI feedback is rate limited. This thesis focuses on the optimal design of limited feedback systems for three types of communication channels: the relay channel, the single-user point-to-point channel, and the multiuser broadcast channel. For the relay channel, we prove the efficiency of the Grassmannian codebooks as the source and relay beamforming codebooks, and propose a method for CSI exchange between the relay and the destination when global CSI is not available at destination. For the single-user point-to-point channel, we study the joint power control and beamforming problem and address the channel magnitude and direction quantization codebook design problem. It is shown that uniform quantization of the channel magnitude (in dB scale) is asymptotically optimal regardless of the channel distribution. The analysis further derives the optimal split of feedback bandwidth between the magnitude and direction quantization codebooks. For the multiuser broadcast channel, we first prove the sufficiency of a product magnitude-direction quantization codebook for managing the multiuser interference. We then derive the optimal split of feedback bandwidth across the users and their magnitude and direction codebooks. The optimization results reveal an inherent structural difference between the single-user and multiuser quantization codebooks: a multiuser codebook should have a finer direction quantization resolution as compared to a single-user codebook. It is further shown that the users expecting higher rates and requiring more reliable communication should provide a finer quantization of their CSI. Finally, we determine the minimum required total feedback rate based on users' quality-of-service constraints and derive the scaling of the system performance with the total feedback rate.
|
7 |
Multiple-antenna Communications with Limited Channel State InformationKhoshnevis, Behrouz 14 November 2011 (has links)
Due to its significant advantage in spectral efficiency, multiple-antenna communication technology will undoubtedly be a major component in future wireless system implementations. However, the full exploitation of this technology also requires perfect feedback of channel state information (CSI) to the transmitter-- something that is not practically feasible. This motivates the study of limited feedback systems, where CSI feedback is rate limited. This thesis focuses on the optimal design of limited feedback systems for three types of communication channels: the relay channel, the single-user point-to-point channel, and the multiuser broadcast channel. For the relay channel, we prove the efficiency of the Grassmannian codebooks as the source and relay beamforming codebooks, and propose a method for CSI exchange between the relay and the destination when global CSI is not available at destination. For the single-user point-to-point channel, we study the joint power control and beamforming problem and address the channel magnitude and direction quantization codebook design problem. It is shown that uniform quantization of the channel magnitude (in dB scale) is asymptotically optimal regardless of the channel distribution. The analysis further derives the optimal split of feedback bandwidth between the magnitude and direction quantization codebooks. For the multiuser broadcast channel, we first prove the sufficiency of a product magnitude-direction quantization codebook for managing the multiuser interference. We then derive the optimal split of feedback bandwidth across the users and their magnitude and direction codebooks. The optimization results reveal an inherent structural difference between the single-user and multiuser quantization codebooks: a multiuser codebook should have a finer direction quantization resolution as compared to a single-user codebook. It is further shown that the users expecting higher rates and requiring more reliable communication should provide a finer quantization of their CSI. Finally, we determine the minimum required total feedback rate based on users' quality-of-service constraints and derive the scaling of the system performance with the total feedback rate.
|
8 |
Contribution à l'analyse de la dynamique des écritures anciennes pour l'aide à l'expertise paléographique / Contribution to the analysis of dynamic entries old for using the expertise palaeographicDaher, Hani 22 November 2012 (has links)
Mes travaux de thèse s’inscrivent dans le cadre du projet ANR GRAPHEM1 (Graphemebased Retrieval and Analysis for PaleograpHic Expertise of Middle Age Manuscripts). Ilsprésentent une contribution méthodologique applicable à l'analyse automatique des écrituresanciennes pour assister les experts en paléographie dans le délicat travail d’étude et dedéchiffrage des écritures.L’objectif principal est de contribuer à une instrumetation du corpus des manuscritsmédiévaux détenus par l’Institut de Recherche en Histoire des Textes (IRHT – Paris) en aidantles paléographes spécialisés dans ce domaine dans leur travail de compréhension de l’évolutiondes formes de l’écriture par la mise en place de méthodes efficaces d’accès au contenu desmanuscrits reposant sur une analyse fine des formes décrites sous la formes de petits fragments(les graphèmes). Dans mes travaux de doctorats, j’ai choisi d’étudier la dynamique del’élément le plus basique de l’écriture appelé le ductus2 et qui d’après les paléographes apportebeaucoup d’informations sur le style d’écriture et l’époque d’élaboration du manuscrit.Mes contributions majeures se situent à deux niveaux : une première étape de prétraitementdes images fortement dégradées assurant une décomposition optimale des formes en graphèmescontenant l’information du ductus. Pour cette étape de décomposition des manuscrits, nousavons procédé à la mise en place d’une méthodologie complète de suivi de traits à partir del’extraction d’un squelette obtenu à partir de procédures de rehaussement de contraste et dediffusion de gradients. Le suivi complet du tracé a été obtenu à partir de l’application des règlesfondamentales d’exécution des traits d’écriture, enseignées aux copistes du Moyen Age. Il s’agitd’information de dynamique de formation des traits portant essentiellement sur des indicationsde directions privilégiées.Dans une seconde étape, nous avons cherché à caractériser ces graphèmes par desdescripteurs de formes visuelles compréhensibles à la fois par les paléographes et lesinformaticiens et garantissant une représentation la plus complète possible de l’écriture d’unpoint de vue géométrique et morphologique. A partir de cette caractérisation, nous avonsproposé une approche de clustering assurant un regroupement des graphèmes en classeshomogènes par l’utilisation d’un algorithme de classification non-supervisé basée sur lacoloration de graphe. Le résultat du clustering des graphèmes a conduit à la formation dedictionnaires de formes caractérisant de manière individuelle et discriminante chaque manuscrittraité. Nous avons également étudié la puissance discriminatoire de ces descripteurs afin d’obtenir la meilleure représentation d’un manuscrit en dictionnaire de formes. Cette étude a étéfaite en exploitant les algorithmes génétiques par leur capacité à produire de bonne sélection decaractéristiques.L’ensemble de ces contributions a été testé à partir d’une application CBIR sur trois bases demanuscrits dont deux médiévales (manuscrits de la base d’Oxford et manuscrits de l’IRHT, baseprincipale du projet), et une base comprenant de manuscrits contemporains utilisée lors de lacompétition d’identification de scripteurs d’ICDAR 2011. L’exploitation de notre méthode dedescription et de classification a été faite sur une base contemporaine afin de positionner notrecontribution par rapport aux autres travaux relevant du domaine de l’identification d’écritures etétudier son pouvoir de généralisation à d’autres types de documents. Les résultats trèsencourageants que nous avons obtenus sur les bases médiévales et la base contemporaine, ontmontré la robustesse de notre approche aux variations de formes et de styles et son caractèrerésolument généralisable à tout type de documents écrits. / My thesis work is part of the ANR GRAPHEM Project (Grapheme based Retrieval andAnalysis for Expertise paleographic Manuscripts of Middle Age). It represents a methodologicalcontribution applicable to the automatic analysis of ancient writings to assist the experts inpaleography in the delicate work of the studying and deciphering the writing.The main objective is to contribute to an instrumentation of the corpus of medievalmanuscripts held by “Institut de Recherche en Histoire de Textes” (IRHT-Paris), by helping thepaleographers specialized in this field in their work of understanding the evolution of forms inthe writing, with the establishment of effective methods to access the contents of manuscriptsbased on a fine analysis of the forms described in the form of small fragments (graphemes). Inmy PhD work, I chose to study the dynamic of the most basic element of the writing called theductus and which according to the paleographers, brings a lot of information on the style ofwriting and the era of the elaboration of the manuscript.My major contribution is situated at two levels: a first step of preprocessing of severelydegraded images to ensure an optimal decomposition of the forms into graphemes containingthe ductus information. For this decomposition step of manuscripts, we have proceeded to theestablishment of a complete methodology for the tracings of strokes by the extraction of theskeleton obtained from the contrast enhancement and the diffusion of the gradient procedures.The complete tracking of the strokes was obtained from the application of fundamentalexecution rules of the strokes taught to the scribes of the Middle Ages. It is related to thedynamic information of the formation of strokes focusing essentially on indications of theprivileged directions.In a second step, we have tried to characterize the graphemes by visual shape descriptorsunderstandable by both the computer scientists and the paleographers and thus unsuring themost complete possible representation of the wrting from a geometrical and morphological pointof view. From this characterization, we have have proposed a clustering approach insuring agrouping of graphemes into homogeneous classes by using a non-supervised classificationalgorithm based on the graph coloring. The result of the clustering of graphemes led to theformation of a codebook characterizing in an individual and discriminating way each processedmanuscript. We have also studied the discriminating power of the descriptors in order to obtaina better representation of a manuscript into a codebook. This study was done by exploiting thegenetic algorithms by their ability to produce a good feature selection.The set of the contributions was tested from a CBIR application on three databases ofmanuscripts including two medieval databases (manuscripts from the Oxford and IRHTdatabases), and database of containing contemporary manuscripts used in the writersidentification contest of ICDAR 2011. The exploitation of our description and classificationmethod was applied on a cotemporary database in order to position our contribution withrespect to other relevant works in the writrings identification domain and study itsgeneralization power to other types of manuscripts. The very encouraging results that weobtained on the medieval and contemporary databases, showed the robustness of our approachto the variations of the shapes and styles and its resolutely generalized character to all types ofhandwritten documents.
|
Page generated in 0.0455 seconds