Global ETD Search

161	The development of FT-Raman techniques to quantify the hydrolysis of Cobalt (III) nitrophenylphosphate complexes using multivariate data analysis Tshabalala, Oupa Samuel 03 1900 (has links) The FT-Raman techniques were developed to quantify reactions that follow on mixing aqueous solutions of bis-(1,3-diaminopropane)diaquacobalt( III) ion ([Co(tn)2(0H)(H20)]2+) and p-nitrophenylphosphate (PNPP). For the development and validation of the kinetic modelling technique, the well-studied inversion of sucrose was utilized. Rate constants and concentrations could be estimated using calibration solutions and modelling methods. It was found that the results obtained are comparable to literature values. Hence this technique could be further used for the [Co(tn)2(0H)(H20)]2+ assisted hydrolysis of PNPP. It was found that rate constants where the pH is maintained at 7.30 give results which differ from those where the pH is started at 7.30 and allowed to change during the reaction. The average rate constant for 2:1 ([Co(tn)2(0H)(H20)]2+:PNPP reactions was found to be approximately 3 x 104 times the unassisted PNPP hydrolysis rate. / Chemistry / M. Sc. (Chemistry) FT-Raman Kinetic modelling Partial least squares Multivariate data analysis Sucrose hydrolysis P-nitrophenylphosphate P-nitrophenol Cobalt(III) complex Organophosphate ester hydrolysis 543.57 Raman spectroscopy Organocobalt compounds Multivariate analysis
162	Inférence statistique en grande dimension pour des modèles structurels. Modèles linéaires généralisés parcimonieux, méthode PLS et polynômes orthogonaux et détection de communautés dans des graphes. / Statistical inference for structural models in high dimension. Sparse generalized linear models, PLS through orthogonal polynomials and community detection in graphs Blazere, Melanie 01 July 2015 (has links) Cette thèse s'inscrit dans le cadre de l'analyse statistique de données en grande dimension. Nous avons en effet aujourd'hui accès à un nombre toujours plus important d'information. L'enjeu majeur repose alors sur notre capacité à explorer de vastes quantités de données et à en inférer notamment les structures de dépendance. L'objet de cette thèse est d'étudier et d'apporter des garanties théoriques à certaines méthodes d'estimation de structures de dépendance de données en grande dimension.La première partie de la thèse est consacrée à l'étude de modèles parcimonieux et aux méthodes de type Lasso. Après avoir présenté les résultats importants sur ce sujet dans le chapitre 1, nous généralisons le cas gaussien à des modèles exponentiels généraux. La contribution majeure à cette partie est présentée dans le chapitre 2 et consiste en l'établissement d'inégalités oracles pour une procédure Group Lasso appliquée aux modèles linéaires généralisés. Ces résultats montrent les bonnes performances de cet estimateur sous certaines conditions sur le modèle et sont illustrés dans le cas du modèle Poissonien. Dans la deuxième partie de la thèse, nous revenons au modèle de régression linéaire, toujours en grande dimension mais l'hypothèse de parcimonie est cette fois remplacée par l'existence d'une structure de faible dimension sous-jacente aux données. Nous nous penchons dans cette partie plus particulièrement sur la méthode PLS qui cherche à trouver une décomposition optimale des prédicteurs étant donné un vecteur réponse. Nous rappelons les fondements de la méthode dans le chapitre 3. La contribution majeure à cette partie consiste en l'établissement pour la PLS d'une expression analytique explicite de la structure de dépendance liant les prédicteurs à la réponse. Les deux chapitres suivants illustrent la puissance de cette formule aux travers de nouveaux résultats théoriques sur la PLS . Dans une troisième et dernière partie, nous nous intéressons à la modélisation de structures au travers de graphes et plus particulièrement à la détection de communautés. Après avoir dressé un état de l'art du sujet, nous portons notre attention sur une méthode en particulier connue sous le nom de spectral clustering et qui permet de partitionner les noeuds d'un graphe en se basant sur une matrice de similarité. Nous proposons dans cette thèse une adaptation de cette méthode basée sur l'utilisation d'une pénalité de type l1. Nous illustrons notre méthode sur des simulations. / This thesis falls within the context of high-dimensional data analysis. Nowadays we have access to an increasing amount of information. The major challenge relies on our ability to explore a huge amount of data and to infer their dependency structures.The purpose of this thesis is to study and provide theoretical guarantees to some specific methods that aim at estimating dependency structures for high-dimensional data. The first part of the thesis is devoted to the study of sparse models through Lasso-type methods. In Chapter 1, we present the main results on this topic and then we generalize the Gaussian case to any distribution from the exponential family. The major contribution to this field is presented in Chapter 2 and consists in oracle inequalities for a Group Lasso procedure applied to generalized linear models. These results show that this estimator achieves good performances under some specific conditions on the model. We illustrate this part by considering the case of the Poisson model. The second part concerns linear regression in high dimension but the sparsity assumptions is replaced by a low dimensional structure underlying the data. We focus in particular on the PLS method that attempts to find an optimal decomposition of the predictors given a response. We recall the main idea in Chapter 3. The major contribution to this part consists in a new explicit analytical expression of the dependency structure that links the predictors to the response. The next two chapters illustrate the power of this formula by emphasising new theoretical results for PLS. The third and last part is dedicated to graphs modelling and especially to community detection. After presenting the main trends on this topic, we draw our attention to Spectral Clustering that allows to cluster nodes of a graph with respect to a similarity matrix. In this thesis, we suggest an alternative to this method by considering a $l_1$ penalty. We illustrate this method through simulations. Grande dimension Méthode de régularisation Méthode de réduction de dimension High dimension Sparse generalized linear models Regularization methods Dimension reduction methods Partial least squares Community detection in graphs 519
163	Novas estratégias para seleção de variáveis por intervalos em problemas de classificação Fernandes, David Douglas de Sousa 26 August 2016 (has links) Submitted by Maike Costa (maiksebas@gmail.com) on 2017-06-20T13:50:43Z No. of bitstreams: 1 arquivototal.pdf: 7102668 bytes, checksum: abe19d798ad952073affbf4950f62d29 (MD5) / Made available in DSpace on 2017-06-20T13:50:43Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 7102668 bytes, checksum: abe19d798ad952073affbf4950f62d29 (MD5) Previous issue date: 2016-08-26 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / In Analytical Chemistry it has been recurring in the literature the use of analytical signals recorded on multiple sensors combined with subsequent chemometric modeling for developing new analytical methodologies. For this purpose, it uses generally multivariate instrumental techniques as spectrometry ultraviolet-visible or near infrared, voltammetry, etc. In this scenario, the analyst is faced with the option of selecting individual variables or variable intervals so to avoid or reduce multicollinearity problems. A well-known strategy for selection of variable intervals is to divide the set of instrumental responses into equal width intervals and select the best interval based on the performance of the prediction of a unique range in the regression by Partial Least Squares (iPLS). On the other hand, the use of interval selection for classification purposes has received relatively little attention. A common practice is to use the iPLS regression method with the coded class indices as response variables to be predicted; that is the basic idea behind the release of the Discriminant Analysis by Partial Least Squares (PLS-DA) for classification. In other words, interval selection for classification purposes has no development of native functions (algorithms). Thus, in this work it is proposed two new strategies in classification problems using interval selection by the Successive Projections Algorithm. The first strategy is named Successive Projections Algorithm for selecting intervals in Discriminant Analysis Partial Least Squares (iSPA-PLS-DA), while the second strategy is called Successive Projections Algorithm for selecting intervals in Soft and Independent Modeling by Class Analogy (iSPA-SIMCA). The performance of the proposed algorithms was evaluated in three case studies: classification of vegetable oils according to the type of raw material and the expiration date using data obtained by square wave voltammetry; classification of unadulterated biodiesel/diesel blends (B5) and adulterated with soybean oil (OB5) using spectral data obtained in the ultraviolet-visible region; and classification of vegetable oils with respect to the expiration date using spectral data obtained in the near infrared region. The proposed iSPA-PLS-DA and iSPA-SIMCA algorithms provided good results in the three case studies, with correct classification rates always greater than or equal to those obtained by PLS-DA and SIMCA models using all variables, iPLS-DA and iSIMCA with a single selected interval, as well as SPA-LDA and GA-LDA with selection of individual variables. Therefore, the proposed iSPA-PLS-DA and iSPA-SIMCA algorithms can be considered as promising approaches for use in classification problems employing interval selection. In a more general point of view, the possibility of using interval selection without loss of the classification accuracy can be considered a very useful tool for the construction of dedicated instruments (e.g. LED-based photometers) for use in routine and in situ analysis. / Em Química Analítica tem sido recorrente na literatura o uso de sinais analíticos registrados em múltiplos sensores combinados com posterior modelagem quimiométrica para desenvolvimento de novas metodologias analíticas. Para esta finalidade, geralmente se faz uso de técnicas instrumentais multivariadas como a espectrometrias no ultravioleta-visível ou no infravermelho próximo, voltametria, etc. Neste cenário, o analista se depara com a opção de selecionar variáveis individuais ou intervalos de variáveis de modo de evitar ou diminuir problemas de multicolinearidade. Uma estratégia bem conhecida para seleção de intervalos de variáveis consiste em dividir o conjunto de respostas instrumentais em intervalos de igual largura e selecionar o melhor intervalo com base no critério de desempenho de predição de um único intervalo em regressão por Mínimos Quadrados Parciais (iPLS). Por outro lado, o uso da seleção de intervalo para fins de classificação tem recebido relativamente pouca atenção. Uma prática comum consiste em utilizar o método de regressão iPLS com os índices de classe codificados como variáveis de resposta a serem preditos, que é a idéia básica por trás da versão da Análise Discriminante por Mínimos Quadrados Parciais (PLS-DA) para a classificação. Em outras palavras, a seleção de intervalos para fins de classificação não possui o desenvolvimento de funções nativas (algoritmos). Assim, neste trabalho são propostas duas novas estratégias em problemas de classificação que usam seleção de intervalos de variáveis empregando o Algoritmo das Projeções Sucessivas. A primeira estratégia é denominada de Algoritmo das Projeções Sucessivas para seleção intervalos em Análise Discriminante por Mínimos Quadrados Parciais (iSPA-PLS-DA), enquanto a segunda estratégia é denominada de Algoritmo das Projeções Sucessivas para a seleção de intervalos em Modelagem Independente e Flexível por Analogia de Classe (iSPA-SIMCA). O desempenho dos algoritmos propostos foi avaliado em três estudos de casos: classificação de óleos vegetais com relação ao tipo de matéria-prima e ao prazo de validade utilizando dados obtidos por voltametria de onda quadrada; classificação de misturas biodiesel/diesel não adulteradas (B5) e adulteradas com óleo de soja (OB5) empregando dados espectrais obtidos na região do ultravioleta-visível; e classificação de óleos vegetais com relação ao prazo de validade usando dados espectrais obtidos na região do infravermelho próximo. Os algoritmos iSPA-PLS-DA e iSPA-SIMCA propostos forneceram bons resultados nos três estudos de caso, com taxas de classificação corretas sempre iguais ou superiores àquelas obtidas pelos modelos PLS-DA e SIMCA utilizando todas as variáveis, iPLS-DA e iSIMCA com um único intervalo selecionado, bem como SPA-LDA e GA-LDA com seleção de variáveis individuais. Portanto, os algoritmos iSPA-PLS-DA e iSPA-SIMCA propostos podem ser consideradas abordagens promissoras para uso em problemas de classificação empregando seleção de intervalos de variáveis. Num contexto mais geral, a possibilidade de utilização de seleção de intervalos de variáveis sem perda da precisão da classificação pode ser considerada uma ferramenta bastante útil para a construção de instrumentos dedicados (por exemplo, fotômetros a base de LED) para uso em análise de rotina e de campo. Seleção de intervalos Algoritmo das projeções sucessivas Classificação Interval selection Successive projections algorithm Classification CIENCIAS EXATAS E DA TERRA::QUIMICA
164	"Calibração multivariada e cinética diferencial em sistemas de análises em fluxo com detecção espectrofotométrica" / "Multivariate calibration and differential kinetic analysis in flow systems with spectrophotometric detection" Paula Regina Fortes 19 June 2006 (has links) A associação dos métodos cinéticos de análises e dos sistemas de análises em fluxo foi demonstrada em relação à determinação espectrofotométrica de ferro e vanádio em ligas Fe-V O método se baseia na influência de Fe2+ e VO2+ na taxa de oxidação de iodeto por dicromato sob condições ácidas; por esta razão o emprego do redutor de Jones foi necessário. Um sistema de análises por injeção em fluxo (FIA) e um sistema multi-impulsão foram dimensionados e avaliados. Em ambos os sistemas, a solução da amostra era inserida no fluxo transportador / reagente iodeto, e a solução de dicromato era adicionada por confluência. Sucessivas medidas eram realizadas durante a passagem da zona de amostra processada pelo detector, cada uma relacionada a uma diferente condição para o desenvolvimento da reação. O tratamento dos dados envolveu calibração multivariada, particularmente o algorítmo PLS. O sistema FIA se mostrou pouco adequado para as determinações multi-paramétricas, uma vez que os elementos de fluído resultantes da natureza de escoamento laminar não continham informações cinéticas suficientes para compor as etapas de modelagem. Por outro lado, MPFS mostrou que a natureza do fluxo pulsado resulta em melhorias nas figuras de mérito devido ao movimento caótico dos elementos de fluído. O sistema proposto é simples e robusto, capaz de analisar 50 amostras por hora, significando em um consumo de 48 mg KI por determinação. A duas primeiras variáveis latentes contém ca 94 % da informação analítica, mostrando que a dimensionalidade dupla intrínsica ao conjunto de dados. Os resultados se apresentaram concordantes com aqueles obtidos por espectrometria de emissão optica com plasma induzido em argônio. / Differential kinetic analysis can be implemented in a flow system analyser, and this was demonstrated in designing an improved spectrophotometric catalytic determination of iron and vanadium in Fe-V alloys. The method relied on the influence of Fe2+ and VO2+ on the rate of the iodide oxidation by Cr2O7 under acidic conditions; therefore the Jones reductor was needed. To this end, a flow injection system (FIA) and a multi-pumping flow system (MPFS) were dimensioned and evaluated. In both systems, the alloy solution was inserted into an acidic KI solution that acted also as carrier stream, and a dichromate solution was added by confluence. Successive measurements were performed during sample passage through the detector, each one related to a different yet reproducible condition for reaction development. Data treatment involved multivariate calibration by the PLS algorithm. The FIA system was less recommended for multi-parametric determination, as the laminar flow regimen could not provide suitable kinetic information. On the other hand, a MPFS demonstrated that pulsed flow led to enhance figures of merit due to chaotic movement of its fluid elements. The proposed MPFS system is very simple and rugged, allowing 50 samples to be run per hour, meaning 48 mg KI per determination. The first two latent variables carry ca 94 % of the analytical information, pointing out that the intrinsic dimensionality of the data set is two. Results are in agreement with inductively coupled argon plasma optical emission spectrometry. cinética diferencial espectrofotometria métodos catalíticos mínimos quadrados parciais multi-impulsão Sistemas de análises em fluxo catalytic methods differential kinetic analysis flow analysis multi-pumping flow system partial least squares spectrophotometry
165	Direcionadores de preferencia para nectares de uva comerciais tradicionais e "lights" utilizando regressão por minimos quadrados parciais (PLSR) / Drivers of liking for grape nectars in the traditional commercial and light versions using partial least squares regression (PLSR) Alves, Leonardo Rangel 07 October 2008 (has links) Orientador: Helena Maria Andre Bolini / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia de Alimentos / Made available in DSpace on 2018-08-11T04:55:59Z (GMT). No. of bitstreams: 1 Alves_LeonardoRangel_M.pdf: 410164 bytes, checksum: eed7ffe76f347f00d0abb009ed908230 (MD5) Previous issue date: 2008 / Resumo: Este estudo objetivou Identificar direcionadores de preferência de oito amostras comerciais de néctar de uva (tradicionais e ¿light¿) utilizando metodologias estatísticas avançadas para relacionar dados de perfil sensorial, físico-químicos e aceitabilidade. Oito amostras comerciais de néctares de uva (quatro tradicionais e suas respectivas versões ¿light¿) foram analisadas. Um teste de Aceitação utilizando a escala hedônica híbrida foi realizado com 114 consumidores. Quatorze termos descritivos foram avaliados por uma equipe sensorial e seis atributos físico-químicos foram medidos. As amostras de néctar de uva A e C foram as mais aceitas e as amostras CL e DL (¿light¿) foram as mais rejeitadas. Construiu-se um Mapa de Preferência Interno e em seguida uma Análise de ¿Cluster¿ foi realizada para o atributo Impressão Global. Dois grupos de consumidores foram encontrados. A principal diferença entre os grupos foi com relação à utilização de diferentes porções da escala pelos consumidores de cada grupo. A metodologia PLSR foi utilizada para relacionar a aceitação dos consumidores com os termos descritivos e atributos físico-químicos, fornecendo correlações entre eles. Os resultados mostraram que os atributos Sabor de Uva, Sabor Residual de Uva, Acidez Total Titulável, Aroma de Uva, Cor Vinho, °Brix, Viscosidade, Acidez, Turbidez, Adstringência, Fenóis Totais e Consistência nesta ordem de importância, estavam fortemente correlacionados com a Impressão Global dos consumidores sendo portanto os direcionadores de preferência encontrados / Abstract: This study depicts the PLS regression method used to help find drivers of liking of the grape nectar. Eight commercial brands (four traditional and four lights) were analyzed. An acceptance test using hybrid hedonic scale was performed with 114 consumers. Fourteen attributes were evaluated by a sensory team of fourteen members, and six physical-chemical attributes were measured. The most accepted samples were A and C, and the less accepted ones were CL and DL (lights). An Internal Preference Mapping followed by a Cluster Analysis was performed on the consumer grades to Global Impression. Two clusters of consumers were found. The mainly difference between clusters was the use of different portions of the scale by the consumers. The PLSR methodology was used to relate the acceptance with the sensory and physical-chemical attributes giving a correlation between them. The model showed the importance of each sensory or physicalchemical attribute for the model projection. The results showed that Grape Flavor; Residual Grape Flavor, Total Sourness Titration, Grape Aroma, Wine Color, °Brix, Viscosity, Sourness, Turbidity, Astringency, Total Phenols and Consistency were positive correlated with consumer grades to Global Impression, therefore they are called drivers of liking / Mestrado / Consumo e Qualidade de Alimentos / Mestre em Alimentos e Nutrição Mapa de preferencia interno Análise por agrupamento Análise descritiva quantitativa Testes Aceitação Néctar Uva Partial least squares regression Internal preference mapping Cluster analysis Quantitative descriptive analysis Test Acceptance Nectar Grape
166	Gestão de projetos de P&D no IPEN: diagnóstico e sugestões ao Escritório de Projetos (PMO) / Project management of R&D in IPEN - Diagnosis and Suggestions to the Project Management Office (PMO) Egon Martins Hannes 12 March 2015 (has links) O presente trabalho pretende entender a dinâmica do gerenciamento de projetos no IPEN. Para tal, decidiu-se pela pesquisa junto a literatura acadêmica de modelos que pudessem servir de base e que após modificações e ajustes pudessem refletir a realidade dos projetos de Institutos Públicos Pesquisa & Desenvolvimento. Após tratamento estatístico dos dados algumas hipóteses foram validadas e demonstraram sua influência positiva no desempenho do gerenciamento do projeto, tais como a influência das pessoas que compõem as equipes, o efeito da liderança, dentre outras. O modelo, inclusive mostrou-se válido para explicar quais fatores são relevantes para o sucesso dos projetos. Um das principais objetivos, foi exatamente o uso de modelo de avaliação de gestão projetos, que fossem passíveis de validação estatística, e não utilizar um dos disponíveis no mercado, tais como P3M3 e OPM3, para que houvesse um controle e confirmação estatística dos resultados. Outro objetivo foi utilizar um modelo cujas assertivas refletissem a natureza dos projetos de Pesquisa & Desenvolvimento gerenciados pelos pesquisadores do IPEN. Aliás, as referidas assertivas foram formuladas, e enviadas via pesquisa web, e respondidas por praticamente uma centena de profissionais do IPEN, envolvidos com projetos de P&D. A presente dissertação, acrescida das recomendações, ao final, tem como proposta servir de contribuição para os trabalhos desenvolvidos pelo Escritório de Projetos do IPEN. O modelo de avaliação, contido neste trabalho, pode ser aplicado em outras Instituições de P&D brasileiras, para que avaliem a forma e a maneira como gerenciam os seus respectivos projetos. / This paper aims to understand the dynamics involved in the project management at IPEN. To reach this goal, the method chosen was research along with academic literature of models that could serve as a base that after modifications and adjustments could reflect the reality of projects from the Public Institute of Research & Development. After undergoing statistical treatment of the data, some hypotheses were validated and showed positive influence on the project management performance, such as the influence of people who make up the teams, the leadership effect, among others. In fact, the model was found to be valid in explaining which factors are relevant for the success of the projects. One of the main goals was exactly the use of the project management evaluation model, submitted to statistical validation and not to use one available on the market, such as the P3M3 and OPM3, in order to assure the statistical control and confirmation of the results. Another goal was to use a model whose statements reflected the nature of the Research & Development project managed by researchers at IPEN. In fact, the aforementioned statements were formulated and sent via a web survey and answered by almost one hundred IPEN professionals who work on R&D projects. The following dissertation, along with the recommendations at the end, was included to serve as contribution to work developed by the IPEN Project Offices. The evaluation model included in this paper can be applied in other R&D organizations in Brazil, to evaluate the way their projects are managed. escritório de gerenciamento de projetos gerenciamento de projetos de P & D modelagem de equações estruturais PMO PMO project management of R&D project management office techniques using Smart PLS
167	Multivariate non-invasive measurements of skin disorders Nyström, Josefina January 2006 (has links) The present thesis proposes new methods for obtaining objective and accurate diagnoses in modern healthcare. Non-invasive techniques have been used to examine or diagnose three different medical conditions, namely neuropathy among diabetics, radiotherapy induced erythema (skin redness) among breast cancer patients and diagnoses of cutaneous malignant melanoma. The techniques used were Near-InfraRed spectroscopy (NIR), Multi Frequency Bio Impedance Analysis of whole body (MFBIA-body), Laser Doppler Imaging (LDI) and Digital Colour Photography (DCP). The neuropathy for diabetics was studied in papers I and II. The first study was performed on diabetics and control subjects of both genders. A separation was seen between males and females and therefore the data had to be divided in order to obtain good models. NIR spectroscopy was shown to be a viable technique for measuring neuropathy once the division according to gender was made. The second study on diabetics, where MFBIA-body was added to the analysis, was performed on males exclusively. Principal component analysis showed that healthy reference subjects tend to separate from diabetics. Also, diabetics with severe neuropathy separate from persons less affected. The preliminary study presented in paper III was performed on breast cancer patients in order to investigate if NIR, LDI and DCP were able to detect radiotherapy induced erythema. The promising results in the preliminary study motivated a new and larger study. This study, presented in papers IV and V, intended to investigate the measurement techniques further but also to examine the effect that two different skin lotions, Essex and Aloe vera have on the development of erythema. The Wilcoxon signed rank sum test showed that DCP and NIR could detect erythema, which is developed during one week of radiation treatment. LDI was able to detect erythema developed during two weeks of treatment. None of the techniques could detect any differences between the two lotions regarding the development of erythema. The use of NIR to diagnose cutaneous malignant melanoma is presented as unpublished results in this thesis. This study gave promising but inconclusive results. NIR could be of interest for future development of instrumentation for diagnosis of skin cancer. Multivariate Data Analysis Non-invasive techniques Clinical studies Principal Component Analysis Wilcoxon Signed Rank Sum Test Near-InfraRed spectroscopy Laser Doppler Imaging Digital Colour Photography Analytical chemistry Analytisk kemi
168	On the effective deployment of current machine translation technology González Rubio, Jesús 03 June 2014 (has links) Machine translation is a fundamental technology that is gaining more importance each day in our multilingual society. Companies and particulars are turning their attention to machine translation since it dramatically cuts down their expenses on translation and interpreting. However, the output of current machine translation systems is still far from the quality of translations generated by human experts. The overall goal of this thesis is to narrow down this quality gap by developing new methodologies and tools that improve the broader and more efficient deployment of machine translation technology. We start by proposing a new technique to improve the quality of the translations generated by fully-automatic machine translation systems. The key insight of our approach is that different translation systems, implementing different approaches and technologies, can exhibit different strengths and limitations. Therefore, a proper combination of the outputs of such different systems has the potential to produce translations of improved quality. We present minimum Bayes¿ risk system combination, an automatic approach that detects the best parts of the candidate translations and combines them to generate a consensus translation that is optimal with respect to a particular performance metric. We thoroughly describe the formalization of our approach as a weighted ensemble of probability distributions and provide efficient algorithms to obtain the optimal consensus translation according to the widespread BLEU score. Empirical results show that the proposed approach is indeed able to generate statistically better translations than the provided candidates. Compared to other state-of-the-art systems combination methods, our approach reports similar performance not requiring any additional data but the candidate translations. Then, we focus our attention on how to improve the utility of automatic translations for the end-user of the system. Since automatic translations are not perfect, a desirable feature of machine translation systems is the ability to predict at run-time the quality of the generated translations. Quality estimation is usually addressed as a regression problem where a quality score is predicted from a set of features that represents the translation. However, although the concept of translation quality is intuitively clear, there is no consensus on which are the features that actually account for it. As a consequence, quality estimation systems for machine translation have to utilize a large number of weak features to predict translation quality. This involves several learning problems related to feature collinearity and ambiguity, and due to the ¿curse¿ of dimensionality. We address these challenges by adopting a two-step training methodology. First, a dimensionality reduction method computes, from the original features, the reduced set of features that better explains translation quality. Then, a prediction model is built from this reduced set to finally predict the quality score. We study various reduction methods previously used in the literature and propose two new ones based on statistical multivariate analysis techniques. More specifically, the proposed dimensionality reduction methods are based on partial least squares regression. The results of a thorough experimentation show that the quality estimation systems estimated following the proposed two-step methodology obtain better prediction accuracy that systems estimated using all the original features. Moreover, one of the proposed dimensionality reduction methods obtained the best prediction accuracy with only a fraction of the original features. This feature reduction ratio is important because it implies a dramatic reduction of the operating times of the quality estimation system. An alternative use of current machine translation systems is to embed them within an interactive editing environment where the system and a human expert collaborate to generate error-free translations. This interactive machine translation approach have shown to reduce supervision effort of the user in comparison to the conventional decoupled post-edition approach. However, interactive machine translation considers the translation system as a passive agent in the interaction process. In other words, the system only suggests translations to the user, who then makes the necessary supervision decisions. As a result, the user is bound to exhaustively supervise every suggested translation. This passive approach ensures error-free translations but it also demands a large amount of supervision effort from the user. Finally, we study different techniques to improve the productivity of current interactive machine translation systems. Specifically, we focus on the development of alternative approaches where the system becomes an active agent in the interaction process. We propose two different active approaches. On the one hand, we describe an active interaction approach where the system informs the user about the reliability of the suggested translations. The hope is that this information may help the user to locate translation errors thus improving the overall translation productivity. We propose different scores to measure translation reliability at the word and sentence levels and study the influence of such information in the productivity of an interactive machine translation system. Empirical results show that the proposed active interaction protocol is able to achieve a large reduction in supervision effort while still generating translations of very high quality. On the other hand, we study an active learning framework for interactive machine translation. In this case, the system is not only able to inform the user of which suggested translations should be supervised, but it is also able to learn from the user-supervised translations to improve its future suggestions. We develop a value-of-information criterion to select which automatic translations undergo user supervision. However, given its high computational complexity, in practice we study different selection strategies that approximate this optimal criterion. Results of a large scale experimentation show that the proposed active learning framework is able to obtain better compromises between the quality of the generated translations and the human effort required to obtain them. Moreover, in comparison to a conventional interactive machine translation system, our proposal obtained translations of twice the quality with the same supervision effort. / González Rubio, J. (2014). On the effective deployment of current machine translation technology [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/37888 / TESIS Statistical machine translation Minimum Bayes' Risk System combination Partial least squares regression Quality estimation Confidence measures Interactive machine translation Interactive translation prediction Active Interaction Active learning Online learning ESTADISTICA E INVESTIGACION OPERATIVA LENGUAJES Y SISTEMAS INFORMATICOS
169	Quality by Design through multivariate latent structures Palací López, Daniel Gonzalo 14 January 2019 (has links) La presente tesis doctoral surge ante la necesidad creciente por parte de la mayoría de empresas, y en especial (pero no únicamente) aquellas dentro de los sectores farmacéu-tico, químico, alimentación y bioprocesos, de aumentar la flexibilidad en su rango ope-rativo para reducir los costes de fabricación, manteniendo o mejorando la calidad del producto final obtenido. Para ello, esta tesis se centra en la aplicación de los conceptos del Quality by Design para la aplicación y extensión de distintas metodologías ya exis-tentes y el desarrollo de nuevos algoritmos que permitan la implementación de herra-mientas adecuadas para el diseño de experimentos, el análisis multivariante de datos y la optimización de procesos en el ámbito del diseño de mezclas, pero sin limitarse ex-clusivamente a este tipo de problemas. Parte I - Prefacio, donde se presenta un resumen del trabajo de investigación realiza-do y los objetivos principales que pretende abordar y su justificación, así como una introducción a los conceptos más importantes relativos a los temas tratados en partes posteriores de la tesis, tales como el diseño de experimentos o diversas herramientas estadísticas de análisis multivariado. Parte II - Optimización en el diseño de mezclas, donde se lleva a cabo una recapitu-lación de las diversas herramientas existentes para el diseño de experimentos y análisis de datos por medios tradicionales relativos al diseño de mezclas, así como de algunas herramientas basadas en variables latentes, tales como la Regresión en Mínimos Cua-drados Parciales (PLS). En esta parte de la tesis también se propone una extensión del PLS basada en kernels para el análisis de datos de diseños de mezclas, y se hace una comparativa de las distintas metodologías presentadas. Finalmente, se incluye una breve presentación del programa MiDAs, desarrollado con la finalidad de ofrecer a sus usuarios la posibilidad de comparar de forma sencilla diversas metodologías para el diseño de experimentos y análisis de datos para problemas de mezclas. Parte III - Espacio de diseño y optimización a través del espacio latente, donde se aborda el problema fundamental dentro de la filosofía del Quality by Design asociado a la definición del llamado 'espacio de diseño', que comprendería todo el conjunto de posibles combinaciones de condiciones de proceso, materias primas, etc. que garanti-zan la obtención de un producto con la calidad deseada. En esta parte también se trata el problema de la definición del problema de optimización como herramienta para la mejora de la calidad, pero también para la exploración y flexibilización de los procesos productivos, con el objeto de definir un procedimiento eficiente y robusto de optimiza-ción que se adapte a los diversos problemas que exigen recurrir a dicha optimización. Parte IV - Epílogo, donde se presentan las conclusiones finales, la consecución de objetivos y posibles líneas futuras de investigación. En esta parte se incluyen además los anexos. / Aquesta tesi doctoral sorgeix davant la necessitat creixent per part de la majoria d'em-preses, i especialment (però no únicament) d'aquelles dins dels sectors farmacèutic, químic, alimentari i de bioprocessos, d'augmentar la flexibilitat en el seu rang operatiu per tal de reduir els costos de fabricació, mantenint o millorant la qualitat del producte final obtingut. La tesi se centra en l'aplicació dels conceptes del Quality by Design per a l'aplicació i extensió de diferents metodologies ja existents i el desenvolupament de nous algorismes que permeten la implementació d'eines adequades per al disseny d'ex-periments, l'anàlisi multivariada de dades i l'optimització de processos en l'àmbit del disseny de mescles, però sense limitar-se exclusivament a aquest tipus de problemes. Part I- Prefaci, en què es presenta un resum del treball de recerca realitzat i els objec-tius principals que pretén abordar i la seua justificació, així com una introducció als conceptes més importants relatius als temes tractats en parts posteriors de la tesi, com ara el disseny d'experiments o diverses eines estadístiques d'anàlisi multivariada. Part II - Optimització en el disseny de mescles, on es duu a terme una recapitulació de les diverses eines existents per al disseny d'experiments i anàlisi de dades per mit-jans tradicionals relatius al disseny de mescles, així com d'algunes eines basades en variables latents, tals com la Regressió en Mínims Quadrats Parcials (PLS). En aquesta part de la tesi també es proposa una extensió del PLS basada en kernels per a l'anàlisi de dades de dissenys de mescles, i es fa una comparativa de les diferents metodologies presentades. Finalment, s'inclou una breu presentació del programari MiDAs, que ofe-reix la possibilitat als usuaris de comparar de forma senzilla diverses metodologies per al disseny d'experiments i l'anàlisi de dades per a problemes de mescles. Part III- Espai de disseny i optimització a través de l'espai latent, on s'aborda el problema fonamental dins de la filosofia del Quality by Design associat a la definició de l'anomenat 'espai de disseny', que comprendria tot el conjunt de possibles combina-cions de condicions de procés, matèries primeres, etc. que garanteixen l'obtenció d'un producte amb la qualitat desitjada. En aquesta part també es tracta el problema de la definició del problema d'optimització com a eina per a la millora de la qualitat, però també per a l'exploració i flexibilització dels processos productius, amb l'objecte de definir un procediment eficient i robust d'optimització que s'adapti als diversos pro-blemes que exigeixen recórrer a aquesta optimització. Part IV- Epíleg, on es presenten les conclusions finals i la consecució d'objectius i es plantegen possibles línies futures de recerca arran dels resultats de la tesi. En aquesta part s'inclouen a més els annexos. / The present Ph.D. thesis is motivated by the growing need in most companies, and specially (but not solely) those in the pharmaceutical, chemical, food and bioprocess fields, to increase the flexibility in their operating conditions in order to reduce production costs while maintaining or even improving the quality of their products. To this end, this thesis focuses on the application of the concepts of the Quality by Design for the exploitation and development of already existing methodologies, and the development of new algorithms aimed at the proper implementation of tools for the design of experiments, multivariate data analysis and process optimization, specially (but not only) in the context of mixture design. Part I - Preface, where a summary of the research work done, the main goals it aimed at and their justification, are presented. Some of the most relevant concepts related to the developed work in subsequent chapters are also introduced, such as those regarding design of experiments or latent variable-based multivariate data analysis techniques. Part II - Mixture design optimization, in which a review of existing mixture design tools for the design of experiments and data analysis via traditional approaches, as well as some latent variable-based techniques, such as Partial Least Squares (PLS), is provided. A kernel-based extension of PLS for mixture design data analysis is also proposed, and the different available methods are compared to each other. Finally, a brief presentation of the software MiDAs is done. MiDAs has been developed in order to provide users with a tool to easily approach mixture design problems for the construction of Designs of Experiments and data analysis with different methods and compare them. Part III - Design Space and optimization through the latent space, where one of the fundamental issues within the Quality by Design philosophy, the definition of the so-called 'design space' (i.e. the subspace comprised by all possible combinations of process operating conditions, raw materials, etc. that guarantee obtaining a product meeting a required quality standard), is addressed. The problem of properly defining the optimization problem is also tackled, not only as a tool for quality improvement but also when it is to be used for exploration of process flexibilisation purposes, in order to establish an efficient and robust optimization method in accordance with the nature of the different problems that may require such optimization to be resorted to. Part IV - Epilogue, where final conclusions are drawn, future perspectives suggested, and annexes are included. / Palací López, DG. (2018). Quality by Design through multivariate latent structures [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/115489 / TESIS diseño de mezclas diseño de experimentos análisis multivariante optimización quality by design mixture design design of experiments multivariate data analysis optimization partial least squares regression (PLS) ESTADISTICA E INVESTIGACION OPERATIVA
170	Faktoren für eine erfolgreiche Steuerung von Patentaktivitäten: Ergebnisse einer empirischen Studie Günther, Thomas, Moses, Heike 12 September 2006 (has links) Empirischen Studien zufolge können Patente sich positiv auf den Unternehmenserfolg auswirken. Allerdings wirkt dieser Effekt nicht automatisch, sondern Unternehmen müssen sich um den Aufbau und die gesteuerte Weiterentwicklung eines nachhaltigen und wertvollen Patentportfolios bemühen. Bisher ist jedoch nicht wissenschaftlich untersucht worden, welche Maßnahmen Unternehmen ergreifen können, um die unternehmensinternen Vorraussetzungen für eine erfolgreiche Steuerung von Patentaktivitäten zu schaffen. Um diese betrieblichen Faktoren zu identifizieren und deren Relevanz zu quantifizieren, wurden 2005 in einer breiten empirischen Untersuchung die aktiven Patentanmelder im deutschsprachigen Raum (über 1.000 Unternehmen) mit Hilfe eines standardisierten Fragebogens befragt. Auf der Basis von 325 auswertbaren Fragebögen (Ausschöpfungsquote 36,8 %) konnten zum einen Ergebnisse zum aktuellen Aufgabenspektrum der Patentabteilungen sowie zu deren organisatorischen und personellen Strukturen gewonnen werden. Ebenfalls wurde in dieser Status quo-Analyse der Bekanntheits- und Implementierungsgrad von Methoden und Systemen (z. B. Patentbewertungsmethoden, Patent-IT-Systeme) beleuchtet. Zum anderen wurden die betrieblichen Faktoren herausgestellt, auf die technologieorientierte Unternehmen achten sollten, um das Fundament für eine erfolgreiche Patentsteuerung zu legen. / Empirical studies have shown that patents can have a positive effect on corporate success. However, this effect does not occur by itself. Companies have to make an effort to create and to develop a sustainable patent portfolio. So far, no academic studies have investigated into which actions a company can take to establish the internal conditions for successful patent management. To identify and to quantify the relevance of these internal factors, a study was conducted using a standardized written questionnaire with more than 1,000 patent-oriented companies in the German-speaking countries (Germany, Austria, Switzerland, Liechtenstein). In total, 325 valid questionnaires were included in the analyses; this corresponds to an above-average response rate of 36.8 %. These analyses revealed insights into the current task profile of patent departments and their organizational and personnel structures. This status quo analysis also included the investigation into the awareness and implementation level of used methods and systems (e. g. patent evaluation methods, patent IT systems). Furthermore, the study could expose the internal determinants, which technology-oriented companies should focus on to ensure a successful patent management. info:eu-repo/classification/ddc/330 ddc:330

Search results