Spelling suggestions: "subject:"labeling."" "subject:"cabeling.""
261 |
Anotação automática de papéis semânticos de textos jornalísticos e de opinião sobre árvores sintáticas não revisadas / Automatic semantic role labeling on non-revised syntactic trees of journalistic and opinion textsHartmann, Nathan Siegle 25 June 2015 (has links)
Contexto: A Anotação de Papéis Semânticos (APS) é uma tarefa da área de Processamento de Línguas Naturais (PLN) que permite detectar os eventos descritos nas sentenças e os participantes destes eventos (Palmer et al., 2010). A APS responde perguntas como Quem?, Quando?, Onde?, O quê?, e Por quê?, dentre outras e, sendo assim, é importante para várias aplicações de PLN. Para anotar automaticamente um texto com papéis semânticos, a maioria dos sistemas atuais emprega técnicas de Aprendizagem de Máquina (AM). Porém, alguns papéis semânticos são previsíveis e, portanto, não necessitam ser tratados via AM. Além disso, a grande maioria das pesquisas desenvolvidas em APS tem dado foco ao inglês, considerando as particularidades gramaticais e semânticas dessa língua, o que impede que essas ferramentas e resultados sejam diretamente transportados para outras línguas. Revisão da Literatura: Para o português do Brasil, há três trabalhos finalizados recentemente que lidam com textos jornalísticos, porém com performance inferior ao estado da arte para o inglês. O primeiro (Alva- Manchego, 2013) obteve 79,6 de F1 na APS sobre o córpus PropBank.Br; o segundo (Fonseca, 2013), sem fazer uso de um treebank para treinamento, obteve 68,0 de F1 sobre o córpus PropBank.Br; o terceiro (Sequeira et al., 2012) realizou anotação apenas dos papéis Arg0 (sujeito prototípico) e Arg1 (paciente prototípico) no córpus CETEMPúblico, com performance de 31,3 pontos de F1 para o primeiro papel e de 19,0 de F1 para o segundo. Objetivos: O objetivo desse trabalho de mestrado é avançar o estado da arte na APS do português brasileiro no gênero jornalístico, avaliando o desempenho de um sistema de APS treinado com árvores sintáticas geradas por um parser automático (Bick, 2000), sem revisão humana, usando uma amostragem do córpus PLN-Br. Como objetivo adicional, foi avaliada a robustez da tarefa de APS frente a gêneros diferentes, testando o sistema de APS, treinado no gênero jornalístico, em uma amostra de revisões de produtos da web. Esse gênero não foi explorado até então na área de APS e poucas de suas características foram formalizadas. Resultados: Foi compilado o primeiro córpus de opiniões sobre produtos da web, o córpus Buscapé (Hartmann et al., 2014). A diferença de performance entre um sistema treinado sobre árvores revisadas e outro sobre árvores não revisadas ambos no gênero jornalístico foi de 10,48 pontos de F1. A troca de gênero entre as fases de treinamento e teste, em APS, é possível, com perda de performance de 3,78 pontos de F1 (córpus PLN-Br e Buscapé, respectivamente). Foi desenvolvido um sistema de inserção de sujeitos não expressos no texto, com precisão de 87,8% no córpus PLN-Br e de 94,5% no córpus Buscapé. Foi desenvolvido um sistema, baseado em regras, para anotar verbos auxiliares com papéis semânticos modificadores, com confiança de 96,76% no córpus PLN-Br. Conclusões: Foi mostrado que o sistema de Alva-Manchego (2013), baseado em árvores sintáticas, desempenha melhor APS do que o sistema de Fonseca (2013), independente de árvores sintáticas. Foi mostrado que sistemas de APS treinados sobre árvores sintáticas não revisadas desempenham melhor APS sobre árvores não revisadas do que um sistema treinado sobre dados gold-standard. Mostramos que a explicitação de sujeitos não expressos nos textos do Buscapé, um córpus do gênero de opinião de produtos na web, melhora a performance da sua APS. Também mostramos que é possível anotar verbos auxiliares com papéis semânticos modificadores, utilizando um sistema baseado em regras, com alta confiança. Por fim, mostramos que o uso do sentido do verbo, como feature de AM, para APS, não melhora a perfomance dos sistemas treinados sobre o PLN-Br e o Buscapé, por serem córpus pequenos. / Background: Semantic Role Labeling (SRL) is a Natural Language Processing (NLP) task that enables the detection of events described in sentences and the participants of these events (Palmer et al., 2010). SRL answers questions such as Who?, When?, Where?, What? and Why? (and others), that are important for several NLP applications. In order to automatically annotate a text with semantic roles, most current systems use Machine Learning (ML) techniques. However, some semantic roles are predictable, and therefore, do not need to be classified through ML. In spite of SRL being well advanced in English, there are grammatical and semantic particularities that prevents full reuse of tools and results in other languages. Related work: For Brazilian Portuguese, there are three studies recently concluded that performs SRL in journalistic texts. The first one (Alva-Manchego, 2013) obtained 79.6 of F1 on the SRL of the PropBank.Br corpus; the second one (Fonseca, 2013), without using a treebank for training, obtained 68.0 of F1 for the same corpus; and the third one (Sequeira et al., 2012) annotated only the Arg0 (prototypical agent) and Arg1 (prototypical patient) roles on the CETEMPúblico corpus, with a perfomance of 31.3 of F1 for the first semantic role and 19.0 for the second one. None of them, however, reached the state of the art of the English language. Purpose: The goal of this masters dissertation was to advance the state of the art of SRL in Brazilian Portuguese. The training corpus used is from the journalistic genre, as previous works, but the SRL annotation is performed on non-revised syntactic trees, i.e., generated by an automatic parser (Bick, 2000) without human revision, using a sampling of the corpus PLN-Br. To evaluate the resulting SRL classifier in another text genre, a sample of product reviews from web was used. Until now, product reviews was a genre not explored in SRL research, and few of its characteristics are formalized. Results: The first corpus of web product reviews, the Buscapé corpus (Hartmann et al., 2014), was compiled. It is shown that the difference in the performance of a system trained on revised syntactic trees and another trained on non-revised trees both from the journalistic genre was of 10.48 of F1. The change of genres during the training and testing steps in SRL is possible, with a performance loss of 3.78 of F1 (corpus PLN-Br and Buscapé, respectively). A system to insert unexpressed subjects reached 87.8% of precision on the PLN-Br corpus and a 94.5% of precision on the Buscapé corpus. A rule-based system was developed to annotated auxiliary verbs with semantic roles of modifiers (ArgMs), achieving 96.76% confidence on the PLN-Br corpus. Conclusions: First we have shown that Alva-Manchego (2013) SRL system, that is based on syntactic trees, performs better annotation than Fonseca (2013)s system, that is nondependent on syntactic trees. Second the SRL system trained on non-revised syntactic trees performs better over non-revised trees than a system trained on gold-standard data. Third, the explicitation of unexpressed subjects on the Buscapé texts improves their SRL performance. Additionally, we show it is possible to annotate auxiliary verbs with semantic roles of modifiers, using a rule-based system. Last, we have shown that the use of the verb sense as a feature of ML, for SRL, does not improve the performance of the systems trained over PLN-Br and Buscapé corpus, since they are small.
|
262 |
Design, Fabrication, and Optimization of Miniaturized Devices for Bioanalytical ApplicationsKumar, Suresh 01 August 2015 (has links)
My dissertation work integrates the techniques of microfabrication, micro/nanofluidics, and bioanalytical chemistry to develop miniaturized devices for healthcare applications. Semiconductor processing techniques including photolithography, physical and chemical vapor deposition, and wet etching are used to build these devices in silicon and polymeric materials. On-chip micro-/nanochannels, pumps, and valves are used to manipulate the flow of fluid in these devices. Analytical techniques such as size-based filtration, solid-phase extraction (SPE), sample enrichment, on-chip labeling, microchip electrophoresis (µCE), and laser induced fluorescence (LIF) are utilized to analyze biomolecules. Such miniaturized devices offer the advantages of rapid analysis, low cost, and lab-on-a-chip scale integration that can potentially be used for point-of-care applications.The first project involves construction of sieving devices on a silicon substrate, which can separate sub-100-nm biostructures based on their size. Devices consist of an array of 200 parallel nanochannels with a height step in each channel, an injection reservoir, and a waste reservoir. Height steps are used to sieve the protein mixture based on size as the protein solution flows through channels via capillary action. Proteins smaller than the height step reach the end of the channels while larger proteins stop at the height step, resulting in separation. A process is optimized to fabricate 10-100 nm tall channels with improved reliability and shorter fabrication time. Furthermore, a protocol is developed to reduce the electrostatic interaction between proteins and channel walls, which allows the study of size-selective trapping of five proteins in this system. The effects of protein size and concentration on protein trapping behavior are evaluated. A model is also developed to predict the trapping behavior of different size proteins in these devices. Additionally, the influence of buffer ionic strength, which can change the effective cross-sectional area of nanochannels and trapping of proteins at height steps, is explored in nanochannels. The ionic strength inversely correlates with electric double layer thickness. Overall, this work lays a foundation for developing nanofluidic-based sieving systems with potential applications in lipoprotein fractionation, protein aggregate studies in biopharmaceuticals, and protein preconcentration. The second project focuses on designing and developing a microfluidic-based platform for preterm birth (PTB) diagnosis. PTB is a pregnancy complication that involves delivery before 37 weeks of gestation, and causes many newborn deaths and illnesses worldwide. Several serum PTB biomarkers have recently been identified, including three peptides and six proteins. To provide rapid analysis of these PTB biomarkers, an integrated SPE and µCE device is assembled that provides sample enrichment, on-chip labeling, and separation. The integrated device is a multi-layer structure consisting of polydimethylsiloxane valves with a peristaltic pump, and a porous polymer monolith in a thermoplastic layer. The valves and pump are fabricated using soft lithography to enable pressure-based sample actuation, as an alternative to electrokinetic operation. Porous monolithic columns are synthesized in the SPE unit using UV photopolymerization of a mixture consisting of monomer, cross-linker, photoinitiator, and various porogens. The hydrophobic surface and porous structure of the monolith allow both protein retention and easy flow. I have optimized the conditions for ferritin retention, on-chip labelling, elution, and µCE in a pressure-actuated device. Overall functionality of the integrated device in terms of pressure-controlled flow, protein retention/elution, and on-chip labelling and separation is demonstrated using a PTB biomarker (ferritin). Moreover, I have developed a µCE protocol to separate four PTB biomarkers, including three peptides and one protein. In the future, an immunoaffinity extraction unit will be integrated with SPE and µCE to enable rapid, on-chip analysis of PTB biomarkers. This integrated system can be used to analyze other disease biomarkers as well.
|
263 |
La structure verbale en chinois mandarin : un problème d'étiquetage ? / The VP structure in Mandarin Chinese : a labeling probleme ?Zhao, Chen 29 May 2017 (has links)
Ce travail de thèse porte sur deux phénomènes syntaxiques importants du chinois mandarin: la construction à copie du verbe et la construction en BA, chacun desquels pose un défi à l'approche générative de la linguistique. Nous avons proposé dans le travail une analyse par étiquetage qui permet éventuellement d'unifier les deux phénomènes qui sont à première vue très différents. Dans la partie sur le phénomène de copie du verbe, nous avons avancé l'idée que la forme de la copie du verbe est dérivée par le mouvement du VP qui fait partie d'une des stratégies grammaticales pour rendre labélisable la structure formée par la fusion entre le complément postverbal et AspP, qui est du type {XP, YP} dans les termes de Chomsky (2013). Dans la partie sur les constructions en BA, en nous basant sur l'hypothèse du mouvement, nous supposons que le mouvement du NP (le BA-NP) est pour satisfaire des exigences de sous-catégorisation de l'élément BA, que le Merge interne du NP avec le vP donne lieu à une structure nominale qui porte le label [φ], et que le rôle de BA est de fournir un label verbal à cette structure nominale sous T, qui autrement serait exclue à l'interface C-I / -résumé en anglais:The thesis mainly discusses two important syntactic phenomenons of Mandarin Chinese: verb copying constructions and BA-constructions, each of which presents a challenge to the generative approach of linguistics. I provided in the thesis a labeling analysis that allows to unify the two phenomena which are very different en the surface. In the part of verb copying constructions, I put forward the idea that the verb copying form is derived by VP movement, one of the grammatical strategies to provide a label to the unlabelable structure formed by the internal Merge between the postverbal complement and AspP, resulting in a {XP, YP} structure in the terms of Chomsky (2013). In the part of BA-construction, based on the movement hypothesis, I proposed that the movement of NP (the BA-NP) is to satisfy the subcategorization requirement of BA, and the internal Merge between the NP and the vP gives rise to a nominal structure labeled [φ], and we argue further that the role of BA is to provide a verbal label to the nominal structure under T, which otherwise, would be ruled out at the CI interface
|
264 |
Vad ska vi säga? : En studie om begreppsystem inom LSS i hemmiljöSvärdstam, Jonas January 2019 (has links)
No description available.
|
265 |
Radio Number for Fourth Power PathsAlegria, Linda V 01 December 2014 (has links)
A path on n vertices, denoted by Pn, is a simple graph whose vertices can be ordered so that two vertices are adjacent if and only if they are consecutive in the order. A fourth power path, Pn4, is obtained from Pn by adding edges between any two vertices, u and v, whose distance in Pn, denoted by dPn(u,v), is less than or equal to four. The diameter of a graph G, denoted diam(G) is the greatest distance between any two distinct vertices of G. A radio labeling of a graph G is a function f that assigns to each vertex a label from the set {0,1,2,...} such that |f(u)−f(v)| ≥ diam(G)−d(u,v)+1 holds for any two distinct vertices, u and v in G (i.e., u, v ∈ V (G)). The greatest value assigned to a vertex by f is called the span of the radio labeling f, i.e., spanf =max{f(v) : v ∈ V (G)}. The radio number of G, rn(G), is the minimum span of f over all radio labelings f of G. In this paper, we provide a lower bound for the radio number of the fourth power path.
|
266 |
Employer Perceptions When Applying Criminal History Information to the Hiring ProcessLevy McCanna, Karen S 01 January 2019 (has links)
In recent years, the state of Illinois has joined the "ban the box" movement which typically prohibits employers from inquiring about a prospective employee's criminal history until it has been determined whether the candidate meets the core qualifications for the position. Little, however, is known whether this legislative change has impacted how private employers use criminal history information and to what extent knowledge of criminal history impacts final hiring decisions. Using Kingdon's policy streams concept as a guide, the purpose of this general qualitative study was to understand whether implementation of "ban the box" principles impacts final hiring decisions. Data were collected through interviews with 27 hiring authorities in the state of Illinois. These data were transcribed, inductively coded, and then subjected to a thematic analysis procedure. Findings revealed that when previously convicted applicants were hired for positions, the most common reasons were noted as the quality and presentation of the candidate during the interview, possession of relevant job-related skills, and the candidate appeared remorseful of past behavior. When candidates were rejected by employers, it was most commonly because of a perceived nexus between the convicting offense and essential job requirements. Implications for positive social change include recommendations policy makers to consider future policy development that focuses on balancing the positive consequences of successful offender reentry with concern for public safety. Doing so may encourage lower recidivism and prosocial behavior including improved employment sustainability for those convicted of crimes, thereby promoting overall public safety objectives.
|
267 |
Mosquito popper: a multiplayer online game for 3D human body scan data segmentationNolte, Zachary 01 May 2017 (has links)
Game with a purpose (GWAP) is a concept that aims to utilize the hours spent in the world playing video games by everyday people to yield valuable data. The main objective of this research is to prove the feasibility of using the concept of GWAP for the segmentation and labeling of massive amount of 3D human body scan data. The rationale behind using GWAP as a method for mesh segmentation and labeling is that the current methods use expensive, time consuming computational algorithms to accomplish this task. Furthermore, the computer algorithms are not as detailed and specific as what natural human ability can achieve in segmentation tasks. The method presented in this paper overcomes the shortcomings of computer algorithms by introducing the concept of GWAP for human model segmentation. The actual process of segmenting and labeling the mesh becomes a form of entertainment rather than a tedious process, from which segmentation data is produced as a bi-product. In addition, the natural capabilities of the human visual processing systems are harnessed to identify and label various parts of the 3D human body shape, which in turn gives more details and specificity in segmentation. The effectiveness of the proposed game play mechanism is proven by experiments conducted in this study.
|
268 |
Ensembles for Distributed DataShoemaker, Larry 21 October 2005 (has links)
Many simulation data sets are so massive that they must be distributed among disk farms attached to different computing nodes. The data is partitioned into spatially disjoint sets that are not easily transferable among nodes due to bandwidth limitations. Conventional machine learning methods are not designed for this type of data distribution. Experts mark a training data set with different levels of saliency emphasizing speed rather than accuracy due to the size of the task. The challenge is to develop machine learning methods that learn how the expert has marked the training data so that similar test data sets can be marked more efficiently.
Ensembles of machine learning classifiers are typically more accurate than individual classifiers. An ensemble of machine learning classifiers requires substantially less memory than the corresponding partition of the data set. This allows the transfer of ensembles among partitions. If all the ensembles are sent to each partition, they can vote for a level of saliency for each example in the partition. Different partitions of the data set may not have any salient points, especially if the data set has a time step dimension. This means the learned classifier for such partitions can not vote for saliency since they have not been trained to recognize it.
In this work, we investigate the performance of different ensembles of classifiers on spatially partitioned data sets. Success is measured by the correct recognition of unknown and salient regions of data points.
|
269 |
The preparation and evaluation of N-acetylneuraminic acid derivatives as probes of sialic acid-recognizing proteinsCiccotosto, Silvana January 2004 (has links)
Abstract not available
|
270 |
Syntax-driven argument identification and multi-argument classification for semantic role labelingLin, Chi-San Althon January 2007 (has links)
Semantic role labeling is an important stage in systems for Natural Language Understanding. The basic problem is one of identifying who did what to whom for each predicate in a sentence. Thus labeling is a two-step process: identify constituent phrases that are arguments to a predicate, then label those arguments with appropriate thematic roles. Existing systems for semantic role labeling use machine learning methods to assign roles one-at-a-time to candidate arguments. There are several drawbacks to this general approach. First, more than one candidate can be assigned the same role, which is undesirable. Second, the search for each candidate argument is exponential with respect to the number of words in the sentence. Third, single-role assignment cannot take advantage of dependencies known to exist between semantic roles of predicate arguments, such as their relative juxtaposition. And fourth, execution times for existing algorithm are excessive, making them unsuitable for real-time use. This thesis seeks to obviate these problems by approaching semantic role labeling as a multi-argument classification process. It observes that the only valid arguments to a predicate are unembedded constituent phrases that do not overlap that predicate. Given that semantic role labeling occurs after parsing, this thesis proposes an algorithm that systematically traverses the parse tree when looking for arguments, thereby eliminating the vast majority of impossible candidates. Moreover, instead of assigning semantic roles one at a time, an algorithm is proposed to assign all labels simultaneously; leveraging dependencies between roles and eliminating the problem of duplicate assignment. Experimental results are provided as evidence to show that a combination of the proposed argument identification and multi-argument classification algorithms outperforms all existing systems that use the same syntactic information.
|
Page generated in 0.0508 seconds