Spelling suggestions: "subject:"antity"" "subject:"dentity""
261 |
Multiple Entity ReconciliationSamoila, Lavinia Andreea January 2015 (has links)
Living in the age of "Big Data" is both a blessing and a curse. On he one hand, the raw data can be analysed and then used for weather redictions, user recommendations, targeted advertising and more. On he other hand, when data is aggregated from multiple sources, there is no guarantee that each source has stored the data in a standardized or even compatible format to what is required by the application. So there is a need to parse the available data and convert it to the desired form. Here is where the problems start to arise: often the correspondences are not quite so straightforward between data instances that belong to the same domain, but come from different sources. For example, in the film industry, information about movies (cast, characters, ratings etc.) can be found on numerous websites such as IMDb or Rotten Tomatoes. Finding and matching all the data referring to the same movie is a challenge. The aim of this project is to select the most efficient algorithm to correlate movie related information gathered from various websites automatically. We have implemented a flexible application that allows us to make the performance comparison of multiple algorithms based on machine learning techniques. According to our experimental results, a well chosen set of rules is on par with the results from a neural network, these two proving to be the most effective classifiers for records with movie information as content.
|
262 |
Convolutional Neural Networks for Named Entity Recognition in Images of Documentsvan de Kerkhof, Jan January 2016 (has links)
This work researches named entity recognition (NER) with respect to images of documents with a domain-specific layout, by means of Convolutional Neural Networks (CNNs). Examples of such documents are receipts, invoices, forms and scientific papers, the latter of which are used in this work. An NER task is first performed statically, where a static number of entity classes is extracted per document. Networks based on the deep VGG-16 network are used for this task. Here, experimental evaluation shows that framing the task as a classification task, where the network classifies each bounding box coordinate separately, leads to the best network performance. Also, a multi-headed architecture is introduced, where the network has an independent fully-connected classification head per entity. VGG-16 achieves better performance with the multi-headed architecture than with its default, single-headed architecture. Additionally, it is shown that transfer learning does not improve performance of these networks. Analysis suggests that the networks trained for the static NER task learn to recognise document templates, rather than the entities themselves, and therefore do not generalize well to new, unseen templates. For a dynamic NER task, where the type and number of entity classes vary per document, experimental evaluation shows that, on large entities in the document, the Faster R-CNN object detection framework achieves comparable performance to the networks trained on the static task. Analysis suggests that Faster R-CNN generalizes better to new templates than the networks trained for the static task, as Faster R-CNN is trained on local features rather than the full document template. Finally, analysis shows that Faster R-CNN performs poorly on small entities in the image and suggestions are made to improve its performance.
|
263 |
Translation Memory System Optimization : How to effectively implement translation memory system optimization / Optimering av översättningsminnessystem : Hur man effektivt implementerar en optimering i översättningsminnessystemChau, Ting-Hey January 2015 (has links)
Translation of technical manuals is expensive, especially when a larger company needs to publish manuals for their whole product range in over 20 different languages. When a text segment (i.e. a phrase, sentence or paragraph) is manually translated, we would like to reuse these translated segments in future translation tasks. A translated segment is stored with its corresponding source language, often called a language pair in a Translation Memory System. A language pair in a Translation Memory represents a Translation Entry also known as a Translation Unit. During a translation, when a text segment in a source document matches a segment in the Translation Memory, available target languages in the Translation Unit will not require a human translation. The previously translated segment can be inserted into the target document. Such functionality is provided in the single source publishing software, Skribenta developed by Excosoft. Skribenta requires text segments in source documents to find an exact or a full match in the Translation Memory, in order to apply a translation to a target language. A full match can only be achieved if a source segment is stored in a standardized form, which requires manual tagging of entities, and often reoccurring words such as model names and product numbers. This thesis investigates different ways to improve and optimize a Translation Memory System. One way was to aid users with the work of manual tagging of entities, by developing Heuristic algorithms to approach the problem of Named Entity Recognition (NER). The evaluation results from the developed Heuristic algorithms were compared with the result from an off the shelf NER tool developed by Stanford. The results shows that the developed Heuristic algorithms is able to achieve a higher F-Measure compare to the Stanford NER, and may be a great initial step to aid Excosofts’ users to improve their Translation Memories. / Översättning av tekniska manualer är väldigt kostsamt, speciellt när större organisationer behöver publicera produktmanualer för hela deras utbud till över 20 olika språk. När en text (t.ex. en fras, mening, paragraf) har blivit översatt så vill vi kunna återanvända den översatta texten i framtida översättningsprojekt och dokument. De översatta texterna lagras i ett översättningsminne (Translation Memory). Varje text lagras i sitt källspråk tillsammans med dess översättning på ett annat språk, så kallat målspråk. Dessa utgör då ett språkpar i ett översättningsminnessystem (Translation Memory System). Ett språkpar som lagras i ett översättningsminne utgör en Translation Entry även kallat Translation Unit. Om man hittar en matchning när man söker på källspråket efter en given textsträng i översättningsminnet, får man upp översättningar på alla möjliga målspråk för den givna textsträngen. Dessa kan i sin tur sättas in i måldokumentet. En sådan funktionalitet erbjuds i publicerings programvaran Skribenta, som har utvecklats av Excosoft. För att utföra en översättning till ett målspråk kräver Skribenta att text i källspråket hittar en exakt matchning eller en s.k. full match i översättningsminnet. En full match kan bara uppnås om en text finns lagrad i standardform. Detta kräver manuell taggning av entiteter och ofta förekommande ord som modellnamn och produktnummer. I denna uppsats undersöker jag hur man effektivt implementerar en optimering i ett översättningsminnessystem, bland annat genom att underlätta den manuella taggningen av entitier. Detta har gjorts genom olika Heuristiker som angriper problemet med Named Entity Recognition (NER). Resultat från de utvecklade Heuristikerna har jämförts med resultatet från det NER-verktyg som har utvecklats av Stanford. Resultaten visar att de Heuristiker som jag utvecklat uppnår ett högre F-Measure jämfört med Stanford NER och kan därför vara ett bra inledande steg för att hjälpa Excosofts användare att förbättra deras översättningsminnen.
|
264 |
Fullstack e-handel applikationKushkbaghi, Nick January 2022 (has links)
The goal of this project has been to create an online store for a music store called Music for All. The company's business concept is to sell new and used music equipment online with payment systems via mail or Paypal. The project consists of three separate parts: the first part that stores product information in a SQL Server database created via ASP.NET Core Web API and Entity Framework, the second part consists of an admin interface that contains a login system created by ASP.NET Core Identity Framework which make it possible for a logged in admin be able to create, delete and update products through a Headless CSM which was created by ASP.NET Core MVC and Entity Framework Core. The online store was developed via React.JS, Next.Js and Redux, which created functionality for consuming the REST API and functions that increased the web application's availability and usability. The REST API was developed via ASP.NET Core. The design was performed via React.JS and Sass and Node.JS to be able to execute JS code on the browser. / Målet med detta projektet har varit att skapa en webbutik för en musikaffär som heter Music for All och skulle etablera sig i sommaren. Företagets affärsidé var att sälja nya och begagnade musikutrustningar på nätet med betalningssystem via post eller Paypal. Projektet utvecklades i tre separata delar där en del lagrade produktinformationer i en SQL Server databas som skapades via ASP.NET Core Web API och Entity Framework, andra delen bestod av ett admingränssnitt som innefattade ett inloggningssystem som skapades av ASP.NET Core Identity Framework för att inloggade anställda skulle kunna använda ett innehållshanteringssystem/Headless CSM som skapades via ASP.NET Core MVC och Entity Framework Core. Webbutiken utvecklades via React.JS, Next.Js och Redux som skapade både funktionalitet för konsumering av REST API:et och funktioner som ökade webbapplikationens tillgänglighet och användbarhet. REST API:et utvecklades via ASP.NET Core. Designen utfördes via React.JS, Sass och Node.JS för att kunna exekvera JS kod på webbläsaren.
|
265 |
Přechod práv a povinností z pracovněprávních vztahů / Transfer of rights and obligations from the employment relationshipsNerad, Miroslav January 2021 (has links)
In his thesis, the author deals with the transfer of rights and obligations arising from employment relationships. In the Czech law, the regulation of transfer of rights and obligations is governed in substantial part by the Act No. 262/2006 Coll., the Labour Code, as amended, and in other statutes. The Czech legislation is based on harmonised regulation in the law of the European Union, which is provided for in the Council Directive 2001/23/EC of 12 March 2001 on the approximation of the laws of the Member States relating to the safeguarding of employees' rights in the event of transfers of undertakings, businesses or parts of undertakings or businesses. In his thesis, the author examines the current regulation in the Czech law and the decision- making practice of the Czech courts and compares it with the decision-making practice of the Court of Justice of the European Union. The author also deals with the amendment of transfer of rights and obligations implemented by the Act No. 285/2020 Coll., amending the Act No. 262/2006 Coll., the Labour Code, as amended, and several other related acts, and compares the applicable law with the preceding. The thesis also deals with the history of regulation of transfer of rights and obligations in the Czech Republic, then it deals with the general definition...
|
266 |
Karst Database Development in Minnesota: Design and Data AssemblyGao, Y., Alexander, E. C., Tipping, R. G. 01 May 2005 (has links)
The Karst Feature Database (KFD) of Minnesota is a relational GIS-based Database Management System (DBMS). Previous karst feature datasets used inconsistent attributes to describe karst features in different areas of Minnesota. Existing metadata were modified and standardized to represent a comprehensive metadata for all the karst features in Minnesota. Microsoft Access 2000 and ArcView 3.2 were used to develop this working database. Existing county and sub-county karst feature datasets have been assembled into the KFD, which is capable of visualizing and analyzing the entire data set. By November 17 2002, 11,682 karst features were stored in the KFD of Minnesota. Data tables are stored in a Microsoft Access 2000 DBMS and linked to corresponding ArcView applications. The current KFD of Minnesota has been moved from a Windows NT server to a Windows 2000 Citrix server accessible to researchers and planners through networked interfaces.
|
267 |
Building a Personally Identifiable Information Recognizer in a Privacy Preserved Manner Using Automated Annotation and Federated LearningHathurusinghe, Rajitha 16 September 2020 (has links)
This thesis explores the training of a deep neural network based named entity recognizer in
an end-to-end privacy preserved setting where dataset creation and model training happen
in an environment with minimal manual interventions. With the improvement of accuracy
in Deep Learning Models for practical tasks, a rising concern is satisfying the demand for
training data for these models amidst the concerns on the data privacy. Several scenarios of
data protection are suggested in the recent past due to public concerns hence the legal guidelines
to enforce them. A promising new development is the decentralized model training
on isolated datasets, which eliminates the compromises of privacy upon providing data to a
centralized entity. However, in this federated setting curating the data source is still a privacy
risk mostly in unstructured data sources such as text.
We explore the feasibility of automatic dataset annotation for a Named Entity Recognition
(NER) task and training a deep learning model with it in two federated learning settings.
We explore the feasibility of utilizing a dataset created in this manner for fine-tuning a stateof-
the-art deep learning language model for the downstream task of named entity recognition.
We also explore this novel setting of deep learning NLP model and federated learning
for its deviation from the classical centralized setting.
We created an automatically annotated dataset containing around 80,000 sentences, a
manual human annotated test set and tools to extend the dataset with more manual annotations.
We observed the noise from automated annotation can be overcome to a level by
increasing the dataset size. We also contributed to the federated learning framework with
state-of-the-art NLP model developments. Overall, our NER model achieved around 0.80
F1-score for recognition of entities in sentences.
|
268 |
Lze to říci jinak aneb automatické hledání parafrází / Automatic Identification of ParaphrasesOtrusina, Lubomír January 2009 (has links)
Automatic paraphrase discovery is an important task in natural language processing. Many systems use paraphrases for improve performance e.g. systems for question answering, information retrieval or document summarization. In this thesis, we explain basic concepts e.g. paraphrase or paraphrase pattern. Next we propose some methods for paraphrase discovery from various resources. Subsequently we propose an unsupervised method for discovering paraphrase from large plain text based on context and keywords between NE pairs. In the end we explain evaluation metods in paraphrase discovery area and then we evaluate our system and compare it with similar systems.
|
269 |
Entity-Centric Discourse Analysis and Its Applications / エンティティに注目した談話解析とその応用Wang, Xun 24 November 2017 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第20777号 / 情博第657号 / 新制||情報||113(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 黒橋 禎夫, 教授 河原 達也, 教授 石田 亨 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
270 |
Integration of corporate and individual income taxes : an equity justificationFriedland, Jonathan Brett 25 February 2011 (has links)
No description available.
|
Page generated in 0.0441 seconds