• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • 2
  • 2
  • 1
  • Tagged with
  • 16
  • 7
  • 6
  • 6
  • 6
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A Hybrid Approach to Cross-Linguistic Tokenization: Morphology with Statistics

Kearsley, Logan R. 01 June 2016 (has links)
Tokenization, or word boundary detection, is a critical first step for most NLP applications. This is often given little attention in English and other languages which use explicit spaces between written words, but standard orthographies for many languages lack explicit markers. Tokenization systems for such languages are usually engineered on an individual basis, with little re-use. The human ability to decode any written language, however, suggests that a general algorithm exists.This thesis presents simple morphologically-based and statistical methods for identifying word boundaries in multiple languages. Statistical methods tend to over-predict, while lexical and morphological methods fail when encountering unknown words. I demonstrate that a generic hybrid approach to tokenization using both morphological and statistical information generalizes well across multiple languages and improves performance over morphological or statistical methods alone, and show that it can be used for efficient tokenization of English, Korean, and Arabic.
2

Making Sense of Online Reviews: A Machine Learning Approach: An Abstract

Harrison, Dana E., Ajjan, Haya 01 January 2020 (has links)
It is estimated that 80% of companies’ data is unstructured. Unstructured data, or data that is not predefined by numerical values, continues to grow at a rapid pace. Images, text, videos and voice are all examples of unstructured data. Companies can use this type of data to leverage novel insights unavailable through more easily manageable, structured data. Unstructured data, however, creates a challenge since it often requires substantial coding prior to performing an analysis. The purpose of this study is to describe the steps and introduce computational methods that can be adopted to further explore unstructured, online reviews. The unstructured nature of online reviews requires extensive text analytics processing. This study introduces methods for text analytics including tokenization at the sentence level, lemmatization or stemming to reduce inflectional forms of the words appearing in the text, and ‘bag of n-grams’ approach. We will also introduce lexicon-based feature engineering and methods to develop new lexicons for capturing theoretically established constructs and relationships that are specific to the domain of study. The numeric features generated in the analysis will then be analyzed using machine learning algorithms. This process can be applied to the analysis of other unstructured data such as dyadic information exchange between customer service, salespeople, customers and channel members. Although not a comprehensive set of examples, companies can apply results from unstructured data analysis to examine a variety of outcomes related to customer decisions, managing channels and mitigating potential crisis situations. Understanding interdisciplinary methods of analyzing unstructured data is critical as the availability of this type of data continues to accelerate and enables researchers to develop theoretical contributions within the marketing discipline.
3

Incremental Re-tokenization in BPE-trained SentencePiece Models

Hellsten, Simon January 2024 (has links)
This bachelor's thesis in Computer Science explores the efficiency of an incremental re-tokenization algorithm in the context of BPE-trained SentencePiece models used in natural language processing. The thesis begins by underscoring the critical role of tokenization in NLP, particularly highlighting the complexities introduced by modifications in tokenized text. It then presents an incremental re-tokenization algorithm, detailing its development and evaluating its performance against a full text re-tokenization. Experimental results demonstrate that this incremental approach is more time-efficient than full re-tokenization, especially evident in large text datasets. This efficiency is attributed to the algorithm's localized re-tokenization strategy, which limits processing to text areas around modifications. The research concludes by suggesting that incremental re-tokenization could significantly enhance the responsiveness and resource efficiency of text-based applications, such as chatbots and virtual assistants. Future work may focus on predictive models to anticipate the impact of text changes on token stability and optimizing the algorithm for different text contexts.
4

Real Estate Tokenization Based on the Blockchain / Fastighetstokenisering baserad på Blockchain

Lu, Yanjiang January 2022 (has links)
Real estate is one of the largest assets in the world, but it is also synonymous with low liquidity. Tokenization of real estate emerged in the era of real estate 4.0 as technological innovation to counter the low liquidity, long and expensive transaction process, and high entry threshold of the real estate industry. By Fractionalization, tokenization of real estate reduces the required capital that investors need. It is also based on blockchain and smart contract technology, which provides open and transparent information and greatly improves transaction efficiency. However, tokenization also faces the problems of low demand and insignificant economic benefits.The purpose of this study is to investigate the economic performance of publicly traded real estate tokens in the US market and the application prospects of tokenization in China. The results of this study indicate that real estate tokens always have higher returns than market portfolios, although sometimes it is not obvious. And accordingly, real estate tokens have higher risks, although these risks are mainly from themselves rather than the market. Even if tokenization can play an important role in promoting the reform of China’s real estate industry, there are still big obstacles to its promotion in China, including profitability, policy, financing needs, and risks. Therefore, it is difficult to be widely promoted in China in the short term.In the future, more accurate conclusions about economic performance can be drawn based on a larger historical data set of real estate tokens. In the case of continued above-market economic performance, if China’s real estate market is actively transformed, and legislation and technology have been improved, real estate tokenization can be widely applied in China and further promote the reform of China’s real estate industry. / Fastigheter är en av de största tillgångarna i världen, men det är också synonymt med låg likviditet. Tokenisering av fastigheter uppstod i en tid präglad av fastigheter 4.0 som en teknisk innovation för att motverka den låga likviditeten, långa och dyra transaktionsprocessen och höga ingångströskeln för fastighetsbranschen. Genom fraktionering minskar tokenisering av fastigheter det kapital som krävs för investerare. Den är också baserad på blockchain och smart kontraktsteknologi, som ger öppen och transparent information och avsevärt förbättrar transaktionseffektiviteten. Men tokenisering möter också problemen med låg efterfrågan och obetydliga ekonomiska fördelar.Syftet med denna studie är att undersöka den ekonomiska prestandan för börsnoterade fastighetstokens på den amerikanska marknaden och tillämpningsmöjligheterna för tokenisering i Kina. Resultaten av denna studie indikerar att fastighetspoletter alltid har högre avkastning än marknadsportföljer, även om det ibland inte är uppenbart. Och följaktligen har fastighetstokens högre risker, även om dessa risker huvudsakligen kommer från sig själv snarare än marknaden. Även om tokenisering kan spela en viktig roll för att främja reformen av Kinas fastighetsindustri, finns det fortfarande stora hinder för marknadsföring i Kina, inklusive lönsamhet, policy, finansieringsbehov och risker. Därför är det svårt att bli allmänt befordrad i Kina på kort sikt.I framtiden kan mer exakta slutsatser om ekonomisk prestanda dras baserat på en större historisk datauppsättning av fastighetstokens. I fallet med fortsatt ekonomisk prestation över marknaden, om Kinas fastighetsmarknad aktivt omvandlas och lagstiftning och teknik har förbättrats, kan fastighetstokenisering tillämpas allmänt i Kina och ytterligare främja reformen av Kinas fastighetsindustri.
5

Title Leveraging Blockchain Technology in Green Finance / Utnyttja blockchain-teknik i grön finans

Maleki, Hooman January 2023 (has links)
To achieve net zero emissions by 2050, a $275 trillion investment is needed, with the private sector playing a significant role. Green corporate bonds are a popular financing method, having grown to 6% of global corporate bonds in 2021. Despite their potential, green bonds face challenges, such as lack of standardization, high costs, and greenwashing risks. Tokenization through Security Token Offerings (STOs) can increase demand and supply for green bonds by enabling fractional ownership, eliminating intermediaries, and improving transparency with blockchain technology and IoT sensors. This drives demand, increases liquidity, and reduces greenwashing risk. STOs also allow smaller investments and finance access for SMEs.This study employs a multi-criteria decision-making model to select a blockchain platform for STOs in green bonds. The process involves identifying platforms, defining features, evaluating suitability, and selecting the platform that best aligns with green bond STO requirements. / För att uppnå netto nollutsläpp till 2050 krävs en investering på 275 biljoner dollar, där den privata sektorn spelar en betydande roll. Gröna företagsobligationer är en populär finansieringsmetod, som växte till 6% av de globala företagsobligationerna 2021. Trots deras potential står gröna obligationer inför utmaningar, som brist på standardisering, höga kostnader och risker med greenwashing. Tokenisering genom Security Token Offerings (STOs) kan öka efterfrågan och tillgången på gröna obligationer genom att möjliggöra bråkägande, elimineramellanhänder och förbättra transparensen med blockkedjeteknik och IoT-sensorer. Detta driver efterfrågan, ökar likviditeten och minskar risken för greenwashing. STO tillåter också mindre investeringar och finansieringsåtkomst för små och medelstora företag.Denna studie använder en flerkriteriemodell för beslutsfattande för att välja en blockkedjeplattform för STO i gröna obligationer. Processen innefattar identifiering av plattformar, definition av funktioner, utvärdering av lämplighet och val av den plattform som bäst överensstämmer med kraven för gröna obligationers STO.
6

Real Estate Tokenization : Structure, Performance and Liquidity Implications / Fastighetstokenisering : Struktur, utveckling och implikation av likviditet

Kull, Felix, Naumann, Theodor January 2022 (has links)
This thesis incorporates a quantitative and qualitative approach to studying real estate tokenization. Real estate tokens are a rapidly-growing investment product with a foundation in blockchain technology. Real estate tokenization platforms and market experts view this product as a solution to many drawbacks that real estate investments have traditionally faced. For instance, tokenization of real estate opts to increase the liquidity through fractionalization of properties, enable secondary market trading, and decrease the need for intermediates in the transaction process. Theoretically, this would from the asset owner’s perspective imply a new platform for both debt and equity raising which could reach a new pool of investors from all over the world. This study examines the structure of real estate tokens and finds similarities and differences to traditional existing products. Additionally, an empirical examination of the performance of real estate tokens is conducted, measured as aggregated return, benchmarked against specific reference indexes. The empirical result found that the token indexes at large did not outperform the benchmark indexes adjusted for risk-adjusted return. Lastly, this paper investigated the potential implications increased liquidity could yield on the residential real estate market as a consequence of tokenized properties, three main conjunctions to liquidity were selected and discussed from the basis of real estate tokenization. / En kvantitativ och kvalitativ metod användes i detta arbete för att studera fastighetstokenisering (Eng. “Real Estate Tokenization”). Fastighetstokens är en snabbt växande investeringsprodukt som använder sig utav blockchain-teknologi. Plattformar för fastighetstokenisering och marknadsexperter beskriver ofta denna produkt som en lösning på de många nackdelar som fastighetsinvesteringar traditionellt kännetecknats av. Exempel på potentiella fördelar som nämns som följd av tokenisering av fastigheter är en ökad likviditet genom fraktionering av fastigheter, möjliggörande av handel på andrahandsmarknader och ett minskat behov av mellanhänder i transaktionsprocessen. Teoretiskt sett skulle fastighetstokenisering ur fastighetsägarens perspektiv, innebära en ny plattform för både skuld- och aktieanskaffning från investerare över hela världen. Den här studien undersöker den finansiella strukturen av fastighetstokens och identifierar likheter och skillnader med traditionella finansiella produkter. Vidare utfördes en empirisk jämförelse av fastighetstokens, mätt som aggregerad avkastning, jämfört mot specifika referensindex. Det empiriska resultatet visar att fastighetstokens inte genererat en överavkastning mot jämförelseindexen i termer av riskjusterad avkastning. Slutligen diskuteras vad tokenisering av fastigheter och en ökad likviditet skulle kunna leda till på fastighetsmarknaden. Tre huvudkonjunktioner till likviditet valdes ut och diskuteras ur ett tokeniserings-perspektiv.
7

Telemetry Post-Processing in the Clouds: A Data Security Challenge

Kalibjian, J. R. 10 1900 (has links)
ITC/USA 2011 Conference Proceedings / The Forty-Seventh Annual International Telemetering Conference and Technical Exhibition / October 24-27, 2011 / Bally's Las Vegas, Las Vegas, Nevada / As organizations move toward cloud [1] computing environments, data security challenges will begin to take precedence over network security issues. This will potentially impact telemetry post processing in a myriad of ways. After reviewing how data security tools like Enterprise Rights Management (ERM), Enterprise Key Management (EKM), Data Loss Prevention (DLP), Database Activity Monitoring (DAM), and tokenization are impacting cloud security, their effect on telemetry post-processing will also be examined. An architecture will be described detailing how these data security tools can be utilized to make telemetry post-processing environments in the cloud more robust.
8

Data Security Architecture Considerations for Telemetry Post Processing Environments

Kalibjian, Jeff 10 1900 (has links)
Telemetry data has great value, as setting up a framework to collect and gather it involve significant costs. Further, the data itself has product diagnostic significance and may also have strategic national security importance if the product is defense or intelligence related. This potentially makes telemetry data a target for acquisition by hostile third parties. To mitigate this threat, data security principles should be employed by the organization to protect telemetry data. Data security is in an important element of a layered security strategy for the enterprise. The value proposition centers on the argument that if organization perimeter/internal defenses (e.g. firewall, IDS, etc.) fail enabling hostile entities to be able to access data found on internal company networks; they will be unable to read the data because it will be encrypted. After reviewing important encryption background including accepted practices, standards, and architectural considerations regarding disk, file, database and application data protection encryption strategies; specific data security options applicable to telemetry post processing environments will be discussed providing tangible approaches to better protect organization telemetry data.
9

Design for Addressing Data Privacy Issues in Legacy Enterprise Application Integration

Meddeoda Gedara, Kavindra Kulathilake January 2019 (has links)
Electronic message transfer is the key element in enterprise application integration (EAI) and the privacy of data transferred must be protected by the systems involved in the message transfer from origin to the destination. The recent data privacy regulation such as GDPR (General Data Protection Regulation) has enforced the organizations to ensure the privacy of the personal data handled with obligations to provide visibility and control over to the data owner. Privacy concerns with relevant to sensitive data embedded and transferred through business-to-business (B2B) middleware platforms in enterprise architecture are mostly at risk with the legacy nature of the products and the complexity of system integrations. This poses a great threat and challenge to organizations processing sensitive data over the interconnected systems in complying with regulatory requirements.  This research proposes a solution design to address the data privacy issues related to personal data handled in an enterprise application integration framework. Where electronic messages used to transfer personally identifiable information (PII). The proposal consisting of a design called “Safety Locker” to issue unique tokens related to encrypted PII elements stored in a persistence data storage based on Apache Ignite. While adding REST API interfaces to access the application functionality such as tokenization, de-tokenization, token management and accessing audit logs. The safety locker can run as a standalone application allowing clients to access its functionality remotely utilizing hypertext transfer protocol (HTTP). The design allows the data controllers to ensure the privacy of PII by embedding tokens generated from the application within the electronic messages transferred through interconnected systems. The solution design is evaluated through a proof of concept implementation, which can be adapted, enhanced to apply in EAI implementations.
10

Shlukování slov podle významu / Word Sense Clustering

Jadrníček, Zbyněk January 2015 (has links)
This thesis is focused on the problem of semantic similarity of words in English language. At first reader is informed about theory of word sense clustering, then there are described chosen methods and tools related to the topic. In the practical part we design and implement system for determining semantic similarity using Word2Vec tool, particularly we focus on biomedical texts of MEDLINE database. At the end of the thesis we discuss reached results and give some ideas to improve the system.

Page generated in 0.1039 seconds