Global ETD Search

611	Multi-Channel Sentiment Analysis in Swedish as Basis for Marketing Decisions Uhlander, Malin January 2023 (has links) In today’s world, it is not enough for companies to consider any one social media channel in isolation. Instead, they must provide their customers with a unified experience across channels and consider interdependencies between channels. Most marketing research that examines user generated content is focused on a single channel and is limited to the English language. This thesis analyses Swedish language content collected from eight different social media platforms: Facebook, YouTube, Instagram, TikTok, Twitter, Tripadvisor, Trustpilot, and Google Reviews. The platforms were compared pairwise by the prevalence of positive, negative, and neutral sentiment in comments and reviews about the theme park Liseberg. The sentiment was predicted using a lexical approach where each word in a wordlist was assigned a weight to denote positive or negative sentiment associated with the word. The study found that there is a statistically significant difference between the positivity, negativity, and neutrality expressed by users on the different social media channels. There was no difference in sentiment between YouTube and Instagram comments, but there were differences in at least one of the three sentiment categories for all other pairwise comparisons of platforms. Having an understanding of the attitudes towards the brand in different channels can support marketers in determining their optimal mix of social media channels. These results are also of interest to researchers who should take the differences between social media platforms into consideration when designing studies around user generated content. omni-channel marketing sentiment analysis Swedish natural language processing opinion mining Computer Sciences Datavetenskap (datalogi)
612	The DVL in the Details: Assessing Differences in Decoy, Victim, and Law Enforcement Chats with Online Sexual Predators Tatiana Renae Ringenberg (11203656) 29 July 2021 (has links) Online sexual solicitors are individuals who deceptively earn the trust of minors online with the goal of eventual sexual gratification. Despite the prevalence of online solicitation, conversations in the domain are difficult to acquire due to the sensitive nature of the data. As a result, researchers studying online solicitors often study conversations between solicitors and decoys which are publicly available online. However, researchers have begun to believe such conversations are not representative of solicitor-victim conversations. Decoys and law enforcement are restricted in that they are unable to initiate contact, suggest meeting, or begin sexual conversations with an offender. Additionally decoys and law enforcement officers both have a goal of gathering evidence which means they often respond positively in contexts which would normally be considered awkward or inappropriate. Multiple researchers have suggested differences may exist between offender-victim and offender-decoy conversations and yet little research has sought to identify the differences and similarities between those talking to solicitors. In this study, the author identifies differences between decoys, officers, and victims within the manipulative process used by online solicitors to entrap victims which is known as grooming. The author looks at differences which occur within grooming stages and strategies within the grooming stages. The research in this study has implications for the data choices of future researchers in this domain. Additionally, this research may be used to inform the training process of officers who will engage in online sex stings. Information Systems Natural Language Processing Computer System Security Online grooming Child exploitation Annotation
613	Incorporating spatial relationship information in signal-to-text processing Davis, Jeremy Elon 13 May 2022 (has links) (PDF) This dissertation outlines the development of a signal-to-text system that incorporates spatial relationship information to generate scene descriptions. Existing signal-to-text systems generate accurate descriptions in regards to information contained in an image. However, to date, no signalto- text system incorporates spatial relationship information. A survey of related work in the fields of object detection, signal-to-text, and spatial relationships in images is presented first. Three methodologies followed by evaluations were conducted in order to create the signal-to-text system: 1) generation of object localization results from a set of input images, 2) derivation of Level One Summaries from an input image, and 3) inference of Level Two Summaries from the derived Level One Summaries. Validation processes are described for the second and third evaluations, as the first evaluation has been previously validated in the related original works. The goal of this research is to show that a signal-to-text system that incorporates spatial information results in more informative descriptions of the content contained in an image. An additional goal of this research is to demonstrate the signal-to-text system can be easily applied to additional data sets, other than the sets used to train the system, and achieve similar results to the training sets. To achieve this goal, a validation study was conducted and is presented to the reader. Object detection Signal-to-text Scene understanding Spatial reasoning Natural language processing Artificial Intelligence and Robotics
614	Sentiment Analysis for E-book Reviews on Amazon to Determine E-book Impact Rank Alsehaimi, Afnan Abdulrahman A 18 May 2021 (has links) No description available. Computer Science Sentiment Analysis NLP PTA Natural Language Processing Ranking System
615	Generating Paraphrases with Greater Variation Using Syntactic Phrases Madsen, Rebecca Diane 01 December 2006 (has links) (PDF) Given a sentence, a paraphrase generation system produces a sentence that says the same thing but usually in a different way. The paraphrase generation problem can be formulated in the machine translation paradigm; instead of translation of English to a foreign language, the system translates an English sentence (for example) to another English sentence. Quirk et al. (2004) demonstrated this approach to generate almost 90% acceptable paraphrases. However, most of the sentences had little variation from the original input sentence. Leveraging syntactic information, this thesis project presents an approach that successfully generated more varied paraphrase sentences than the approach of Quirk et al. while maintaining coverage of the proportion of acceptable paraphrases generated. The ParaMeTer system (Paraphrasing by MT) identifies syntactic chunks in paraphrase sentences and substitutes labels for those chunks. This enables the system to generalize movements that are more syntactically plausible, as syntactic chunks generally capture sets of words that can change order in the sentence without losing grammaticality. ParaMeTer then uses statistical phrase-based MT techniques to learn alignments for the words and chunk labels alike. The baseline system followed the same pattern as the Quirk et al. system - a statistical phrase-based MT system. Human judgments showed that the syntactic approach and baseline both achieve approximately the same ratio of fluent, acceptable paraphrase sentences per fluent sentences. These judgments also showed that the ParaMeTer system has more phrase rearrangement than the baseline system. Though the baseline has more within-phrase alteration, future modifications such as a chunk-only translation model should improve ParaMeTer's variation for phrase alteration as well. paraphrase generation paraphrase sentential paraphrase syntax statistical machine translation machine translation natural language processing Computer Sciences
616	Extending the Information Partition Function: Modeling Interaction Effects in Highly Multivariate, Discrete Data Cannon, Paul C. 28 December 2007 (has links) (PDF) Because of the huge amounts of data made available by the technology boom in the late twentieth century, new methods are required to turn data into usable information. Much of this data is categorical in nature, which makes estimation difficult in highly multivariate settings. In this thesis we review various multivariate statistical methods, discuss various statistical methods of natural language processing (NLP), and discuss a general class of models described by Erosheva (2002) called generalized mixed membership models. We then propose extensions of the information partition function (IPF) derived by Engler (2002), Oliphant (2003), and Tolley (2006) that will allow modeling of discrete, highly multivariate data in linear models. We report results of the modified IPF model on the World Health Organization's Survey on Global Aging (SAGE). Information Partition Function interaction effects multivariate analysis discrete data Natural Language Processing Statistics and Probability
617	Traumatic Brain Injury Surveillance and Research with Electronic Health Records: Building New Capacities McFarlane, Timothy D. 03 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Between 3.2 and 5.3 million U.S. civilians live with traumatic brain injury (TBI)-related disabilities. Although the post-acute phase of TBI has been recognized as both a discrete disease process and risk factor for chronic conditions, TBI is not recognized as a chronic disease. TBI epidemiology draws upon untimely, incomplete, cross-sectional, administrative datasets. The adoption of electronic health records (EHR) may supplement traditional datasets for public health surveillance and research. Methods Indiana constructed a state-wide clinical TBI registry from longitudinal (2004-2018) EHRs. This dissertation includes three distinct studies to enhance, evaluate, and apply the registry: 1) development and evaluation of a natural language processing algorithm for identification of TBI severity within free-text notes; 2) evaluation and comparison of the performance of the ICD-9-CM and ICD-10-CM surveillance definitions; and 3) estimating the effect of mild TBI (mTBI) on the risk of post-acute chronic conditions compared to individuals without mTBI. Results Automated extraction of Glasgow Coma Scale from clinical notes was feasible and demonstrated balanced recall and precision (F-scores) for classification of mild (99.8%), moderate (100%), and severe (99.9%) TBI. We observed poor sensitivity for ICD-10-CM TBI surveillance compared to ICD-9-CM (0.212 and 0.601, respectively), resulting in potentially 5-fold underreporting. ICD-10-CM was not statistically equivalent to ICD-9-CM for sensitivity (𝑑𝑑𝑑𝑑̂=0.389, 95% CI [0.388,0.405]) or positive predictive value (𝑑𝑑𝑑𝑑̂=-0.353, 95% CI [-0.362,-0.344]). Compared to a matched cohort, individuals with mTBI were more likely to be diagnosed with mental health, substance use, neurological, cardiovascular, and endocrine conditions. Conclusion ICD-9-CM and ICD-10-CM surveillance definitions were not equivalent, and the transition resulted in a underreporting incidence for mTBI. This has direct implications on existing and future TBI registries and the Report to Congress on Traumatic Brain Injury in the United States. The supplementation of state-based trauma registries with structured and unstructured EHR data is effective for studying TBI outcomes. Our findings support the classification of TBI as a chronic disease by funding bodies, which may improve public funding to replace legacy systems to improve standardization, timeliness, and completeness of the epidemiology and post-acute outcomes of TBI. Chronic disease Epidemiology Natural language processing Public health informatics Public health surveillance Traumatic brain injury
618	Generation of Control Logic from Ordinary Speech Haghjo, Hamed, Vahlberg, Elias January 2022 (has links) Developments in automatic code generation are evolving remarkably fast, with companies and researchers competing to reach human-level accuracy and capability. Advancements in this field primarily focus on using machine learning models for end-to-end code generation. This project introduces the system CodeFromVoice, which explores an alternative method for code generation. This method relies on existing Natural Language Processing models combined with traditional parsing methods. CodeFromVoice shows that this approach can generate code from text or transcribed speech using Automatic Speech Recognition. The generated code is limited in complexity and restricted to the context of an existing application but achieves a Word Error Rate of less than 25%. / Utvecklingen av automatisk kodgenerering visar stora framsteg, med företag och forskare som tävlar om att nå mänsklig nivå av noggrannhet och förmåga. Framsteg inom detta område fokuserar främst på användning av maskininlärningsmodeller för hela kodgenerering processen. Detta projekt introducerar systemet CodeFromVoice, som utforskar en alternativ metod för kodgenerering. Denna metod bygger på befintliga NLP-modeller kombinerat med traditionella parsning metoder. CodeFromVoice visar att detta tillvägagångssätt kan generera kod från text eller transkriberat tal med automatisk taligenkänning. Den genererade koden är begränsad i komplexitet och begränsad till sammanhanget av en existerande applikation, men uppnår en ordfelfrekvens som är mindre än 25%. Code generation generation of code generation of control logic natural language processing Engineering and Technology Teknik och teknologier
619	Topic discovery and document similarity via pre-trained word embeddings Chen, Simin January 2018 (has links) Throughout the history, humans continue to generate an ever-growing volume of documents about a wide range of topics. We now rely on computer programs to automatically process these vast collections of documents in various applications. Many applications require a quantitative measure of the document similarity. Traditional methods first learn a vector representation for each document using a large corpus, and then compute the distance between two document vectors as the document similarity.In contrast to this corpus-based approach, we propose a straightforward model that directly discovers the topics of a document by clustering its words, without the need of a corpus. We define a vector representation called normalized bag-of-topic-embeddings (nBTE) to encapsulate these discovered topics and compute the soft cosine similarity between two nBTE vectors as the document similarity. In addition, we propose a logistic word importance function that assigns words different importance weights based on their relative discriminating power.Our model is efficient in terms of the average time complexity. The nBTE representation is also interpretable as it allows for topic discovery of the document. On three labeled public data sets, our model achieved comparable k-nearest neighbor classification accuracy with five stateof-art baseline models. Furthermore, from these three data sets, we derived four multi-topic data sets where each label refers to a set of topics. Our model consistently outperforms the state-of-art baseline models by a large margin on these four challenging multi-topic data sets. These works together provide answers to the research question of this thesis:Can we construct an interpretable document represen-tation by clustering the words in a document, and effectively and efficiently estimate the document similarity? / Under hela historien fortsätter människor att skapa en växande mängd dokument om ett brett spektrum av publikationer. Vi förlitar oss nu på dataprogram för att automatiskt bearbeta dessa stora samlingar av dokument i olika applikationer. Många applikationer kräver en kvantitativmått av dokumentets likhet. Traditionella metoder först lära en vektorrepresentation för varje dokument med hjälp av en stor corpus och beräkna sedan avståndet mellan two document vektorer som dokumentets likhet.Till skillnad från detta corpusbaserade tillvägagångssätt, föreslår vi en rak modell som direkt upptäcker ämnena i ett dokument genom att klustra sina ord , utan behov av en corpus. Vi definierar en vektorrepresentation som kallas normalized bag-of-topic-embeddings (nBTE) för att inkapsla de upptäckta ämnena och beräkna den mjuka cosinuslikheten mellan två nBTE-vektorer som dokumentets likhet. Dessutom föreslår vi en logistisk ordbetydelsefunktion som tilldelar ord olika viktvikter baserat på relativ diskriminerande kraft.Vår modell är effektiv när det gäller den genomsnittliga tidskomplexiteten. nBTE-representationen är också tolkbar som möjliggör ämnesidentifiering av dokumentet. På tremärkta offentliga dataset uppnådde vår modell jämförbar närmaste grannklassningsnoggrannhet med fem toppmoderna modeller. Vidare härledde vi från de tre dataseten fyra multi-ämnesdatasatser där varje etikett hänvisar till en uppsättning ämnen. Vår modell överensstämmer överens med de högteknologiska baslinjemodellerna med en stor marginal av fyra utmanande multi-ämnesdatasatser. Dessa arbetsstöd ger svar på forskningsproblemet av tisthesis:Kan vi konstruera en tolkbar dokumentrepresentation genom att klustra orden i ett dokument och effektivt och effektivt uppskatta dokumentets likhet? Computer and Information Sciences Data- och informationsvetenskap
620	Automatic Reference Resolution for Pedestrian Wayfinding Systems / Automatisk referenslösning i navigationssystem förfotgängare Kalpakchi, Dmytro January 2018 (has links) Imagine that you are in the new city and want to explore it. Trying to navigate with maps leads to the unnecessary confusion about street names and prevents you from a enjoying a wonderful walk. The dialogue system that could navigate you from by means of a simple conversation using salient landmarks in your immediate vicinity would be much more helpful! Developing such dialogue system is non-trivial and requires solving a lot of complicated tasks. One of such tasks, tackled in the present thesis, is called reference resolution (RR), i.e. resolving utterances to the underlying geographical entities, referents (if any). The utterances that have referent(s) are called referring expressions (REs). The RR task is decomposed into two tasks: RE identification and resolution itself. Neural network models for both tasks have been designed and extensively evaluated. The model for RE identification, called RefNet, utilizes recurrent neural networks (RNNs) for handling sequential input, i.e. phrases. For each word in an utterance, RefNet outputs a label indicating whether this word is in the beginning of the RE, inside or outside it. The reference resolution model, called SpaceRefNet, uses the RefNet's RNN layer to encode REs and the designed feature extractor to represent geographical objects. Both encodings are fed to a simple feed-forward network with a softmax prediction layer, yielding the probability of match between the RE and the geographical object. Both introduced models have beaten the respective baselines and show promising results in general. / Tänk dig att du är i en ny stad och vill känna staden bättre. Du försöker att använda kartor, men blir förvirrad av gatunamn och kan inte njuta av din promenad. Ett dialogsystem, som kan hjälpa dig att navigera med hjälp av talade instruktioner, och som använder sig av framträdande landmärken i din närhet skulle vara mer användbart! Att utveckla ett sådant system är mycket komplicerat och man behöver att lösa ett antal mycket svåra uppgifter. En av dessa uppgifter kallas referenslösning (RR), vilket innebär att associera refererande fraser (RE) i yttranden till de geografiska objekt som avses. RR har brutits ner i två deluppgifter: identifiering av RE i yttranden, och referenslösning av dessa RE. Neurala-nätverksmodeller har utformats och utvärderats för båda uppgifterna. Modellen för identifiering av RE kallas RefNet och använder återkopplande neuronnät (RNN) för att behandla sekventiellindata, d.v.s. fraser. Varje ord i ett yttrande klassificeras av RefNet som en av tre följande kategorier: “i början av RE”, “i mitten av RE” samt “utanför RE”. Modellen för RR kallas SpaceRefNet och använder RefNets RNN-lager för att representera RE, samt en designad särdragsextraktor för att koda geografiska objekt. Båda kodningarna används som indata för ett enkelt framåtmatande neuronnät med ett avslutande softmax-lager. Det avslutande lagret producerar en sannolikhet att en viss RE motsvarar det geografiska objektet i fråga. Båda modellerna fungerade bättre än respektive baslinjemodeller, och visar lovande resultat i allmänhet. / Уявiть, що Ви опинилися у мiстi, яке нiколи не вiдвiдували. Ви хочете побачити все, що мiсто може Вам запропонувати, але не знаєте нiкого, хто може з цим допомогти. Назви вулиць на електронних картах не тiльки не допомагають, а ще й заплутують Вас, заважаючи отримувати насолоду вiд чудової прогулянки. Було б набагато зручнiше, якщо Ви могли б говорити з дiалоговою системою, як Ви говорите з друзями. Така система допомагала б Вам орiєнтуватися, використовуючи помiтнi орiєнтири у Вашому оточеннi. Розробка такої системи включає в себе багато нетривiальних задач, одна з яких називається задача розв’язання географiчних посилань (РГП). Словосполучення, вживанi з метою вказати на специфiчний географiчний об’єкт, є досить розповсюдженими у повсякденнiй мовi. Такi словосполучення називаються географiчними посиланнями (ГП), а географiчнi об’єкти, на якi вони посилаються - референтами. Задача розв’язання географiчних посилань полягає у спiвставленнi їх з вiдповiдними референтами.У рамках даної дипломної роботи задача РГП була декомпозована на двi частини: iдентифiкацiя географiчних посилань (IГП) та власне розв’язання (ВРГП). Для вирiшення обох задач було розроблено, протестовано та оцiнено вiдповiднi нейроннi мережi. Модель для розв’язання задачi IГП називається RefNet та використовує рекурентнi нейроннi мережi, щоб мати змогу обробляти послiдовнi вхiднi данi, як-то фрази. RefNet аналiзує висловлене речення дослiвно та визначає для кожного слова чи воно знаходиться на початку, всерединi чи поза ГП. Модель для розв’язання задачi ВРГП називається SpaceRefNet та використовує рекурентний шар RefNet для представлення поданих на вхiд ГП. Географiчнi об’єкти представляються за допомогою розробленого алгоритму видiляння ознак. Обидва представлення подаються на вхiд простої нейронної мережi прямого поширення з кiнцевим шаром softmax, який обчислює ймовiрнiсть того, що подане ГП описує поданий географiчний об’єкт.Обидвi мережi показали гарний результат, кращий за вiдповiднi базовi моделi. Результати загалом показують, що використання нейронних мереж для вирiшення задачi розв’язання географiчних посилань – це перспективний напрям для майбутнiх дослiджень. Reference Resolution Pedestrian Wayfinding Systems Natural Language Processing Computer Sciences Datavetenskap (datalogi)

Search results