Global ETD Search

331	Accurately extracting information from a finite set of different report categories and formats / Precis extraktion av information från ett begränsat antal rapporter med olika struktur och format på datan Holmbäck, Jonatan January 2023 (has links) POC Sports (hereafter simply POC) is a company that manufactures gear and accessories for winter sports as well as cycling. Their mission is to “Protect lives and reduce the consequences of accidents for athletes and anyone inspired to be one”. To do so, a lot of care needs to be put into making their equipment as protective as possible, while still maintaining the desired functionality. To aid in this, their vendor companies run standardized tests to evaluate their products. The results of these tests are then compiled into a report for POC. The problem is that the different companies use different styles and formats to convey this information, which can be classified into different categories. Therefore, this project aimed to provide a tool that can be used by POC to identify the report’s category and then accurately extract relevant data from it. An accuracy score was used as the metric to evaluate the tool’s accuracy with respect to extracting the relevant data. The development and evaluation of the tool were performed in two evaluation rounds. Additional metrics were used to evaluate a number of existing tools. These metrics included: whether the tools were open source, how easy they are to set up, pricing, and how much of the task the tool could cover. A proof of concept tool was realized and it demonstrated an accuracy of 97%. This was considered adequate when compared to the minimum required accuracy of 95%. However, due to the available time and resources, the sample size was limited, and thus this accuracy may not apply to the entire population with a confidence level higher than 75%. The results of evaluating the iterative improvements in the tool suggest that it is possible by addressing issues as they are found to achieve an acceptable score for a large fraction of the general population. Additionally, it would be beneficial to keep a catalog of the recurring solutions that have been made for different problems, so they can be reused for similar problems, allowing for better extensibility and generalizability. To build on the work performed in this thesis, the next steps might be to look into similar problems for other formats and to examine how different PDF generators may affect the ability to extract and process data present in PDF reports. / POC är ett företag som tillverkar utrustning, i synnerhet hjälmar, för vintersport och cyklister. Deras mål är att “Skydda liv och minska konsekvenserna från olyckor för atleter och vem som helst som är inspirerad till att bli en sådan”. För att uppnå detta har mycket jobb lagts ner för att göra deras utrustning så skyddande som möjligt., men samtidigt bibehålla samma funktionalitet. För att bidra med detta har POCs säljare genomfört standardiserade tester för att evaluera om deras produkter håller upp till standardena som satts på dem. Resultaten från dessa test är ofta presenterade i form av en rapport som sedan skickas till POC. Problemet är att de olika säljarna använder olika sätt och även format för att presentera den här informationen, som kan klassifieras in till olika kategorier. Därför avser det här projektet att skapa ett verktyg som kan användas av POC för att identifiera och därefter extrahera datan från dessa rapporter. Ett precisionsspoäng användes som mått för att utvärdera verktygets precision med avseende på att extrahera relevant data. Utvecklingen och utvärderingen av verktyget genomfördes i två utvärderingsomgångar. Ytterligare mått användes för att utvärdera ett antal befintliga verktyg. Dessa mått inkluderade: om verktygen var öppen källkod, hur enkla de är att installera och bröja använda, prissättning och hur mycket av uppgiften verktyget kunde täcka. En prototype utvecklades med en precision på 97%. Detta ansågs vara tillräckligt jämfört med den minsta nödvändiga precision på 95%. Men på grund av den tillgängliga tiden och resurserna var urvalsstorleken begränsad, och därför kanske denna noggrannhet inte gäller för hela populationen med en konfidensnivå högre än 75%. Resultaten av utvärderingen av de iterativa förbättringarna i verktyget tyder på att det är möjligt att genom att ta itu med problem som dyker upp, att uppnå en acceptabel poäng för en stor del av den allmänna befolkningen. Dessutom skulle det vara fördelaktigt att föra en katalog över de återkommande lösningar som har gjorts för olika problem, så att de kan återanvändas för liknande problem, vilket möjliggör bättre töjbarhet och generaliserbarhet. För att bygga vidare på det arbete som utförts i denna avhandling kan nästa steg vara att undersöka liknande problem för andra format och att undersöka hur olika PDF-generatorer kan påverka hur väl det går att extrahera och bearbeta data som finns i PDF-rapporter. Text Extraction PDF Excel Text Parsing Data Analysis Text Extrahering PDF Excel Text Parsing Data Analys Computer and Information Sciences Data- och informationsvetenskap
332	Comparison of Touchscreen and Physical Keyboard with Nomadic Text Entry Ross, Michael Tyler 07 May 2016 (has links) Many research projects have been conducted to compare standing text entry with nomadic text entry. Other research projects have compared the input types of touchscreen and physical keyboards while texting. There is few literature that compares the two inputs types during a standing and nomadic text entry. This research was conducted to investigate the differences in error rate and characters per minute for both input types during both text entry conditions. To investigate these differences two devices were used, the iPhone 4 and the Blackberry Curve 9350, to type a phrase during both a standing and walking condition. Both characters per minute and error rate were analyzed. The investigation showed that there were no significant difference in error rate, but there was a significant difference in characters per minute. The touchscreen keyboard performed better in terms of characters per minutes and arguably performed better in accuracy. Walking Keyboard Mobility Accuracy Text Entry Speed
333	Qualitative differences in levels of performance on a computer text-editing task De Laurentiis, Emiliano C. January 1981 (has links) Note: Manuscripts -- Editing. Text editors (Computer programs)
334	Irony, Ideology, and Resistance: The Amazing Double Life of Harlequin Presents Downey, Kristin 07 1900 (has links) In Harlequin Presents, the recurrence of particular moments of resistance suggests that these structures of events have meaning apart from -and perhaps even antithetical to - the ideological outcome of any specific text. The ideological structures presented in the romance novel are not passively accepted, nor do they simply fulfill a single pre-existing need or desire in their reading public. Romance novels utilize ideologies in a self-consciously playful and ironic way. These texts offer multiple ways of understanding the worlds they depict, the structures of understanding contained within being posited and discarded. This thesis proposes a means of interpreting the romance novel captures the ambivalence of the reading experience. I will show how the paradox of the romance novel - the seemingly limitless potential of a feminine discourse of the private sphere that is set within the conservative confines of repetitive narratives of social integration - is incorporated into the structure of the texts themselves. While the texts in the series manifest standardized outcomes, they also exhibit recurrent patterns of resistance. The serial form of Harlequin Presents dictates that this tension between separate value sets and ideologies is never fully resolved. The appeal of the romance novel must then lie between these competing demands. Each of my chapters examines the ways that this ironic tension functions within a different intellectual space. In considering how the domestic sphere, the body and the nation are overlayed with multiple, contested meanings, this thesis maps out the scope of ideological resistance and adherence throughout Harlequin Presents. / Thesis / Doctor of Philosophy (PhD)
335	Zero Syllable Molestina, Camila 01 January 2010 (has links) (PDF) The word, the corporeal word, the forgotten piece of word, the pause, the breath, the silence_Having said, heard, not at all. Through text, video, sound and installation, this thesis investigates the materiality of language, translation and memory in relationship to words as an attempt to reconfigure a certain inner space. This thesis is a research on the theoretical context and cultural reference of the M.F.A. work. The play between the spoken words (multiple voices within one voice) and the image (body) echoes the geography of survival: moments translated to words, speech translated to body, to image, to void. video installation speech language translation text body
336	C-SALT: Conversational Style Attribution Given Legislative Transcriptions Summers, Garrett D 01 June 2016 (has links) (PDF) Common authorship attribution is well described by various authors summed up in Jacques Savoy’s work. Namely, authorship attribution is the process “whereby the author of a given text must be determined based on text samples written by known authors [48].” The field of authorship attribution has been explored in various contexts. Most of these works have been done on the authors written text. This work seeks to approach a similar field to authorship attribution. We seek to attribute not a given author to a work based on style, but a style itself that is used by a group of people. Our work classifies an author into a category based off the spoken dialogue they have said, not text they have written down. Using this system, we differentiate California State Legislators from other entities in a hearing. This is done using audio transcripts of the hearing in question. As this is not Authorship Attribution, the work can better be described as ”Conversational Style Attribution”. Used as a tool in speaker identification classifiers, we were able to increase the accuracy of audio recognition by 50.9%, and facial recognition by 51.6%. These results show that our research into Conversational Style Attribution provides a significant benefit to the speaker identification process. style attribution speaker identification legislator text
337	WHISK: Web Hosted Information Into Summarized Knowledge Wu, Jiewen 01 July 2016 (has links) (PDF) Today’s online content increases at an alarmingly rate which exceeds users’ ability to consume such content. Modern search techniques allow users to enter keyword queries to find content they wish to see. However, such techniques break down when users freely browse the internet without knowing exactly what they want. Users may have to invest an unnecessarily long time reading content to see if they are interested in it. Automatic text summarization helps relieve this problem by creating synopses that significantly reduce the text while preserving the key points. Steffen Lyngbaek created the SPORK summarization pipeline to solve the content overload in Reddit comment threads. Lyngbaek adapted the Opinosis graph model for extractive summarization and combined it with agglomerative hierarchical clustering and the Smith-Waterman algorithm to perform multi-document summarization on Reddit comments.This thesis presents WHISK as a pipeline for general multi-document text summarization based on SPORK. A generic data model in WHISK allows creating new drivers for different platforms to work with the pipeline. In addition to the existing Opinosis graph model adapted in SPORK, WHISK introduces two simplified graph models for the pipeline. The simplified models removes unnecessary restrictions inherited from Opinosis graph’s abstractive summarization origins. Performance measurements and a study with Digital Democracy compare the two new graph models against the Opinosis graph model. Additionally, the study evaluates WHISK’s ability to generate pull quotes from political discussions as summaries. text summarization summarization Other Computer Sciences
338	SPORK: A Summarization Pipeline for Online Repositories of Knowledge Lyngbaek, Steffen Slyngbae 01 June 2013 (has links) (PDF) The web 2.0 era has ushered an unprecedented amount of interactivity on the Internet resulting in a flood of user-generated content. This content is often unstructured and comes in the form of blog posts and comment discussions. Users can no longer keep up with the amount of content available, which causes developers to start relying on natural language techniques to help mitigate the problem. Although many natural language processing techniques have been employed for years, automatic text summarization, in particular, has recently gained traction. This research proposes a graph-based, extractive text summarization system called SPORK (Summarization Pipeline for Online Repositories of Knowledge). The goal of SPORK is to be able to identify important key topics presented in multi-document texts, such as online comment threads. While most other automatic summarization systems simply focus on finding the top sentences represented in the text, SPORK separates the text into clusters, and identifies different topics and opinions presented in the text. SPORK has shown results of managing to identify 72\% of key topics present in any discussion and up to 80\% of key topics in a well-structured discussion. text summarization spork automatic extractive abstractive
339	Intertextual Readings of the Nyāyabhūṣaṇa on Buddhist Anti-Realism Neill, Tyler 13 December 2022 (has links) This two-part dissertation has two goals: 1) a close philological reading of a 50-page section of a 10th-century Sanskrit philosophical work (Bhāsarvajña's Nyāyabhūṣaṇa), and 2) the creation and assessment of a novel intertextuality research system (Vātāyana) centered on the same work. The first half of the dissertation encompasses the philology project in four chapters: 1) background on the author, work, and key philosophical ideas in the passage; 2) descriptions of all known manuscript witnesses of this work and a new critical edition that substantially improves upon the editio princeps; 3) a word-for-word English translation richly annotated with both traditional explanatory material and novel digital links to not one but two interactive online research systems; and 4) a discussion of the Sanskrit author's dialectical strategy in the studied passage. The second half of the dissertation details the intertextuality research system in a further four chapters: 5) why it is needed and what can be learned from existing projects; 6) the creation of the system consisting of curated textual corpus, composite algorithm in natural language processing and information retrieval, and live web-app interface; 7) an evaluation of system performance measured against a small gold-standard dataset derived from traditional philological research; and 8) a discussion of the impact such new technology could have on humanistic research more broadly. System performance was assessed to be quite good, with a 'recall@5' of 80%, meaning that most previously known cases of mid-length quotation and even paraphrase could be automatically found and returned within the system's top five hits. Moreover, the system was also found to return a 34% surplus of additional significant parallels not found in the small benchmark. This assessment confirms that Vātāyana can be useful to researchers by aiding them in their collection and organization of intertextual observations, leaving them more time to focus on interpretation. Seventeen appendices illustrate both these efforts and a number of side projects, the latter of which span translation alignment, network visualization of an important database of South Asian prosopography (PANDiT), and a multi-functional Sanskrit text-processing web application (Skrutable).:Preface (i) Table of Contents (ii) Abbreviations (v) Terms and Symbols (v) Nyāyabhūṣaṇa Witnesses (v) Main Sanskrit Editions (vi) Introduction (vii) A Multi-Disciplinary Project in Intertextual Reading (vii) Main Object of Study: Nyāyabhūṣaṇa 104–154 (vii) Project Outline (ix) Part I: Close Reading (1) 1 Background (1) 1.1 Bhāsarvajña (1) 1.2 The Nyāyabhūṣaṇa (6) 1.2.1 Ts One of Several Commentaries on Bhāsarvajña's Nyāyasāra (6) 1.2.2 In Modern Scholarship, with Focus on NBhū 104–154 (8) 1.3 Philosophical Context (11) 1.3.1 Key Philosophical Concepts (12) 1.3.2 Intra-Textual Context within the Nyāyabhūṣaṇa (34) 1.3.3 Inter-Textual Context (36) 2 Edition of NBhū 104–154 (39) 2.1 Source Materials (39) 2.1.1 Edition of Yogīndrānanda 1968 (E) (40) 2.1.2 Manuscripts (P1, P2, V) (43) 2.1.3 Diplomatic Transcripts (59) 2.2 Notes on Using the Edition (60) 2.3 Critical Edition of NBhū 104–154 with Apparatuses (62) 3 Translation of NBhū 104–154 (108) 3.1 Notes on Translation Method (108) 3.2 Notes on Outline Headings (112) 3.3 Annotated Translation of NBhū 104–154 (114) 4 Discussion (216) 4.1 Internal Structure of NBhū 104–154 (216) 4.2 Critical Assessment of Bhāsarvajña's Argumentation (218) Part II: Distant Reading with Digital Humanities (224) 5 Background in Intertextuality Detection (224) 5.1 Sanskrit Projects (225) 5.2 Non-Sanskrit Projects (228) 5.3 Operationalizing Intertextuality (233) 6 Building an Intertextuality Machine (239) 6.1 Corpus (Pramāṇa NLP) (239) 6.2 Algorithm (Vātāyana) (242) 6.3 User Interface (Vātāyana) (246) 7 Evaluating System Performance (255) 7.1 Previous Scholarship on NBhū 104–154 as Philological Benchmark (255) 7.2 System Performance Relative to Benchmark (257) 8 Discussion (262) Conclusion (266) Works Cited (269) Main Sanskrit Editions (269) Works Cited in Part I (271) Works Cited in Part II (281) Appendices (285) Appendix 1: Correspondence of Joshi 1986 to Yogīndrānanda 1968 (286) Appendix 1D: Full-Text Alignment of Joshi 1986 to Yogīndrānanda 1968 (287) Appendix 2: Prosopographical Relations Important for NBhū 104–154 (288) Appendix 2D: Command-Line Tool “Pandit Grapher” (290) Appendix 3: Previous Suggestions to Improve Text of NBhū 104–154 (291) Appendix 4D: Transcript and Collation Data for NBhū 104–154 (304) Appendix 5D: Command-Line Tool “cte2cex” for Transcript Data Conversion (305) Appendix 6D: Deployment of Brucheion for Interactive Transcript Data (306) Appendix 7: Highlighted Improvements to Text of NBhū 104–154 (307) Appendix 7D: Alternate Version of Edition With Highlighted Improvements (316) Appendix 8D: Digital Forms of Translation of NBhū 104–154 (317) Appendix 9: Analytic Outline of NBhū 104–154 by Shodo Yamakami (318) Appendix 10.1: New Analytic Outline of NBhū 104–154 (Overall) (324) Appendix 10.2: New Analytic Outline of NBhū 104–154 (Detailed) (325) Appendix 11D: Skrutable Text Processing Library and Web Application (328) Appendix 12D: Pramāṇa NLP Corpus, Metadata, and LDA Modeling Info (329) Appendix 13D: Vātāyana Intertextuality Research Web Application (330) Appendix 14: Sample of Yamakami Citation Benchmark for NBhū 104–154 (331) Appendix 14D: Full Yamakami Citation Benchmark for NBhū 104–154 (333) Appendix 15: Vātāyana Recall@5 Scores for NBhū 104–154 (334) Appendix 16: PVA, PVin, and PVSV Vātāyana Search Hits for Entire NBhū (338) Appendix 17: Sample Listing of Vātāyana Search Hits for Entire NBhū (349) Appendix 17D: Full Listing of Vātāyana Search Hits for Entire NBhū (355) Overview of Digital Appendices (356) Zusammenfassung (Thesen Zur Dissertation) (357) Summary of Results (361)
340	Some Aspects of Text-To-Speech Conversion by Rules Ramasubramanian, Narayana 09 1900 (has links) <p> A critical survey of the important features and characteristics of some existing Text-to-Speech Conversion (TSC) system by rules is given. The necessary algorithms, not available for these systems in the literature, have been formulated providing the basic philosophies underlying these systems. A new algorithm TESCON for a TSC system by rules is developed without implementation details. TESCON is primarily concerned with the preprocessing and linguistic analysis of an input text in English orthography. For the first time, the use of function-content word concepts is fully utilized to identify the potential head-words in phrases. Stress, duration modification and pause insertions are suggested as part of the rule schemes. TESCON is general in nature and is fully compatible with a true TSC system.</p> / Thesis / Master of Science (MSc)

Search results