Spelling suggestions: "subject:"text."" "subject:"next.""
171 |
All Purpose Textual Data Information Extraction, Visualization and QueryingJanuary 2018 (has links)
abstract: Since the advent of the internet and even more after social media platforms, the explosive growth of textual data and its availability has made analysis a tedious task. Information extraction systems are available but are generally too specific and often only extract certain kinds of information they deem necessary and extraction worthy. Using data visualization theory and fast, interactive querying methods, leaving out information might not really be necessary. This thesis explores textual data visualization techniques, intuitive querying, and a novel approach to all-purpose textual information extraction to encode large text corpus to improve human understanding of the information present in textual data.
This thesis presents a modified traversal algorithm on dependency parse output of text to extract all subject predicate object pairs from text while ensuring that no information is missed out. To support full scale, all-purpose information extraction from large text corpuses, a data preprocessing pipeline is recommended to be used before the extraction is run. The output format is designed specifically to fit on a node-edge-node model and form the building blocks of a network which makes understanding of the text and querying of information from corpus quick and intuitive. It attempts to reduce reading time and enhancing understanding of the text using interactive graph and timeline. / Dissertation/Thesis / Masters Thesis Software Engineering 2018
|
172 |
Aspectos semânticos na representação de textos para classificação automática / Semantic aspects in the representation of texts for automatic classificationRoberta Akemi Sinoara 24 May 2018 (has links)
Dada a grande quantidade e diversidade de dados textuais sendo criados diariamente, as aplicações do processo de Mineração de Textos são inúmeras e variadas. Nesse processo, a qualidade da solução final depende, em parte, do modelo de representação de textos adotado. Por se tratar de textos em língua natural, relações sintáticas e semânticas influenciam o seu significado. No entanto, modelos tradicionais de representação de textos se limitam às palavras, não sendo possível diferenciar documentos que possuem o mesmo vocabulário, mas que apresentam visões diferentes sobre um mesmo assunto. Nesse contexto, este trabalho foi motivado pela diversidade das aplicações da tarefa de classificação automática de textos, pelo potencial das representações no modelo espaço-vetorial e pela lacuna referente ao tratamento da semântica inerente aos dados em língua natural. O seu desenvolvimento teve o propósito geral de avançar as pesquisas da área de Mineração de Textos em relação à incorporação de aspectos semânticos na representação de coleções de documentos. Um mapeamento sistemático da literatura da área foi realizado e os problemas de classificação foram categorizados em relação à complexidade semântica envolvida. Aspectos semânticos foram abordados com a proposta, bem como o desenvolvimento e a avaliação de sete modelos de representação de textos: (i) gBoED, modelo que incorpora a semântica obtida por meio de conhecimento do domínio; (ii) Uni-based, modelo que incorpora a semântica por meio da desambiguação lexical de sentidos e hiperônimos de conceitos; (iii) SR-based Terms e SR-based Sentences, modelos que incorporam a semântica por meio de anotações de papéis semânticos; (iv) NASARIdocs, Babel2Vec e NASARI+Babel2Vec, modelos que incorporam a semântica por meio de desambiguação lexical de sentidos e embeddings de palavras e conceitos. Representações de coleções de documentos geradas com os modelos propostos e outros da literatura foram analisadas e avaliadas na classificação automática de textos, considerando datasets de diferentes níveis de complexidade semântica. As propostas gBoED, Uni-based, SR-based Terms e SR-based Sentences apresentam atributos mais expressivos e possibilitam uma melhor interpretação da representação dos documentos. Já as propostas NASARIdocs, Babel2Vec e NASARI+Babel2Vec incorporam, de maneira latente, a semântica obtida de embeddings geradas a partir de uma grande quantidade de documentos externos. Essa propriedade tem um impacto positivo na performance de classificação. / Text Mining applications are numerous and varied since a huge amount of textual data are created daily. The quality of the final solution of a Text Mining process depends, among other factors, on the adopted text representation model. Despite the fact that syntactic and semantic relations influence natural language meaning, traditional text representation models are limited to words. The use of such models does not allow the differentiation of documents that use the same vocabulary but present different ideas about the same subject. The motivation of this work relies on the diversity of text classification applications, the potential of vector space model representations and the challenge of dealing with text semantics. Having the general purpose of advance the field of semantic representation of documents, we first conducted a systematic mapping study of semantics-concerned Text Mining studies and we categorized classification problems according to their semantic complexity. Then, we approached semantic aspects of texts through the proposal, analysis, and evaluation of seven text representation models: (i) gBoED, which incorporates text semantics by the use of domain expressions; (ii) Uni-based, which takes advantage of word sense disambiguation and hypernym relations; (iii) SR-based Terms and SR-based Sentences, which make use of semantic role labels; (iv) NASARIdocs, Babel2Vec and NASARI+Babel2Vec, which take advantage of word sense disambiguation and embeddings of words and senses.We analyzed the expressiveness and interpretability of the proposed text representation models and evaluated their classification performance against different literature models. While the proposed models gBoED, Uni-based, SR-based Terms and SR-based Sentences have improved expressiveness, the proposals NASARIdocs, Babel2Vec and NASARI+Babel2Vec are latently enriched by the embeddings semantics, obtained from the large training corpus. This property has a positive impact on text classification performance.
|
173 |
The Functions of Textbooks : A Textbook Analysis of Text Genres and their RepresentationMagnusson, Jennie January 2021 (has links)
The aim of this study is to investigate which text genres are included in three textbooks intended for the course English 5, and how they are represented. This is done in order to discuss the functions that the textbooks Blueprint A, Solid Gold and Engelska 5 & 6: Outlooks on, can have in the English classroom. The investigation is conducted through a textbook analysis that includes both a quantitative content analysis to create an overview of the text genres, and a close reading text analysis that investigates the representation of the texts with the help of Bloom’s Taxonomy. The investigation finds that all three books contain several different text genres and that they use some of the text genres more frequently than the other books do. This means that the three books have different texts in focus. Since the study also shows that the texts are represented in many different ways, the study concludes that the books can have many different functions, specific for each book and therefore it can be important to evaluate textbooks before using them and to use more than one. However, one major factor that all these three books can help teachers with is to introduce pupils to many different text genres and improve learning by including most of the perspectives introduced by Bloom as important levels of learning.
|
174 |
Svět kolem nás jako hyperlink / Local Environment as HyperlinkMešár, Marek January 2013 (has links)
Document describes selected techniques and approaches to problem of text detection, extraction and recognition on modern mobile devices. It also describes their proper presentation to the user interface and their conversion to hyperlinks as a source of information about surrounding world. The paper outlines text detection and recognition technique based on MSER detection and also describes the use of image features tracking method for text motion estimation.
|
175 |
Vad var det fysioterapeuten sa? : En studie i hur text och bild i samverkan kan hjälpa patienter med vulvodyni att minnas information under och från ett vårdbesökLjungwald, Emma January 2024 (has links)
Problemet för patienter med vulvodyni är att de idag enbart får muntlig information av fysioterapeuten på den undersökta vårdmottagningen. Detta gör att de inte minns all den information som ges under det första vårdbesöket. Regionen saknar kunskap i hur de ska designa patientinformation. Syftet med detta examensarbete är att bidra med kunskap om hur patientinformation i text och bild kan designas för att öka chanserna att patienten kommer ihåg behandlingsinformationen som hen får under det första vårdbesöket.För att besvara arbetets forskningsfrågor har jag använt mig av metoderna textanalys, intervju och enkät. Som informationsdesigner behöver man tänka på att använda ett språk som så många som möjligt kan förstå. Resultaten från de empiriska undersökningarna visar att text och bild i samverkan ökar chanserna att patienten minns informationen. Det viktigaste för patienterna att minnas är övningarna och där kan bilder hjälpa till att skapa förståelse i hur övningen går till.Baserat på resultaten från de empiriska undersökningarna, tidigare forskning och teorierna; human centered design, text och bild i samverkan och kognition, har ett gestaltningsförslag utformats i text och bild för att öka chanserna att patienterna minns den information som de får av fysioterapeuten. / The problem for patients with vulvodynia is that today they only receive verbal information from the physiotherapist at the examined care clinic. This means that they do not remember all the information given during the first care visit. The region lacks knowledge in how to design patient information. The purpose of this degree project is to contribute with knowledge about how patient information in text and images can be designed to increase the chances that the patient will remember the treatment information they receive during the first care visit. To answer the work's research questions, I have used the methods text analysis, interview, and survey. As an information designer, you need to think about using a language that as many people as possible can understand. The results from the empirical investigations show that text and image in collaboration increase the chances that the patient will remember the information. The most important thing for patients to remember is the exercises, and that's where pictures can help create an understanding of how the exercise is done. Based on the results of the empirical investigations, previous research and the theories of human centered design, text and image in collaboration and cognition, a design proposal has been designed in text and image to increase the chances that the patients will remember the information they receive from the physiotherapist.
|
176 |
Aplikace textovo-optimalizačních technik a jejich vliv na adekvátní pochopení textu / Application of text-optimizing techniques and their influence on adequate text comprehensionIštván, Marcel January 2017 (has links)
This dissertation describes a research aimed at finding text-optimization techniques which would increase the comprehension and better the attitudes of the readers towards the text. The aim is to evaluate the effectiveness of these techniques considering the possibility of implementation into the text-production process of state and commercial institutions. The theoretical basis for text comprehension is the construction of mental representations of the text. Many factors, internal and external, can influence the construction of a correct mental representation of the text. This thesis researches the factors of text quality and attitudes towards the text. Effects of the same text manipulations of Slovak and Dutch texts were observed in three language groups. These were: a group of Slovak native speakers, Dutch native speakers and people who learned Dutch as a foreign language. For this research two texts were selected, namely the instructions provided with the tax declaration form and an instruction manual provided with a digital camera. Parts of the texts were optimized and rewritten into two variants. The first variant is based on the principle of rewriting the text into a dialogue. The instructions for such transformation can be summarized into three points. This approach should be usable for not...
|
177 |
När staten inte granskar : En kvalitativ studie om presentationen av texttyper i ämnesplaner och läroböcker för svenska i gymnasietLingemyr, Jesper, Åberg, Joakim January 2018 (has links)
När lärare undervisar utgår de från en ämnesplan som innehåller information om de delar en kurs omfattar. Lärare tar ofta hjälp av olika läroböcker för att genomföra sin undervisning. Mellan åren 1938 och 1991 granskades läroböcker statligt innan de kom ut på marknaden. Efter 1991 sker granskning inte längre statligt utan det är upp till varje lärare att avgöra om en lärobok passar till undervisningen. Syftet med denna studie är att undersöka hur presentationen av texttyper från ämnesplanerna för svenska 1, svenska 2 och svenska 3 i tre serier med tryckta läroböcker överensstämmer med ämnesplanerna. Utgångspunkten är texttyperna argumenterande text, utredande text och text av vetenskaplig karaktär. De läroböcker som undersöks är serierna Svenska helt enkelt och Svenska impulser samt läroboken Människans texter. Språket. Studiens metod är en kvalitativ innehållsanalys som undersöker hur läroböckerna explicit och implicit presenterar innehållet från ämnesplanerna. Resultatet visar att serien Svenska helt enkelt behandlar allt utvalt innehåll från ämnesplanerna medan serien Svenska impulser och läroboken Människans texter. Språket inte behandlar allt utvalt innehåll. Studien kommer fram till att läroboken kan användas som ett stöd i undervisningen men inte som enskilt underlag då det kan finnas innehåll från ämnesplanerna som inte behandlas i läroboken. / The practice of teaching usually starts with the syllabus which contains information about the content that should be taught in a course. Teachers often use textbooks in order to fulfill the course. Between the years 1938 and 1991 textbooks were reviewed by the Swedish government before entering the market. After 1991 reviewing textbooks is done by the teachers and they decide if the book fits their way of teaching. The purpose of this study is to analyze how the presentations of different genres of text in different textbooks correspond with the syllabus. The genres in focus are argumentative text, investigative text and academic text. The textbooks that are being reviewed are the series Svenska helt enkelt and Svenska impulser as well as the textbook Människans texter. Språket. The method used is a qualitative text analysis that analyzes how the selected textbooks explicitly and implicitly present the content from the syllabus. The results of the analysis show that the series Svenska helt enkelt treat all content from the syllabus while the series Svenska impulser and Människans texter. Språket does not treat all content. The conclusion is that textbooks can be used as support for teaching a course but they cannot be used as base for the course since there might be content from the syllabus that is not treated by the textbook.
|
178 |
ReviewTraoré, Flavia Aiello 30 March 2016 (has links) (PDF)
Review
|
179 |
Auf dem Weg zu einem TEI-Austauschformat für ägyptisch-koptische TexteGerhards, Simone, Schweitzer, Simon 20 April 2016 (has links) (PDF)
Diverse ägyptologische Großprojekte (TLA: http://aaew.bbaw.de/tla; Ramses: http://ramses.ulg.ac.be/; Rubensohn: http://elephantine.smb.museum/; Karnak: http://www.cfeetk.cnrs.fr/karnak/) erstellen annotierte Korpora. Für einen Datenaustausch ist ein standardisiertes Austauschformat, das auf TEI beruht, dringend erforderlich. Dazu haben sich diese Großprojekte zusammengeschlossen, um einen gemeinsamen Vorschlag zu erarbeiten. In unserem Vortrag möchten wir den aktuellen Stand der Diskussion präsentieren: Was ist der Basistext in der Auszeichnung: hieroglyphische Annotation oder die Umschrift des Textes? Wie geht man mit den verschiedenen Schriftformaten um? Können die Metadatenangaben im Header mithilfe gemeinsamer Thesauri standardisiert werden? Was wird inline, was wird stand-off annotiert?
|
180 |
Deep learning for text spottingJaderberg, Maxwell January 2015 (has links)
This thesis addresses the problem of text spotting - being able to automatically detect and recognise text in natural images. Developing text spotting systems, systems capable of reading and therefore better interpreting the visual world, is a challenging but wildly useful task to solve. We approach this problem by drawing on the successful developments in machine learning, in particular deep learning and neural networks, to present advancements using these data-driven methods. Deep learning based models, consisting of millions of trainable parameters, require a lot of data to train effectively. To meet the requirements of these data hungry algorithms, we present two methods of automatically generating extra training data without any additional human interaction. The first crawls a photo sharing website and uses a weakly-supervised existing text spotting system to harvest new data. The second is a synthetic data generation engine, capable of generating unlimited amounts of realistic looking text images, that can be solely relied upon for training text recognition models. While we define these new datasets, all our methods are also evaluated on standard public benchmark datasets. We develop two approaches to text spotting: character-centric and word-centric. In the character-centric approach, multiple character classifier models are developed, reinforcing each other through a feature sharing framework. These character models are used to generate text saliency maps to drive detection, and convolved with detection regions to enable text recognition, producing an end-to-end system with state-of-the-art performance. For the second, higher-level, word-centric approach to text spotting, weak detection models are constructed to find potential instances of words in images, which are subsequently refined and adjusted with a classifier and deep coordinate regressor. A whole word image recognition model recognises words from a huge dictionary of 90k words using classification, resulting in previously unattainable levels of accuracy. The resulting end-to-end text spotting pipeline advances the state of the art significantly and is applied to large scale video search. While dictionary based text recognition is useful and powerful, the need for unconstrained text recognition still prevails. We develop a two-part model for text recognition, with the complementary parts combined in a graphical model and trained using a structured output learning framework adapted to deep learning. The trained recognition model is capable of accurately recognising unseen and completely random text. Finally, we make a general contribution to improve the efficiency of convolutional neural networks. Our low-rank approximation schemes can be utilised to greatly reduce the number of computations required for inference. These are applied to various existing models, resulting in real-world speedups with negligible loss in predictive power.
|
Page generated in 0.0294 seconds