Global ETD Search

1	Computer support for large character set languages Hsu, Siu Chi January 1991 (has links) No description available. 005 Chinese character set
2	Läsa ljud : Att formulera en texts auditiva kvaliteter visuellt / Reading Sound : To Express a Text's Auditive Qualities Visually Sahlén, Mattias, Hultberg, Lina January 2010 (has links) <p>In printed text, sound is a somewhat forgotten aspect. The recitation of text is vital in areas like poetry and oratory, but still has no distinct technique of being communicated. Emphasis of a word might be expressed through italics, but no canonic character set or system for vocal delivery of texts exists. With this essay we are creating a foundation for a development of such a character set or system. By studying existing visualisations of sound and comparing these with semiotic and perception-based theories we obtain useful insights for a prospective system for visualisations of vocal sounds.</p><p>We conclude that the aspects of sound one wants to visualise must be carefully defined since the viewer cannot process infinite amounts of information. A sound visualisation system does not have to consist of multiple characters or signs to be effective, but had better be built around a strong code to manage the signs into a working system. Creating a context for the signs is also recommended in order to be able to compare signs with eachother.</p> perception audio visualisation semiotics sensation notation synesthesia character set recitation Media and communication studies Medie- och kommunikationsvetenskap
3	Läsa ljud : Att formulera en texts auditiva kvaliteter visuellt / Reading Sound : To Express a Text's Auditive Qualities Visually Sahlén, Mattias, Hultberg, Lina January 2010 (has links) In printed text, sound is a somewhat forgotten aspect. The recitation of text is vital in areas like poetry and oratory, but still has no distinct technique of being communicated. Emphasis of a word might be expressed through italics, but no canonic character set or system for vocal delivery of texts exists. With this essay we are creating a foundation for a development of such a character set or system. By studying existing visualisations of sound and comparing these with semiotic and perception-based theories we obtain useful insights for a prospective system for visualisations of vocal sounds. We conclude that the aspects of sound one wants to visualise must be carefully defined since the viewer cannot process infinite amounts of information. A sound visualisation system does not have to consist of multiple characters or signs to be effective, but had better be built around a strong code to manage the signs into a working system. Creating a context for the signs is also recommended in order to be able to compare signs with eachother. perception audio visualisation semiotics sensation notation synesthesia character set recitation Media and communication studies Medie- och kommunikationsvetenskap
4	Ανάπτυξη μεθόδου με σκοπό την αναγνώριση και εξαγωγή θεματικών λέξεων κλειδιών από διευθύνσεις ιστοσελίδων του ελληνικού Διαδικτύου / Keyword identification within Greek URLs Βονιτσάνου, Μαρία-Αλεξάνδρα 16 January 2012 (has links) Η αύξηση της διαθέσιμης Πληροφορίας στον Παγκόσμιο Ιστό είναι ραγδαία. Η παρατήρηση αυτή παρότρυνε πολλούς ερευνητές να επικεντρώσουν το έργο τους στην εξαγωγή χρήσιμων γνωρισμάτων από διαδικτυακά έγγραφα, όπως ιστοσελίδες, εικόνες, βίντεο, με σκοπό τη ενίσχυση της διαδικασίας κατηγοριοποίησης ιστοσελίδων. Ένας πόρος που περιέχει πληροφορία και δεν έχει διερευνηθεί διεξοδικά για γλώσσες εκτός της αγγλικής, είναι η διεύθυνση ιστοσελίδας (URL- Uniform Recourse Locator). Το κίνητρο της διπλωματικής αυτής εργασίας είναι το γεγονός ότι ένα σημαντικό υποσύνολο των χρηστών του διαδικτύου δείχνει ενδιαφέρον για δικτυακούς πόρους, των οποίων οι διευθύνσεις URL περιλαμβάνουν όρους προερχόμενους από τη μητρική τους γλώσσα (η οποία δεν είναι η αγγλική), γραμμένους με λατινικούς χαρακτήρες. Προτείνεται μέθοδος η οποία θα αναγνωρίζει και θα εξάγει τις λέξεις-κλειδιά από διευθύνσεις ιστοσελίδων (URLs), εστιάζοντας στο ελληνικό Διαδίκτυο και συγκεκριμένα σε URLs που περιέχουν ελληνικούς όρους. Το κύριο ζήτημα της προτεινόμενης μεθόδου είναι ότι οι ελληνικές λέξεις μπορούν να μεταγλωττίζονται με λατινικούς χαρακτήρες σύμφωνα με πολλούς διαφορετικούς τρόπους, καθώς και το γεγονός ότι τα URLs μπορούν να περιέχουν περισσότερες της μιας λέξεις χωρίς κάποιο διαχωριστικό. Παρόλη την ύπαρξη προηγούμενων προσεγγίσεων για την επεξεργασία ελληνικού διαδικτυακού περιεχομένου, όπως αναζητήσεις στο ελληνικό διαδίκτυο και αναγνώριση οντότητας σε ελληνικές ιστοσελίδες, καμία από τις παραπάνω δεν βασίζεται σε διευθύνσεις URL. Επιπλέον, έχουν αναπτυχθεί πολλές τεχνικές για την κατηγοριοποίηση ιστοσελίδων με βάση κυρίως τις διευθύνσεις URL, αλλά καμία δεν διερευνά την περίπτωση του ελληνικού διαδικτύου. Η προτεινόμενη μέθοδος περιέχει δύο βασικά στοιχεία: το μεταγλωττιστή και τον κατακερματιστή. Ο μεταγλωττιστής, βασισμένος σε ένα ελληνικό λεξικό και ένα σύνολο κανόνων, μετατρέπει τις λέξεις που είναι γραμμένες με λατινικούς χαρακτήρες σε ελληνικούς όρους ενώ παράλληλα ο κατακερματιστής τμηματοποιεί τη διεύθυνση URL σε λέξεις με νόημα, εξάγοντας, έτσι τελικά ελληνικούς όρους που αποτελούν λέξεις κλειδιά. Η πειραματική αξιολόγηση της προτεινόμενης μεθόδου σε δείγμα ελληνικών URLs αποδεικνύει ότι μπορεί να αξιοποιηθεί εποικοδομητικά στην αυτόματη αναγνώριση λέξεων-κλειδιών σε ελληνικά URLs. / The available information on the WWW is increasing rapidly. This observation has triggered many researchers to focus their work on extracting useful features from web documents that would enhance the task of web classification. A quite informative resource that has not been thoroughly explored for languages other than English, is the uniform recourse locator (URL). Motivated by the fact that a significant part of the Web users is interested in web resources, whose URLs contain terms from their non English native languages,written using Latin characters, we propose a method that identifies and extracts successfully keywords within URLs focusing on the Greek Web and especially ons URLs, containing Greek terms. The main issue of this approach is that Greek words can be transliterated to Latin characters in many different ways based on how the words are pronounced rather than on how they are written. Although there are previous attempts on similar issues, like Greek web searches and entity recognition in Greek Web Pages, none of them is based on URLs. In addition, there are many techniques on web page categorization based mainly on URLs but noone explores the case of Greek terms. The proposed method uses a three-step approach; firstly, a normalized URL is divided into its basic components, according to URI protocol (scheme :// host / path-elements / document . extension). The domain part is splitted on the apperance of punctuation marks or numbers. Secondly, domain-tokens are segmented into meaningful tokens using a set of transliteration rules and a Greek dictionary. Finally, in order to identify useful keywords, a score is assigned to each extracted keyword based on its length and whether the word is nested in another word. The algorithm is evaluated on a random sample of 1,000 URLs collected manually. We perform a human-based evaluation comparing the keywords extracted automatically with the keywords extracted manually when no other additional information than the URL is available. The results look promising. Τμηματοποίηση λέξεων 025.042 2 Greeklish to Greek transliteration Keyword extraction Uniform resource locator Word segmentation
5	Elektronický informační štítek / Electronic information card Šafář, Viktor January 2011 (has links) Tato diplomová práce se zabývá návrhem elektronického informačního štítku, jehož základem je maticový LED displej. V teoretické části jsou probrány použitá rozhraní a periferie mikrokontroléru a PC a diskutuje se nad možnostmi kódování českých znaků. Dále se probírá krok za krokem návrh zařízení. Nejdříve jsou vytyčeny hlavní požadavky a funkcionalita zařízení, následuje výběr vhodných komponent a je navrženo elektrické schéma. Jádrem zařízení je mikrokontrolér Microchip PIC, pro který je dále navržen program. Nakonec je popsána a naprogramována aplikace pro MS Windows, která se se zařízením komunikuje.
6	Improving Retrieval Accuracy in Main Content Extraction from HTML Web Documents Mohammadzadeh, Hadi 17 December 2013 (has links) (PDF) The rapid growth of text based information on the World Wide Web and various applications making use of this data motivates the need for efficient and effective methods to identify and separate the “main content” from the additional content items, such as navigation menus, advertisements, design elements or legal disclaimers. Firstly, in this thesis, we study, develop, and evaluate R2L, DANA, DANAg, and AdDANAg, a family of novel algorithms for extracting the main content of web documents. The main concept behind R2L, which also provided the initial idea and motivation for the other three algorithms, is to use well particularities of Right-to-Left languages for obtaining the main content of web pages. As the English character set and the Right-to-Left character set are encoded in different intervals of the Unicode character set, we can efficiently distinguish the Right-to-Left characters from the English ones in an HTML file. This enables the R2L approach to recognize areas of the HTML file with a high density of Right-to-Left characters and a low density of characters from the English character set. Having recognized these areas, R2L can successfully separate only the Right-to-Left characters. The first extension of the R2L, DANA, improves effectiveness of the baseline algorithm by employing an HTML parser in a post processing phase of R2L for extracting the main content from areas with a high density of Right-to-Left characters. DANAg is the second extension of the R2L and generalizes the idea of R2L to render it language independent. AdDANAg, the third extension of R2L, integrates a new preprocessing step to normalize the hyperlink tags. The presented approaches are analyzed under the aspects of efficiency and effectiveness. We compare them to several established main content extraction algorithms and show that we extend the state-of-the-art in terms of both, efficiency and effectiveness. Secondly, automatically extracting the headline of web articles has many applications. We develop and evaluate a content-based and language-independent approach, TitleFinder, for unsupervised extraction of the headline of web articles. The proposed method achieves high performance in terms of effectiveness and efficiency and outperforms approaches operating on structural and visual features. / Das rasante Wachstum von textbasierten Informationen im World Wide Web und die Vielfalt der Anwendungen, die diese Daten nutzen, macht es notwendig, effiziente und effektive Methoden zu entwickeln, die den Hauptinhalt identifizieren und von den zusätzlichen Inhaltsobjekten wie z.B. Navigations-Menüs, Anzeigen, Design-Elementen oder Haftungsausschlüssen trennen. Zunächst untersuchen, entwickeln und evaluieren wir in dieser Arbeit R2L, DANA, DANAg und AdDANAg, eine Familie von neuartigen Algorithmen zum Extrahieren des Inhalts von Web-Dokumenten. Das grundlegende Konzept hinter R2L, das auch zur Entwicklung der drei weiteren Algorithmen führte, nutzt die Besonderheiten der Rechts-nach-links-Sprachen aus, um den Hauptinhalt von Webseiten zu extrahieren. Da der lateinische Zeichensatz und die Rechts-nach-links-Zeichensätze durch verschiedene Abschnitte des Unicode-Zeichensatzes kodiert werden, lassen sich die Rechts-nach-links-Zeichen leicht von den lateinischen Zeichen in einer HTML-Datei unterscheiden. Das erlaubt dem R2L-Ansatz, Bereiche mit einer hohen Dichte von Rechts-nach-links-Zeichen und wenigen lateinischen Zeichen aus einer HTML-Datei zu erkennen. Aus diesen Bereichen kann dann R2L die Rechts-nach-links-Zeichen extrahieren. Die erste Erweiterung, DANA, verbessert die Wirksamkeit des Baseline-Algorithmus durch die Verwendung eines HTML-Parsers in der Nachbearbeitungsphase des R2L-Algorithmus, um den Inhalt aus Bereichen mit einer hohen Dichte von Rechts-nach-links-Zeichen zu extrahieren. DANAg erweitert den Ansatz des R2L-Algorithmus, so dass eine Sprachunabhängigkeit erreicht wird. Die dritte Erweiterung, AdDANAg, integriert eine neue Vorverarbeitungsschritte, um u.a. die Weblinks zu normalisieren. Die vorgestellten Ansätze werden in Bezug auf Effizienz und Effektivität analysiert. Im Vergleich mit mehreren etablierten Hauptinhalt-Extraktions-Algorithmen zeigen wir, dass sie in diesen Punkten überlegen sind. Darüber hinaus findet die Extraktion der Überschriften aus Web-Artikeln vielfältige Anwendungen. Hierzu entwickeln wir mit TitleFinder einen sich nur auf den Textinhalt beziehenden und sprachabhängigen Ansatz. Das vorgestellte Verfahren ist in Bezug auf Effektivität und Effizienz besser als bekannte Ansätze, die auf strukturellen und visuellen Eigenschaften der HTML-Datei beruhen. Hauptinhalt Extraktion Information Retrieval UTF-8-Kodierung Form HTML-Dokumente rechts nach links Sprachen Web Mining Unicode-Zeichensatz Vektorraummodell Cosinus-Ähnlichkeit überlappen erzielte Ähnlichkeit Schlagzeile Extraktion Titel Extraktion Trend-Mining main content extraction information retrieval UTF-8 encoding form HTML documents right to left languages web mining Unicode character set vector space model cosine similarity overlap scoring similarity headline extraction title extraction trend mining ddc:500
7	Improving Retrieval Accuracy in Main Content Extraction from HTML Web Documents Mohammadzadeh, Hadi 27 November 2013 (has links) The rapid growth of text based information on the World Wide Web and various applications making use of this data motivates the need for efficient and effective methods to identify and separate the “main content” from the additional content items, such as navigation menus, advertisements, design elements or legal disclaimers. Firstly, in this thesis, we study, develop, and evaluate R2L, DANA, DANAg, and AdDANAg, a family of novel algorithms for extracting the main content of web documents. The main concept behind R2L, which also provided the initial idea and motivation for the other three algorithms, is to use well particularities of Right-to-Left languages for obtaining the main content of web pages. As the English character set and the Right-to-Left character set are encoded in different intervals of the Unicode character set, we can efficiently distinguish the Right-to-Left characters from the English ones in an HTML file. This enables the R2L approach to recognize areas of the HTML file with a high density of Right-to-Left characters and a low density of characters from the English character set. Having recognized these areas, R2L can successfully separate only the Right-to-Left characters. The first extension of the R2L, DANA, improves effectiveness of the baseline algorithm by employing an HTML parser in a post processing phase of R2L for extracting the main content from areas with a high density of Right-to-Left characters. DANAg is the second extension of the R2L and generalizes the idea of R2L to render it language independent. AdDANAg, the third extension of R2L, integrates a new preprocessing step to normalize the hyperlink tags. The presented approaches are analyzed under the aspects of efficiency and effectiveness. We compare them to several established main content extraction algorithms and show that we extend the state-of-the-art in terms of both, efficiency and effectiveness. Secondly, automatically extracting the headline of web articles has many applications. We develop and evaluate a content-based and language-independent approach, TitleFinder, for unsupervised extraction of the headline of web articles. The proposed method achieves high performance in terms of effectiveness and efficiency and outperforms approaches operating on structural and visual features. / Das rasante Wachstum von textbasierten Informationen im World Wide Web und die Vielfalt der Anwendungen, die diese Daten nutzen, macht es notwendig, effiziente und effektive Methoden zu entwickeln, die den Hauptinhalt identifizieren und von den zusätzlichen Inhaltsobjekten wie z.B. Navigations-Menüs, Anzeigen, Design-Elementen oder Haftungsausschlüssen trennen. Zunächst untersuchen, entwickeln und evaluieren wir in dieser Arbeit R2L, DANA, DANAg und AdDANAg, eine Familie von neuartigen Algorithmen zum Extrahieren des Inhalts von Web-Dokumenten. Das grundlegende Konzept hinter R2L, das auch zur Entwicklung der drei weiteren Algorithmen führte, nutzt die Besonderheiten der Rechts-nach-links-Sprachen aus, um den Hauptinhalt von Webseiten zu extrahieren. Da der lateinische Zeichensatz und die Rechts-nach-links-Zeichensätze durch verschiedene Abschnitte des Unicode-Zeichensatzes kodiert werden, lassen sich die Rechts-nach-links-Zeichen leicht von den lateinischen Zeichen in einer HTML-Datei unterscheiden. Das erlaubt dem R2L-Ansatz, Bereiche mit einer hohen Dichte von Rechts-nach-links-Zeichen und wenigen lateinischen Zeichen aus einer HTML-Datei zu erkennen. Aus diesen Bereichen kann dann R2L die Rechts-nach-links-Zeichen extrahieren. Die erste Erweiterung, DANA, verbessert die Wirksamkeit des Baseline-Algorithmus durch die Verwendung eines HTML-Parsers in der Nachbearbeitungsphase des R2L-Algorithmus, um den Inhalt aus Bereichen mit einer hohen Dichte von Rechts-nach-links-Zeichen zu extrahieren. DANAg erweitert den Ansatz des R2L-Algorithmus, so dass eine Sprachunabhängigkeit erreicht wird. Die dritte Erweiterung, AdDANAg, integriert eine neue Vorverarbeitungsschritte, um u.a. die Weblinks zu normalisieren. Die vorgestellten Ansätze werden in Bezug auf Effizienz und Effektivität analysiert. Im Vergleich mit mehreren etablierten Hauptinhalt-Extraktions-Algorithmen zeigen wir, dass sie in diesen Punkten überlegen sind. Darüber hinaus findet die Extraktion der Überschriften aus Web-Artikeln vielfältige Anwendungen. Hierzu entwickeln wir mit TitleFinder einen sich nur auf den Textinhalt beziehenden und sprachabhängigen Ansatz. Das vorgestellte Verfahren ist in Bezug auf Effektivität und Effizienz besser als bekannte Ansätze, die auf strukturellen und visuellen Eigenschaften der HTML-Datei beruhen. info:eu-repo/classification/ddc/500 ddc:500

1

Page generated in 0.0572 seconds