Αποδοτική διαχείριση κειμενικής πληροφορίας, δεικτοδότηση, αποθήκευση, επεξεργασία και εφαρμογέςΘεοδωρίδης, Ευάγγελος 03 July 2009 (has links)
Βασική επιδίωξη της παρούσας διατριβής είναι η διερεύνηση των δυνατοτήτων του πεδίου
της επιστήμης των υπολογιστών που πραγματεύεται την αποθήκευση και την επεξεργασία
πληροφορίας, μέσα στο περιβάλλον που έχουν σχηματίσει οι σύγχρονες εφαρμογές. Τα
τελευταία χρόνια, η πληροφορία που είναι διαθέσιμη σε ηλεκτρονική μορφή, έχει γιγαντωθεί με αποτέλεσμα να είναι αναγκαία η ανάπτυξη νέων τεχνικών για την αποτελεσματική
αποθήκευση και επεξεργασία αυτής. Δύο πολύ χαρακτηριστικές και σημαντικές εφαρμογές, στις οποίες ανακύπτουν συνεχώς νέα προβλήματα, είναι η διαχείριση Βιολογικών
δεδομένων, όπως π.χ. οι ακολουθίες γονιδιωμάτων, καθώς και η διαχείριση πληροφορίας
από τον παγκόσμιο ιστό, όπως π.χ. τα έγγραφα HTML, XML ή οι συντομεύσεις (urls).
Στόχος είναι ανάπτυξη δομών δεικτοδότησης πάνω στην πληροφορία έτσι ώστε τα σχετικά
ερωτήματα με αυτή να απαντώνται αποδοτικά και πολύ πιο γρήγορα από το να ψάχναμε εκτενώς μέσα σε αυτή. Χαρακτηριστικά τέτοια ερωτήματα είναι η εύρεση προτύπων (pattern matching) ή ο εντοπισμός επαναλαμβανόμενων μοτίβων (motif extraction). Πιο συγκεκριμένα, τα ϑέματα στα οποία εστίασε η παρούσα διατριβή είναι τα ακόλουϑα:
- Εντοπισμός Περιοδικοτήτων σε συμβολοσειρές. Στην ενότητα αυτή δίνεται μια σειρά από αλγόριθμους για την εξαγωγή περιοδικοτήτων από συμβολοσειρές.
Δίνονται αλγόριθμοι για την εξαγωγή μέγιστων επαναλήψεων, της περιόδου του καλύμματος και της ρίζας μιας συμβολοσειράς. Οι αλγόριθμοι αυτοί χρησιμοποιούν ώς βάση το δένδρο επιθεμάτων και οι περισσότεροι από αυτούς είναι γραμμικοί.
- Δεικτοδότηση Βεβαρημένων Ακολουθιών. Στην επόμενη ενότητα η μελέτη εστιάζει στην δεικτοδότηση βεβαρημένων ακολουθιών, καθώς και στην απάντηση ερωτημάτων σε αυτές όπως η εύρεση προτύπων, η εύρεση επαναλήψεων, η εύρεση καλυμμάτων, κ.α.. Οι βεβαρημένες ακολουθίες είναι ακολουθίες όπου σε κάθε ϑέση
τους έχουμε εμφάνιση όλων των συμβόλων του αλφαβήτου της ακολουθίας, έχοντας λάβει ένα συγκεκριμένο βάρος. Οι βεβαρημένες ακολουθίες αναπαριστούν βιολογικές ακολουθίες είτε νουκλεοτιδίων είτε αμινοξέων και στην ουσία περιγράφουν την πιθανότητα εμφάνισης ενός συμβόλου του αλφαβήτου σε μια συγκεκριμένη ϑέση της ακολουθίας ή κάποιες συγκεκριμένες βιολογικές ιδιότητες που διαθέτουν οι ρυθμιστικές πρωτεΐνες σε κάθε ϑέση της ακολουθίας. Για την διαχείριση αυτών των ιδιόμορφων ακολουθιών προτείνεται ως δομή δεικτοδότησης το βεβαρημένο δένδρο επιθεμάτων (Weighted Suffix Tree), ένα δένδρο με παρόμοια δομικά χαρακτηριστικά με αυτά του γενικευμένου δένδρου επιθεμάτων. Στην παρούσα εργασία δίνεται
ο ορισμός του βεβαρημένου δένδρου επιθεμάτων και αλγόριθμοι κατασκευής του σε γραμμικό χρόνο και χώρο.
-Εξαγωγή μοτίβων από βεβαρημένες Ακολουθίες. Με την χρήση του βεβαρημένου δένδρου επιθεμάτων υλοποιούνται ένα σύνολο αλγόριθμων εξαγωγής επαναληπτικών δομών από βεβαρημένες ακολουθίες. Πιο συγκεκριμένα, δίνονται
αλγόριθμοι για την εύρεση μέγιστων ευγών,επαναλαμβανόμενων μοτίβων και κοινών μοτίβων από περισσότερες της μίας βεβαρημένες ακολουθίες.
- Αλγόριθμοι Σύστασης Σελίδων Παγκόσμιου Ιστού με χρήση τεχνικών επεξεργασίας
συμβολοσειρών. Αρκετές εφαρμογές παγκόσμιου ιστού (συστήματα σύστασης ή συστήματα κρυφής μνήμης) προσπαθούν να προβλέψουν τις προθέσεις ενός επισκέπτη είτε για να του προτείνουν είτε για να προφορτώσουν μία σελίδα. Για το σκοπό αυτό προσπαθούν να εκμεταλλευτούν οποιαδήποτε εμπειρία που έχει καταγραφεί στο σύστημα από προηγούμενες προσπελάσεις. Προτείνεται νέος τρόπος
δεικτοδότησης και αναπαράστασης της πληροφορίας που εξάγεται από τα διαθέσιμα δεδομένα, όπως οι προσβάσεις των χρηστών από τα logfilesκαι το περιεχόμενο
των σελίδων. Για την εξόρυξη γνώσης από τα παραπάνω δεδομένα, αυτά αναπαριστώνται ως συμβολοσειρές και στη συνέχεια επεξεργάζονται και δεικτοδοτούνται από ένα γενικευμένο βεβαρημένο δένδρο επιθεμάτων. Το δένδρο αυτό συμπυκνώνει αποδοτικά τα πιο συχνά αλλά και πιο ουσιαστικά μοτίβα προσπελάσεων και χρησιμοποιείται, αφότου κατασκευαστεί, σαν ένα μοντέλο για την πρόβλεψη των κινήσεων τον επισκεπτών ενός ιστοτόπου. / The basic goal of this thesis is to explore the possibilities of the field of computer science that deals with storing and processing information in the environment that formed by the modern applications. In recent years, the information that is available in electronic form, has met an enormous growth. Thus it is necessary to develop new techniques for efficient storage and processing. Two very specific and important applications in which constantly new problems arise are, the management of biological data, such as genome sequences, and the management information from the Web, such as documents HTML, XML or shortcuts (urls).
The objective is the development of data structures for indexing information so that the questions are able to be answered in less time than looking explicitly in information. Such questions are to find patterns (pattern matching) or the identification of repeated motifs (motif extraction). In particular, the issues on which this thesis has focused are:
- Locating Periodicities in strings. This section provides a series of algorithms for the extraction of periodicities of strings. We propose algorithms for the extraction of maximum repetitions of the cover, period and the seed of a string. The algorithms used are based on suffix tree and they are optimal.
- Weighted Sequences indexing. In the next section, the study focuses on indexing of weighted sequences, and to answer questions like finding models, pairs, covers etc. in them. The weighted sequences are sequences where each position consists of all the symbols of the alphabet in sequence, having each one a specific weight. For the management of these sequences a particular indexing structure is proposed with the name Weighted Suffix Tree, a tree with structural features similar to those of the generalized suffix tree. In this work we propose the definition of the weighted suffix tree and construction algorithms in linear time and memory space. With the utilization of weighted suffix tree on a set of weighted sequences we propose algorithms for extracting repetitive structures from a set of weighted sequences. More specifically, we propose algorithms for finding maximum pairs, repeated motifs and common patterns of more than one weighted sequences
-Recommendation Algorithms for web pages using strings processing algorithms. Several web applications (Recommendation systems or cache systems) want to predict the intentions of a visitor in order to propose or to preload a webpage. For this purpose systems try to exploit any experience that is recorded in the system from previous accesses. A new method for indexing and representing of information extracted is proposed upon the recorder data, from the user accesses in log files and content pages. For extracting knowledge from these data, the information is represented as strings and then treated and processed as weighted sequences. All these sequences are indexed by a generalized weighted sequence tree.
適用於智慧型手機使用者之味覺資料庫建置與菜單推薦機制 / Menu Recommendation System and Taste Database Constructed for Smartphone Users林信廷, Lin, Shin Ting Unknown Date (has links)
本論文實作了一個程式Foodtaste,包含了記錄餐點味覺資料,查詢個人記錄,以及實作數種推薦的功能。本論文並提出了數個計算方法,透過LifeLog累積下來的味覺資料進行計算,來獲得每位使用者的個人口味偏好和味覺比例,並將這些資料與餐點的味覺比例計算來對餐廳進行個人化的餐點推薦。 / Foods and eating are the basic element of human's life, and people have their own favorite in choosing foods. Thus it is an important issue to make some recommendation for people in front of a dazzling array of foods.
With the advances in technology, smartphones bring convenience to people and change their life style. One can use smartphones to record various things in his life. The ways of memories become digitalized, and how to use these digital data to analyze and give opinions becomes popular.
Base on one’s taste, present study combined dietary records and food taste in Lifelog, using Folksonomy and Crowdsourcing to acquire data of specific food taste from smartphone users, and linked these data to restaurant’s name and the name of the meal in our database.
We designed a smartphone application which called "Foodtaste". It provided users to record what they ate and how did it taste, looking up personal records, and several recommending methods. Our study also provides several methods in calculating cumulative data in Lifelog and acquiring the preference of one’s taste and ratio in variable foods from every user. Then we calculated these data to carry out personalized food recommendation.
Factors Associated with Clinicians’ Recommendation for Return to Work in Patients with Work-related Shoulder and Elbow InjuryTabloie, Farshid 28 November 2013 (has links)
Background: RTW after work-related injuries is a multifactorial process. Factors affecting clinicians to make RTW-recommendations for patients with WRSEI have not been studied in the literature.
Purpose: We investigated the associations between group of factors chosen from different domains (Personal/Environmental) and clinicians’ RTW-recommendations for patients with WRSEI.
Methods: Study design was cross-sectional. Data were collected from self-reported surveys and clinical charts of 130 adult workers (not working at the time of visit and referred to WSIB-Shoulder & Elbow Specialty Clinic-Toronto) with chronic (≥6-months) injuries.
Results: Population mean age was 43.5-years. 52% were female. The average time-since-injury was 20.4-months (45%>12-months). 70% received RTW-recommendations (regular/modified-job). 30% received a No-RTW-recommendation. 42% had education≥college-level. 18% had heavy (>20kg) job-demands. Higher MCS-scores had a significant association (p=0.0003) with clinicians’ RTW-recommendations.
Conclusion: In patients with chronic WRSEI(s), poor general health-status and high disability, workers with better mental-health were more likely to receive a RTW-recommendation by clinicians.
Realising the objectives of the South African Schools Choral Eisteddfod : a case study / Theodore K.A. DzorkpeyDzorkpey, Theodore Kwadzo Agbelie January 2010 (has links)
The realisation of the objectives of the South African Schools Choral Eisteddfod
(SASCE) is influenced by the national education system and the environment it
operates in. This thesis accordingly studies the SASCE within the organisational
framework of the Department of National Education. It provides a comprehensive
description of the factors that influence the achievement of the objectives of the
SASCE in the FET band in the Motheo district of the Free State Province.
South African national education policy provides for a single unified democratic
system for the organisation, governance and funding of schools. The Department
of National Education formulates policy and provinces are responsible for its
implementation by means of district offices. In this respect the education system is
regarded as an organisation consisting of different sub–organisations that must
provide effective education in line with the educational needs of the country. A
generic five–point model of effective organisational structure accordingly was
applied to determine the factors impacting on the realisation of the objectives of
Data were gathered and analysed by means of personal observations, document
analysis and semi–structured interviews with education officials, school principals
and choir conductors.
The challenges of the national education system with regard to appropriate
facilities, equipment, funding, appropriately trained officials and educators, support
staff and effective policy implementation are consistent with the challenges facing
the Department of National Education’s enrichment programmes, of which the
SASCE forms part.
Findings and recommendations are offered for all research questions. A general
recommendation pertains to a proposed restructuring of the provincial enrichment
programmes sub–directorate in order to address some of its organisational
shortcomings and also the challenges facing the SASCE. / Thesis (Ph.D (Music))--North-West University, Potchefstroom Campus, 2011.
Infinitesimal reasoning in information retrieval and trust-based recommendation systemsChowdhury, Maria 26 April 2010 (has links)
We propose preferential and trust-based frameworks for Information Retrieval and Recommender Systems, which utilize the power of Hyperreal Numbers. In the first part of our research, we propose a preferential framework for Information Retrieval which enables expressing preference annotations on search keywords and document elements, respectively. Our framework is flexible and allows expressing preferences such as “A is infinitely more preferred than B,” which we capture by using hyperreal numbers. Due to widespread use of XML as a standard for representing documents, we consider XML documents in this research and propose a consistent preferential weighting scheme for nested document elements. We show how to naturally incorporate preferences on search keywords and document elements into an IR ranking process using the well-known TF-IDF (Term Frequency - Inverse Document Frequency) ranking measure. In the second part of our research we propose a novel recommender system which enhances user-based collaborative filtering by using a trust-based social network. Again, we use hyperreal numbers and polynomials for capturing natural preferences in aggregating opinions of trusted users. We use these opinions to “help” users who are similar to an active user to come up with recommendations for items for which they might not have an opinion themselves. We argue that the method we propose reflects better the real life behaviour of the people. Our method is justified by the experimental results; we are the first to break a stated “barrier” of 0.73 for the mean absolute error (MAE) of the predicted ratings. Our results are based on a large, real life dataset from Epinions.com, for which, we also achieve a prediction coverage that is significantly better than that of the state-of-the-art methods.
Recommendation in Enterprise 2.0 Social Media StreamsLunze, Torsten 15 October 2014 (has links) (PDF)
A social media stream allows users to share user-generated content as well as aggregate different external sources into one single stream. In Enterprise 2.0 such social media streams empower co-workers to share their information and to work efficiently and effectively together while replacing email communication. As more users share information it becomes impossible to read the complete stream leading to an information overload. Therefore, it is crucial to provide the users a personalized stream that suggests important and unread messages. The main characteristic of an Enterprise 2.0 social media stream is that co-workers work together on projects represented by topics: the stream is topic-centered and not user-centered as in public streams such as Facebook or Twitter.
A lot of work has been done dealing with recommendation in a stream or for news recommendation. However, none of the current research approaches deal with the characteristics of an Enterprise 2.0 social media stream to recommend messages. The existing systems described in the research mainly deal with news recommendation for public streams and lack the applicability for Enterprise 2.0 social media streams.
In this thesis a recommender concept is developed that allows the recommendation of messages in an Enterprise 2.0 social media stream. The basic idea is to extract features from a new message and use those features to compute a relevance score for a user. Additionally, those features are used to learn a user model and then use the user model for scoring new messages. This idea works without using explicit user feedback and assures a high user acceptance because no intense rating of messages is necessary. With this idea a content-based and collaborative-based approach is developed. To reflect the topic-centered streams a topic-specific user model is introduced which learns a user model independently for each topic.
There are constantly new terms that occur in the stream of messages. For improving the quality of the recommendation (by finding more relevant messages) the recommender should be able to handle the new terms. Therefore, an approach is developed which adapts a user model if unknown terms occur by using terms of similar users or topics. Also, a short- and long-term approach is developed which tries to detect short-term interests of users. Only if the interest of a user occurs repeatedly over a certain time span are terms transferred to the long-term user model.
The approaches are evaluated against a dataset obtained through an Enterprise 2.0 social media stream application. The evaluation shows the overall applicability of the concept. Specifically the evaluation shows that a topic-specific user model outperforms a global user model and also that adapting the user model according to similar users leads to an increase in the quality of the recommendation. Interestingly, the collaborative-based approach cannot reach the quality of the content-based approach.
Employee Referral Vad driver anställda att rekommendera potentiella medarbetare till en tjänst? : - För företag med rekryteringsbehov.Andersson, Caroline, Schmidinger, Fanny January 2014 (has links)
Problem: För att skapa ett starkt employer brand ska arbetsgivaren inneha positiva associationer hos potentiella medarbetare, vilket bidrar till att mindre resurser krävs för att finna den efterfrågade kompetensen. Då en rekommendation handlar om att yttra sig förmånligt blir betydelsen av medarbetares rekommendationer viktig för företagets employer brand. Resultaten av att använda sig av medarbetares rekommendationer i rekryteringsprocessen är känt inom forskningen. Däremot har underliggande faktorer och motiv som ligger till grund för att medarbetare rekommenderar personer i deras sociala nätverk inte adresserats i samma utsträckning. Syfte: Syftet med studien är att undersöka vilka underliggande faktorer och motiv som finns hos anställda vid rekommendationer av personer i deras sociala nätverk. Studien ska analysera hur företag med rekryteringsbehov kan utveckla deras employer brand som i sin tur gynnar rekryteringsprocessen. Studien kan fungera som vägledning i utvecklingen av företags employee referral program, då en allmän förståelse kring rekommendationer saknas. Utifrån resultatet kommer förslag att ges om vad som bör tas hänsyn till vid en utveckling av ett employee referral program. Metod: Den insamlade data är av kvantitativ karaktär och samlades in genom en enkät som skickades ut till anställda på ABB. Resultat: Studien visar att respondenterna i undersökningen motiverades av en prosocial motivation då de rekommenderar någon till att söka anställning hos arbetsgivaren främst för att hjälpa en bekant följt av att hjälpa organisationen. Respondenterna motiverades minst av yttre motivation i form av belöningar samt att rekommendationen ska bistå med ett stärkande av deras position. Hinder och osäkerhet upplevdes till viss del hos respondenterna vid en rekommendation, men i de flesta fall var detta inte en orsak till att inte rekommendera. Det främsta hindret som upplevdes var att respondenterna inte har kunskap om hur de rekommenderar via digitala hjälpmedel och den främsta osäkerheten grundades i att personen som rekommenderas inte ska passa ihop med ABB. De flesta respondenter svarade däremot att inget hinder eller osäkerhet fanns vid en rekommendation. / Problem: In order to create a strong employer brand it requires that the potential employees hold positive associations towards the employer. When this exists, fewer resources are required in order to find the right expertise. A recommendation could be defined as: to speak favourably about something or someone. The importance of employee referrals is vital for the company's employer brand. The results of the use of these types of referral programs in the recruitment process are already known in research. However, the underlying factors and motives for employee referrals and why the employees recommend people in their social networks is not addressed to the same extent in research. Purpose: The purpose of this thesis is to examine the underlying factors and motivations held by employees when they recommend people in their social network. The study will analyze how companies employer brand and employee referral program can be developed. Based on the results, suggestions will be given on what should be reviewed in a development of an employee referral program. Method: The collected data is of quantitative characteristic and was extracted from a survey, which was sent to ABB employees. Results: The study shows that most of the respondents were motivated by a prosocial motivation, as they mainly recommended someone to help an acquaintance, followed by helping the organization. The external motivation was the motivation the respondents were least motivated by, in regards to rewards and that a referral would assist with a strengthening of their position. Barriers and uncertainties among respondents were perceived to some extent when making a referral. For some respondents this was perceived as a factor for not making a referral, however not for most of them. The main barrier experienced was a lack of knowledge about how to make a referral by using digital tools. The main uncertainty was found in the perception that the person who is recommended make a suitable organisation fit. Most respondents experienced that barriers or uncertainties would not prevent them from making a referral.
Music recommendation and discovery in the long tailCelma Herrada, Òscar 16 February 2009 (has links)
Avui en dia, la música està esbiaixada cap al consum d'alguns artistes molt populars. Per exemple, el 2007 només l'1% de totes les cançons en format digital va representar el 80% de les vendes. De la mateixa manera, només 1.000 àlbums varen representar el 50% de totes les vendes, i el 80% de tots els àlbums venuts es varen comprar menys de 100 vegades. Es clar que hi ha una necessitat per tal d'ajudar a les persones a filtrar, descobrir, personalitzar i recomanar música, a partir de l'enorme quantitat de contingut musical disponible. Els algorismes de recomanació de música actuals intenten predir amb precisió el que els usuaris demanen escoltar. Tanmateix, molt sovint aquests algoritmes tendeixen a recomanar artistes famosos, o coneguts d'avantmà per l'usuari. Això fa que disminueixi l'eficàcia i utilitat de les recomanacions, ja que aquests algorismes es centren bàsicament en millorar la precisió de les recomanacions. És a dir, tracten de fer prediccions exactes sobre el que un usuari pugui escoltar o comprar, independentment de quant útils siguin les recomanacions generades. En aquesta tesi destaquem la importància que l'usuari valori les recomanacions rebudes. Per aquesta raó modelem la corba de popularitat dels artistes, per tal de poder recomanar música interessant i desconeguda per l'usuari. Les principals contribucions d'aquesta tesi són: (i) un nou enfocament basat en l'anàlisi de xarxes complexes i la popularitat dels productes, aplicada als sistemes de recomanació, (ii) una avaluació centrada en l'usuari, que mesura la importància i la desconeixença de les recomanacions, i (iii) dos prototips que implementen la idees derivades de la tasca teòrica. Els resultats obtinguts tenen una clara implicació per aquells sistemes de recomanació que ajuden a l'usuari a explorar i descobrir continguts que els pugui agradar. / Actualmente, el consumo de música está sesgada hacia algunos artistas muy populares. Por ejemplo, en el año 2007 sólo el 1% de todas las canciones en formato digital representaron el 80% de las ventas. De igual modo, únicamente 1.000 álbumes representaron el 50% de todas las ventas, y el 80% de todos los álbumes vendidos se compraron menos de 100 veces. Existe, pues, una necesidad de ayudar a los usuarios a filtrar, descubrir, personalizar y recomendar música a partir de la enorme cantidad de contenido musical existente. Los algoritmos de recomendación musical existentes intentan predecir con precisión lo que la gente quiere escuchar. Sin embargo, muy a menudo estos algoritmos tienden a recomendar o bien artistas famosos, o bien artistas ya conocidos de antemano por el usuario.Esto disminuye la eficacia y la utilidad de las recomendaciones, ya que estos algoritmos se centran en mejorar la precisión de las recomendaciones. Con lo cuál, tratan de predecir lo que un usuario pudiera escuchar o comprar, independientemente de lo útiles que sean las recomendaciones generadas. En este sentido, la tesis destaca la importancia de que el usuario valore las recomendaciones propuestas. Para ello, modelamos la curva de popularidad de los artistas con el fin de recomendar música interesante y, a la vez, desconocida para el usuario.Las principales contribuciones de esta tesis son: (i) un nuevo enfoque basado en el análisis de redes complejas y la popularidad de los productos, aplicada a los sistemas de recomendación,(ii) una evaluación centrada en el usuario que mide la calidad y la novedad de las recomendaciones, y (iii) dos prototipos que implementan las ideas derivadas de la labor teórica. Los resultados obtenidos tienen importantes implicaciones para los sistemas de recomendación que ayudan al usuario a explorar y descubrir contenidos que le puedan gustar. / Music consumption is biased towards a few popular artists. For instance, in 2007 only 1% of all digital tracks accounted for 80% of all sales. Similarly, 1,000 albums accounted for 50% of all album sales, and 80% of all albums sold were purchased less than 100 times. There is a need to assist people to filter, discover, personalise and recommend from the huge amount of music content available along the Long Tail.Current music recommendation algorithms try to accurately predict what people demand to listen to. However, quite often these algorithms tend to recommend popular -or well-known to the user- music, decreasing the effectiveness of the recommendations. These approaches focus on improving the accuracy of the recommendations. That is, try to make accurate predictions about what a user could listen to, or buy next, independently of how useful to the user could be the provided recommendations. In this Thesis we stress the importance of the user's perceived quality of the recommendations. We model the Long Tail curve of artist popularity to predict -potentially- interesting and unknown music, hidden in the tail of the popularity curve. Effective recommendation systems should promote novel and relevant material (non-obvious recommendations), taken primarily from the tail of a popularity distribution. The main contributions of this Thesis are: (i) a novel network-based approach for recommender systems, based on the analysis of the item (or user) similarity graph, and the popularity of the items, (ii) a user-centric evaluation that measures the user's relevance and novelty of the recommendations, and (iii) two prototype systems that implement the ideas derived from the theoretical work. Our findings have significant implications for recommender systems that assist users to explore the Long Tail, digging for content they might like.
