• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6
  • 2
  • Tagged with
  • 9
  • 9
  • 5
  • 5
  • 5
  • 5
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Resolving Quasi-Synonym Relationships in Automatic Thesaurus Construction using Fuzzy Rough Sets and an Inverse Term Frequency Similarity Function

Davault, Julius Mack, III 01 January 2009 (has links)
One of the problems associated with automatic thesaurus construction is with determining the semantic relationship between word pairs. Quasi-synonyms provide a type of equivalence relationship: words are similar only for purposes of information retrieval. Determining such relationships in a thesaurus is hard to achieve automatically. The term vector space model and an inverse term frequency similarity function can provide a way to automatically determine the similarity between words in thesaurus. A thesaurus constructed using this method can also improve precision and recall in information retrieval, when the thesaurus is constructed in conjunction with fuzzy rough set algorithms and used with tight upper approximation query expansion. This dissertation presents a method that combines fuzzy rough sets and a word weighting and inverse term frequency similarity function as a technique for automatic thesaurus construction.
2

A Novel Power Flow Method for Long Term Frequency Stability Analysis

Yan, Wenjin 03 October 2013 (has links)
This thesis presents a novel approach for a power system to find a practical power flow solution when all the generators in the system have hit their real power output limits, such as some generator units shutting down or load outages. The approach assumes the frequency of the system is unable to be kept at the rated value (usually 60 or 50 Hz) and accordingly, the generator real power outputs are affected by the system frequency deviation. The modification aims to include the system frequency deviation as a new state variable in the power flow so that the power system can be described in a more precise way when the generation limits are hit and the whole system is not operated under the normal condition. A new mathematical formulation for power flow is given by modified the conventional power flow mismatch equation and Jacobian matrix. The Newton – Raphson method is particularly chose to be modified because Newton – Raphson method is most widely used and it is a fast convergent and accurate method. The Jacobian matrix will be augmented by adding a column and a row. Matlab is used as a programming tool to implement the Power Flow for Long Term Frequency Stability (PFLTFS) method for a simple 4-bus system and the IEEE 118-bus system. And PSS/E Dynamic simulation is used to verify the steady state solution from PFLTFS is reasonable. The PSS/E Dynamic Simulation plots are used to analyze the long term frequency response. The PFLTFS method provides a technique for solving an abnormal state system power flow. From the results we can conclude that the PFLTFS method is reasonable for solving power flow of a real power unbalanced system.
3

How does the market perceive ESG in IPOs : Investigating how ESG factors affect IPO Underpricing in the U.S. market

Bui, Thi Mai Anh, Frongillo, Alessandra January 2020 (has links)
Environmental, Social and Governance (ESG) integration in financial activities is a crucial topic that is gaining importance in financial markets. During the years, many studies have been conducted about Initial Public Offering (IPO) and underpricing since they are fundamental aspects of firms’ lifecycle. Nevertheless, none of these studies have appropriately related firms’ ESG characteristics to IPO underpricing. In order to fill this knowledge gap, this thesis’s purpose is to investigate whether the ESG factors of a firm have effects on its IPO underpricing in the U.S stock market. The U.S has been chosen as it is the biggest stock market in the world and because of the quality and reliability of the data available for this country.  A quantitative study is applied to investigate the relationship between ESG characteristics of the firms and the level of underpricing. First, to obtain the measurement of the ESG level of the pre-IPO firms, we have conducted two textual analysis of IPO prospectus, namely, term frequency and sentiment analysis. These indicators aim to show the disclosure level of ESG factors and whenever ESG is perceived negatively or positively by the market. Successively, the multiple regression is performed for each ESG indicator to find which measures have the analytical abilities to explain IPO underpricing. Based on the multiple regression results, we can conclude that the frequency of environmental & governance terms occurred in IPO prospectus, the negative tone, and the overall sentiment of the environmental context are significantly explaining IPO underpricing. These results have given meaningful answers for our research. The market does not perceive the social factors of a firm in the IPO context. On the other hand, environmental and governance aspects still attract the market’s attention in different ways. The market is concerned about the disclosure level of the governance activities and whether these activities are sufficiently mentioned in the prospectus. Meanwhile, the market takes into serious consideration the environmental activities of a firm by assessing the qualities of these activities. Moreover, the market is more sensitive to the negative information about environmental content than positive information in the IPO context. The textual analysis methods applied in this thesis have some limitations. However, this study has the reliability to confirm that some companies’ ESG factors affect IPO underpricing. As a consequence, it is possible to state that the market cares about  ESG issues.
4

Sentiment Analysis Of IMDB Movie Reviews : A comparative study of Lexicon based approach and BERT Neural Network model

Domadula, Prashuna Sai Surya Vishwitha, Sayyaparaju, Sai Sumanwita January 2023 (has links)
Background: Movies have become an important marketing and advertising tool that can influence consumer behaviour and trends. Reading film reviews is an im- important part of watching a movie, as it can help viewers gain a general under- standing of the film. And also, provide filmmakers with feedback on how their work is being received. Sentiment analysis is a method of determining whether a review has positive or negative sentiment, and this study investigates a machine learning method for classifying sentiment from film reviews. Objectives: This thesis aims to perform comparative sentiment analysis on textual IMDb movie reviews using lexicon-based and BERT neural network models. Later different performance evaluation metrics are used to identify the most effective learning model. Methods: This thesis employs a quantitative research technique, with data analysed using traditional machine learning. The labelled data set comes from an online website called Kaggle (https://www.kaggle.com/datasets), which contains movie review information. Algorithms like the lexicon-based approach and the BERT neural networks are trained using the chosen IMDb movie reviews data set. To discover which model performs the best at predicting the sentiment analysis, the constructed models will be assessed on the test set using evaluation metrics such as accuracy, precision, recall and F1 score. Results: From the conducted experimentation the BERT neural network model is the most efficient algorithm in classifying the IMDb movie reviews into positive and negative sentiments. This model achieved the highest accuracy score of 90.67% over the trained data set, followed by the BoW model with an accuracy of 79.15%, whereas the TF-IDF model has 78.98% accuracy. BERT model has the better precision and recall with 0.88 and 0.92 respectively, followed by both BoW and TF-IDF models. The BoW model has a precision and recall of 0.79 and the TF-IDF has a precision of 0.79 and a recall of 0.78. And also the BERT model has the highest F1 score of 0.88, followed by the BoW model having a F1 score of 0.79 whereas, TF-IDF has 0.78. Conclusions: Among the two models evaluated, the lexicon-based approach and the BERT transformer neural network, the BERT neural network is the most efficient, having a good performance score based on the measured performance criteria.
5

Help Document Recommendation System

Vijay Kumar, Keerthi, Mary Stanly, Pinky January 2023 (has links)
Help documents are important in an organization to use the technology applications licensed from a vendor. Customers and internal employees frequently use and interact with the help documents section to use the applications and know about the new features and developments in them. Help documents consist of various knowledge base materials, question and answer documents and help content. In day- to-day life, customers go through these documents to set up, install or use the product. Recommending similar documents to the customers can increase customer engagement in the product and can also help them proceed without any hurdles. The main aim of this study is to build a recommendation system by exploring different machine-learning techniques to recommend the most relevant and similar help document to the user. To achieve this, in this study a hybrid-based recommendation system for help documents is proposed where the documents are recommended based on similarity of the content using content-based filtering and similarity between the users using collaborative filtering. Finally, the recommendations from content-based filtering and collaborative filtering are combined and ranked to form a comprehensive list of recommendations. The proposed approach is evaluated by the internal employees of the company and by external users. Our experimental results demonstrate that the proposed approach is feasible and provides an effective way to recommend help documents.
6

Klasifikace emailové komunikace / Classification of eMail Communication

Piják, Marek January 2018 (has links)
This diploma's thesis is based around creating a classifier, which will be able to recognize an email communication received by Topefekt.s.r.o on daily basis and assigning it into classification class. This project will implement some of the most commonly used classification methods including machine learning. Thesis will also include evaluation comparing all used methods.
7

Metody sumarizace textových dokumentů / Methods of Text Document Summarization

Pokorný, Lubomír January 2012 (has links)
This thesis deals with one-document summarization of text data. Part of it is devoted to data preparation, mainly to the normalization. Listed are some of the stemming algorithms and it contains also description of lemmatization. The main part is devoted to Luhn"s method for summarization and its extension of use WordNet dictionary. Oswald summarization method is described and applied as well. Designed and implemented application performs automatic generation of abstracts using these methods. A set of experiments where developed, which verified correct functionality of the application and of extension of Luhn"s summarization method too.
8

Studying the effectiveness of dynamic analysis for fingerprinting Android malware behavior / En studie av effektivitet hos dynamisk analys för kartläggning av beteenden hos Android malware

Regard, Viktor January 2019 (has links)
Android is the second most targeted operating system for malware authors and to counter the development of Android malware, more knowledge about their behavior is needed. There are mainly two approaches to analyze Android malware, namely static and dynamic analysis. Recently in 2017, a study and well labeled dataset, named AMD (Android Malware Dataset), consisting of over 24,000 malware samples was released. It is divided into 135 varieties based on similar malicious behavior, retrieved through static analysis of the file classes.dex in the APK of each malware, whereas the labeled features were determined by manual inspection of three samples in each variety. However, static analysis is known to be weak against obfuscation techniques, such as repackaging or dynamic loading, which can be exploited to avoid the analysis. In this study the second approach is utilized and all malware in the dataset are analyzed at run-time in order to monitor their dynamic behavior. However, analyzing malware at run-time has known weaknesses as well, as it can be avoided through, for instance, anti-emulator techniques. Therefore, the study aimed to explore the available sandbox environments for dynamic analysis, study the effectiveness of fingerprinting Android malware using one of the tools and investigate whether static features from AMD and the dynamic analysis correlate. For instance, by an attempt to classify the samples based on similar dynamic features and calculating the Pearson Correlation Coefficient (r) for all combinations of features from AMD and the dynamic analysis. The comparison of tools for dynamic analysis, showed a need of development, as most popular tools has been released for a long time and the common factor is a lack of continuous maintenance. As a result, the choice of sandbox environment for this study ended up as Droidbox, because of aspects like ease of use/install and easily adaptable for large scale analysis. Based on the dynamic features extracted with Droidbox, it could be shown that Android malware are more similar to the varieties which they belong to. The best metric for classifying samples to varieties, out of four investigated metrics, turned out to be Cosine Similarity, which received an accuracy of 83.6% for the entire dataset. The high accuracy indicated a correlation between the dynamic features and static features which the varieties are based on. Furthermore, the Pearson Correlation Coefficient confirmed that the manually extracted features, used to describe the varieties, and the dynamic features are correlated to some extent, which could be partially confirmed by a manual inspection in the end of the study.
9

Recommending digital books to children : Acomparative study of different state-of-the-art recommendation system techniques / Att rekommendera digitala böcker till barn : En jämförelsestudie av olika moderna tekniker för rekommendationssystem

Lundqvist, Malvin January 2023 (has links)
Collaborative filtering is a popular technique to use behavior data in the form of user’s interactions with, or ratings of, items in a system to provide personalized recommendations of items to the user. This study compares three different state-of-the-art Recommendation System models that implement this technique, Matrix Factorization, Multi-layer Perceptron and Neural Matrix Factorization, using behavior data from a digital book platform for children. The field of Recommendation Systems is growing, and many platforms can benefit of personalizing the user experience and simplifying the use of the platforms. To perform a more complex comparison and introduce a new take on the models, this study proposes a new way to represent the behavior data as input to the models, i.e., to use the Term Frequency-Inverse Document Frequency (TFIDF) of occurrences of interactions between users and books, as opposed to the traditional binary representation (positive if there has been any interaction and negative otherwise). The performance is measured by extracting the last book read for each user, and evaluating how the models would rank that book for recommendations to the user. To assess the value of the models for the children’s reading platform, the models are also compared to the existing Recommendation System on the digital book platform. The results indicate that the Matrix Factorization model performs best out of the three models when using children’s reading behavior data. However, due to the long training process and larger set of hyperparameters to tune for the other two models, these may not have reached an optimal hyperparameter tuning, thereby affecting the comparison among the three state-of-the-art models. This limitation is further discussed in the study. All three models perform significantly better than the current system on the digital book platform. The models with the proposed representation using TF-IDF values show notable promise, performing better than the binary representation in almost all numerical metrics for all models. These results can suggest future research work on more ways of representing behavior data as input to these types of models. / Kollaborativ filtrering är en populär teknik för att använda beteendedata från användare i form av t.ex. interaktioner med, eller betygsättning av, objekt i ett system för att ge användaren personliga rekommendationer om objekt. I den här studien jämförs tre olika modeller av moderna rekommendationssystem som tillämpar denna teknik, matrisfaktorisering, flerlagersperceptron och neural matrisfaktorisering, med hjälp av beteendedata från en digital läsplattform för barn. Rekommendationssystem är ett växande område, och många plattformar kan dra nytta av att anpassa användarupplevelsen utifrån individen och förenkla användningen av plattformen. För att utföra en mer komplex jämförelse och introducera en ny variant av modellerna, föreslår denna studie ett nytt sätt att representera beteendedata som indata till modellerna, d.v.s. att använda termfrekvens med omvänd dokumentfrekvens (TF- IDF) av förekomster av interaktioner mellan användare och böcker, i motsats till den traditionella binära representationen (positiv om en tidigare interaktion existerar och negativ i annat fall). Prestandan mäts genom att extrahera den senaste boken som lästs för varje användare, och utvärdera hur högt modellerna skulle rangordna den boken i rekommendationer till användaren. För att värdesätta modellerna för plattformen med digitala böcker, så jämförs modellerna också med det befintliga rekommendationssystemet på plattformen. Resultaten tyder på att matrisfaktorisering-modellen presterar bäst utav de tre modellerna när man använder data från barns läsbeteende. På grund av den långa träningstiden och fler hyperparametrar att optimera för de andra två modellerna, kan det dock vara så att de inte har nått en optimal hyperparameterinställning, vilket påverkar jämförelsen mellan de tre moderna modellerna. Denna begränsning diskuteras ytterligare i studien. Alla tre modellerna presterar betydligt bättre än det nuvarande systemet på läsplattformen. Modellerna med den föreslagna representationen av TFIDF-värden visar sig mycket lovande och presterar bättre än den binära representationen i nästan alla numeriska mått för alla modeller. Dessa resultat kan ge skäl för framtida forskning av fler sätt att representera beteendedata som indata till denna typ av modeller.

Page generated in 0.0489 seconds