Global ETD Search

1	運用文字探勘及財務資料探討中國市場營運概況文字敘述及財務表現之一致性 / Using Text Mining and Financial Data to Explore for Consistency between Narrative Disclosure and Financial Performance in China Market 鄭凱文 Unknown Date (has links) 本研究透過文字探勘對中國大陸2011年上市公司的MD&A進行分析，並搭配財務資訊相互比對，分析中國大陸上市公司所揭露的MD&A是否誇大，再透過實證研究分析造成中國大陸上市公司MD&A揭露誇大與否的原因。本研究樣本為2011年中國大陸所有上市公司所揭露的MD&A及相關財務資訊，MD&A非量化資訊係運用Stanford Word Segmenter斷詞資料庫、正負向詞典、TFIDF、K-means等技術進行群集分析，並結合財務資訊的K-Means群集分析，分析出中國大陸2011年上市公司MD&A揭露是否誇大；再代入公司規模、管理階層對風險的偏好程度、獲利能力、償債能力等變數，分析影響公司MD&A揭露誇大與否的因素。研究結果顯示，公司規模、管理階層對風險的偏好程度與公司MD&A資訊揭露傾向於不誇大呈顯著負相關，而公司獲利能力、公司償債能力與公司MD&A資訊揭露傾向於不誇大呈顯著正相關。本研究希望提供投資人另一種分析MD&A的方式，並建議投資人運用上市公司所揭露的MD&A資訊時，需額外考慮公司MD&A揭露有無誇大的情勢，並作適度的調整，以降低投資風險，擬定正確的投資決策。 / This study presented a way to analyze MD&A on listed companies in 2011 in China via text mining, crossing comparison with its fiscal information, validating whether disclosed MD&A on the China listed companies is overstated and its possible factors by empirical study. The research sample is the disclosed MD&A and related financial information on China listed companies in 2011. Qualitative narrative MD&A utilizes Stanford Word Segmenter, NTUSD, TFIDF and K-means performing cluster analysis, combining K-means cluster analysis of financial information, figuring out disclosed MD&A of China listed companies in 2011 exaggerated. By the variables of company scale, the Management of risk preference, profit, liquidity analyzes the effect factor of whether disclosed MD&A exaggerated or not. According to the research, the disclosed MD&A tending not to exaggerate is significantly and negatively related to company scale and the management of risk preference. Profitability and liquidity are significantly and positively relationship to disclosed MD&A tending not to exaggerate. The research is providing another way of reading MD&A with investors, suggesting investors need to take whether disclosed MD&A is overstated into consideration, and adjusting in appropriate in reducing the investment risk when making Investment decisions. 文字探勘 K-means TFIDF
2	Magellan : un agent pour simplifier les achats sur internet Paturel, Jonathan January 2002 (has links) Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal. Intelligent Recherche Web Produit Commerce électronique TFIDF
3	Comparing Feature Extraction Methods and Effects of Pre-Processing Methods for Multi-Label Classification of Textual Data / Utvärdering av Metoder för Extraktion av Särdrag och Förbehandling av Data för Multi-Taggning av Textdata Eklund, Martin January 2018 (has links) This thesis aims to investigate how different feature extraction methods applied to textual data affect the results of multi-label classification. Two different Bag of Words extraction methods are used, specifically the Count Vector and the TF-IDF approaches. A word embedding method is also investigated, called the GloVe extraction method. Multi-label classification can be useful for categorizing items, such as pieces of music or news articles, that may belong to multiple classes or topics. The effect of using different pre-processing methods is also investigated, such as the use of N-grams, stop-word elimination, and stemming. Two different classifiers, an SVM and an ANN, are used for multi-label classification using a Binary Relevance approach. The results indicate that the choice of extraction method has a meaningful impact on the resulting classifications, but that no one method consistently outperforms the others. Instead the results show that the GloVe extraction method performs the best for the recall metrics, while the Bag of Words methods perform the best for the precision metrics. / Detta arbete ämnar att undersöka vilken effekt olika metoder för att extrahera särdrag ur textdata har när dessa används för att multi-tagga textdatan. Två metoder baserat på Bag of Words undersöks, närmare bestämt Count Vector-metoden samt TF-IDF-metoden. Även en metod som använder sig av word embessings undersöks, som kallas för GloVe-metoden. Multi-taggning av data kan vara användbart när datan, exempelvis musikaliska stycken eller nyhetsartiklar, kan tillhöra flera klasser eller områden. Även användandet av flera olika metoder för att förbehandla datan undersöks, såsom användandet utav N-gram, eliminering av icke-intressanta ord, samt transformering av ord med olika böjningsformer till gemensam stamform. Två olika klassificerare, en SVM samt en ANN, används för multi-taggningen genom använding utav en metod kallad Binary Relevance. Resultaten visar att valet av metod för extraktion av särdrag har en betydelsefull roll för den resulterande multi-taggningen, men att det inte finns en metod som ger bäst resultat genom alla tester. Istället indikerar resultaten att extraktionsmetoden baserad på GloVe presterar bäst när det gäller 'recall'-mätvärden, medan Bag of Words-metoderna presterar bäst gällade 'precision'-mätvärden. Computer Sciences Datavetenskap (datalogi)
4	內控缺失與財務報導一致性之關聯性 / The Relationship between Internal Control Weakness and the Financial Reporting Consistency 許正昇 Unknown Date (has links) 本研究使用TFIDF文字探勘技術分析樣本公司年度財務報告裏的管理階層討論與分析(Management’s Discussion & Analysis of Financial Condition and Results of Operations，以下簡稱MD&A)與財務資訊，欲探討公司內部控制有效性對於MD&A資訊與財務資訊一致性之影響。本研究樣本自2002年至2014年美國上市櫃公司之年報中選取，研究結果顯示，當內部控制出現重大缺失，會對企業財務報導一致性產生顯著影響，內部控制具備有效性，其財務資訊與MD&A文字性資訊所揭露之訊息較為一致。 / The major purpose of this study is to examine the relationship between internal control weakness and the financial reporting consistency. I use TFIDF text mining technology analysis the Management's Discussion & Analysis of Financial Condition and Results of Operations (MD&A) and financial information. All annual report of the US-listed companies from 2002 to 2014 are collected as data samples. As anticipated, we find that internal control weakness is negatively correlated to the financial reporting consistency. Companies with no internal control weakness present more consistent MD&A information comparing to their financial information. TFIDF 文字探勘 MD&A 內控缺失 TFIDF Text Mining MD&A Internal Control Weakness
5	Multidimensional Visualization of News Articles / Flerdimensionel Visualisering av Nyhetsartiklar Åklint, Richard, Khan, Muhammad Farhan January 2015 (has links) Large data sets are difficult to visualize. For a human to find structures and understand the data, good visualization tools are required. In this project a technique will be developed that makes it possible for a user to look at complex data at different scales. This technique is obvious when viewing geographical data where zooming in and out gives a good feeling for the spatial relationships in map data or satellite images. However, for other types of data it is not obvious how much scaling should be done. In this project, an experimental application is developed that visualizes data in multiple dimensions from a large news article database. Using this experimental application, the user can select multiple keywords on different axis and then can create a visualization containing news articles with those keywords. The user is able to move around the visualization. If the camera is far away from the document icons then they are clustered using red coloured spheres. If the user moves the camera closer to the clusters they will pop up into single document icons. If the camera is very close to the document icons it is possible to read the news articles Visualization TFIDF Octree Keywords Extractor News Articles Data Abstraction Big Data
6	Evaluation of Approaches for Representation and Sentiment of Customer Reviews / Utvärdering av tillvägagångssätt för representation och uppfattning om kundrecensioner Giorgis, Stavros January 2021 (has links) Classification of sentiment on customer reviews is a real-world application for many companies that offer text analytics and opinion extraction on customer reviews on different domains such as consumer electronics, hotels, restaurants, and car rental agencies. Natural Language Processing’s latest progress has seen the development of many new state-of-the-art approaches for representing the meaning of sentences, phrases, and words in the text using vector space models, so-called embeddings. In this thesis, we evaluated the most current and most popular text representation techniques against traditional methods as a baseline. The evaluation dataset consists of customer reviews from different domains with different lengths used by a text analysis company. Through a train dataset exploration, we evaluated which datasets were the most suitable for this specific task. Furthermore, we explored different techniques that could be used to alter a language model’s decisions without retraining it. Finally, all the methods were evaluated against their time performance and the resource requirements to present an overall experimental assessment that could potentially help the company decide which is the most appropriate technique to replace its system in a production environment. / Klassificeringen av attityd och känsloläge i kundrecensioner är en tillämpning med praktiskt värde för flera företag i marknadsanalysbranschen. Aktuell forskning i språkteknologi har etablerat vektorrum som standardrepresentation för ord, fraser och yttranden, så kallade embeddings. Denna uppsats utvärderar den senaste tidens mest framgångsrika textrepresentationsmodeller jämfört med mer traditionella vektorrum. Utvärdering görs genom att jämföra automatiska analyser med mänskliga bedömningar för kundrecensioner av varierande längd från olika domäner tillhandahållna av ett textanalysföretag. Inom ramen för studien har olika testmängder jämförts och olika sätt att modifera en språkmodells klassficering utan om träning. Alla modeller har också jämförts med avseende på resurs- och tidsåtgång för träning för att hjälpa uppdragsgivaren fatta beslut om vilken teknik som utgör den mest ändamålsenliga utvecklingsvägen för dess driftsatta system. machine learning nlp text analytics sentiment analysis transformers tfidf bow fasttext word2vec bert xlnet roberta maskininlärning nlp textanalys sentimentanalys transformatorer tfidf bow fasttext word2vec bert xlnet roberta Computer and Information Sciences Data- och informationsvetenskap
7	Clustering and Summarization of Chat Dialogues : To understand a company’s customer base / Klustring och Summering av Chatt-Dialoger Hidén, Oskar, Björelind, David January 2021 (has links) The Customer Success department at Visma handles about 200 000 customer chats each year, the chat dialogues are stored and contain both questions and answers. In order to get an idea of what customers ask about, the Customer Success department has to read a random sample of the chat dialogues manually. This thesis develops and investigates an analysis tool for the chat data, using the approach of clustering and summarization. The approach aims to decrease the time spent and increase the quality of the analysis. Models for clustering (K-means, DBSCAN and HDBSCAN) and extractive summarization (K-means, LSA and TextRank) are compared. Each algorithm is combined with three different text representations (TFIDF, S-BERT and FastText) to create models for evaluation. These models are evaluated against a test set, created for the purpose of this thesis. Silhouette Index and Adjusted Rand Index are used to evaluate the clustering models. ROUGE measure together with a qualitative evaluation are used to evaluate the extractive summarization models. In addition to this, the best clustering model is further evaluated to understand how different data sizes impact performance. TFIDF Unigram together with HDBSCAN or K-means obtained the best results for clustering, whereas FastText together with TextRank obtained the best results for extractive summarization. This thesis applies known models on a textual domain of customer chat dialogues, something that, to our knowledge, has previously not been done in literature. Machine Learning NLP Text Representations Clustering Extractive summarization TFIDF S-BERT FastText K-means DBSCAN HDBSCAN LSA TextRank Word Mover's Distance (WMD) Computer Engineering Datorteknik

1

Page generated in 0.0263 seconds