Spelling suggestions: "subject:"text anda data mining"" "subject:"text ando data mining""
1 |
Det allmänna TDM-undantaget : En analys av 15 a § URLTidhult, Ludvig January 2024 (has links)
No description available.
|
2 |
European Copyright Law and the Text and Data Mining Exceptions and Limitations : With a focus on the DSM Directive, is the EU Approach a Hindrance or Facilitator to Innovation in the Region?Gerrish, Charlotte January 2019 (has links)
We are in a digital age with Big Data at the heart of our global online environment. Exploiting Big Data by manual means is virtually impossible. We therefore need to rely on innovative methods such as Machine Learning and AI to allow us to fully harness the value of Big Data available in our digital society. One of the key processes allowing us to innovate using new technologies such as Machine Learning and AI is by the use of TDM which is carried out on large volumes of Big Data. Whilst there is no single definition of TDM, it is universally acknowledged that TDM involves the automated analytical processing of raw and unstructured data sets through sophisticated ICT tools in order to obtain valuable insights for society or to enable efficient Machine Learning and AI development. Some of the source text and data on which TDM is performed is likely to be protected by copyright, which creates difficulties regarding the balance between the exclusive rights of copyright holders, and the interests of innovators developing TDM technologies and performing TDM, for both research and commercial purposes, who need as much unfettered access to source material in order to create the most performant AI solutions. As technology has grown so rapidly over the last few decades, the copyright law framework must adapt to avoid becoming redundant. This paper looks at the European approach to copyright law in the era of Big Data, and specifically its approach to TDM exceptions in light of the recent DSM Directive, and whether this approach has been, or is, a furtherance or hindrance to innovation in the EU.
|
3 |
Text Classificaton In Turkish Marketing Domain And Context-sensitive Ad DistributionEngin, Melih 01 February 2009 (has links) (PDF)
Online advertising has a continuously increasing popularity. Target audience of this new advertising method is huge. Additionally, there is another rapidly growing and crowded group related to internet advertising that consists of web publishers. Contextual advertising systems make it easier for publishers to present online ads on their web sites, since these online marketing systems automatically divert ads to web sites with related contents. Web publishers join ad networks and gain revenue by enabling ads to be displayed on their sites. Therefore, the accuracy of automated ad systems in determining ad-context relevance is crucial.
In this thesis we construct a method for semantic classification of web site contexts in Turkish language and develop an ad serving system to display context related ads on web documents. The classification method uses both semantic and statistical techniques. The method is supervised, and therefore, needs processed sample data for learning classification rules. Therefore, we generate a Turkish marketing dataset and use it in our classification approaches. We form successful classification methods using different feature spaces and support vector machine configurations. Our results present a good comparison between these methods.
|
4 |
A European solution for Text and Data Mining in the development of creative Artificial Intelligence : With a specific focus on articles 3 and 4 of the Digital Signel Market DirectiveChristensen, Kristina January 2021 (has links)
In today’s data-driven society, also called the Fourth Industrial Revolution, Text and Data Mining (TDM) has become an essential tool in managing the booming Big Data in its different sizes and forms. It is also an inherent part of AI research using machine learning, where these techniques highly depend on datasets derived from TDM to self-learn and to make autonomous decisions. Through the lens of copyright and related rights, TDM may be used to train AI for the purpose of AI-driven creativity, where AI has already helped in actualizing paintings, compose music and to produce movie trailers. However, since TDM typically involves extraction and/or copying of works and other subject matter protectable by copyright and related rights – in order to create datasets relevant to each AI project – it is at risk of infringing the exclusive right of reproduction and sui generis database right under the EU acquis. Indeed, TDM used for the purpose of AI-driven creativity may not necessarily amount to an infringement, if the restricted act is covered by prima facie an available exception or limitation. Several pre-existing exceptions and limitations under the EU acquis, i.e. temporary act of reproduction, scientific research, normal use of a database, extraction of insubstantial part from a database and the mandatory exception for computer programs, have been examined as possible candidates to screen unlicensed TDM activities from copyright and related rights infringement. However, this thesis observes that due to their narrow scope and the legal fragmentation caused by the voluntary implementation of some of the exceptions, these are not fully adapted to cover unlicensed TDM and thus creating legal uncertainties for AI developers. In this regard, in order to transfers the fundamental principle of copyright and related rights into the digital age and to compete with legal systems that offer a more friendly environment for TDM (e.g. US, Japan and UK), the European legislator adopted the Digital Single Market Directive 2019/790 (DSM Directive) comprising two obligatory TDM exceptions in articles 3 and 4. However, despite the reduction of several legal uncertainties and the diverging national implementations of the pre-existing exceptions and limitations, the adopted regime has significant shortcoming that may hinder the AI development in Europe. Ultimately, this thesis concludes that despite following an approach that better fits the digital environment, the DSM Directive fails to address the new era of the Fourth Industrial Revolution to which AI belongs.
|
5 |
Rough set-based reasoning and pattern mining for information filteringZhou, Xujuan January 2008 (has links)
An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).
|
6 |
An ontological approach for monitoring and surveillance systems in unregulated marketsYounis Zaki, Mohamed January 2013 (has links)
Ontologies are a key factor of Information management as they provide a common representation to any domain. Historically, finance domain has suffered from a lack of efficiency in managing vast amounts of financial data, a lack of communication and knowledge sharing between analysts. Particularly, with the growth of fraud in financial markets, cases are challenging, complex, and involve a huge volume of information. Gathering facts and evidence is often complex. Thus, the impetus for building a financial fraud ontology arises from the continuous improvement and development of financial market surveillance systems with high analytical capabilities to capture frauds which is essential to guarantee and preserve an efficient market.This thesis proposes an ontology-based approach for financial market surveillance systems. The proposed ontology acts as a semantic representation of mining concepts from unstructured resources and other internet sources (corpus). The ontology contains a comprehensive concept system that can act as a semantically rich knowledge base for a market monitoring system. This could help fraud analysts to understand financial fraud practices, assist open investigation by managing relevant facts gathered for case investigations, providing early detection techniques of fraudulent activities, developing prevention practices, and sharing manipulation patterns from prosecuted cases with investigators and relevant users. The usefulness of the ontology will be evaluated through three case studies, which not only help to explain how manipulation in markets works, but will also demonstrate how the ontology can be used as a framework for the extraction process and capturing information related to financial fraud, to improve the performance of surveillance systems in fraud monitoring. Given that most manipulation cases occur in the unregulated markets, this thesis uses a sample of fraud cases from the unregulated markets. On the empirical side, the thesis presents examples of novel applications of text-mining tools and data-processing components, developing off-line surveillance systems that are fully working prototypes which could train the ontology in the most recent manipulation techniques.
|
7 |
Text and Data Mining in EU Copyright LawSvensson, Gabriella January 2020 (has links)
Text and data mining can be a useful tool in such diverse fields as scientific research, journalism, culture and not least training of artificial intelligence and its importance is likely to only grow in the future. Despite its huge potential there are many indicators that copyright law restricts use of text and data mining – keeping users from optimal application. This thesis discusses possible barriers crated by EU copyright law, in particular in the light of the new exceptions provided by the Directive on Copyright and Related Rights in the Digital Single Market and finds that despite improvements in terms of legal certainty there are still obstacles to the efficient application of text and data mining.
|
8 |
Analyse der Meinungsentwicklung in Online Foren – Konzept und FallstudieKaiser, Carolin, Bodendorf, Freimut January 2010 (has links)
Das Web 2.0 ist u.a. auch eine weltweite Plattform für Meinungsäußerungen. Immer mehr Kunden diskutieren online über Produkte und tauschen Erfahrungen aus. Die Analyse der Online Beiträge stellt daher ein wichtiges Marktforschungsinstrument dar. Es wird ein Ansatz zur automatischen Identifikation, Aggregation und Analyse von Meinungen mittels Text Mining vorgestellt und dessen Anwendung an einem Beispiel aus der Sportartikelindustrie aufgezeigt.
|
9 |
Decision Support Systems for Financial Market SurveillanceAlic, Irina 30 November 2016 (has links)
Entscheidungsunterstützungssysteme in der Finanzwirtschaft sind nicht nur für die Wis-senschaft, sondern auch für die Praxis von großem Interesse. Um die Finanzmarktüber-wachung zu gewährleisten, sehen sich die Finanzaufsichtsbehörden auf der einen Seite, mit der steigenden Anzahl von onlineverfügbaren Informationen, wie z.B. den Finanz-Blogs und -Nachrichten konfrontiert. Auf der anderen Seite stellen schnell aufkommen-de Trends, wie z.B. die stetig wachsende Menge an online verfügbaren Daten sowie die Entwicklung von Data-Mining-Methoden, Herausforderungen für die Wissenschaft dar. Entscheidungsunterstützungssysteme in der Finanzwirtschaft bieten die Möglichkeit rechtzeitig relevante Informationen für Finanzaufsichtsbehörden und Compliance-Beauftragte von Finanzinstituten zur Verfügung zu stellen. In dieser Arbeit werden IT-Artefakte vorgestellt, welche die Entscheidungsfindung der Finanzmarktüberwachung unterstützen. Darüber hinaus wird eine erklärende Designtheorie vorgestellt, welche die Anforderungen der Regulierungsbehörden und der Compliance-Beauftragten in Finan-zinstituten aufgreift.
|
10 |
Digital Intelligence – Möglichkeiten und Umsetzung einer informatikgestützten Frühaufklärung: Digital Intelligence – opportunities and implementation of a data-driven foresightWalde, Peter 15 December 2010 (has links)
Das Ziel der Digital Intelligence bzw. datengetriebenen Strategischen Frühaufklärung ist, die Zukunftsgestaltung auf Basis valider und fundierter digitaler Information mit vergleichsweise geringem Aufwand und enormer Zeit- und Kostenersparnis zu unterstützen. Hilfe bieten innovative Technologien der (halb)automatischen Sprach- und Datenverarbeitung wie z. B. das Information Retrieval, das (Temporal) Data, Text und Web Mining, die Informationsvisualisierung, konzeptuelle Strukturen sowie die Informetrie. Sie ermöglichen, Schlüsselthemen und latente Zusammenhänge aus einer nicht überschaubaren, verteilten und inhomogenen Datenmenge wie z. B. Patenten, wissenschaftlichen Publikationen, Pressedokumenten oder Webinhalten rechzeitig zu erkennen und schnell und zielgerichtet bereitzustellen. Die Digital Intelligence macht somit intuitiv erahnte Muster und Entwicklungen explizit und messbar.
Die vorliegende Forschungsarbeit soll zum einen die Möglichkeiten der Informatik zur datengetriebenen Frühaufklärung aufzeigen und zum zweiten diese im pragmatischen Kontext umsetzen.
Ihren Ausgangspunkt findet sie in der Einführung in die Disziplin der Strategischen Frühaufklärung und ihren datengetriebenen Zweig – die Digital Intelligence.
Diskutiert und klassifiziert werden die theoretischen und insbesondere informatikbezogenen Grundlagen der Frühaufklärung – vor allem die Möglichkeiten der zeitorientierten Datenexploration.
Konzipiert und entwickelt werden verschiedene Methoden und Software-Werkzeuge, die die zeitorientierte Exploration insbesondere unstrukturierter Textdaten (Temporal Text Mining) unterstützen. Dabei werden nur Verfahren in Betracht gezogen, die sich im Kontext einer großen Institution und den spezifischen Anforderungen der Strategischen Frühaufklärung pragmatisch nutzen lassen. Hervorzuheben sind eine Plattform zur kollektiven Suche sowie ein innovatives Verfahren zur Identifikation schwacher Signale.
Vorgestellt und diskutiert wird eine Dienstleistung der Digital Intelligence, die auf dieser Basis in einem globalen technologieorientierten Konzern erfolgreich umgesetzt wurde und eine systematische Wettbewerbs-, Markt- und Technologie-Analyse auf Basis digitaler Spuren des Menschen ermöglicht.:Kurzzusammenfassung 2
Danksagung 3
Inhaltsverzeichnis 5
Tabellenverzeichnis 9
Abbildungsverzeichnis 10
A – EINLEITUNG 13
1 Hintergrund und Motivation 13
2 Beitrag und Aufbau der Arbeit 16
B – THEORIE 20
B0 – Digital Intelligence 20
3 Herleitung und Definition der Digital Intelligence 21
4 Abgrenzung zur Business Intelligence 23
5 Übersicht über unterschiedliche Textsorten 24
6 Informetrie: Bibliometrie, Szientometrie, Webometrie 29
7 Informationssysteme im Kontext der Digital Intelligence 31
B1 – Betriebswirtschaftliche Grundlagen der Digital Intelligence 36
8 Strategische Frühaufklärung 37
8.1 Facetten und historische Entwicklung 37
8.2 Methoden 41
8.3 Prozess 42
8.4 Bestimmung wiederkehrender Termini 44
8.5 Grundlagen der Innovations- und Diffusionsforschung 49
B2 – Informatik-Grundlagen der Digital Intelligence 57
9 Von Zeit, Daten, Text, Metadaten zu multidimensionalen zeitorientierten (Text)Daten 59
9.1 Zeit – eine Begriffsbestimmung 59
9.1.1 Zeitliche Grundelemente und Operatoren 59
9.1.2 Lineare, zyklische und verzweigte Entwicklungen 62
9.1.3 Zeitliche (Un)Bestimmtheit 62
9.1.4 Zeitliche Granularität 63
9.2 Text 63
9.2.1 Der Text und seine sprachlich-textuellen Ebenen 63
9.2.2 Von Signalen und Daten zu Information und Wissen 65
9.3 Daten 65
9.3.1 Herkunft 65
9.3.2 Datengröße 66
9.3.3 Datentyp und Wertebereich 66
9.3.4 Datenstruktur 67
9.3.5 Dimensionalität 68
9.4 Metadaten 69
9.5 Zusammenfassung und multidimensionale zeitorientierte Daten 70
10 Zeitorientierte Datenexplorationsmethoden 73
10.1 Zeitorientierte Datenbankabfragen und OLAP 76
10.2 Zeitorientiertes Information Retrieval 78
10.3 Data Mining und Temporal Data Mining 79
10.3.1 Repräsentationen zeitorientierter Daten 81
10.3.2 Aufgaben des Temporal Data Mining 86
10.4 Text Mining und Temporal Text Mining 91
10.4.1 Grundlagen des Text Mining 98
10.4.2 Entwickelte, genutzte und lizensierte Anwendungen des Text Mining 107
10.4.3 Formen des Temporal Text Mining 110
10.4.3.1 Entdeckung kausaler und zeitorientierter Regeln 110
10.4.3.2 Identifikation von Abweichungen und Volatilität 111
10.4.3.3 Identifikation und zeitorientierte Organisation von Themen 112
10.4.3.4 Zeitorientierte Analyse auf Basis konzeptueller Strukturen 116
10.4.3.5 Zeitorientierte Analyse von Frequenz, Vernetzung und Hierarchien 117
10.4.3.6 Halbautomatische Identifikation von Trends 121
10.4.3.7 Umgang mit dynamisch aktualisierten Daten 123
10.5 Web Mining und Temporal Web Mining 124
10.5.1 Web Content Mining 125
10.5.2 Web Structure Mining 126
10.5.3 Web Usage Mining 127
10.5.4 Temporal Web Mining 127
10.6 Informationsvisualisierung 128
10.6.1 Visualisierungstechniken 130
10.6.1.1 Visualisierungstechniken nach Datentypen 130
10.6.1.2 Visualisierungstechniken nach Darstellungsart 132
10.6.1.3 Visualisierungstechniken nach Art der Interaktion 137
10.6.1.4 Visualisierungstechniken nach Art der visuellen Aufgabe 139
10.6.1.5 Visualisierungstechniken nach Visualisierungsprozess 139
10.6.2 Zeitorientierte Visualisierungstechniken 140
10.6.2.1 Statische Repräsentationen 141
10.6.2.2 Dynamische Repräsentationen 145
10.6.2.3 Ereignisbasierte Repräsentationen 147
10.7 Zusammenfassung 152
11 Konzeptuelle Strukturen 154
12 Synopsis für die zeitorientierte Datenexploration 163
C – UMSETZUNG EINES DIGITAL-INTELLIGENCESYSTEMS 166
13 Bestimmung textbasierter Indikatoren 167
14 Anforderungen an ein Digital-Intelligence-System 171
15 Beschreibung der Umsetzung eines Digital-Intelligence-Systems 174
15.1 Konzept einer Dienstleistung der Digital Intelligence 175
15.1.1 Portalnutzung 177
15.1.2 Steckbriefe 178
15.1.3 Tiefenanalysen 180
15.1.4 Technologiescanning 185
15.2 Relevante Daten für die Digital Intelligence (Beispiel) 187
15.3 Frühaufklärungs-Plattform 188
15.4 WCTAnalyze und automatische Extraktion themenspezifischer Ereignisse 197
15.5 SemanticTalk 200
15.6 Halbautomatische Identifikation von Trends 204
15.6.1 Zeitreihenkorrelation 205
15.6.2 HD-SOM-Scanning 207
D – ZUSAMMENFASSUNG 217
Anhang A: Prozessbilder entwickelter Anwendungen des (Temporal) Text Mining 223
Anhang B: Synopsis der zeitorientierten Datenexploration 230
Literaturverzeichnis 231
Selbstständigkeitserklärung 285
Wissenschaftlicher Werdegang des Autors 286
Veröffentlichungen 287
|
Page generated in 0.1108 seconds