Spelling suggestions: "subject:"regularexpression"" "subject:"regularexpressions""
21 |
OPTIMALIZACE ALGORITMŮ A DATOVÝCH STRUKTUR PRO VYHLEDÁVÁNÍ REGULÁRNÍCH VÝRAZŮ S VYUŽITÍM TECHNOLOGIE FPGA / OPTIMIZATION OF ALGORITHMS AND DATA STRUCTURES FOR REGULAR EXPRESSION MATCHING USING FPGA TECHNOLOGYKaštil, Jan Unknown Date (has links)
Disertační práce se zabývá rychlým vyhledáváním regulárních výrazů v síťovém provozu s použitím technologie FPGA. Vyhledávání regulárních výrazů v síťovém provozu je výpočetně náročnou operací využívanou převážně v oblasti síťové bezpečnosti a v oblasti monitorování provozu vysokorychlostních počítačových sítí. Současná řešení neumožňují dosáhnout požadovaných multigigabitových propustností při dodržení všech požadavků, které jsou na vyhledávací jednotky kladeny. Nejvyšších propustností dosahují implementace založené na využití inovativních hardwarových architektur implementovaných v FPGA případně v ASIC. Tato disertační práce popisuje nové architektury vyhledávací jednotky, které jsou vhodné pro implementaci jak v FPGA tak v ASIC. Základní myšlenkou navržených architektur je využití perfektní hashovací funkce pro implementaci přechodové tabulky konečného automatu. Dále byla navržena architektura, která umožňuje uživateli zanést malou pravděpodobnost chyby při vyhledávání a tím snížit paměťové nároky vyhledávací jednotky. Disertační práce analyzuje vliv pravděpodobnosti této chyby na celkovou spolehlivost systému a srovnává ji s řešením používaným v současnosti. V rámci disertační práce byla provedena měření vlastností regulárních výrazů používaných při analýze provozu moderních počítačových sítí. Z provedené analýzy vyplývá, že velká část regulárních výrazů je vhodná pro implementaci pomocí navržených architektur. Pro dosažení vysoké propustnosti vyhledávací jednotky práce navrhuje nový algoritmus transformace abecedy, který umožňuje, aby vyhledávací jednotka zpracovala více znaků v jednom kroku. Na rozdíl od současných metod, navržený algoritmus umožňuje konstrukci automatu zpracovávajícího libovolný počet symbolů v jednom taktu. Implementované architektury dosahují v porovnání se současnými metodami úspory paměti zlepšení až 200MB.
|
22 |
Systém pro správu jazykových verzí Portálu VUT / Language Version Tools for Web Portal of BUTPavlíček, Milan Unknown Date (has links)
The main purpose of this master's thesis is to create language version tools for web Portal of BUT. There are possibilities of applications using databases or without them. The next part of the thesis analyses current situation of the web Portal of BUT. I described individually servers, tools and mainly solution ensuring multilingual sites. Thereinafter I attend to design and implementation of required system. This system consists of some single scripts and web applications for web developers and translators. Finally I will describe the procedure of integration of this system to current web Portal of BUT.
|
23 |
Outomatiese Setswana lemma-identifisering / Jeanetta Hendrina BritsBrits, Jeanetta Hendrina January 2006 (has links)
Within the context of natural language processing, a lemmatiser is one of the
most important core technology modules that has to be developed for a particular
language. A lemmatiser reduces words in a corpus to the corresponding lemmas
of the words in the lexicon.
A lemma is defined as the meaningful base form from which other more complex
forms (i.e. variants) are derived. Before a lemmatiser can be developed for a
specific language, the concept "lemma" as it applies to that specific language
should first be defined clearly. This study concludes that, in Setswana, only
stems (and not roots) can act independently as words; therefore, only stems
should be accepted as lemmas in the context of automatic lemmatisation for
Setswana.
Five of the seven parts of speech in Setswana could be viewed as closed
classes, which means that these classes are not extended by means of regular
morphological processes. The two other parts of speech (nouns and verbs) require
the implementation of alternation rules to determine the lemma. Such alternation
rules were formalised in this study, for the purpose of development of a
Setswana lemmatiser. The existing Setswana grammars were used as basis for
these rules. Therewith the precision of the formalisation of these existing grammars
to lemmatise Setswana words could be determined.
The software developed by Van Noord (2002), FSA 6, is one of the best-known
applications available for the development of finite state automata and transducers.
Regular expressions based on the formalised morphological rules were
used in FSA 6 to create finite state transducers. The code subsequently generated
by FSA 6 was implemented in the lemmatiser.
The metric that applies to the evaluation of the lemmatiser is precision. On a test
corpus of 1 000 words, the lemmatiser obtained 70,92%. In another evaluation
on 500 complex nouns and 500 complex verbs separately, the lemmatiser obtained
70,96% and 70,52% respectively. Expressed in numbers the precision on
500 complex and simplex nouns was 78,45% and on complex and simplex verbs
79,59%. The quantitative achievement only gives an indication of the relative
precision of the grammars. Nevertheless, it did offer analysed data with which
the grammars were evaluated qualitatively. The study concludes with an overview
of how these results might be improved in the future. / Thesis (M.A. (African Languages))--North-West University, Potchefstroom Campus, 2006.
|
24 |
Classifications et grammaires des invariants lexicaux arabes en prévision d’un traitement informatique de cette langue. Construction d’un modèle théorique de l’arabe : la grammaire des invariants lexicaux temporels / Classifications and grammars of Arab lexical invariants in anticipation of an automatic processing of this language. Construction of a theoretical model of Arabic : grammar of temporal lexical invariantsGhoul, Dhaou 07 December 2016 (has links)
Cette thèse porte sur la classification et le traitement des invariants lexicaux arabes qui expriment un aspect temporel afin de créer un modèle qui présente chaque invariant sous la forme d’un schéma de grammaire (automates à états finis). Dans ce travail nous avons limité notre traitement seulement pour 20 invariants lexicaux. Notre hypothèse part du principe que les invariants lexicaux sont situés au même niveau structural (formel) que les schèmes dans le langage quotient (squelette) de la langue arabe. Ils cachent beaucoup d’informations et entraînent des attentes syntaxiques qui permettent de prédire la structure de la phrase.Au début de cette thèse, nous abordons la notion « invariant lexical » en exposant les différents niveaux d’invariance. Ensuite, nous classons les invariants étudiés dans cette thèse selon plusieurs critères.La deuxième partie de cette thèse fait l’objet de notre propre étude concernant les invariants lexicaux temporels dans laquelle nous commençons par une présentation de notre méthode d’étude linguistique ainsi que la modélisation par schémas de grammaires des invariants lexicaux temporels étudiés. Ensuite, nous abordons l’analyse proprement dite des invariants lexicaux simples comme « ḥattā, baʿda » et complexes comme « baʿdamā, baynamā ».Enfin, une application expérimentale « Kawâkib » a été employée pour détecter et identifier les invariants lexicaux en montrant leurs points forts aussi bien que leurs lacunes. Nous proposons également une nouvelle vision de la prochaine version de « Kawâkib » qui peut représenter une application pédagogique de l'arabe sans lexique. / This thesis focuses on the classification and the treatment of Arabic lexical invariants that express a temporal aspect. Our aim is to create a diagram of grammar (finite state machine) for each invariant. In this work, we limited our treatment to 20 lexical invariants. Our assumption is that the lexical invariants are located at the same structural level (formal) as the schemes in the language quotient (skeleton) of the Arabic language. They hide much information and involve syntactic expectations that make it possible to predict the structure of the sentence.In the first part of our research tasks, we present the concept of “invariant lexical” by exposing the various levels of invariance. Then, we classify the invariants according to several criteria.The second part is the object of our own study concerning the temporal lexical invariants. We present our linguistic method as well as our approach of modelling using diagrams of grammars. Then, we analyze the simple lexical invariants such “ḥattā, baʿda” and the complexes ones such “baʿdamā, baynamā”.Finally, an experimental application “Kawâkib” was used to detect and identify the lexical invariants by showing their strong points as well as their gaps. We also propose a new vision of the next version of “Kawâkib” that can represent a teaching application of Arabic without lexicon.
|
25 |
Outomatiese Setswana lemma-identifisering / Jeanetta Hendrina BritsBrits, Jeanetta Hendrina January 2006 (has links)
Within the context of natural language processing, a lemmatiser is one of the
most important core technology modules that has to be developed for a particular
language. A lemmatiser reduces words in a corpus to the corresponding lemmas
of the words in the lexicon.
A lemma is defined as the meaningful base form from which other more complex
forms (i.e. variants) are derived. Before a lemmatiser can be developed for a
specific language, the concept "lemma" as it applies to that specific language
should first be defined clearly. This study concludes that, in Setswana, only
stems (and not roots) can act independently as words; therefore, only stems
should be accepted as lemmas in the context of automatic lemmatisation for
Setswana.
Five of the seven parts of speech in Setswana could be viewed as closed
classes, which means that these classes are not extended by means of regular
morphological processes. The two other parts of speech (nouns and verbs) require
the implementation of alternation rules to determine the lemma. Such alternation
rules were formalised in this study, for the purpose of development of a
Setswana lemmatiser. The existing Setswana grammars were used as basis for
these rules. Therewith the precision of the formalisation of these existing grammars
to lemmatise Setswana words could be determined.
The software developed by Van Noord (2002), FSA 6, is one of the best-known
applications available for the development of finite state automata and transducers.
Regular expressions based on the formalised morphological rules were
used in FSA 6 to create finite state transducers. The code subsequently generated
by FSA 6 was implemented in the lemmatiser.
The metric that applies to the evaluation of the lemmatiser is precision. On a test
corpus of 1 000 words, the lemmatiser obtained 70,92%. In another evaluation
on 500 complex nouns and 500 complex verbs separately, the lemmatiser obtained
70,96% and 70,52% respectively. Expressed in numbers the precision on
500 complex and simplex nouns was 78,45% and on complex and simplex verbs
79,59%. The quantitative achievement only gives an indication of the relative
precision of the grammars. Nevertheless, it did offer analysed data with which
the grammars were evaluated qualitatively. The study concludes with an overview
of how these results might be improved in the future. / Thesis (M.A. (African Languages))--North-West University, Potchefstroom Campus, 2006.
|
26 |
Anomaly detection technique for sequential data / Technique de détection d'anomalies utilisant des données séquentiellesPellissier, Muriel 15 October 2013 (has links)
De nos jours, beaucoup de données peuvent être facilement accessibles. Mais toutes ces données ne sont pas utiles si nous ne savons pas les traiter efficacement et si nous ne savons pas extraire facilement les informations pertinentes à partir d'une grande quantité de données. Les techniques de détection d'anomalies sont utilisées par de nombreux domaines afin de traiter automatiquement les données. Les techniques de détection d'anomalies dépendent du domaine d'application, des données utilisées ainsi que du type d'anomalie à détecter.Pour cette étude nous nous intéressons seulement aux données séquentielles. Une séquence est une liste ordonnée d'objets. Pour de nombreux domaines, il est important de pouvoir identifier les irrégularités contenues dans des données séquentielles comme par exemple les séquences ADN, les commandes d'utilisateur, les transactions bancaires etc.Cette thèse présente une nouvelle approche qui identifie et analyse les irrégularités de données séquentielles. Cette technique de détection d'anomalies peut détecter les anomalies de données séquentielles dont l'ordre des objets dans les séquences est important ainsi que la position des objets dans les séquences. Les séquences sont définies comme anormales si une séquence est presque identique à une séquence qui est fréquente (normale). Les séquences anormales sont donc les séquences qui diffèrent légèrement des séquences qui sont fréquentes dans la base de données.Dans cette thèse nous avons appliqué cette technique à la surveillance maritime, mais cette technique peut être utilisée pour tous les domaines utilisant des données séquentielles. Pour notre application, la surveillance maritime, nous avons utilisé cette technique afin d'identifier les conteneurs suspects. En effet, de nos jours 90% du commerce mondial est transporté par conteneurs maritimes mais seulement 1 à 2% des conteneurs peuvent être physiquement contrôlés. Ce faible pourcentage est dû à un coût financier très élevé et au besoin trop important de ressources humaines pour le contrôle physique des conteneurs. De plus, le nombre de conteneurs voyageant par jours dans le monde ne cesse d'augmenter, il est donc nécessaire de développer des outils automatiques afin d'orienter le contrôle fait par les douanes afin d'éviter les activités illégales comme les fraudes, les quotas, les produits illégaux, ainsi que les trafics d'armes et de drogues. Pour identifier les conteneurs suspects nous comparons les trajets des conteneurs de notre base de données avec les trajets des conteneurs dits normaux. Les trajets normaux sont les trajets qui sont fréquents dans notre base de données.Notre technique est divisée en deux parties. La première partie consiste à détecter les séquences qui sont fréquentes dans la base de données. La seconde partie identifie les séquences de la base de données qui diffèrent légèrement des séquences qui sont fréquentes. Afin de définir une séquence comme normale ou anormale, nous calculons une distance entre une séquence qui est fréquente et une séquence aléatoire de la base de données. La distance est calculée avec une méthode qui utilise les différences qualitative et quantitative entre deux séquences. / Nowadays, huge quantities of data can be easily accessible, but all these data are not useful if we do not know how to process them efficiently and how to extract easily relevant information from a large quantity of data. The anomaly detection techniques are used in many domains in order to help to process the data in an automated way. The anomaly detection techniques depend on the application domain, on the type of data, and on the type of anomaly.For this study we are interested only in sequential data. A sequence is an ordered list of items, also called events. Identifying irregularities in sequential data is essential for many application domains like DNA sequences, system calls, user commands, banking transactions etc.This thesis presents a new approach for identifying and analyzing irregularities in sequential data. This anomaly detection technique can detect anomalies in sequential data where the order of the items in the sequences is important. Moreover, our technique does not consider only the order of the events, but also the position of the events within the sequences. The sequences are spotted as anomalous if a sequence is quasi-identical to a usual behavior which means if the sequence is slightly different from a frequent (common) sequence. The differences between two sequences are based on the order of the events and their position in the sequence.In this thesis we applied this technique to the maritime surveillance, but this technique can be used by any other domains that use sequential data. For the maritime surveillance, some automated tools are needed in order to facilitate the targeting of suspicious containers that is performed by the customs. Indeed, nowadays 90% of the world trade is transported by containers and only 1-2% of the containers can be physically checked because of the high financial cost and the high human resources needed to control a container. As the number of containers travelling every day all around the world is really important, it is necessary to control the containers in order to avoid illegal activities like fraud, quota-related, illegal products, hidden activities, drug smuggling or arm smuggling. For the maritime domain, we can use this technique to identify suspicious containers by comparing the container trips from the data set with itineraries that are known to be normal (common). A container trip, also called itinerary, is an ordered list of actions that are done on containers at specific geographical positions. The different actions are: loading, transshipment, and discharging. For each action that is done on a container, we know the container ID and its geographical position (port ID).This technique is divided into two parts. The first part is to detect the common (most frequent) sequences of the data set. The second part is to identify those sequences that are slightly different from the common sequences using a distance-based method in order to classify a given sequence as normal or suspicious. The distance is calculated using a method that combines quantitative and qualitative differences between two sequences.
|
27 |
Automatic Java Code Generator for Regular Expressions and Finite AutomataMemeti, Suejb January 2012 (has links)
No description available.
|
28 |
Rychlejší než grep pomocí čítačů / Beat Grep with Counters, ChallengeHorký, Michal January 2021 (has links)
Vyhledávání regulárních výrazů má ve vývoji softwaru nezastupitelné místo. Rychlost vyhledávání může ovlivnit použitelnost softwaru, a proto je na ni kladen velký důraz. Pro určité druhy regulárních výrazů mají standardní přístupy pro vyhledávání vysokou složitost. Kvůli tomu jsou náchylné k útokům založeným na vysoké náročnosti vyhledávání regulárních výrazů (takzvané ReDoS útoky). Regulární výrazy s omezeným opakováním, které se v praxi často vyskytují, jsou jedním z těchto druhů. Efektivní reprezentace a rychlé vyhledávání těchto regulárních výrazů je možné s použítím automatu s čítači. V této práci představujeme implementaci vyhledávání regulárních výrazů založeném na automatech s čítači v C++. Vyhledávání je implementováno v rámci RE2, rychlé moderní knihovny pro vyhledávání regulárních výrazů. V práci jsme provedli experimenty na v praxi používaných regulárních výrazech. Výsledky experimentů ukázaly, že implementace v rámci nástroje RE2 je rychleší než původní implementace v jazyce C#.
|
29 |
Detekce dynamických síťových aplikací / Detection of Dynamic Network ApplicationsBurián, Pavel January 2013 (has links)
This thesis deals with detection of dynamic network applications. It describes some of the existing protocols and methods of their identification from IP flow and packet contents. It constitues a design of a detection system based on the automatic creation of regular expressions and describes its implementation. It presents the created regular expressions for BitTorrent and eDonkey protocol. It compares their quality with the solution of L7-filter.
|
30 |
Analýza systémových záznamů / System Log AnalysisŠčotka, Jan January 2008 (has links)
The goal of this master thesis is to make possible to perform system log analysis in more general way than well-known host-based instrusion detection systems (HIDS). The way how to achieve this goal is via proposed user-friendly regular expressions. This thesis deals with making regular expressions possible to use in the field of log analysis, and mainly by users unfamiliar with formal aspects of computer science.
|
Page generated in 0.0877 seconds