Spelling suggestions: "subject:"substring"" "subject:"substrings""
1 |
Finding the Longest Increasing Subsequence of Every SubstringTseng, Chiou-Ting 27 August 2006 (has links)
Given a string S = {a1, a2, a3, ..., an}, the longest increasing subsequence (LIS) problem is to find a subsequence of the given string such that the subsequence
is increasing and its length is maximal. In a previous result, to find the longest increasing subsequences of each sliding window with a fixed size w of a given string
with length n can be solved in O(w log log n+OUTPUT) time, where O(w log log n+ w^2) time is taken for preprocessing and OUTPUT is the sum of all output lengths. In this thesis, we solve the problem for finding the longest increasing subsequence of every substring of S. With the straightforward implementation of the previous result, the time required for the preprocessing would be O(n^3). We modify the data structure used in the algorithm, hence the required preprocessing time is improved to O(n^2). The time required for the report stage is linear to the size of the output. In other words, our algorithm can find the LIS of every substring in O(n^2+OUTPUT) time. If the LIS's of all substrings are desired to be reported, since there are O(n^2) substrings totally in a given string with length n, our algorithm is optimal.
|
2 |
Casamento de padrão em strings privados, com aplicação em consultas seguras a banco de dadosMedeiros Vanderlei, Igor January 2006 (has links)
Made available in DSpace on 2014-06-12T15:59:37Z (GMT). No. of bitstreams: 2
arquivo5353_1.pdf: 2899587 bytes, checksum: 5f8d77138fc281697613bf9ad7df021c (MD5)
license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5)
Previous issue date: 2006 / A computação segura entre múltiplos participantes (MPC) é uma área de convergência entre
estudos de criptografia e de sistemas distribuídos. Na MPC, assim como na computação distribuída,
dois ou mais participantes colaboram na resolução de uma determinada tarefa. Esta
tarefa a ser resolvida pode ser modelada através de uma função f (x1, ...,xn), onde cada entrada
xi pertence a um participante diferente, e ao final da execução do protocolo, cada participante
obterá apenas a sua saída pré-determinada e as entradas xi permanecerão secretas. Por exemplo,
considere um conjunto de pessoas que deseja descobrir quem é o mais rico, entretanto
nenhum deles quer revelar qual é o montante da sua fortuna (este problema é conhecido como
o problema dos milionários e foi discutido pela primeira vez por Yao [Yao82]).
Diversos estudos teóricos e práticos já foram realizados por pesquisadores da área e muitos
avanços já foram dados em direção à resolução dos problemas MPC. De um lado se encontram
os pesquisadores teóricos, que buscam esquemas genéricos capazes de solucionar qualquer problema
computável, do outro lado, os pesquisadores práticos, na busca de soluções específicas
para cada categoria de problema.
Os estudos teóricos já conseguiram provar que existem soluções genéricas para qualquer
problema computável, entretanto essas soluções requerem um enorme custo computacional, o
que torna o seu uso inviável.
Por atacar cada categoria de problema isoladamente, as pesquisas práticas conseguem produzir
soluções viáveis, pois estudam as características inerentes a cada categoria, e se beneficiam
destas características de forma a reduzir o custo computacional do protocolo.
A proposta deste trabalho é desenvolver uma solução viável ao problema do substring MPC,
e, para tanto, será utilizada uma abordagem prática. Considere dois participantes E e R que
possuem os strings secretos SE e SR, respectivamente; o protocolo de substring MPC permite
que o participante R descubra se o string SR é ou não substring de SE, sem que R tenha acesso ao string SE, nem E tenha acesso ao string SR.
Finalmente, o protocolo de substring MPC será utilizado para a construção de um protocolo
de consulta a banco de dados seguro, capaz de realizar consultas do tipo "SELECT * FROM
funcionarios WHERE nome LIKE %joão% ", sem revelar ao servidor de banco de dados qual
é o valor que está sendo procurado
|
3 |
A Performance Analysis Framework for Coreference Resolution AlgorithmsPatel, Chandankumar Johakhim 29 August 2016 (has links)
No description available.
|
4 |
Substring Current-Voltage Measurement of PV Strings Using a Non-Contact I-V Curve TracerJanuary 2020 (has links)
abstract: In the current photovoltaic (PV) industry, the O&M (operations and maintenance) personnel in the field primarily utilize three approaches to identify the underperforming or defective modules in a string: i) EL (electroluminescence) imaging of all the modules in the string; ii) IR (infrared) thermal imaging of all the modules in the string; and, iii) current-voltage (I-V) curve tracing of all the modules in the string. In the first and second approaches, the EL images are used to detect the modules with broken cells, and the IR images are used to detect the modules with hotspot cells, respectively. These two methods may identify the modules with defective cells only semi-qualitatively, but not accurately and quantitatively. The third method, I-V curve tracing, is a quantitative method to identify the underperforming modules in a string, but it is an extremely time consuming, labor-intensive, and highly ambient conditions dependent method. Since the I-V curves of individual modules in a string are obtained by disconnecting them individually at different irradiance levels, module operating temperatures, angle of incidences (AOI) and air-masses/spectra, all these measured curves are required to be translated to a single reporting condition (SRC) of a single irradiance, single temperature, single AOI and single spectrum. These translations are not only time consuming but are also prone to inaccuracy due to inherent issues in the translation models. Therefore, the current challenges in using the traditional I-V tracers are related to: i) obtaining I-V curves simultaneously of all the modules and substrings in a string at a single irradiance, operating temperature, irradiance spectrum and angle of incidence due to changing weather parameters and sun positions during the measurements, ii) safety of field personnel when disconnecting and reconnecting of cables in high voltage systems (especially field aged connectors), and iii) enormous time and hardship for the test personnel in harsh outdoor climatic conditions. In this thesis work, a non-contact I-V (NCIV) curve tracing tool has been integrated and implemented to address the above mentioned three challenges of the traditional I-V tracers.
This work compares I-V curves obtained using a traditional I-V curve tracer with the I-V curves obtained using a NCIV curve tracer for the string, substring and individual modules of crystalline silicon (c-Si) and cadmium telluride (CdTe) technologies. The NCIV curve tracer equipment used in this study was integrated using three commercially available components: non-contact voltmeters (NCV) with voltage probes to measure the voltages of substrings/modules in a string, a hall sensor to measure the string current and a DAS (data acquisition system) for simultaneous collection of the voltage data obtained from the NCVs and the current data obtained from the hall sensor. This study demonstrates the concept and accuracy of the NCIV curve tracer by comparing the I-V curves obtained using a traditional capacitor-based tracer and the NCIV curve tracer in a three-module string of c-Si modules and of CdTe modules under natural sunlight with uniform light conditions on all the modules in the string and with partially shading one or more of the modules in the string to simulate and quantitatively detect the underperforming module(s) in a string. / Dissertation/Thesis / Masters Thesis Engineering 2020
|
5 |
Wiederholungen in TextenGolcher, Felix 16 December 2013 (has links)
Diese Arbeit untersucht vollständige Zeichenkettenfrequenzverteilungen natürlichsprachiger Texte auf ihren linguistischen und anwendungsbezogenen Gehalt. Im ersten Teil wird auf dieser Datengrundlage ein unüberwachtes Lernverfahren entwickelt, das Texte in Morpheme zerlegt. Die Zerlegung geht von der Satzebene aus und verwendet jegliche vorhandene Kontextinformation. Es ergibt sich ein sprachunabhängiger Algorithmus, der die gefundenen Morpheme teilweise zu Baumstrukturen zusammenordnet. Die Evaluation der Ergebnisse mit Hilfe statistischer Modelle ermöglicht die Identifizierung auch kleiner Performanzunterschiede. Diese sind einer linguistischen Interpretation zugänglich. Der zweite Teil der Arbeit besteht aus stilometrischen Untersuchungen anhand eines Textähnlichkeitsmaßes, das ebenfalls auf vollständigen Zeichenkettenfrequenzen beruht. Das Textähnlichkeitsmaß wird in verschiedenen Varianten definiert und anhand vielfältiger stilometrischer Fragestellungen und auf Grundlage unterschiedlicher Korpora ausgewertet. Dabei ist ein wiederholter Vergleich mit der Performanz bisheriger Forschungsansäzte möglich. Die Performanz moderner Maschinenlernverfahren kann mit dem hier vorgestellten konzeptuell einfacheren Verfahren reproduziert werden. Während die Segmentierung in Morpheme ein lokaler Vorgang ist, besteht Stilometrie im globalen Vergleich von Texten. Daher bietet die Untersuchung dieser zwei unverbunden scheinenden Fragestellungen sich gegenseitig ergänzende Perspektiven auf die untersuchten Häufigkeitsdaten. Darüber hinaus zeigt die Diskussion der rezipierten Literatur zu beiden Themen ihre Verbindungen durch verwandte Konzepte und Denkansätze auf. Aus der Gesamtheit der empirischen Untersuchungen zu beiden Fragestellungen kann abgeleitet werden, dass den längeren und damit selteneren Zeichenketten wesentlich mehr Informationsgehalt innewohnt, als in der bisherigen Forschung gemeinhin angenommen wird. / This thesis investigates the linguistic and application specific content of complete character substring frequency distributions of natural language texts. The first part develops on this basis an unsupervised learning algorithm for segmenting text into morphemes. The segmentation starts from the sentence level and uses all available context information. The result is a language independent algorithm which arranges the found morphemes partly into tree like structures. The evaluation of the output using advanced statistical modelling allows for identifying even very small performance differences. These are accessible to linguistic interpretation. The second part of the thesis consists of stylometric investigations by means of a text similarity measure also rooted in complete substring frequency statistics. The similarity measure is defined in different variants and evaluated for various stylometric tasks and on the basis of diverse corpora. In most of the case studies the presented method can be compared with publicly available performance figures of previous research. The high performance of modern machine learning methods is reproduced by the considerably simpler algorithm developed in this thesis. While the segmentation into morphemes is a local process, stylometry consists in the global comparison of texts. For this reason investigating of these two seemingly unconnected problems offers complementary perspectives on the explored frequency data. The discussion of the recieved litarature concerning both subjects additionally shows their connectedness by related concepts and approaches. It can be deduced from the totality of the empirical studies on text segmentation and stylometry conducted in this thesis that the long and rare character sequences contain considerably more information then assumed in previous research.
|
6 |
Valorisation des analogies lexicales entre l'anglais et les langues romanes : étude prospective pour un dispositif plurilingue d'apprentissage du FLE dans le domaine de la santé / Emphasising lexical analogies between English and Romance languages : prospective study towards a plurilingual learning device of French for healthcareGilles, Fabrice 29 September 2017 (has links)
Cette étude lexicologique prospective s'inscrit dans la didactique des L3. L’objectif est d’élaborer un interlexique anglais-espagnol-français-italien-portugais composé des adjectifs, noms et verbes anglais fréquents dans les écrits scientifiques de la santé, et de leurs équivalents de traduction analogues en espagnol, français, italien et portugais. Deux mots sont analogues s’ils ont le même sens et une forme similaire.Les rapports entre les concepts d'analogie, de similarité et d'identité sont examinés, les types d'analogies intralinguistiques et interlinguistiques illustrés et les principales analogies et dissemblances entre l’anglais, le français et les langues romanes exposées. L'existence de celles-ci est justifiée par les origines indoeuropéennes et surtout d'intenses contacts de langues. Après avoir rappelé l’importance de l’analogie dans l’apprentissage, nous montrons le lien entre notre recherche et deux types d’approches didactiques des langues : l'intercompréhension, qui développe la compréhension de langues voisines, et les approches sur corpus qui permettent de mieux connaitre et faire connaitre la phraséologie scientifique.Les 2000 lemmes anglais les plus fréquents ont été extraits du corpus scientifique anglais de ScienText, leurs 2208 acceptions fréquentes délimitées sur la base du profil combinatoire et triées en deux catégories sémantiques : lexique de spécialité et lexique scientifique transdisciplinaire. Les lemmes anglais ont été traduits dans les quatre langues romanes, et la similarité mesurée en fonction de la sous-chaine maximale commune (SMC).L’interlexique contient 47 % des acceptions fréquentes. Par couples de langues, l’analogie est encore plus élevée : anglais – français, 66 %, anglais-italien, 65 %, anglais-espagnol, 63 %, anglais-portugais, 58 %. Ce lexique analogue pourrait donc servir comme base de transfert dans des activités de FLE L3 pour des professionnels de la santé, et l’anglais L2 semble être une passerelle possible vers les langues romanes. Des activités plurilingues sont construites sur des concordances extraites des corpus multilingues alignés EMEA et Europarl. Un questionnement métalinguistique en anglais sensibilise à des traits (morpho)syntaxiques du français ; les analogies des deux langues sont systématiquement mises en relief, et dans les cas d'opacité, celles des autres langues romanes avec l’anglais. / This prospective lexicological investigation belongs to the field of L3 French didactics. The purpose is to elaborate a French-Italian-Portuguese-Spanish interlexicon out of the frequent adjectives, nouns and verbs of the healthcare scientific writings, and their analogue translation equivalents in French, Italian, Portuguese and Spanish. Two words are analogue if they have the same meaning and a similar form.Related concepts of analogy, similarity and identity are discussed, types of intralinguistic and cross-linguistic analogies reviewed, and the main analogies and differences between English, French and Romance languages detailed. Their many analogies are justified by Indo-European origins and mostly by intense language contacts. Once the importance of analogy in learning procedures has been highlighted, we show how this research and two types of didactic approaches connect together: intercomprehension, which develops comprehension skills in neighbor languages, and corpus approaches which enable to get a closer insight into scientific phraseology.The 2000 most frequent English lemmas were extracted from the ScienText English scientific corpus, their 2208 frequent acceptions explored from their combinatory profile and sorted out in two semantic categories: healthcare subject-specific vocabulary and science specific trans-disciplinary vocabulary. The English lemmas were translated into the four Romance languages, and similarity measurements were carried out with the longest common substring method.The interlexicon contains 47% of the frequent acceptions. Analogy is even higher by language pairs: English – French, 66%, English – Italian, 65%, English - Spanish, 63%, English – Portuguese, 58%. Consequently, this analogue vocabulary could form a transfer basis in learning activities of L3 French for health care providers, and L2 English seems to be a possible bridge language toward Romance languages. Plurilingual activities are built on concordances extracted from multilingual aligned corpora (EMEA, Europarl). Metalinguistic questions in English point out (morpho)syntactic features of French; the analogies between both languages are systematically enhanced, and in case of lexical opacity, those between English and the other Romance languages.
|
Page generated in 0.0599 seconds