51 |
Exploring State-of-the-Art Natural Language Processing Models with Regards to Matching Job Adverts and ResumesRückert, Lise, Sjögren, Henry January 2022 (has links)
The ability to automate the process of comparing and matching resumes with job adverts is a growing research field. This can be done through the use of the machine learning area Natural Language Processing (NLP), which enables a model to learn human language. This thesis explores and evaluates the application of the state-of-the-art NLP model, SBERT, on the task of comparing and calculating a measure of similarity between extracted text from resumes and adverts. This thesis also investigates what type of data that generates the best performing model on said task. The results show that SBERT quickly can be trained on unlabeled data from the HR domain with the usage of a Triplet network, and achieves high performance and good results when tested on various tasks. The models are shown to be bilingual, can tackle unseen vocabulary and understand the concept and descriptive context of entire sentences instead of solely single words. Thus, the conclusion is that the models have a neat understanding of semantic similarity and relatedness. However, in some cases the models are also shown to become binary in their calculations of similarity between inputs. Moreover, it is hard to tune a model that is exhaustively comprehensive of such diverse domain such as HR. A model fine-tuned on clean and generic data extracted from adverts shows the overall best performance in terms of loss and consistency.
|
52 |
Increasing CNN representational power using absolute cosine value regularizationSingleton, William S. 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The Convolutional Neural Network (CNN) is a mathematical model designed to distill input information into a more useful representation. This distillation process removes information over time through a series of dimensionality reductions, which ultimately, grant the model the ability to resist noise, and generalize effectively. However, CNNs often contain elements that are ineffective at contributing towards useful representations. This Thesis aims at providing a remedy for this problem by introducing Absolute Cosine Value Regularization (ACVR). This is a regularization technique hypothesized to increase the representational power of CNNs by using a Gradient Descent Orthogonalization algorithm to force the vectors that constitute their filters at any given convolutional layer to occupy unique positions in in their respective spaces. This method should in theory, lead to a more effective balance between information loss and representational power, ultimately, increasing network performance. The following Thesis proposes and examines the mathematics and intuition behind ACVR, and goes on to propose Dynamic-ACVR (D-ACVR). This Thesis also proposes and examines the effects of ACVR on the filters of a low-dimensional CNN, as well as the effects of ACVR and D-ACVR on traditional Convolutional filters in VGG-19. Finally, this Thesis proposes and examines regularization of the Pointwise filters in MobileNetv1.
|
53 |
Design and analysis of Discrete Cosine Transform-based watermarking algorithms for digital images. Development and evaluation of blind Discrete Cosine Transform-based watermarking algorithms for copyright protection of digital images using handwritten signatures and mobile phone numbers.Al-Gindy, Ahmed M.N. January 2011 (has links)
This thesis deals with the development and evaluation of blind discrete cosine transform-based watermarking algorithms for copyright protection of digital still images using handwritten signatures and mobile phone numbers. The new algorithms take into account the perceptual capacity of each low frequency coefficients inside the Discrete Cosine Transform (DCT) blocks before embedding the watermark information. They are suitable for grey-scale and colour images. Handwritten signatures are used instead of pseudo random numbers. The watermark is inserted in the green channel of the RGB colour images and the luminance channel of the YCrCb images. Mobile phone numbers are used as watermarks for images captured by mobile phone cameras. The information is embedded multiple-times and a shuffling scheme is applied to ensure that no spatial correlation exists between the original host image and the multiple watermark copies. Multiple embedding will increase the robustness of the watermark against attacks since each watermark will be individually reconstructed and verified before applying an averaging process. The averaging process has managed to reduce the amount of errors of the extracted information. The developed watermarking methods are shown to be robust against JPEG compression, removal attack, additive noise, cropping, scaling, small degrees of rotation, affine, contrast enhancements, low-pass, median filtering and Stirmark attacks. The algorithms have been examined using a library of approximately 40 colour images of size 512 512 with 24 bits per pixel and their grey-scale versions. Several evaluation techniques were used in the experiment with different watermarking strengths and different signature sizes. These include the peak signal to noise ratio, normalized correlation and structural similarity index measurements. The performance of the proposed algorithms has been compared to other algorithms and better invisibility qualities with stronger robustness have been achieved.
|
54 |
Matrix Approximation And Image CompressionPadavana, Isabella R 01 June 2024 (has links) (PDF)
This thesis concerns the mathematics and application of various methods for approximating matrices, with a particular eye towards the role that such methods play in image compression. An image is stored as a matrix of values with each entry containing a value recording the intensity of a corresponding pixel, so image compression is essentially equivalent to matrix approximation. First, we look at the singular value decomposition, one of the central tools for analyzing a matrix. We show that, in a sense, the singular value decomposition is the best low-rank approximation of any matrix. However, the singular value decomposition has some serious shortcomings as an approximation method in the context of digital images. The second method we consider is the discrete Fourier transform, which does not require the storage of basis vectors (unlike the SVD). We describe the fast Fourier transform, which is a remarkably efficient method for computing the discrete cosine transform, and how we can use this method to reduce the information in a matrix. Finally, we look at the discrete cosine transform, which reduces the complexity of the calculation further by restricting to a real basis. We also look at how we can apply a filter to adjust the relative importance of the data encoded by the discrete cosine transform prior to compression. In addition, we developed code implementing the ideas explored in the thesis and demonstrating examples.
|
55 |
Video extraction for fast content access to MPEG compressed videosJiang, Jianmin, Weng, Y. 09 June 2009 (has links)
No / As existing video processing technology is primarily
developed in the pixel domain yet digital video is stored in compressed
format, any application of those techniques to compressed
videos would require decompression. For discrete cosine transform
(DCT)-based MPEG compressed videos, the computing cost of
standard row-by-row and column-by-column inverse DCT (IDCT)
transforms for a block of 8 8 elements requires 4096 multiplications
and 4032 additions, although practical implementation only
requires 1024 multiplications and 896 additions. In this paper, we
propose a new algorithm to extract videos directly from MPEG
compressed domain (DCT domain) without full IDCT, which is
described in three extraction schemes: 1) video extraction in 2 2
blocks with four coefficients; 2) video extraction in 4 4 blocks
with four DCT coefficients; and 3) video extraction in 4 4 blocks
with nine DCT coefficients. The computing cost incurred only
requires 8 additions and no multiplication for the first scheme,
2 multiplication and 28 additions for the second scheme, and
47 additions (no multiplication) for the third scheme. Extensive
experiments were carried out, and the results reveal that: 1) the
extracted video maintains competitive quality in terms of visual
perception and inspection and 2) the extracted videos preserve the
content well in comparison with those fully decompressed ones
in terms of histogram measurement. As a result, the proposed
algorithm will provide useful tools in bridging the gap between
pixel domain and compressed domain to facilitate content analysis
with low latency and high efficiency such as those applications in
surveillance videos, interactive multimedia, and image processing.
|
56 |
BANDWIDTH LIMITED 320 MBPS TRANSMITTERAnderson, Christopher 10 1900 (has links)
International Telemetering Conference Proceedings / October 28-31, 1996 / Town and Country Hotel and Convention Center, San Diego, California / With every new spacecraft that is designed comes a greater density of information that will
be stored once it is in operation. This, coupled with the desire to reduce the number of
ground stations needed to download this information from the spacecraft, places new
requirements on telemetry transmitters. These new transmitters must be capable of data
rates of 320 Mbps and beyond.
Although the necessary bandwidth is available for some non-bandwidth-limited
transmissions in Ka-Band and above, many systems will continue to rely on more narrow
allocations down to X-Band. These systems will require filtering of the modulation to meet
spectral limits. The usual requirements of this filtering also include that it not introduce
high levels of inter-symbol interference (ISI) to the transmission.
These constraints have been addressed at CE by implementing a DSP technique that pre-filters
a QPSK symbol set to achieve bandwidth-limited 320 Mbps operation. This
implementation operates within the speed range of the radiation-hardened digital
technologies that are currently available and consumes less power than the traditional high-speed
FIR techniques.
|
57 |
CFD investigation of the atmospheric boundary layer under different thermal stability conditionsPieterse, Jacobus Erasmus 03 1900 (has links)
Thesis (MScEng)--Stellenbosch University, 2013. / ENGLISH ABSTRACT: An accurate description of the atmospheric boundary layer (ABL) is a prerequisite
for computational fluid dynamic (CFD) wind studies. This includes taking into
account the thermal stability of the atmosphere, which can be stable, neutral or
unstable, depending on the nature of the surface fluxes of momentum and heat.
The diurnal variation between stable and unstable conditions in the Namib Desert
interdune was measured and quantified using the wind velocity and temperature
profiles that describe the thermally stratified atmosphere, as derived by Monin-
Obukhov similarity theory. The implementation of this thermally stratified
atmosphere into CFD has been examined in this study by using Reynoldsaveraged
Navier-Stokes (RANS) turbulence models. The maintenance of the
temperature, velocity and turbulence profiles along an extensive computational
domain length was required, while simultaneously allowing for full variation in
pressure and density through the ideal gas law. This included the implementation
of zero heat transfer from the surface, through the boundary layer, under neutral
conditions so that the adiabatic lapse rate could be sustained. Buoyancy effects
were included by adding weight to the fluid, leading to the emergence of the
hydrostatic pressure field and the resultant density changes expected in the real
atmosphere. The CFD model was validated against measured data, from literature,
for the flow over a cosine hill in a wind tunnel. The standard k-ε and SST k-ω
turbulence models, modified for gravity effects, represented the data most
accurately. The flow over an idealised transverse dune immersed in the thermally
stratified ABL was also investigated. It was found that the flow recovery was
enhanced and re-attachment occurred earlier in unstable conditions, while flow
recovery and re-attachment took longer in stable conditions. It was also found that
flow acceleration over the crest of the dune was greater under unstable conditions.
The effect of the dune on the flow higher up in the atmosphere was also felt at
much higher distances for unstable conditions, through enhanced vertical
velocities. Under stable conditions, vertical velocities were reduced, and the
influence on the flow higher up in the atmosphere was much less than for unstable
or neutral conditions. This showed that the assumption of neutral conditions could
lead to an incomplete picture of the flow conditions that influence any particular case of interest. / AFRIKAANSE OPSOMMING: 'n Akkurate beskrywing van die atmosferiese grenslaag (ABL) is 'n voorvereiste
vir wind studies met berekenings-vloeimeganika (CFD). Dit sluit in die
inagneming van die termiese stabiliteit van die atmosfeer, wat stabiel, neutraal of
onstabiel kan wees, afhangende van die aard van die oppervlak vloed van
momentum en warmte. Die daaglikse variasie tussen stabiele en onstabiele
toestande in die Namib Woestyn interduin is gemeet en gekwantifiseer deur
gebruik te maak van die wind snelheid en temperatuur profiele wat die termies
gestratifiseerde atmosfeer, soos afgelei deur Monin-Obukhov teorie, beskryf. Die
implementering van hierdie termies gestratifiseerde atmosfeer in CFD is in hierdie
studie aangespreek deur gebruik te maak van RANS turbulensie modelle. Die
handhawing van die temperatuur, snelheid en turbulensie profiele in die lengte
van 'n uitgebreide berekenings domein is nodig, en terselfdertyd moet toegelaat
word vir volledige variasie in die druk en digtheid, deur die ideale gaswet. Dit
sluit in die implementering van zero hitte-oordrag vanaf die grond onder neutrale
toestande sodat die adiabatiese vervaltempo volgehou kan word. Drykrag effekte
is ingesluit deur die toevoeging van gewig na die vloeistof, wat lei tot die
ontwikkeling van die hidrostatiese druk veld, en die gevolglike digtheid
veranderinge, wat in die werklike atmosfeer verwag word. Die CFD-model is
gevalideer teen gemete data, vanaf die literatuur, vir die vloei oor 'n kosinus
heuwel in 'n windtonnel. Die standaard k-ε en SST k-ω turbulensie modelle, met
veranderinge vir swaartekrag effekte, het die data mees akkuraat voorgestel. Die
vloei oor 'n geïdealiseerde transversale duin gedompel in die termies
gestratifiseerde ABL is ook ondersoek. Daar is bevind dat die vloei herstel is
versterk en terug-aanhegging het vroeër plaasgevind in onstabiele toestande,
terwyl vloei herstel en terug-aanhegging langer gevat het in stabiele toestande.
Daar is ook bevind dat vloei versnelling oor die kruin van die duin groter was
onder onstabiele toestande. Die effek van die duin op die vloei hoër op in die
atmosfeer is ook op hoër afstande onder onstabiele toestande gevoel, deur middel
van verhoogte vertikale snelhede. Onder stabiele toestande, is vertikale snelhede
verminder, en die invloed op die vloei hoër op in die atmosfeer was veel minder
as vir onstabiel of neutrale toestande. Dit het getoon dat die aanname van neutrale
toestande kan lei tot 'n onvolledige beeld van die vloei toestande wat 'n invloed op
'n bepaalde geval kan hê.
|
58 |
Sumarizace českých textů z více zdrojů / Multi-source Text Summarization for CzechBrus, Tomáš January 2012 (has links)
This work focuses on the summarization task for a set of articles on the same topic. It discusses several possible ways of summarizations and ways to assess their final quality. The implementation of the described algorithms and their application to selected texts constitutes a part of this work. The input texts come from several Czech news servers and they are represented as deep syntactic trees (the so called tectogrammatical layer).
|
59 |
Word2vec2syn : Synonymidentifiering med Word2vec / Word2vec2syn : Synonym Identification using Word2vecPettersson, Tove January 2019 (has links)
Inom NLP (eng. natural language processing) är synonymidentifiering en av de språkvetenskapliga utmaningarna som många antar. Fodina Language Technology AB är ett företag som skapat ett verktyg, Termograph, ämnad att samla termer inom företag och hålla den interna språkanvändningen konsekvent. En metodkombination bestående av språkteknologiska strategier utgör synonymidentifieringen och Fodina önskar ett större täckningsområde samt mer dynamik i framtagningsprocessen. Därav syftade detta arbete till att ta fram en ny metod, utöver metodkombinationen, för just synonymidentifiering. En färdigtränad Word2vec-modell användes och den inbyggda funktionen för cosinuslikheten användes för att få fram synonymer och skapa kluster. Modellen validerades, testades och utvärderades i förhållande till metodkombinationen. Valideringen visade att modellen skattade inom ett rimligt mänskligt spann i genomsnitt 60,30 % av gångerna och Spearmans korrelation visade på en signifikant stark korrelation. Testningen visade att 32 % av de bearbetade klustren innehöll matchande synonymförslag. Utvärderingen visade att i de fall som förslagen inte matchade så var modellens synonymförslag korrekta i 5,73 % av fallen jämfört med 3,07 % för metodkombinationen. Den interna reliabiliteten för utvärderarna visade på en befintlig men svag enighet, Fleiss Kappa = 0,19, CI(0,06, 0,33). Trots viss osäkerhet i resultaten påvisas ändå möjligheter för vidare användning av word2vec-modeller inom Fodinas synonymidentifiering. / One of the main challenges in the field of natural language processing (NLP) is synonym identification. Fodina Language Technology AB is the company behind the tool, Termograph, that aims to collect terms and provide a consistent language within companies. A combination of multiple methods from the field of language technology constitutes the synonym identification and Fodina would like to improve the area of coverage and increase the dynamics of the working process. The focus of this thesis was therefore to evaluate a new method for synonym identification beyond the already used combination. Initially a trained Word2vec model was used and for the synonym identification the built-in-function for cosine similarity was applied in order to create clusters. The model was validated, tested and evaluated relative to the combination. The validation implicated that the model made estimations within a fair human-based range in an average of 60.30% and Spearmans correlation indicated a strong significant correlation. The testing showed that 32% of the processed synonym clusters contained matching synonym suggestions. The evaluation showed that the synonym suggestions from the model was correct in 5.73% of all cases compared to 3.07% for the combination in the cases where the clusters did not match. The interrater reliability indicated a slight agreement, Fleiss’ Kappa = 0.19, CI(0.06, 0.33). Despite uncertainty in the results, opportunities for further use of Word2vec-models within Fodina’s synonym identification are nevertheless demonstrated.
|
60 |
As funções seno e cosseno: diagnóstico de dificuldades de aprendizagem através de sequências didáticas com diferentes mídiasSouza, Edílson Paiva de 09 December 2010 (has links)
Made available in DSpace on 2016-04-27T16:57:04Z (GMT). No. of bitstreams: 1
Edilson Paiva de Souza.pdf: 3867466 bytes, checksum: 6a09b6b64515cc8958d8385427c33097 (MD5)
Previous issue date: 2010-12-09 / Secretaria da Educação do Estado de São Paulo / This research aims to diagnose the learning difficulties of high school students about the concepts of the trigonometric functions sine and cosine. The research is based on the principles of Didactic Engineering and on the theory of Semiotic Representation Registers created by Raymond Duval. The didactic sequence presented is oriented by analysis of high school textbooks and considers the research works made using the graphical software in teaching and learning process to improve knowledge. The tools used in the application of the sequence were pencil and paper and software Graphmatic. The sequence was applied with second year students of public high school in the capital of São Paulo. The protocols produced by eight teams of students who participated in four sessions allowed the analysis, and led to the conclusion that the use of technology through a process of education provided by a dynamic graphics software provided an increase in knowledge about the concepts of sine and cosine functions / Esta pesquisa tem como objetivo diagnosticar as dificuldades de aprendizagem de alunos do Ensino Médio em relação aos conceitos das funções trigonométricas seno e cosseno. A investigação está fundamentada nos princípios da Engenharia Didática e embasada na Teoria dos Registros de Representação Semiótica de Raymond Duval. A sequência didática apresentada orienta-se nas análises de livros didáticos do Ensino Médio e pesquisas que utilizaram o software gráfico no processo de ensino aprendizagem para melhoria do conhecimento. As ferramentas utilizadas na aplicação da sequência foram o lápis e o papel e o software Graphmatic. A sequência foi aplicada com alunos do segundo ano do Ensino Médio, de uma escola pública da capital de São Paulo. Foram analisados os protocolos de oito duplas que participaram de quatro sessões. Os dados coletados foram analisados e levaram a concluir que a utilização da tecnologia, através de um processo de ensino dinâmico proporcionado pelo software gráfico Graphmatic, propiciou um aumento no conhecimento sobre os conceitos das funções seno e cosseno
|
Page generated in 0.0521 seconds