• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • 3
  • 2
  • Tagged with
  • 11
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

HYPERLINKS IN THE TWITTERVERSE: ANALYZING THE URL USAGE IN SOCIAL MEDIA POSTS

Aljebreen, Abdullah, 0009-0008-1925-818X 05 1900 (has links)
An important means for disseminating information on social media platforms is by including URLs that point to external sources in user posts. In X, formally known as Twitter, we estimate that about 21% of the daily stream of English-language posts contain URLs. Given this prevalence, we assert that studying URLs in social media holds significant importance as they play a pivotal part in shaping the flow of information and influencing user behavior. Examining hyperlinked posts can help us gain valuable insights into online discourse and detect emerging trends. The first aspect of our analysis is the study of users' intentions behind including URLs in social media posts. We argue that gaining insights about the users' motivations for posting with URLs has multiple applications, including the appropriate treatment and processing of these posts in other tasks. Hence, we build a comprehensive taxonomy containing the various intentions behind sharing URLs on social media. In addition, we explore the labeling of intentions via the use of crowdsourcing. In addition to the intentions aspect of hyperlinked posts, we analyze their structure relative to the content of the web documents pointed to by the URLs. Hence, we define, and analyze the segmentation problem of hyperlinked posts and develop an effective algorithm to solve it. We show that our solution can benefit sentiment analysis on social media. In the final aspect of our analysis, we investigate the emergence of news outlets posing as local sources, known as "pink slime", and their spread on social media. We conduct a comprehensive study investigating hyperlinked posts featuring pink slime websites. Through our analysis of the patterns and origins of posts, we discover and extract syntactical features and utilize them for developing a classification approach to detect such posts. Our approach has achieved an accuracy rate of 92.5%. / Computer and Information Science
2

Using Machine Learning to Detect Malicious URLs

Cheng, Aidan 01 January 2017 (has links)
There is a need for better predictive model that reduces the number of malicious URLs being sent through emails. This system should learn from existing metadata about URLs. The ideal solution for this problem would be able to learn from its predictions. For example, if it predicts a URL to be malicious, and that URL is deemed safe by the sandboxing environment, the predictor should refine its model to account for this data. The problem, then, is to construct a model with these characteristics that can make these predictions for the vast number of URLs being processed. Given that the current system does not employ machine learning methods, we intend to investigate multiple such models and summarize which of those might be worth pursuing on a large scale.
3

Uma investigação do uso de características na tetecção de URLs

Bezerra, Maria Azevedo 11 September 2015 (has links)
Submitted by Geyciane Santos (geyciane_thamires@hotmail.com) on 2015-12-02T21:31:19Z No. of bitstreams: 1 Dissertação - Maria Azevedo Bezerra.pdf: 3338616 bytes, checksum: fc58f97452c2e63faf03817434866ec3 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-12-03T19:17:32Z (GMT) No. of bitstreams: 1 Dissertação - Maria Azevedo Bezerra.pdf: 3338616 bytes, checksum: fc58f97452c2e63faf03817434866ec3 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-12-03T19:24:08Z (GMT) No. of bitstreams: 1 Dissertação - Maria Azevedo Bezerra.pdf: 3338616 bytes, checksum: fc58f97452c2e63faf03817434866ec3 (MD5) / Made available in DSpace on 2015-12-03T19:24:08Z (GMT). No. of bitstreams: 1 Dissertação - Maria Azevedo Bezerra.pdf: 3338616 bytes, checksum: fc58f97452c2e63faf03817434866ec3 (MD5) Previous issue date: 2015-09-11 / Não Informada / Malicious URLs have become a channel for criminal activities on the Internet, such as spam and phishing. Current solutions for validation and verification of malicious URLs are considered or are believed to be accurate, with well-adjusted results. However, is it really possible or feasible to obtain 100% of accuracy in these solutions? This work describes a simple and direct investigation of features, bases and URL formats, aiming to show that the results of validation and verification URLs are highly dependent on certain aspects/factors. The idea is to extract URL features (lexical, DNS and others) for obtain the maximum information from the URLs and employ machine learning algorithms to question their influence throughout the process. In order to prove this idea, were created four hypotheses that showed that it is possible to disagree with the results of several studies from the literature. / URLs maliciosas tornaram-se um canal para atividades criminosas na Internet, como spam e phishing. As atuais soluções para validação e verificação de URLs maliciosas se consideram ou são consideradas precisas, com resultados bem ajustados. Contudo, será que realmente é possível ou factível se obter percentuais beirando 100% de precisão nessas soluções? Neste sentido, esta dissertação descreve uma simples e direta investigação de características, bases e formatos de URLs, visando mostrar que os resultados de validação e verificação de URLs são bastante dependentes de certos aspectos/fatores. A ideia é extrair características (léxicas, DNS e outras) que permitam obter o máximo de informação das URLs e empregar algoritmos de aprendizagem de máquina para questionar a influência dessas características em todo o processo. Como forma de provar essa ideia, foram elaboramos quatro hipóteses, que ao final no trabalho, mostraram que é possível discordar do resultado de vários trabalhos já existentes na literatura.
4

CONTEXTUAL DECOMPOSITION OF WEB RESOURCES: APPLYING SEMANTIC GRAPH ANALYSIS TO PERSONAL URL SETS

JOSHI, ABHIJIT PURSHOTTAM January 2003 (has links)
No description available.
5

Intelligent Event Focused Crawling

Farag, Mohamed Magdy Gharib 23 September 2016 (has links)
There is need for an integrated event focused crawling system to collect Web data about key events. When an event occurs, many users try to locate the most up-to-date information about that event. Yet, there is little systematic collecting and archiving anywhere of information about events. We propose intelligent event focused crawling for automatic event tracking and archiving, as well as effective access. We extend the traditional focused (topical) crawling techniques in two directions, modeling and representing: events and webpage source importance. We developed an event model that can capture key event information (topical, spatial, and temporal). We incorporated that model into the focused crawler algorithm. For the focused crawler to leverage the event model in predicting a webpage's relevance, we developed a function that measures the similarity between two event representations, based on textual content. Although the textual content provides a rich set of features, we proposed an additional source of evidence that allows the focused crawler to better estimate the importance of a webpage by considering its website. We estimated webpage source importance by the ratio of number of relevant webpages to non-relevant webpages found during crawling a website. We combined the textual content information and source importance into a single relevance score. For the focused crawler to work well, it needs a diverse set of high quality seed URLs (URLs of relevant webpages that link to other relevant webpages). Although manual curation of seed URLs guarantees quality, it requires exhaustive manual labor. We proposed an automated approach for curating seed URLs using social media content. We leveraged the richness of social media content about events to extract URLs that can be used as seed URLs for further focused crawling. We evaluated our system through four series of experiments, using recent events: Orlando shooting, Ecuador earthquake, Panama papers, California shooting, Brussels attack, Paris attack, and Oregon shooting. In the first experiment series our proposed event model representation, used to predict webpage relevance, outperformed the topic-only approach, showing better results in precision, recall, and F1-score. In the second series, using harvest ratio to measure ability to collect relevant webpages, our event model-based focused crawler outperformed the state-of-the-art focused crawler (best-first search). The third series evaluated the effectiveness of our proposed webpage source importance for collecting more relevant webpages. The focused crawler with webpage source importance managed to collect roughly the same number of relevant webpages as the focused crawler without webpage source importance, but from a smaller set of sources. The fourth series provides guidance to archivists regarding the effectiveness of curating seed URLs from social media content (tweets) using different methods of selection. / Ph. D.
6

O direito ao esquecimento na era digital: desafios da regulação da desvinculação de urls prejudiciais a pessoas naturais nos índices de pesquisa dos buscadores horizontais

Gonçalves, Luciana Helena 15 April 2016 (has links)
Submitted by Luciana Helena Gonçalves (lucianahgoncalves@gmail.com) on 2016-05-13T17:12:59Z No. of bitstreams: 1 Dissertacao_Luciana_Goncalves_versão_final.pdf: 882706 bytes, checksum: a06c2632cc87e1f0188267ebc95be19f (MD5) / Rejected by Letícia Monteiro de Souza (leticia.dsouza@fgv.br), reason: Prezada Luciana, Seu trabalho não segue as normas ABNT. Favor verificar dissertações de seus colegas na biblioteca digital para a comparação. Atenciosamente, Letícia Monteiro 3799-3631 on 2016-05-13T17:20:41Z (GMT) / Submitted by Luciana Helena Gonçalves (lucianahgoncalves@gmail.com) on 2016-05-15T20:47:10Z No. of bitstreams: 1 Dissertacao_Luciana_Goncalves_final..pdf: 880344 bytes, checksum: 78bced6f00281444c0a2eee60600136e (MD5) / Approved for entry into archive by Letícia Monteiro de Souza (leticia.dsouza@fgv.br) on 2016-05-16T12:31:45Z (GMT) No. of bitstreams: 1 Dissertacao_Luciana_Goncalves_final..pdf: 880344 bytes, checksum: 78bced6f00281444c0a2eee60600136e (MD5) / Made available in DSpace on 2016-05-16T12:43:58Z (GMT). No. of bitstreams: 1 Dissertacao_Luciana_Goncalves_final..pdf: 880344 bytes, checksum: 78bced6f00281444c0a2eee60600136e (MD5) Previous issue date: 2016-04-15 / In the 'Recurso Especial' regarding to the petition filed with the court by the broadcasting presenter, Xuxa Meneghel, in order to compel Google Search to delist the results related to the expression 'Xuxa pedophile' or to any other term which would link her name to this criminal act from its search engine index, the Reporting Judge of this decision, Nancy Andrighi, has defined accurately the controversy of this dissertation: the daily life of thousands of people depends nowadays on the information which is on the web, and that would be not easily found without the use of the database provided by search engines. On the other hand, these search engines can be used to locate web pages with information, URLs which are results of the search under people’s names. In this way, what can be done? A right to be forgotten, in other words, the possibility of requiring the delisting of an URL from the search engine’s index which is a result of a search using the name of a person could really exist? There are people who affirm that the most appropriate measure to deal with this problem would be to reach out to the person who uploaded the content on the web. There are also people who defend that a right to be forgotten protection would represent a big threat to the freedom of expression. Before this context, this dissertation aims at establishing which could be the characteristics and limits of the right to be forgotten in the digital era, by taking into account the current condition of the Brazilian legal system in respect of this topic. In this way, this right will be confronted with other rights and public and private interests (specially the right to freedom of expression and the right to information) and also consider the characteristics of the operation of the global computer network. By remembering of the importance of the electronic search engines in the exercise of the access to information and, moreover, the difficulties which are related to the delisting of the information from all the sites in which it had been published, our dissertation will focus on the potential – and on the difficulties – of using the regulation of these mechanisms of search for the effective protection of the right to be forgotten in the digital era. / No julgamento do recurso especial referente à ação ajuizada pela apresentadora Xuxa Meneghel para compelir o Google Search a desvincular dos seus índices de busca os resultados relativos à pesquisa sobre a expressão 'Xuxa pedófila' ou qualquer outra que associasse o nome da autora a esta prática criminosa, a relatora da decisão, a Ministra Nancy Andrighi, definiu de maneira clara a controvérsia de que cuida este trabalho: o cotidiano de milhares de pessoas depende atualmente de informações que estão na web, e que dificilmente seriam encontradas sem a utilização das ferramentas de pesquisas oferecidas pelos sites de busca. Por outro lado, esses mesmos buscadores horizontais podem ser usados para a localização de páginas com informações, URLs prejudiciais resultantes da busca com o nome das pessoas. Diante disso, o que fazer? Existiria realmente um direito de ser esquecido, isto é, de ter uma URL resultante de uma pesquisa sobre o nome de uma pessoa desvinculado do índice de pesquisa do buscador horizontal? Há quem afirme que a medida mais apropriada para lidar com esse problema seria ir atrás do terceiro que publicou essa informação originariamente na web. Há também quem defenda que a proteção de um direito de ser esquecido representaria uma ameaça grande demais para a liberdade de expressão e de informação. Diante deste quadro, esta dissertação visa a estabelecer quais podem ser as características e os limites do direito ao esquecimento na era digital, de acordo com o estado atual da legislação brasileira a respeito, confrontando-se tal direito com outros direitos e interesses públicos e privados (especialmente o direito à liberdade de expressão e à informação) e levando em conta as características de funcionamento da própria rede mundial de computadores, em especial das ferramentas de buscas. Tendo em vista a importância dos buscadores horizontais no exercício do acesso à informação e, além disso, as dificuldades relacionadas à retirada de URLs de todos os sítios em que tenham sido publicadas, nossa pesquisa focará no potencial – e nas dificuldades – de se empregar a regulação de tais ferramentas de busca para a proteção eficaz do direito ao esquecimento na era digital.
7

Removing DUST using multiple alignment of sequences

Rodrigues, Kaio Wagner Lima, 92991221146 21 September 2016 (has links)
Submitted by Kaio Wagner Lima Rodrigues (kaiowagner@gmail.com) on 2018-08-23T05:45:00Z No. of bitstreams: 3 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) kaio-tese.pdf: 3615178 bytes, checksum: dc547b203670c1159f46136e021a4825 (MD5) kaio-folha-de-aprovacao.jpg: 3343904 bytes, checksum: b00e5c4807f5a7e10eddc2eed2de5f12 (MD5) / Approved for entry into archive by Secretaria PPGI (secretariappgi@icomp.ufam.edu.br) on 2018-08-23T19:08:57Z (GMT) No. of bitstreams: 3 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) kaio-tese.pdf: 3615178 bytes, checksum: dc547b203670c1159f46136e021a4825 (MD5) kaio-folha-de-aprovacao.jpg: 3343904 bytes, checksum: b00e5c4807f5a7e10eddc2eed2de5f12 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2018-08-24T13:43:58Z (GMT) No. of bitstreams: 3 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) kaio-tese.pdf: 3615178 bytes, checksum: dc547b203670c1159f46136e021a4825 (MD5) kaio-folha-de-aprovacao.jpg: 3343904 bytes, checksum: b00e5c4807f5a7e10eddc2eed2de5f12 (MD5) / Made available in DSpace on 2018-08-24T13:43:58Z (GMT). No. of bitstreams: 3 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) kaio-tese.pdf: 3615178 bytes, checksum: dc547b203670c1159f46136e021a4825 (MD5) kaio-folha-de-aprovacao.jpg: 3343904 bytes, checksum: b00e5c4807f5a7e10eddc2eed2de5f12 (MD5) Previous issue date: 2016-09-21 / FAPEAM - Fundação de Amparo à Pesquisa do Estado do Amazonas / A large number of URLs collected by web crawlers correspond to pages with duplicate or near-duplicate contents. These duplicate URLs, generically known as DUST (Different URLs with Similar Text), adversely impact search engines since crawling, storing and using such data imply waste of resources, the building of low quality rankings and poor user experiences. To deal with this problem, several studies have been proposed to detect and remove duplicate documents without fetching their contents. To accomplish this, the proposed methods learn normalization rules to transform all duplicate URLs into the same canonical form. This information can be used by crawlers to avoid fetching DUST. A challenging aspect of this strategy is to efficiently derive the minimum set of rules that achieve larger reductions with the smallest false positive rate. As most methods are based on pairwise analysis, the quality of the rules is affected by the criterion used to select the examples and the availability of representative examples in the training sets. To avoid processing large numbers of URLs, they employ techniques such as random sampling or by looking for DUST only within sites, preventing the generation of rules involving multiple DNS names. As a consequence of these issues, current methods are very susceptible to noise and, in many cases, derive rules that are very specific. In this thesis, we present a new approach to derive quality rules that take advantage of a multi-sequence alignment strategy. We demonstrate that a full multi-sequence alignment of URLs with duplicated content, before the generation of the rules, can lead to the deployment of very effective rules. Experimental results demonstrate that our approach achieved larger reductions in the number of duplicate URLs than our best baseline in two different web collections, in spite of being much faster. We also present a distributed version of our method, using the MapReduce framework, and demonstrate its scalability by evaluating it using a set of 7.37 million URLs. / Um grande número de URLs obtidas por coletores corresponde a páginas com conteúdo duplicado ou quase duplicado, conhecidas em Inglês pelo acrônimo DUST, que pode ser traduzido como Diferentes URLs com Texto Similar. DUST são prejudiciais para sistemas de busca porque ao serem coletadas, armazenadas e utilizadas, contribuem para o desperdício de recursos, a criação de rankings de baixa qualidade e, consequentemente, uma experiência pior para o usuário. Para lidar com este problema, muita pesquisa tem sido realizada com intuito de detectar e remover DUST antes mesmo de coletar as URLs. Para isso, esses métodos se baseiam no aprendizado de regras de normalização que transformam todas as URLs com conteúdo duplicado para uma mesma forma canônica. Tais regras podem ser então usadas por coletores com o intuito de reconhecer e ignorar DUST. Para isto, é necessário derivar, de forma eficiente, um conjunto mínimo de regras que alcance uma grande taxa de redução com baixa incidência de falsos-positivos. Como a maioria dos métodos propostos na literatura é baseada na análise de pares, a qualidade das regras é afetada pelo critério usado para selecionar os exemplos de pares e a disponibilidade de exemplos representativos no treino. Para evitar processar um número muito alto de exemplos, em geral, são aplicadas técnicas de amostragem ou a busca por DUST é limitada apenas a sites, o que impede a geração de regras que envolvam diferentes nomes de DNS. Como consequência, métodos atuais são muito suscetíveis a ruído e, em muitos casos, derivam regras muito específicas. Nesta tese, é proposta uma nova técnica para derivar regras, baseada em uma estratégia de alinhamento múltiplo de sequências. Em particular, mostramos que um alinhamento prévio das URLs com conteúdo duplicado contribui para uma melhor generalização, o que resulta na geração de regras mais efetivas. Através de experimentos em duas diferentes coleções extraídas da Web, observa-se que a técnica proposta, além de ser mais rápida, filtra um número maior de URLs duplicadas. Uma versão distribuída do método, baseada na arquitetura MapReduce, proporciona a possibilidade de escalabilidade para coleções com dimensões compatíveis com a Web.
8

Ultrafast Raman Loss Spectroscopy (URLS)

Mallick, Babita 08 1900 (has links) (PDF)
Contemporary laser research involves the development of spectroscopic techniques to understand the microscopic structural aspects of a simple molecular system in chemical and materials to more complex biological systems such as cells. In particular, Raman spectroscopy, which provides bond specific information, has attracted considerable attention. Further with the advent of femtosecond (fs) laser, the recent trend in the field of fs chemistry is to develop nonlinear Raman techniques that allow one to acquire vibrational structural information with both fs temporal resolution as well as good spectral resolution. Among many advanced nonlinear Raman techniques, the development of fs Stimulated Raman scattering (SRS) has gathered momentum in the recent decade due to its ability to (1) provide vibrational structural information of various system including fluorescent molecules with good signal to noise ratio and (2) circumvent the limitation imposed on the spectral resolution by the necessary pulse durations according to the energy-time uncertainty principle where ‘K’ is a constant that depends on the pulse shape) unlike in the case of fs normal resonance Raman spectroscopy. We have developed a technique named “Ultrafast Raman loss spectroscopy (URLS)” that is analogues to SRS, but is more advantageous as compared to SRS and has the potential to be an alternative if not competitive tool as a vibrational structure elucidating technique. The concept and the design of this novel technique, URLS, form the core of the thesis entitled “Ultrafast Raman Loss Spectroscopy (URLS)”. Chapter 1 lays the theoretical groundwork for ultra-short pulses and nonlinear spectroscopy which forms the heart of URLS. It presents a detailed discussion on the basis behind the elementary experimental problems associated with the ultra-short laser pulses when they travel through a medium, the characterization of these ultrashort pulses as well as various non-linear phenomena induced within a medium due to the propagation of these pulses. Chapter 2 focuses on the concept of SRS which resulted into the foundation of URLS. It illustrates the theoretical as well as the experimental aspects of SRS and demonstrates the sensitivity of SRS over normal Raman spectroscopy. Chapter 3 introduces the conceptual and the technical basis which ensued into the development of URLS while Chapter 4 demonstrates its application and efficiency over its analogue SRS. URLS involves the interaction of two laser sources, viz. a picosecond (ps) pulse and a fs white light (WL), with a sample leading to the generation of loss signal on the higher energy (blue) side with respect to the wavelength of the ps pulse unlike the gain signal observed on the lower energy (red) side in SRS. These loss signals are at least 1.5 times more intense than SRS signals. Also, the very prerequisite of the experimental protocol for signal detection to be on the higher energy side by design eliminates the interference from fluorescence, which appears on the red side. Thus, the rapid data acquisition, 100% natural fluorescence rejection and experimental ease ascertain “Ultrafast Raman Loss Spectroscopy (URLS)” as a unique valuable structure determining technique. Further, the effect of resonance on the line shape of the URLS signal has been studied which forms the subject of discussion in Chapter 5. The objective of the study is to verify whether the variation of resonance Raman line shapes in URLS could provide an understanding of the mode specific response on ultrafast excitation. It is found that the URLS signal’s line shape is mode dependent and can provide information similar to Raman excitation profile (REP) in the normal Raman studies. This information can have impact on the study of various dynamical process involving vibrational modes like structural dynamics and coherent control. Chapter 6 demonstrates the application of URLS as a structure elucidating technique for monitoring ultrafast structural and reaction dynamics in both chemical and biological systems using α-terthiophene (3T) as the model system. The objective is to understand the mechanism of the molecular structure dependent electronic relaxation of the first singlet excited state, S1, of α-terthiophene using fs URLS. The URLS data along with the ab-initio calculations indicate that the electronic transition is associated with a structural rearrangement from a non-planar to a planar configuration in the singlet manifold along the ring deformation co-ordinate. The experimental findings suggest that the singlet state decays exponentially with a decay time constant ( 1/e) of about 145 ps and this decay could be assigned to the intersystem crossing (ISC) pathway from the relaxed S1 state to the vibrationally hot triplet state, T1*. Lastly, Chapter 7 summarizes the entire thesis and presents some possible future prospects for URLS. Considering the advantages of URLS, it is proposed that URLS can be exploited [1] to determine the structure of any fluorescent/non-florescent condensed materials and biological systems with a very good spectral resolution (10- 40 cm-1); [2] to obtain the vibrational signature of weak Raman scattering molecules and vibrational modes with relatively small Raman cross-section owing to its high detection sensitivity with good signal to noise ratio; [3] for performing fs time-resolved study by introducing an additional fs pulse for photo-excitation of the molecule and using URLS to probe the excited state dynamics with good temporal (fs) and spectral (10-40 cm-1) resolution; and lastly, [4] the high chemical selectivity of URLS and the fact that the signal is generated only within the focal volume of the lasers where all the beams overlap can be utilized for developing this method into a microscopy for labeled-free effective vibrational study of biological samples. Consequently, it is hoped that this technique, “Ultrafast Raman Loss Spectroscopy (URLS)”, would be a suitable alternative to other nonlinear Raman methods like coherent anti-Stokes Raman spectroscopy (CARS) that has made major inroads into biology, medicine and materials.
9

A MACHINE LEARNING BASED WEB SERVICE FOR MALICIOUS URL DETECTION IN A BROWSER

Hafiz Muhammad Junaid Khan (8119418) 12 December 2019 (has links)
Malicious URLs pose serious cyber-security threats to the Internet users. It is critical to detect malicious URLs so that they could be blocked from user access. In the past few years, several techniques have been proposed to differentiate malicious URLs from benign ones with the help of machine learning. Machine learning algorithms learn trends and patterns in a data-set and use them to identify any anomalies. In this work, we attempt to find generic features for detecting malicious URLs by analyzing two publicly available malicious URL data-sets. In order to achieve this task, we identify a list of substantial features that can be used to classify all types of malicious URLs. Then, we select the most significant lexical features by using Chi-Square and ANOVA based statistical tests. The effectiveness of these feature sets is then tested by using a combination of single and ensemble machine learning algorithms. We build a machine learning based real-time malicious URL detection system as a web service to detect malicious URLs in a browser. We implement a chrome extension that intercepts a browser’s URL requests and sends them to web service for analysis. We implement the web service as well that classifies a URL as benign or malicious using the saved ML model. We also evaluate the performance of our web service to test whether the service is scalable.
10

Ultrafast Raman Loss Spectroscopic Investigations of Excited State Structural Dynamics of Bis(phenylethynyl)benzene and trans-Stilbene

Mallick, Babita January 2017 (has links) (PDF)
The subject of this thesis is the design and development of a unified set up for femtosecond transient absorption and ultrafast Raman loss spectroscopy and demonstrate its potential in capturing the ultrafast photophysical and photochemical processes with excellent time and frequency resolution. Ultrafast spectroscopy has been serving as a powerful tool for understanding the structural dynamical properties of molecules in the condensed and gas phase. The advent of ultrashort pulses with their high peak power enables the laser spectroscopic community to study molecular reaction dynamics and photophysics that happen at extremely short timescales, ranging from picosecond to femtosecond. These processes can be measured with extremely high time resolution, which helps to resolve the under-lying molecular process. But in order to understand the global mechanism of the underlying molecular processes, we have to resolve the nuclear dynamics with the proper frequency resolution. However, achieving both, time and frequency resolutions simultaneously is not possible according to the Heisenberg uncertainty principle. Later, this limitation was overcome by femtosecond stimulated Raman spectroscopy (FSRS), a third order non-linear Raman spectroscopy. In this thesis we introduced the ultrafast Raman loss spectroscopic (URLS) technique which is analogous to FSRS, offering the modern ultrafast community to resolve molecular processes with better signal-to-noise ratio along with proper time and frequency resolution. We demonstrate the experimental procedure including the single shot detection scheme to measure whitelight background, ground state Ra-man, transient absorption and transient Raman in shot-to-shot detection fashion. URLS has been applied to understand the excited state planarization dynamics of 1,4-bis(phenylethynyl)benzene (BPEB) in different solvents. In addition, excitation wavelength dependent conformational reorganization dynamics of different sub-sets of thermally activated ground state population of BPEB are also discussed. Using the same techniques along with femtosecond transient absorption, we demonstrate the ultrafast vibrational energy transfer and the role of coherent oscillations of low frequency vibrations on the solution phase photo-isomerization of trans-stilbene from an optically excited state. The effects of solvents on the coherent nuclear motion are also discussed in the context of reaction rates. 2

Page generated in 0.4795 seconds