Spelling suggestions: "subject:"summarization"" "subject:"ummarization""
41 |
Visual object category discovery in images and videosLee, Yong Jae, 1984- 12 July 2012 (has links)
The current trend in visual recognition research is to place a strict division between the supervised and unsupervised learning paradigms, which is problematic for two main reasons. On the one hand, supervised methods require training data for each and every category that the system learns; training data may not always be available and is expensive to obtain. On the other hand, unsupervised methods must determine the optimal visual cues and distance metrics that distinguish one category from another to group images into semantically meaningful categories; however, for unlabeled data, these are unknown a priori.
I propose a visual category discovery framework that transcends the two paradigms and learns accurate models with few labeled exemplars. The main insight is to automatically focus on the prevalent objects in images and videos, and learn models from them for category grouping, segmentation, and summarization.
To implement this idea, I first present a context-aware category discovery framework that discovers novel categories by leveraging context from previously learned categories. I devise a novel object-graph descriptor to model the interaction between a set of known categories and the unknown to-be-discovered categories, and group regions that have similar appearance and similar object-graphs. I then present a collective segmentation framework that simultaneously discovers the segmentations and groupings of objects by leveraging the shared patterns in the unlabeled image collection. It discovers an ensemble of representative instances for each unknown category, and builds top-down models from them to refine the segmentation of the remaining instances. Finally, building on these techniques, I show how to produce compact visual summaries for first-person egocentric videos that focus on the important people and objects. The system leverages novel egocentric and high-level saliency features to predict important regions in the video, and produces a concise visual summary that is driven by those regions.
I compare against existing state-of-the-art methods for category discovery and segmentation on several challenging benchmark datasets. I demonstrate that we can discover visual concepts more accurately by focusing on the prevalent objects in images and videos, and show clear advantages of departing from the status quo division between the supervised and unsupervised learning paradigms. The main impact of my thesis is that it lays the groundwork for building large-scale visual discovery systems that can automatically discover visual concepts with minimal human supervision. / text
|
42 |
Αυτόματη εξαγωγή περίληψης από ελληνικό κείμενοΚυριάκου, Ερωτόκριτος 20 October 2009 (has links)
Η παρούσα διπλωματική εργασία πραγματεύεται το θέμα της αυτόματης εξαγωγής περίληψης από κείμενο ελληνικής γλώσσας. Η ανάκτηση πληροφορίας είναι ένας τομέας της επεξεργασίας φυσικής γλώσσας η οποία αποτελεί υποτομέα της Τεχνητής Νοημοσύνης. Σκοπός της είναι η ανάκτηση σημαντικών πληροφοριών από μεγάλες συλλογές δεδομένων. Ο συγκεκριμένος τομέας που συγκεντρώνεται στην εξαγωγή συνοπτικών περιλήψεων από κείμενα καλείται Αυτόματη Εξαγωγή Περίληψης Κειμένου. Το πρόγραμμα αφαιρεί τις πλεονάζουσες πληροφορίες από το κείμενο εισόδου και παράγει ένα μικρότερο, απαλλαγμένο από πλεονασμούς, κείμενο εξόδου. Το κείμενο αυτό είναι ένα extract από το αρχικό κείμενο. Με αυτό εννοούμε ότι καμία από τις νέες προτάσεις δεν παράγεται από την αρχή, αντ' αυτού, αρχικές μη τροποποιημένες προτάσεις χρησιμοποιούνται για να σχηματιστεί η περίληψη. Οι πιο σημαντικές προτάσεις επιλέγονται με την εφαρμογή κριτήριων που έχουν ειδικά σχεδιαστεί για να βαθμολογήσουν τη κάθε πρόταση. Το αποτέλεσμα συγκρίνεται με «ανθρώπινα» κατασκευασμένες περιλήψεις και με κάποια γνωστά προγράμματα αυτόματης σύνοψης κειμένου. / This diploma dissertation is about automatic text summarization for the Greek language. Information retrieval is a field of natural language processing which is a subfield of Artificial Intelligence. Its purpose is to retrieve important information out of large collections of data. The specific domain that concentrates on text-data and the extraction of short summaries is called automatic text summarization. A computer program that summarizes a text. The summarizer removes redundant information from the input text and produces a shorter non-redundant output text. The output text is an extract from the original text. With extract, we mean that no sentence is produced from scratch, but instead original sentences are used to form the summary. The most important sentences are chosen by application of some criteria that are specially designed to rank each sentence. The results are compared to human made summaries and to some well-known summarization programs.
|
43 |
Automatic Multi-word Term Extraction and its Application to Web-page SummarizationHuo, Weiwei 20 December 2012 (has links)
In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification.
We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
|
44 |
IMPROVED DOCUMENT SUMMARIZATION AND TAG CLOUDS VIA SINGULAR VALUE DECOMPOSITIONProvost, JAMES 25 September 2008 (has links)
Automated summarization is a difficult task. World-class summarizers can provide only "best guesses" of which sentences encapsulate the important content from within a set of documents. As automated systems continue to improve, users are still not given the means to observe complex relationships between seemingly independent concepts. In this research we used singular value decompositions to organize concepts and determine the best candidate sentences for an automated summary. The results from this straightforward attempt were comparable to world-class summarizers. We then included a clustered tag cloud, using a singular value decomposition to measure term "interestingness" with respect to the set of documents. The combination of best candidate sentences and tag clouds provided a more inclusive summary than a traditionally-developed summarizer alone. / Thesis (Master, Computing) -- Queen's University, 2008-09-24 16:31:25.261
|
45 |
Multi-modal Video Ummarization Using Hidden Markov Models For Content-based Multimedia IndexingYasaroglu, Yagiz 01 January 2003 (has links) (PDF)
This thesis deals with scene level summarization of story-based videos. Two different approaches for story-based video summarization are investigated. The first approach probabilistically models the input video and identifies scene boundaries using the same model. The second approach models scenes and classifies scene types
by evaluating likelihood values of these models. In both approaches, hidden Markov models are used as the probabilistic modeling tools. The first approach also exploits the relationship between video summarization and video production, which is briefly explained, by means of content types. Two content types are defined, dialog driven and action driven content, and the need to define such content types is emonstrated
by simulations. Different content types use different hidden Markov models and
features. The selected model segments input video as a whole. The second approach models scene types. Two types, dialog scene and action scene, are defined with different features and models. The system classifies fixed sized partitions of the video as either of the two scene types, and segments partitions separately according to their scene types. Performance of these two systems are compared against a iv
deterministic video summarization method employing clustering based on visual properties and video structure related rules. Hidden Markov model based video summarization using content types enjoys the highest performance.
|
46 |
Sumarização de dados no nodo por parâmetros : fusão de dados local em ambiente internet das coisas / Data summarization in the node by parameters (DSNP) : local data fusion in an Iot environmentMaschi, Luis Fernando Castilho 28 February 2018 (has links)
Submitted by LUIS FERNANDO CASTILHO MASCHI null (maschibr@yahoo.com.br) on 2018-03-27T13:55:29Z
No. of bitstreams: 1
SUMARIZAÇÃO DE DADOS NO NODO POR PARÂMETROS.pdf: 1425727 bytes, checksum: 7815d75156e3306a56b50c9922887e5d (MD5) / Rejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo:
Problema 01) O ano descrito na CAPA e na FOLHA DE ROSTO deverá ser o ano de entrega da dissertação na Seção Técnica de Pós-Graduação.
Problema 02) A FICHA CATALOGRÁFICA (Obrigatório pela ABNT NBR14724) deve ser a elaborada pela Biblioteca, na sua ficha falta número do CDU, palavras-chave.
Problema 03) Falta a data na FOLHA DE APROVAÇÃO, que é a data efetiva da defesa.
Problema 04) Faltam as palavras-chave no abstracts e no resumo.
Estou encaminhando anexo um modelo das páginas pré-textuais.
Lembramos que o arquivo depositado no repositório deve ser igual ao impresso.
Agradecemos a compreensão
on 2018-03-27T17:55:34Z (GMT) / Submitted by LUIS FERNANDO CASTILHO MASCHI null (maschibr@yahoo.com.br) on 2018-05-03T13:41:48Z
No. of bitstreams: 2
SUMARIZAÇÃO DE DADOS NO NODO POR PARÂMETROS.pdf: 1425727 bytes, checksum: 7815d75156e3306a56b50c9922887e5d (MD5)
SUMARIZAÇÃO DE DADOS NO NODO POR PARÂMETROS.pdf: 1342697 bytes, checksum: b2357258f4e04e3d3eafb7b6023c58ad (MD5) / Rejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo:
Anexar um arquivo só;
data da defesa por extenso (28 de fevereiro de 2018);
tirar folha em branco antes da dedicatória;
colocar o Campus de Rio Claro na natureza da pesquisa.
Agradecemos a compreensão. on 2018-05-04T21:30:57Z (GMT) / Submitted by LUIS FERNANDO CASTILHO MASCHI null (maschibr@yahoo.com.br) on 2018-05-11T20:01:50Z
No. of bitstreams: 1
SUMARIZAÇÃO DE DADOS NO NODO POR PARÂMETROS.pdf: 1342185 bytes, checksum: 30fe378e25bc78d274d13f711fb3d38c (MD5) / Approved for entry into archive by Paula Torres Monteiro da Torres (paulatms@sjrp.unesp.br) on 2018-05-11T22:36:59Z (GMT) No. of bitstreams: 1
maschi_lf_me_sjrp_int.pdf: 1342185 bytes, checksum: 30fe378e25bc78d274d13f711fb3d38c (MD5) / Made available in DSpace on 2018-05-11T22:36:59Z (GMT). No. of bitstreams: 1
maschi_lf_me_sjrp_int.pdf: 1342185 bytes, checksum: 30fe378e25bc78d274d13f711fb3d38c (MD5)
Previous issue date: 2018-02-28 / Com o surgimento da Internet das Coisas, por meio de bilhões de objetos ou dispositivos inseridos na Internet, gerando um volume de dados nunca antes imaginado. Este trabalho propõe uma maneira de coletar e processar dados locais através da tecnologia de fusão de dados chamada de sumarização de dados. A principal característica desta proposta é a fusão local de dados, através de parâmetros fornecidos pela aplicação e/ou base de dados, garantindo a qualidade dos dados coletados pelo nodo do sensor. Nos testes, um nodo sensor com a técnica proposta, aqui identificada como Sumarização de Dados no Nodo por Parâmetros (SDNP), realiza a sumarização de dados e posteriormente é comparado com outro nodo que realizou uma gravação contínua dos dados coletados. Foram criados dois conjuntos de nós para estes testes, um com um nodo de sensor que analisou a luminosidade de salas de aula, que neste caso obteve uma redução de 97% no volume de dados gerados, e outro conjunto que analisou a temperatura dessas salas, obtendo uma redução de 80% no volume de dados. Através desses testes, verificou-se que a sumarização de dados local no nodo pode ser usada para reduzir o volume de dados gerados, diminuindo assim o volume de mensagens geradas pelos ambientes IoT. / With the advent of the Internet of Things, billions of objects or devices are inserted into the global computer network, generating and processing data in a volume never before imagined. This work proposes a way to collect and process local data through the data fusion technique called summarization. The main feature of the proposal is the local data fusion through parameters provided by the application, ensuring the quality of data collected by the sensor node. In the tests, the sensor node was compared when performing the data summary with another that performed a continuous recording of the collected data. Two sets of nodes were created, one with a sensor node that analyzed the luminosity of the room, which in this case obtained a reduction of 97% in the volume of data generated, and another set that analyzed the temperature of the room, obtaining a reduction of 80 % in the data volume. Through these tests, it has been verified that the local data fusion at the node can be used to reduce the volume of data generated, consequently decreasing the volume of messages generated by IoT environments.
|
47 |
Algoritmos rápidos para estimativas de densidade hierárquicas e suas aplicações em mineração de dados / Fast algorithms for hierarchical density estimates and its applications in data miningJoelson Antonio dos Santos 29 May 2018 (has links)
O agrupamento de dados (ou do inglês Clustering) é uma tarefa não supervisionada capaz de descrever objetos em grupos (ou clusters), de maneira que objetos de um mesmo grupo sejam mais semelhantes entre si do que objetos de grupos distintos. As técnicas de agrupamento de dados são divididas em duas principais categorias: particionais e hierárquicas. As técnicas particionais dividem um conjunto de dados em um determinado número de grupos distintos, enquanto as técnicas hierárquicas fornecem uma sequência aninhada de agrupamentos particionais separados por diferentes níveis de granularidade. Adicionalmente, o agrupamento hierárquico de dados baseado em densidade é um paradigma particular de agrupamento que detecta grupos com diferentes concentrações ou densidades de objetos. Uma das técnicas mais populares desse paradigma é conhecida como HDBSCAN*. Além de prover hierarquias, HDBSCAN* é um framework que fornece detecção de outliers, agrupamento semi-supervisionado de dados e visualização dos resultados. No entanto, a maioria das técnicas hierárquicas, incluindo o HDBSCAN*, possui uma alta complexidade computacional. Fato que as tornam proibitivas para a análise de grandes conjuntos de dados. No presente trabalho de mestrado, foram propostas duas variações aproximadas de HDBSCAN* computacionalmente mais escaláveis para o agrupamento de grandes quantidades de dados. A primeira variação de HDBSCAN* segue o conceito de computação paralela e distribuída, conhecido como MapReduce. Já a segunda, segue o contexto de computação paralela utilizando memória compartilhada. Ambas as variações são baseadas em um conceito de divisão eficiente de dados, conhecido como Recursive Sampling, que permite o processamento paralelo desses dados. De maneira similar ao HDBSCAN*, as variações propostas também são capazes de fornecer uma completa análise não supervisionada de padrões em dados, incluindo a detecção de outliers. Experimentos foram realizados para avaliar a qualidade das variações propostas neste trabalho, especificamente, a variação baseada em MapReduce foi comparada com uma versão paralela e exata de HDBSCAN* conhecida como Random Blocks. Já a versão paralela em ambiente de memória compartilhada foi comparada com o estado da arte (HDBSCAN*). Em termos de qualidade de agrupamento e detecção de outliers, tanto a variação baseada em MapReduce quanto a baseada em memória compartilhada mostraram resultados próximos à versão paralela exata de HDBSCAN* e ao estado da arte, respectivamente. Já em termos de tempo computacional, as variações propostas mostraram maior escalabilidade e rapidez para o processamento de grandes quantidades de dados do que as versões comparadas. / Clustering is an unsupervised learning task able to describe a set of objects in clusters, so that objects of a same cluster are more similar than objects of other clusters. Clustering techniques are divided in two main categories: partitional and hierarchical. The particional techniques divide a dataset into a number of distinct clusters, while hierarchical techniques provide a nested sequence of partitional clusters separated by different levels of granularity. Furthermore, hierarchical density-based clustering is a particular clustering paradigm that detects clusters with different concentrations or densities of objects. One of the most popular techniques of this paradigm is known as HDBSCAN*. In addition to providing hierarchies, HDBSCAN* is a framework that provides outliers detection, semi-supervised clustering and visualization of results. However, most hierarchical techniques, including HDBSCAN*, have a high complexity computational. This fact makes them prohibitive for the analysis of large datasets. In this work have been proposed two approximate variations of HDBSCAN* computationally more scalable for clustering large amounts of data. The first variation follows the concept of parallel and distributed computing, known as MapReduce. The second one follows the context of parallel computing using shared memory. Both variations are based on a concept of efficient data division, known as Recursive Sampling, which allows parallel processing of this data. In a manner similar to HDBSCAN*, the proposed variations are also capable of providing complete unsupervised patterns analysis in data, including outliers detection. Experiments have been carried out to evaluate the quality of the variations proposed in this work, specifically, the variation based on MapReduce have been compared to a parallel and exact version of HDBSCAN*, known as Random Blocks. Already the version parallel in shared memory environment have been compared to the state of the art (HDBSCAN*). In terms of clustering quality and outliers detection, the variation based on MapReduce and other based on shared memory showed results close to the exact parallel verson of HDBSCAN* and the state of the art, respectively. In terms of computational time, the proposed variations showed greater scalability and speed for processing large amounts of data than the compared versions.
|
48 |
Sběr dat z velkého počtu počítačů pomocí hierarchické sumarizace / Data acquisition from a great number of computers using hierarchical summarizationJelínek, Mojmír January 2008 (has links)
This paper deals with IPTV (Internet Protocol Television) transmission of feedback and is showing options and ways of construction, problems and optimalization of signalization protocol. In the beginning are described IPTV and technology terms that this technology uses. Here can be found information about classical TV (Television) transmitting and comparing with IPTV technology, the advantages and disadvantaged of IPTV and answers why this solution has future. Next parts are about history of IPTV and real use over ADSL (Asymmetric Digital Subscriber Line) in present. Here are explained all the necessary units like Head-End, feedback target, root feedback target, ADSL, DSLAM (Digital Subscriber Line Access Multiplexer) and methods of data stream transmission. Also here are described the techniques of video stream compression (MPEG-2 and MPEG-4) and all options of data transmission as broadcast, unicast and multicast. Important part is about transmission speed and needs. The realization also contains applications, written in the C++ language, for transmitting and receiving packets by UDP (User Datagram Protocol) protocol. The task of these applications is to load the main server, where the measurement of packet loss and CPU (Central Processing Unit) load takes place. The result is a table of measured values for specified packet sizes and for specified time intervals between them. The meaning of this measuring is to find the maximal number of computer nodes, which the feedback target is able to proceed. Last part is about realization of 2 applications in JAVA language, which get the information about end-nodes. Both algorithms are using 2 threads to increase speed of getting the information. The client has few random generators within one thread, which will be later replaced by special algorithms for getting real values.
|
49 |
Protokol TTP pro správu hierarchických stromů zpětné vazby RTCP kanálu / TTP protocol for managing hierarchy trees of RTCP feedback channelMüller, Jakub January 2008 (has links)
TTP protocol for managing hierarchy trees of the RTCP feedback channel represents the mechanism for transferring the big amount of data from end users via the “narrow” feedback channel. We are not speaking about thousands of users but about millions of users, which are using services like IPTV. The method of a data summarization is used for this purpose in selected network nodes. The summarized message is transferred and summarized again in higher levels of the hierarchical tree. Both methods allow reducing the amount of data and help to increase information content transferred via the feedback channel. Finding of the correct end user position in a network is also very important aspect. The user must be able to find the closest summarization node with this information and starts sending out the messages to this node for processing. There are several methods for constructing and managing the asynchronous feedback channel that are introduced in this work.
|
50 |
Surfacing Personas from Enterprise Social Media to Enhance Engagement VisibilityVenkatachalam, Ramiya 28 August 2013 (has links)
No description available.
|
Page generated in 0.08 seconds