Global ETD Search

21	ANDROID SECURE DEPLOYMENT & NFC BASED E-LIBRARY IMPLEMENTATION HASSAN, FARRUKH January 2015 (has links) This thesis communicates a new approach for the future Library system using secure NFC technology. Today we can use NFC and Android based mobile phones to build modern library system in which user will instead of standing in the queue can directly borrow and return books. The NFC technology which will use in this thesis is capable of storing small amount of information. This storage will be used for maintaining the books records. Although the NFC works in close proximity but still there are possibilities of attacks. Due to contact less communication the victim cannot notice the attacks. There are different types of attacks which can occur including modification of data and listening to the communication by unknown user. Therefore in this thesis the author will look into how one can protect the system from these kinds of at-tacks. The motivation behind the thesis is to introduce scalable cloud based infrastructure as a backbone Library. Current systems using bar code technology are not secure. Therefore an infrastructure needs to be built which includes cloud based server for key distribution and data storage. Furthermore, this thesis includes a study of the encryption and decryption schemes for close proximity communications. A new novel algorithm has been introduced and implemented as an encryption scheme for this thesis work. The Huffman scheme has been modified and 16 bit keys have been used for the key exchange. The new approach is compared with the existing techniques and found that it is reliable as compared to other techniques. Cloud based Server Attacks Heterogeneous Networks Authentication RFID tags NFC Partial keys.
22	Cooperative End-to-end Congestion Control in Heterogeneous Wireless Networks Mohammadizadeh, Neda 20 August 2013 (has links) Sharing the resources of multiple wireless networks with overlapped coverage areas has a potential of improving the transmission throughput. However, in the existing frameworks, the improvement cannot be achieved in congestion scenarios because of independent congestion control procedures among the end-to-end paths. Although various network characteristics make the congestion control complex, this variety can be useful in congestion avoidance if the networks cooperate with each other. When congestion happens in an end-to-end path, it is inevitable to have a packet transmission rate less than the minimum requested rate due to congestion window size adjustments. Cooperation among networks can help to avoid this problem for better service quality. When congestion is predicted for one path, some of the on-going packets can be sent over other paths instead of the congested path. In this way, the traffic can be shifted from a congested network to others, and the overall transmission throughput does not degrade in a congestion scenario. However, cooperation is not always advantageous since the throughput of cooperative transmission in an uncongested scenario can be less than that of non-cooperative transmission due to cooperation costs such as cooperation setup time, additional signalling for cooperation, and out-of-order packet reception. In other words, a trade-off exists between congestion avoidance and cooperation cost. Thus, cooperation should be triggered only when it is beneficial according to congestion level measurements. In this research, our aim is to develop an efficient cooperative congestion control scheme for a heterogeneous wireless environment. To this end, a cooperative congestion control algorithm is proposed, in which the state of an end-to-end path is provided at the destination terminal by measuring the queuing delay and estimating the congestion level. The decision on when to start/stop cooperation is made based on the network characteristics, instantaneous traffic condition, and the requested quality of service (QoS). Simulation results demonstrate the throughput improvement of the proposed scheme over non-cooperative congestion control. Congestion Control Heterogeneous Networks Wireless Networks Cooperation TCP Transport Layer Multipath MPTCP
23	Tiered Networks: Modeling, Resource and Interference Management Erturk, Mustafa Cenk 01 January 2012 (has links) The wireless networks of the future are likely to be tiered, i.e., a heterogeneous mixture of overlaid networks that have different power, spectrum, hardware, coverage, mobility, complexity, and technology requirements. The focus of this dissertation is to improve the performance and increase the throughput of tiered networks with resource/interference management methods, node densification schemes, and transceiver designs; with their applications to advanced tiered network structures such as heterogeneous networks (i.e., picocells, femtocells, relay nodes, and distributed antenna systems), device-to-device (D2D) networks, and aeronautical communication networks (ACN). Over the last few decades, there has been an incredible increase in the demand for wireless services in various applications in the entire world. This increase leads to the emergence of a number of advanced wireless systems and networks whose common goal is to provide a very high data rate to countless users and applications. With the traditional macrocellular network architectures, it will be extremely challenging to meet such demand for high data rates in the upcoming years. Therefore, a mixture of different capability networks has started being built in a tiered manner. While the number and capabilities of networks are increasing to satisfy higher requirements; Modeling, managing, and maintaining the entire structure has become more challenging. The capacity of wireless networks has increased with various different advanced technologies/methodologies between 1950-2000 which can be summarized under three main titles: spectrum increase (x25), spectrum efficiency increase (x25), and network density (spectrum reuse) increase (x1600). It is vital to note that among different schemes, the most important gain is explored with increasing the reuse and adding more nodes/cells into the system, which will be the focus of this dissertation. Increasing the reuse by adding nodes into the network in an uncoordinated (irregular in terms of power, spectrum, hardware, coverage, mobility, complexity, and technology) manner brought up heterogeneity to the traditional wireless networks: multi-tier resource management problems in uncoordinated interference environments. In this study, we present novel resource/interference management methods, node densification schemes, and transceiver designs to improve the performance of tiered networks; and apply our methodologies to heterogeneous networks, D2D networks, and ACN. The focus and the contributions of this research involve the following perspectives: 1. Resource Management in Tiered Networks: Providing a fairness metric for tiered networks and developing spectrum allocation models for heterogeneous network structures. 2. Network Densification in Tiered Networks: Providing the signal to interference plus noise ratio (SINR) and transmit power distributions of D2D networks for network density selection criteria, and developing gateway scheduling algorithms for dense tiered networks. 3. Mobility in Tiered Networks: Investigation of mobility in a two-tier ACN, and providing novel transceiver structures for high data rate, high mobility ACN to mitigate the effect of Doppler. Aeronautical Communication Networks Device-to-device Networks Femtocell Networks Heterogeneous Networks Communication Electrical and Computer Engineering
24	Interference Modeling and Performance Analysis of 5G MmWave Networks Niknam, Solmaz January 1900 (has links) Doctor of Philosophy / Department of Electrical and Computer Engineering / Balasubramaniam Natarajan / Triggered by the popularity of smart devices, wireless traffic volume and device connectivity have been growing exponentially during recent years. The next generation of wireless networks, i.e., 5G, is a promising solution to satisfy the increasing data demand through combination of key enabling technologies such as deployment of a high density of access points (APs), referred to as ultra-densification, and utilization of a large amount of bandwidth in millimeter wave (mmWave) bands. However, due to unfavorable propagation characteristics, this portion of spectrum has been under-utilized. As a solution, large antenna arrays that coherently direct the beams will help overcome the hostile characteristics of mmWave signals. Building networks of directional antennas has given rise to many challenges in wireless communication design. One of the main challenges is how to incorporate 5G technology into current networks and design uniform structures that bring about higher network performance and quality of service. In addition, the other factor that can be severely impacted is interference behavior. This is basically due to the fact that, narrow beams are highly vulnerable to obstacles in the environment. Motivated by these factors, the present dissertation addresses some key challenges associated with the utilization of mmWave signals. As a first step towards this objective, we first propose a framework of how 5G mmWave access points can be integrated into the current wireless structures and offer higher data rates. The related resource sharing problem has been also proposed and solved, within such a framework. Secondly, to better understand and quantify the interference behavior, we propose interference models for mmWave networks with directional beams for both large scale and finite-sized network dimension. The interference model is based on our proposed blockage model which captures the average number of obstacles that cause a complete link blockage, given a specific signal beamwidth. The main insight from our analysis shows that considering the effect of blockages leads to a different interference profile. Furthermore, we investigate how to model interference considering not only physical layer specifications but also upper layers constraints. In fact, upper network layers, such as medium access control (MAC) protocol controls the number of terminals transmitting simultaneously and how resources are shared among them, which in turn impacts the interference power level. An interesting result from this analysis is that, from the receiving terminal standpoint, even in mmWave networks with directional signals and high attenuation effects, we still need to maintain some sort of sensing where all terminals are not allowed to transmit their packets, simultaneously. The level of such sensing depends on the terminal density. Lastly, we provide a framework to detect the network regime and its relation to various key deployment parameters, leveraging the proposed interference and blockage models. Such regime detection is important from a network management and design perspective. Based on our finding, mmWave networks can exhibit either an interference-limited regime or a noise-limited regime, depending on various factors such as access point density, blockage density, signal beamwidth, etc. Interference modeling Millimeter wave communication 5G Stochastic geometry Multi-band heterogeneous networks Blockage effect
25	Optimization of user association and resource allocation in heteregeneous networks / Optimisation de l'association des utilisateurs et de l'allocation des ressources dans les réseaux sans fil hétérogènes Zalghout, Mohamad 23 October 2017 (has links) Aujourd'hui, l'extension des exigences du trafic de données sans fil dépasse le taux de croissance de la capacité des nouvelles technologies d'accès sans fil. Par conséquent, les réseaux sans fil mobiles de la future génération proposent des architectures hétérogènes, généralement appelées réseaux sans fil hétérogènes (HWN). HWN se caractérisent par l'intégration des réseaux cellulaires et des réseaux locaux sans fil (WLAN) pour répondre aux besoins des utilisateurs et améliorer la capacité du système. En fait, l'intégration de différents types de technologies d'accès sans fil dans HWN offre des choix flexibles pour que les utilisateurs soient associés au réseau qui répond le mieux à leurs besoins. Dans ce contexte, cette thèse traite le problème d'association d'utilisateurs et le problème d'allocation de ressources dans un système sans fil hétérogène basé sur des points d'accès Wi-Fi intégrés et des stations de base L TE. Les contributions de cette thèse pourraient être divisées en trois parties principales. Dans la première partie, un nouveau problème d'association d'utilisateurs et d'optimisation de l'allocation des ressources est formulé pour maximiser la satisfaction globale des utilisateurs dans le système. La satisfaction de l'utilisateur est basée sur une fonction de profit pondérée qui vise à améliorer la puissance relative du signal reçu et la diminution de la consommation d'énergie des terminaux mobiles (MT). Étant donné qu'un MT n'est autorisé à être associé qu'à un seul réseau à la fois, le problème d'optimisation formulé est binaire avec une complexité NP complète. Ensuite, plusieurs solutions centralisées avec une complexité à temps polynomial sont proposées pour résoudre le problème formulé. Les solutions proposées sont basées sur des approches heuristiques et sur la relaxation continue du problème d'optimisation binaire formulé. La deuxième partie de la thèse vise à fournir une solution distribuée pour le problème formulé. La solution distribuée proposée déploie la technique de détente lagrangienne pour convertir le problème global formulé en plusieurs problèmes de Knapsack distribués, chaque réseau traite son problème Knapsack correspondant. La méthode de sous gradient est utilisée pour trouver les multiplicateurs lagrangiens optimaux ou sous optimaux. Enfin, la troisième partie de la thèse étudie de nouvelles perspectives de la formulation du problème d'optimisation et ses solutions centralisées et distribuées correspondantes. Un problème d'association d'utilisateurs et d'allocation de ressources basé sur la priorité est formulé. Le problème est ensuite réduit en plusieurs problèmes résolus à l'aide des solutions proposées réparties et centralisées. En outre, une nouvelle solution de maximisation de l'efficacité énergétique est proposée en modifiant les objectifs du problème d'optimisation originalement formulé. / It is indicated that the expansion of the wireless data traffic requirements exceeds the capacity growth rate of new wireless access technologies. Therefore, next-generation mobile wireless networks are moving toward heterogeneous architectures usually referred to as heterogeneous wireless networks (HWNs). HWNs are usually characterized by the integration of cellular networks and wireless local area networks (WLANs) to meet user requirements and enhance system capacity. In fact, integrating different types of wireless access technologies in HWNs provides flexible choices for users to be associated with the network that best satisfies their needs. In this context, this thesis discusses the user association and downlink resource allocation problem in a heterogeneous wireless system that is based on integrated Wi-Fi access points (APs) and long-term evolution (L TE) base stations (BSs). The contributions of this thesis could be divided into three main parts. In the first part, a novel user association and resource allocation optimization problem is formulated to maximize the overall user satisfaction in the system. The user satisfaction is based on a weighted profit function that aims at enhancing the relative received signal strength and decreasing the power consumption of mobile terminals (MTs). Since a MT is only allowed to be associated with a single network at a time, the formulated optimization problem is binary with an NP-complete complexity. Then, multiple centralized solutions with polynomial-time complexities are proposed to solve the formulated problem. The proposed centralized solutions are based on heuristic approaches and on the continuous re laxation of the formulated binary optimization problem. The second part of the thesis aims at providing a distributed solution for the formulated problem. The proposed distributed solution deploys the Lagrangian relaxation .technique in order to convert the global formulated problem into multiple distributed Knapsack problems each network processes its corresponding Knapsack problem. The sub-gradient method is used in order to find the optimal, or near optimal, Lagrangian multipliers. Finally, the third part of the thesis studies new perspectives of the formulated optimization problem and its corresponding centralized and distributed solutions. Mainly, a generalized priority-aware user association and resource allocation problem is formulated. The priority-aware problem is then reduced into multiple problems that are solved using the proposed centralized and distributed solutions. Moreover, a novel power efficiency maximization solution is proposed by altering the objectives of the main formulated optimization problem. Allocation de ressources Consommation d'énergie User association Resource allocation Heterogeneous networks Power consumption 621.382
26	5G Backhauling with Software-defined Wireless Mesh Networks Santos, Ricardo January 2018 (has links) Current technological advances have caused an exponential growth of the number of mobile Internet-connected devices, along with their respective traffic demands. To cope with this increase of traffic demands, fifth generation (5G) network architectures will need to provide multi-gigabit capacity at the access base stations (BSs), through the deployment of ultra-dense small cells (SCs) operating with millimeter-wave (mmWave) frequencies, e.g. 60 GHz. To connect the BSs to the core network, a robust and high capacity backhaul infrastructure is required. As it is unfeasible to connect all the SCs through optical fiber links, a solution for the future 5G backhaul relies on the usage of mmWave frequencies to interconnect the SCs, forming multi-hop wireless mesh topologies. In this thesis, we explore the application of the Software-defined Networking (SDN) paradigm for the management of a SC wireless backhaul. With SDN, the data and control planes are separated and the network management is done by a centralized controller entity that has a global network view. To that end, we provide multiple contributions. Firstly, we provide an SDN-based architecture to manage SC backhaul networks, which include an out-of-band Long Term Evolution (LTE) control channel and where we consider aspects such as energy efficiency, resiliency and flexible backhaul operation. Secondly, we demonstrate the benefit of the wireless backhaul configuration using the SDN controller, which can be used to improve the wireless resource allocation and provide resiliency mechanisms in the network. Finally, we investigate how a SC mesh backhaul can be optimally reconfigured between different topologies, focusing on minimizing the network disruption during the reconfiguration. / The growth of mobile devices, along with their traffic demands, is expected to saturate the current mobile networks soon. To cope with such demand increase, fifth generation (5G) network architectures will need to provide multi-gigabit capacity at the access level, through the deployment of a massive amount of ultra-dense small cells (SCs). To connect the access and core networks, a robust and high capacity backhaul is required. To that end, mmWave links that operate at e.g. 60 GHz, can be used to interconnect the SCs, forming multi-hop wireless mesh topologies. In this thesis, we study the application of the Software-defined Networking (SDN) paradigm for the management of a SC wireless backhaul. Firstly, we provide an SDN-based architecture to manage SC backhaul networks, which includes an out-of-band control channel and where we consider aspects such as energy efficiency, resiliency and flexible backhaul operation. Secondly, we show the benefits of the wireless backhaul configuration using the SDN controller, which can be used to improve the wireless resource allocation and provide network resiliency. Finally, we investigate how a SC mesh backhaul can be optimally reconfigured between different topologies, while minimizing the network disruption during the reconfiguration. SDN wireless backhaul heterogeneous networks mmWave 5G resiliency Computer Sciences Datavetenskap (datalogi)
27	Optimisation of traffic steering for heterogeneous mobile networks Frei, Sandra January 2015 (has links) Mobile networks have changed from circuit switched to IP-based mobile wireless packet switched networks. This paradigm shift led to new possibilities and challenges. The development of new capabilities based on IP-based networks is ongoing and raises new problems that have to be tackled, for example, the heterogeneity of current radio access networks and the wide range of data rates, coupled with user requirements and behaviour. A typical example of this shift is the nature of traffic, which is currently mostly data-based; further, forecasts based on market and usage trends indicate a data traffic increase of nearly 11 times between 2013 and 2018. The majority of this data traffic is predicted to be multimedia traffic, such as video streaming and live video streaming combined with voice traffic, all prone to delay, jitter, and packet loss and demanding high data rates and a high Quality of Service (QoS) to enable the provision of valuable service to the end-user. While the demands on the network are increasing, the end-user devices become more mobile and end-user demand for the capability of being always on, anytime and anywhere. The combination of end-user devices mobility, the required services, and the significant traffic loads generated by all the end-users leads to a pressing demand for adequate measures to enable the fulfilment of these requirements. The aim of this research is to propose an architecture which provides smart, intelligent and per end-user device individualised traffic steering for heterogeneous mobile networks to cope with the traffic volume and to fulfil the new requirements on QoS, mobility, and real-time capabilities. The proposed architecture provides traffic steering mechanisms based on individual context data per end-user device enabling the generation of individual commands and recommendations. In order to provide valuable services for the end-user, the commands and recommendations are distributed to the end-user devices in real-time. The proposed architecture does not require any proprietary protocols to facilitate its integration into the existing network infrastructure of a mobile network operator. The proposed architecture has been evaluated through a number of use cases. A proof-of-concept of the proposed architecture, including its core functionality, was implemented using the ns-3 network simulator. The simulation results have shown that the proposed architecture achieves improvements for traffic steering including traffic offload and handover. Further use cases have demonstrated that it is possible to achieve benefits in multiple other areas, such as for example improving the energy efficiency, improving frequency interference management, and providing additional or more accurate data to 3rd party to improve their services. 004.6
28	Detection and Analysis of Online Extremist Communities Benigni, Matthew Curran 01 May 2017 (has links) Online social networks have become a powerful venue for political activism. In many cases large, insular online communities form that have been shown to be powerful diffusion mechanisms of both misinformation and propaganda. In some cases these groups users advocate actions or policies that could be construed as extreme along nearly any distribution of opinion, and are thus called Online Extremist Communities (OECs). Although these communities appear increasingly common, little is known about how these groups form or the methods used to influence them. The work in this thesis provides researchers a methodological framework to study these groups by answering three critical research questions: How can we detect large dynamic online activist or extremist communities? What automated tools are used to build, isolate, and influence these communities? What methods can be used to gain novel insight into large online activist or extremist communities? These group members social ties can be inferred based on the various affordances offered by OSNs for group curation. By developing heterogeneous, annotated graph representations of user behavior I can efficiently extract online activist discussion cores using an ensemble of unsupervised machine learning methods. I call this technique Ensemble Agreement Clustering. Through manual inspection, these discussion cores can then often be used as training data to detect the larger community. I present a novel supervised learning algorithm called Multiplex Vertex Classification for network bipartition on heterogeneous, annotated graphs. This methodological pipeline has also proven useful for social botnet detection, and a study of large, complex social botnets used for propaganda dissemination is provided as well. Throughout this thesis I provide Twitter case studies including communities focused on the Islamic State of Iraq and al-Sham (ISIS), the ongoing Syrian Revolution, the Euromaidan Movement in Ukraine, as well as the alt-Right. Covert Network Detection Community Detection Annotated Networks Multilayer Networks Heterogeneous Networks Spectral Clustering
29	Interference Mitigation, Resource Allocation and Channel Control Techniques for 4G and Beyond Systems Yilmaz, Mustafa Harun 21 March 2017 (has links) The usage of the wireless communication technologies have been increasing due to the benefits they provide in our daily life. These technologies are used in various fields such as military communication, public safety, cellular communication. The current systems might not be sufficient to meet the increasing demand. Therefore, the new solutions such as the usage of smart antennas have been proposed to satisfy this demand. Among different solutions, cognitive heterogeneous networks (HetNets) have been recently introduced as a promising one to meet the high user demand. In cognitive Hetnets, there are secondary base stations (SBSs) with secondary users (SUs) and primary base stations (PBSs) with primary users (PUs) in a given area without any coordination between SBS-SBS and SBS-PBS. Due to the physical coexistence of SBSs and the lack of available spectrum, interference caused by the SBSs becomes a significant issue. Therefore, there is a need for the techniques that allow users to share the same spectrum while maintaining the required performance level for each user by adopting interference mitigation techniques. In this dissertation, we focus on resource allocation, interference coordination/mitigation and channel control techniques in 4G and beyond systems. As resource allocation techniques, we propose two studies. In the first study, we present the random subcarrier selection algorithm which is that each SU selects a specific number of subcarriers determined by its needs. In comparison where, at each iteration of the game, the SU searches all the subcarriers to maximize its payof, our algorithm is based on selecting the subcarriers randomly and checks only those subcarriers that achieve higher payof. In the second study, we utilize the reconfigurable antennas (RAs) which allows wireless devices to alter their antenna states determined by different radiation patterns to maximize received signal strength, and present the joint subcarrier and antenna state selection algorithm. SU selects the subcarriers whose capacity values are the highest among the available ones. Since SUs employ RAs, i.e., multiple antenna states, they obtain the reports for all subcarriers from each antenna states, and select the state with the subcarriers which provide the highest capacity gain. As interference coordination/mitigation technique, we propose a game theoretical partially overlapping filtered multitone (POFMT) scheme. Partially overlapping is performed in both frequency and space domains. While intentional carrier frequency shift is introduced in frequency, RAs are utilized to achieve partially overlapping in space domain. Within a game theoretical framework, when SUs search for the frequency shift ratio, they also select the antenna state to increase the system utility. We also combine the resource allocation technique with POTs and present the game theoretical resource allocation with POFMT. To achieve the resource allocation, an SBS slides a group of consecutive subcarriers through all available ones and computes the utility for each selected subcarriers. It picks the consecutive ones which give the highest capacity result. Our results show that our algorithms reach Nash equilibrium and increase the system gain substantially in terms of the corresponding utility. As channel control technique, we propose a wireless channel control using spatially adaptive antenna arrays. This technique simultaneously utilizes beam-steering and spatial adaptation to enhance the wireless channel gain and system capacity. While the interference is reduced via beam-steering feature of proposed antenna, the wireless channel can be controlled by spatially moving the antenna in one axis. Simulated realized gain patterns at various array positions and phase shifter states are subsequently utilized in link and system level simulations to demonstrate the advantages of the proposed concept. It is shown that the system gain can be increased with the spatial adaptation capability of the antenna. Cognitive Radio Game Theory Scheduling Filtered Multitones Heterogeneous Networks Electrical and Computer Engineering
30	Classificação automática de textos por meio de aprendizado de máquina baseado em redes / Text automatic classification through machine learning based on networks Rafael Geraldeli Rossi 26 October 2015 (has links) Nos dias atuais há uma quantidade massiva de dados textuais sendo produzida e armazenada diariamente na forma de e-mails, relatórios, artigos e postagens em redes sociais ou blogs. Processar, organizar ou gerenciar essa grande quantidade de dados textuais manualmente exige um grande esforço humano, sendo muitas vezes impossível de ser realizado. Além disso, há conhecimento embutido nos dados textuais, e analisar e extrair conhecimento de forma manual também torna-se inviável devido à grande quantidade de textos. Com isso, técnicas computacionais que requerem pouca intervenção humana e que permitem a organização, gerenciamento e extração de conhecimento de grandes quantidades de textos têm ganhado destaque nos últimos anos e vêm sendo aplicadas tanto na academia quanto em empresas e organizações. Dentre as técnicas, destaca-se a classificação automática de textos, cujo objetivo é atribuir rótulos (identificadores de categorias pré-definidos) à documentos textuais ou porções de texto. Uma forma viável de realizar a classificação automática de textos é por meio de algoritmos de aprendizado de máquina, que são capazes de aprender, generalizar, ou ainda extrair padrões das classes das coleções com base no conteúdo e rótulos de documentos textuais. O aprendizado de máquina para a tarefa de classificação automática pode ser de 3 tipos: (i) indutivo supervisionado, que considera apenas documentos rotulados para induzir um modelo de classificação e classificar novos documentos; (ii) transdutivo semissupervisionado, que classifica documentos não rotulados de uma coleção com base em documentos rotulados; e (iii) indutivo semissupervisionado, que considera documentos rotulados e não rotulados para induzir um modelo de classificação e utiliza esse modelo para classificar novos documentos. Independente do tipo, é necessário que as coleções de documentos textuais estejam representadas em um formato estruturado para os algoritmos de aprendizado de máquina. Normalmente os documentos são representados em um modelo espaço-vetorial, no qual cada documento é representado por um vetor, e cada posição desse vetor corresponde a um termo ou atributo da coleção de documentos. Algoritmos baseados no modelo espaço-vetorial consideram que tanto os documentos quanto os termos ou atributos são independentes, o que pode degradar a qualidade da classificação. Uma alternativa à representação no modelo espaço-vetorial é a representação em redes, que permite modelar relações entre entidades de uma coleção de textos, como documento e termos. Esse tipo de representação permite extrair padrões das classes que dificilmente são extraídos por algoritmos baseados no modelo espaço-vetorial, permitindo assim aumentar a performance de classificação. Além disso, a representação em redes permite representar coleções de textos utilizando diferentes tipos de objetos bem como diferentes tipos de relações, o que permite capturar diferentes características das coleções. Entretanto, observa-se na literatura alguns desafios para que se possam combinar algoritmos de aprendizado de máquina e representações de coleções de textos em redes para realizar efetivamente a classificação automática de textos. Os principais desafios abordados neste projeto de doutorado são (i) o desenvolvimento de representações em redes que possam ser geradas eficientemente e que também permitam realizar um aprendizado de maneira eficiente; (ii) redes que considerem diferentes tipos de objetos e relações; (iii) representações em redes de coleções de textos de diferentes línguas e domínios; e (iv) algoritmos de aprendizado de máquina eficientes e que façam um melhor uso das representações em redes para aumentar a qualidade da classificação automática. Neste projeto de doutorado foram propostos e desenvolvidos métodos para gerar redes que representem coleções de textos, independente de domínio e idioma, considerando diferentes tipos de objetos e relações entre esses objetos. Também foram propostos e desenvolvidos algoritmos de aprendizado de máquina indutivo supervisionado, indutivo semissupervisionado e transdutivo semissupervisionado, uma vez que não foram encontrados na literatura algoritmos para lidar com determinados tipos de relações, além de sanar a deficiência dos algoritmos existentes em relação à performance e/ou tempo de classificação. É apresentado nesta tese (i) uma extensa avaliação empírica demonstrando o benefício do uso das representações em redes para a classificação de textos em relação ao modelo espaço-vetorial, (ii) o impacto da combinação de diferentes tipos de relações em uma única rede e (iii) que os algoritmos propostos baseados em redes são capazes de superar a performance de classificação de algoritmos tradicionais e estado da arte tanto considerando algoritmos de aprendizado supervisionado quanto semissupervisionado. As soluções propostas nesta tese demonstraram ser úteis e aconselháveis para serem utilizadas em diversas aplicações que envolvam classificação de textos de diferentes domínios, diferentes características ou para diferentes quantidades de documentos rotulados. / A massive amount of textual data, such as e-mails, reports, articles and posts in social networks or blogs, has been generated and stored on a daily basis. The manual processing, organization and management of this huge amount of texts require a considerable human effort and sometimes these tasks are impossible to carry out in practice. Besides, the manual extraction of knowledge embedded in textual data is also unfeasible due to the large amount of texts. Thus, computational techniques which require little human intervention and allow the organization, management and knowledge extraction from large amounts of texts have gained attention in the last years and have been applied in academia, companies and organizations. The tasks mentioned above can be carried out through text automatic classification, in which labels (identifiers of predefined categories) are assigned to texts or portions of texts. A viable way to perform text automatic classification is through machine learning algorithms, which are able to learn, generalize or extract patterns from classes of text collections based on the content and labels of the texts. There are three types of machine learning algorithms for automatic classification: (i) inductive supervised, in which only labeled documents are considered to induce a classification model and this model are used to classify new documents; (ii) transductive semi-supervised, in which all known unlabeled documents are classified based on some labeled documents; and (iii) inductive semi-supervised, in which labeled and unlabeled documents are considered to induce a classification model in order to classify new documents. Regardless of the learning algorithm type, the texts of a collection must be represented in a structured format to be interpreted by the algorithms. Usually, the texts are represented in a vector space model, in which each text is represented by a vector and each dimension of the vector corresponds to a term or feature of the text collection. Algorithms based on vector space model consider that texts, terms or features are independent and this assumption can degrade the classification performance. Networks can be used as an alternative to vector space model representations. Networks allow the representations of relations among the entities of a text collection, such as documents and terms. This type of representation allows the extraction patterns which are not extracted by algorithms based on vector-space model. Moreover, text collections can be represented by networks composed of different types of entities and relations, which provide the extraction of different patterns from the texts. However, there are some challenges to be solved in order to allow the combination of machine learning algorithms and network-based representations to perform text automatic classification in an efficient way. The main challenges addressed in this doctoral project are (i) the development of network-based representations efficiently generated which also allows an efficient learning; (ii) the development of networks which represent different types of entities and relations; (iii) the development of networks which can represent texts written in different languages and about different domains; and (iv) the development of efficient learning algorithms which make a better use of the network-based representations and increase the classification performance. In this doctoral project we proposed and developed methods to represent text collections into networks considering different types of entities and relations and also allowing the representation of texts written in any language or from any domain. We also proposed and developed supervised inductive, semi-supervised transductive and semi-supervised inductive learning algorithms to interpret and learn from the proposed network-based representations since there were no algorithms to handle certain types of relations considered in this thesis. Besides, the proposed algorithms also attempt to obtain a higher classification performance and a faster classification than the existing network-based algorithms. In this doctoral thesis we present (i) an extensive empirical evaluation demonstrating the benefits about the use of network-based representations for text classification, (ii) the impact of the combination of different types of relations in a single network and (iii) that the proposed network-based algorithms are able to surpass the classification performance of traditional and state-of-the-art algorithms considering both supervised and semi-supervised learning. The solutions proposed in this doctoral project have proved to be advisable to be used in many applications involving classification of texts from different domains, areas, characteristics or considering different numbers of labeled documents. Aprendizado de máquina Classificação de textos Propagação de rótulos Redes heterogêneas Heterogeneous networks Label propagation Machine learning Text classification

Search results