Global ETD Search

671	Sharing the love : a generic socket API for Hadoop Mapreduce Yee, Adam J. 01 January 2011 (has links) Hadoop is a popular software framework written in Java that performs data-intensive distributed computations on a cluster. It includes Hadoop MapReduce and the Hadoop Distributed File System (HDFS). HDFS has known scalability limitations due to its single NameNode which holds the entire file system namespace in RAM on one computer. Therefore, the NameNode can only store limited amounts of file names depending on the RAM capacity. The solution to furthering scalability is distributing the namespace similar to how file is data divided into chunks and stored across cluster nodes. Hadoop has an abstract file system API which is extended to integrate HDFS, but has also been extended for integrating file systems S3, CloudStore, Ceph and PVFS. File systems Ceph and PVFS already distribute the namespace, while others such as Lustre are making the conversion. Google previously announced in 2009 they have been implementing a Google File System distributed namespace to achieve greater scalability. The Generic Hadoop API is created from Hadoop's abstract file system API. It speaks a simple communication protocol that can integrate any file system which supports TCP sockets. By providing a file system agnostic API, future work with other file systems might provide ways for surpassing Hadoop 's current scalability limitations. Furthermore, the new API eliminates the need for customizing Hadoop's Java implementation, and instead moves the implementation to the file system itself. Thus, developers wishing to integrate their new file system with Hadoop are not responsible for understanding details ofHadoop's internal operation. The API is tested on a homogeneous, four-node cluster with OrangeFS. Initial OrangeFS I/0 throughputs compared to HDFS are 67% ofHDFS' write throughput and 74% percent of HDFS' read throughput. But, compared with an alternate method of integrating with OrangeFS (a POSIX kernel interface), write and read throughput is increased by 23% and 7%, respectively Apache Hadoop (Computer file) MapReduce (Computer program) Cluster analysis Data processing Computer algorithms Computer Sciences
672	New Primitives for Tackling Graph Problems and Their Applications in Parallel Computing Zhong, Peilin January 2021 (has links) We study fundamental graph problems under parallel computing models. In particular, we consider two parallel computing models: Parallel Random Access Machine (PRAM) and Massively Parallel Computation (MPC). The PRAM model is a classic model of parallel computation. The efficiency of a PRAM algorithm is measured by its parallel time and the number of processors needed to achieve the parallel time. The MPC model is an abstraction of modern massive parallel computing systems such as MapReduce, Hadoop and Spark. The MPC model captures well coarse-grained computation on large data --- data is distributed to processors, each of which has a sublinear (in the input data) amount of local memory and we alternate between rounds of computation and rounds of communication, where each machine can communicate an amount of data as large as the size of its memory. We usually desire fully scalable MPC algorithms, i.e., algorithms that can work for any local memory size. The efficiency of a fully scalable MPC algorithm is measured by its parallel time and the total space usage (the local memory size times the number of machines). Consider an 𝑛-vertex 𝑚-edge undirected graph 𝐺 (either weighted or unweighted) with diameter 𝐷 (the largest diameter of its connected components). Let 𝑁=𝑚+𝑛 denote the size of 𝐺. We present a series of efficient (randomized) parallel graph algorithms with theoretical guarantees. Several results are listed as follows: 1) Fully scalable MPC algorithms for graph connectivity and spanning forest using 𝑂(𝑁) total space and 𝑂(log 𝐷loglog_{𝑁/𝑛} 𝑛) parallel time. 2) Fully scalable MPC algorithms for 2-edge and 2-vertex connectivity using 𝑂(𝑁) total space where 2-edge connectivity algorithm needs 𝑂(log 𝐷loglog_{𝑁/𝑛} 𝑛) parallel time, and 2-vertex connectivity algorithm needs 𝑂(log 𝐷⸱log²log_{𝑁/𝑛} n+\log D'⸱loglog_{𝑁/𝑛} 𝑛) parallel time. Here 𝐷' denotes the bi-diameter of 𝐺. 3) PRAM algorithms for graph connectivity and spanning forest using 𝑂(𝑁) processors and 𝑂(log 𝐷loglog_{𝑁/𝑛} 𝑛) parallel time. 4) PRAM algorithms for (1 + 𝜖)-approximate shortest path and (1 + 𝜖)-approximate uncapacitated minimum cost flow using 𝑂(𝑁) processors and poly(log 𝑛) parallel time. These algorithms are built on a series of new graph algorithmic primitives which may be of independent interests. Computer science Computer algorithms Parallel programming (Computer science) SPARK (Computer program language) MapReduce (Computer file) Apache Hadoop
673	Object Parallel Spatio-Temporal Analysis and Modeling System Rex, David Bruce 01 January 1993 (has links) The dissertation will outline an object-oriented model from which a next-generation GIS can be derived. The requirements for a spatial information analysis and modeling system can be broken into three primary functional classes: data management (data classification and access), analysis (modeling, optimization, and simulation) and visualization (display of data). These three functional classes can be considered as the primary colors of the spectrum from which the different shades of spatial analysis are composed. Object classes will be developed which will be designed to manipulate the three primary functions as required by the user and the data. Computer algorithms Neural networks (Computer science)
674	Image Analytic Tools for Optical Coherence Tomography Tissue Characterization and Robust Learning Huang, Ziyi January 2023 (has links) The computer-aided analysis is poised to play an increasingly prominent role in medicine and healthcare. Benefiting from the increasing computing power, various machine learning frameworks have been developed in the biomedical field, bringing significant improvements in real-world clinical applications. However, for many diseases, the development of these life-supporting algorithms is still in its infancy. To bridge this gap: This thesis is dedicated to the development of efficient algorithms for better image intervention and addressing data quality challenges in machine learning algorithms to provide direct guidance for real-world clinical applications. With the above goals, three topics are explored in depth. First, we develop a novel tissue analysis framework for cardiac substrate identification and tissue heterogeneity assessment. In particular, we creatively used model uncertainty to measure tissue structure information, offering a means of extracting the tissue heterogeneity information in a non-invasive way for real-time imaging and processing. The tissue analysis framework in the first aim is based on the fully supervised technique, which relies heavily on the availability of large-scale datasets with accurate annotations. Such high-quality datasets are extremely time-consuming to acquire, especially for biomedical segmentation tasks. To lessen the need for the labeling process, we further develop three weakly supervised learning frameworks to address data and labeling challenges caused by limited data resources. Finally, we develop an in-vivo tissue analysis framework on cardiac datasets, aiming to provide real-time guidance for clinical ablation procedures. Our models could contribute to the improvement of ablation treatment by identifying the ablation targets and avoiding critical structures within the hearts. Electrical engineering Optical coherence tomography Computer algorithms Tissues--Analysis Medical care--Computer programs
675	Support vector machines, generalization bounds, and transduction Kroon, Rodney Stephen 12 1900 (has links) Thesis (MComm)--University of Stellenbosch, 2003. / Please refer to full text for abstract. Machine learning Computer algorithms PAC bounds Support vector machine (SVM) Transductive bounds Model selection Theses -- Computer science Dissertations -- Computer science Theses -- Mathematics Dissertations -- Mathematics
676	Reinforcement learning for routing in communication networks Andrag, Walter H. 04 1900 (has links) Thesis (MSc)--Stellenbosch University, 2003. / ENGLISH ABSTRACT: Routing policies for packet-switched communication networks must be able to adapt to changing traffic patterns and topologies. We study the feasibility of implementing an adaptive routing policy using the Q-Learning algorithm which learns sequences of actions from delayed rewards. The Q-Routing algorithm adapts a network's routing policy based on local information alone and converges toward an optimal solution. We demonstrate that Q-Routing is a viable alternative to other adaptive routing methods such as Bellman-Ford. We also study variations of Q-Routing designed to better explore possible routes and to take into consideration limited buffer size and optimize multiple objectives. / AFRIKAANSE OPSOMMING:Die roetering in kommunikasienetwerke moet kan aanpas by veranderings in netwerktopologie en verkeersverspreidings. Ons bestudeer die bruikbaarheid van 'n aanpasbare roeteringsalgoritme gebaseer op die "Q-Learning"-algoritme wat dit moontlik maak om 'n reeks besluite te kan neem gebaseer op vertraagde vergoedings. Die roeteringsalgoritme gebruik slegs nabygelee inligting om roeteringsbesluite te maak en konvergeer na 'n optimale oplossing. Ons demonstreer dat die roeteringsalgoritme 'n goeie alternatief vir aanpasbare roetering is, aangesien dit in baie opsigte beter vaar as die Bellman-Ford algoritme. Ons bestudeer ook variasies van die roeteringsalgoritme wat beter paaie kan ontdek, minder geheue gebruik by netwerkelemente, en wat meer as een doelfunksie kan optimeer. Computer networks Computer algorithms Telecommunication Routing policies Pocket-switched communication networks Dissertations -- Computer science Theses -- Computer science Dissertations -- Mathematical sciences Theses -- Mathematical sciences
677	Aplicativo computacional para obtenção de probabilidades a priori de classificação errônea em experimentos agronômicos / Padovani, Carlos Roberto Pereira, 1975- January 2007 (has links) Orientador: Flávio Ferrari Aragon / Banca: Adriano Wagner Ballarin / Banca: Luís Fernando Nicolosi Bravin / Banca: Rui Vieira de Moraes / Banca: Sandra Fiorelli de Almeida P. Simeão / Resumo: Nas Ciências Agronômicas, encontram-se várias situações em que são observadas diversas variáveis respostas nas parcelas ou unidades experimentais. Nestas situações, um caso de interesse prático à experimentação agronômica é o que considera a construção de regiões de similaridade entre as parcelas para a discriminação entre os grupos experimentais e ou para a classificação de novas unidades experimentais em uma dessas regiões. Os métodos de classificação ou discriminação exigem, para sua utilização prática, uma quantidade considerável de retenção de informação da estrutura de variabilidade dos dados e, principalmente, alta fidedignidade e competência nas alocações de novos indivíduos nos grupos, mostradas nas distribuições corretas destes indivíduos. Existem vários procedimentos para medir o grau de decisão correta (acurácia) das informações fornecidas pelos métodos classificatórios. Praticamente, a totalidade deles utilizam a probabilidade de classificação errônea como o indicador de qualidade, sendo alguns destes freqüentistas (probabilidade estimada pela freqüência relativa de ocorrências - métodos não paramétricos) e outros baseados nas funções densidade de probabilidade das populações (métodos paramétricos). A principal diferença entre esses procedimentos é a conceituação dada ao cálculo da probabilidade de classificação errônea. Pretende-se, no presente estudo, apresentar alguns procedimentos para estimar estas probabilidades, desenvolver um software para a obtenção das estimativas considerando a distância generalizada de Mahalanobis como o procedimento relativo à da função densidade de probabilidade para populações com distribuição multinormal . Este software será de acesso livre e de fácil manuseio para pesquisadores de áreas aplicadas, completado com o manual do usuário e com um exemplo de aplicação envolvendo divergência genética de girassol. / Abstract: In the Agronomical Sciences, mainly in studies involving biomass production and rational use of energy, there are several situations in which several variable answers in the parts or experimental units are observed. In these situations, a case of practical interest to the agronomical experimentation is that one which considers the construction of similarity regions among parts and or the classification of new experimental units. The classification methods demand, for their utilization, a considerable quantity for utilization of their information retention of data and, mostly, high fidelity and competence in the new individual allocations. There are several procedures to measure accuracy degree of the information supplied by the discrimination method. Practically all of them use the miss-classification probability (erroneous classification) like the quality indicator. The main difference among these evaluation methods is the characterization of the miss-classification probability. Therefore, the aim is to present some estimate procedures of the missclassification probabilities involving repetition frequency and distribution methods and to develop a software to obtain their estimate, which is accessible and easy handling for researchers of applied areas, complementing the study with user's manual and examples in the rational energy application and biomass energy. / Doutor Estruturas de dados (Computação) Algoritmos de computador. Função discriminante linear. Análise multivariada. Distância de Mahalanobis. Multivariate analysis. eng Linear discriminant function. eng Computer algorithms. eng
678	Smart Broadcast Protocol Design For Vehicular Ad hoc Networks Unknown Date (has links) Multi-hop broadcast is one of the main approaches to disseminate data in VANET. Therefore, it is important to design a reliable multi-hop broadcast protocol, which satis es both reachability and bandwidth consumption requirements. In a dense network, where vehicles are very close to each other, the number of vehicles needed to rebroadcast the message should be small enough to avoid a broad- cast storm, but large enough to meet the reachability requirement. If the network is sparse, a higher number of vehicles is needed to retransmit to provide a higher reachability level. So, it is obvious that there is a tradeo between reachability and bandwidth consumption. In this work, considering the above mentioned challenges, we design a number of smart broadcast protocols and evaluate their performance in various network den- sity scenarios. We use fuzzy logic technique to determine the quali cation of vehicles to be forwarders, resulting in reachability enhancement. Then we design a band- width e cient fuzzy logic-assisted broadcast protocol which aggressively suppresses the number of retransmissions. We also propose an intelligent hybrid protocol adapts to local network density. In order to avoid packet collisions and enhance reachability, we design a cross layer statistical broadcast protocol, in which the contention window size is adjusted based on the local density information. We look into the multi-hop broadcast problem with an environment based on game theory. In this scenario, vehicles are players and their strategy is either to volunteer and rebroadcast the received message or defect and wait for others to rebroadcast. We introduce a volunteer dilemma game inspired broadcast scheme to estimate the probability of forwarding for the set of potential forwarding vehicles. In this scheme we also introduce a fuzzy logic-based contention window size adjustment system. Finally, based on the estimated spatial distribution of vehicles, we design a transmission range adaptive scheme with a fuzzy logic-assisted contention window size system, in which a bloom lter method is used to mitigate overhead. Extensive experimental work is obtained using simulation tools to evaluate the performance of the proposed schemes. The results con rm the relative advantages of the proposed protocols for di erent density scenarios. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2017. / FAU Electronic Theses and Dissertations Collection Mobile communication systems. Wireless sensor networks. Computer algorithms.
679	Mechanisms for prolonging network lifetime in wireless sensor networks Unknown Date (has links) Sensors are used to monitor and control the physical environment. A Wireless Sen- sor Network (WSN) is composed of a large number of sensor nodes that are densely deployed either inside the phenomenon or very close to it [18][5]. Sensor nodes measure various parameters of the environment and transmit data collected to one or more sinks, using hop-by-hop communication. Once a sink receives sensed data, it processes and forwards it to the users. Sensors are usually battery powered and it is hard to recharge them. It will take a limited time before they deplete their energy and become unfunctional. Optimizing energy consumption to prolong network lifetime is an important issue in wireless sensor networks. In mobile sensor networks, sensors can self-propel via springs [14], wheels [20], or they can be attached to transporters, such as robots [20] and vehicles [36]. In static sensor networks with uniform deployment (uniform density), sensors closest to the sink will die first, which will cause uneven energy consumption and limitation of network life- time. In the dissertation, the nonuniform density is studied and analyzed so that the energy consumption within the monitored area is balanced and the network lifetime is prolonged. Several mechanisms are proposed to relocate the sensors after the initial deployment to achieve the desired density while minimizing the total moving cost. Using mobile relays for data gathering is another energy efficient approach. Mobile sensors can be used as ferries, which carry data to the sink for static sensors so that expensive multi-hop communication and long distance communication are reduced. In this thesis, we propose a mobile relay based routing protocol that considers both energy efficiency and data delivery delay. It can be applied to both event-based reporting and periodical report applications. / Another mechanism used to prolong network lifetime is sensor scheduling. One of the major components that consume energy is the radio. One method to conserve energy is to put sensors to sleep mode when they are not actively participating in sensing or data relaying. This dissertation studies sensor scheduling mechanisms for composite event detection. It chooses a set of active sensors to perform sensing and data relaying, and all other sensors go to sleep to save energy. After some time, another set of active sensors is chosen. Thus sensors work alternatively to prolong network lifetime. / by Yinying Yang. / Vita. / Thesis (Ph.D.)--Florida Atlantic University, 2010. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2010. Mode of access: World Wide Web. Sensor networks--Design and construction Computer algorithms Computer network protocols
680	An Ant Inspired Dynamic Traffic Assignment for VANETs: Early Notification of Traffic Congestion and Traffic Incidents Unknown Date (has links) Vehicular Ad hoc NETworks (VANETs) are a subclass of Mobile Ad hoc NETworks and represent a relatively new and very active field of research. VANETs will enable in the near future applications that will dramatically improve roadway safety and traffic efficiency. There is a need to increase traffic efficiency as the gap between the traveled and the physical lane miles keeps increasing. The Dynamic Traffic Assignment problem tries to dynamically distribute vehicles efficiently on the road network and in accordance with their origins and destinations. We present a novel dynamic decentralized and infrastructure-less algorithm to alleviate traffic congestions on road networks and to fill the void left by current algorithms which are either static, centralized, or require infrastructure. The algorithm follows an online approach that seeks stochastic user equilibrium and assigns traffic as it evolves in real time, without prior knowledge of the traffic demand or the schedule of the cars that will enter the road network in the future. The Reverse Online Algorithm for the Dynamic Traffic Assignment inspired by Ant Colony Optimization for VANETs follows a metaheuristic approach that uses reports from other vehicles to update the vehicle’s perceived view of the road network and change route if necessary. To alleviate the broadcast storm spontaneous clusters are created around traffic incidents and a threshold system based on the level of congestion is used to limit the number of incidents to be reported. Simulation results for the algorithm show a great improvement on travel time over routing based on shortest distance. As the VANET transceivers have a limited range, that would limit messages to reach at most 1,000 meters, we present a modified version of this algorithm that uses a rebroadcasting scheme. This rebroadcasting scheme has been successfully tested on roadways with segments of up to 4,000 meters. This is accomplished for the case of traffic flowing in a single direction on the roads. It is anticipated that future simulations will show further improvement when traffic in the other direction is introduced and vehicles travelling in that direction are allowed to use a store carry and forward mechanism. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2016. / FAU Electronic Theses and Dissertations Collection Artificial intelligence. Intelligent transportation systems. Intelligent control systems. Mobile computing. Computer algorithms. Combinatorial optimization.

Search results