511 |
Layout Detection and Table Recognition: Recent Challenges in Digitizing Historical Documents and Handwritten Tabular DataLehenmeier, Constantin, Burghardt, Manuel, Mischka, Bernadette 11 June 2024 (has links)
In this paper, we discuss the computer-aided processing of handwritten tabular
records of historical weather data. The observationes meteorologicae, which are housed by the
Regensburg University Library, are one of the oldest collections of weather data in Europe.
Starting in 1771, meteorological data was consistently documented in a standardized form
over almost 60 years by several writers. The tabular structure, as well as the unconstrained
textual layout of comments and the use of historical characters, propose various challenges
in layout and text recognition. We present a customized strategy to digitize tabular and
handwritten data by combining various state-of-the-art methods for OCR processing to fit
the collection. Since the recognition of historical documents still poses major challenges,
we provide lessons learned from experimental testing during the first project stages. Our
results show that deep learning methods can be used for text recognition and layout detection.
However, they are less efficient for the recognition of tabular structures. Furthermore,
a tailored approach had to be developed for the historical meteorological characters during
the manual creation of ground truth data. The customized system achieved an accuracy
rate of 82% for the text recognition of the heterogeneous handwriting and 87% accuracy
for layout recognition of the tables.
|
512 |
Release and monitoring of Laricobius nigrinus (Coleoptera: Derodontidae) for biological control of the hemlock woolly adelgid in the eastern USMausel, Dave L. 10 December 2007 (has links)
Different Laricobius nigrinus Fender release locations, numbers of predators, and timing of release were evaluated for biological control of the hemlock woolly adelgid (HWA), Adelges tsugae Annand (Hemiptera: Adelgidae). It established at 59% of the sites and location was the most important factor related with establishment and abundance, HWA density, and hemlock vigor index. Cold locations had poor establishment or low abundance, declines in HWA density, and increases in hemlock vigor over time. Paired release and control sites detected a predator impact on HWA density, but densities remained high and tree vigor declined. The phenology of L. nigrinus, L. rubidus LeConte, and HWA were studied at a field insectary and the species were highly synchronized. A cage exclusion study showed that HWA survival and density were lower and ovisac disturbance was higher when exposed to predation. To improve L. nigrinus monitoring, we compared beat sheets for adults or branch clipping for immatures, and the host searching behavior of L. nigrinus was studied to understand how it locates a tree and HWA. In the Appalachians, beat sheet sampling resulted in false negatives as larvae were collected by branch clipping. Adults orientated to a tree visually, fed when prey were present and flew when absent, and showed different search patterns on infested versus uninfested trees. In Seattle, both sampling methods detected L. nigrinus because the predator was common. Predator : prey ratios were high at heavily infested sites in Seattle and low in the eastern US, where is has been released recently. Partial life tables were constructed for HWA sistentes at four sites for 2 yr in Seattle. Unspecified causes of nymph and adult mortality were high and L. nigrinus was the dominant predator of ovisacs. Adult L. nigrinus abundance was positively related to HWA density and immature abundance was related to ovisac density, indicating an aggregation and numerical response to its prey. Laricobius nigrinus has not demonstrated complete biological control of HWA to date, but it may do so in the future and continued release is justified. / Ph. D.
|
513 |
A Linear Programming Method for Synthesizing Origin-Destination (O-D) Trip Tables from Traffic Counts for Inconsistent SystemsLei, Peng 10 August 1998 (has links)
Origin-Destination (O-D) trip tables represent the demand-supply information of each directed zonal-pair in a given region during a given period of time. The effort of this research is to develop a linear programming methodology for estimating O-D trip tables based on observed link volumes. In order to emphasize the nature of uncertainty in the data and in the problem, the developed model permits the user's knowledge of path travel time to vary within a band-width of values, and accordingly modifies the user-optimality principle. The data on the observed flows might also not be complete and need not be perfectly matched. In addition, a prior trip table could also be specified in order to guide the updating process via the model solution. To avoid excessive computational demands required by a total numeration of all possible paths between each O-D pair, a Column Generation Algorithm (CGA) is adopted to exploit the special structures of the model. Based on the known capacity of each link, a simple formula is suggested to calculate the cost for the links having unknown volumes. An indexed cost is introduced to avoid the consideration of unnecessary passing-through-zone paths, and an algorithm for solving the corresponding minimum-cost-path problem is developed. General principles on the design of an object-oriented code are presented, and some useful programming techniques are suggested for this special problem. Some test results on the related models are presented and compared, and different sensitivity analyses are performed based on different scenarios. Finally, several research topics are recommended for future research. / Master of Science
|
514 |
Preferences of Social Interaction for Environmental Attributes Among Grandparents Who Are Taking Care of Grandchildren in Two Chinese Residential Communities Located in Shanghai, ChinaCao, Fan 21 June 2016 (has links)
The present thesis examines questionnaire responses regarding optimal environmental attributes of public outdoor spaces for Chinese grandparents who are taking care of their grandchildren within selected urban residential communities in Shanghai, China. This thesis also assesses the needs of these grandparents providing childcare against the environmental attributes of urban public spaces. It uses the results to formulate design recommendations that will facilitate increased social interaction between grandparents with grandchildren and other persons in open public spaces of residential communities. Public spaces are often excellent locations for social interaction between grandparents with other persons within communities. Recently, there has been an increase in the number of Chinese grandparents providing childcare for their grandchildren, and many choose to spend time with grandchildren in these public open spaces. However, the needs and preferences of this demographic do not necessarily align with those of the general population.
The current literature has identified five primary environmental attributes (access, comfort, opportunities of meeting, potential sensory elements, visibility) related to social interaction, each composed of a variety of landscape elements and characteristics. A framework was constructed based on these five environmental attributes and a variety of landscape elements and characteristics, and used to formulate a questionnaire for 46 grandparents, who take care of their grandchildren and live in high-rise buildings were surveyed. The selected participants were witnessed watching over their grandchildren in open spaces or the accompanying facilities and were asked to express a level of preference for a series of landscape elements presented in a questionnaire. The survey also included questions regarding demographic information. Descriptive and inferential analysis were then carried out through the survey data.
The intended result of the study involved establishing a set of landscape architectural design recommendations that could be used in order to meet the preferences of this portion of society. Ideally, the findings will assist those involved in designing and managing outdoor environments in identifying the most salient environmental attributes for this growing sector of the Chinese community. The study could also help to prioritize interventions aimed at improving the use of open spaces and promoting social interaction among grandparents or grandparents with other neighbors. The approach also identified which landscape elements were most likely to attract grandparents to visit and stay in neighborhoods' open spaces longer with their grandchildren. Ideally, an outdoor public space designed following this set of design recommendations would contain the preferred environmental attributes and landscape elements of grandparents and their grandchildren and would provide more opportunities for social interaction. / Master of Landscape Architecture
|
515 |
Applicability of Stormwater Best Management Practices in the Virginia Coastal PlainJohnson, Rachael Diane 06 June 2016 (has links)
The Virginia Runoff Reduction Method (RRM) was adopted in 2014 as a compliance tool for evaluation of stormwater volume and quality, and necessitates use of urban stormwater best management practices (BMPs) to meet regulatory standards. Coastal Virginia is characterized by flat terrain, shallow water tables, and low permeable soils that may limit the application of BMPs as recommended by state regulations. Soil morphological features are often used to estimate the seasonal high water table (SHWT) for initial feasibility, but existing soil data misrepresented expected SHWT depths in the Virginia Beach, VA, study area. A GIS-based methodology relying on perennial surface water elevations and USGS groundwater monitoring data was developed to estimate the SHWT depth in Virginia Beach. The SHWT map was shown to be consistently more reliable than available predictions based on soil morphology, and was used as input to a BMP siting tool. The tool, known as BMP Checker, was developed to explore how flat terrain, shallow water tables, and poor soils influence BMP siting in coastal Virginia. The BMP Checker algorithm was validated on 11 Virginia Beach sites before application on 10,000 ft2 (929 m2) area sections across the city. Citywide application showed that the most widely applicable BMPs in the study area include wet ponds that intercept groundwater and constructed wetlands. Conversely, sheet flow to conservation area and infiltration practices are the least applicable. Because the RRM assigns more credit to infiltration-based practices, sites in Virginia Beach may find it difficult to meet regulatory standards. / Master of Science
|
516 |
EdgeFn: A Lightweight Customizable Data Store for Serverless Edge ComputingPaidiparthy, Manoj Prabhakar 01 June 2023 (has links)
Serverless Edge Computing is an extension of the serverless computing paradigm that enables the deployment and execution of modular software functions on resource-constrained edge devices. However, it poses several challenges due to the edge network's dynamic nature and serverless applications' latency constraints. In this work, we introduce EdgeFn, a lightweight distributed data store for the serverless edge computing system. While serverless comput- ing platforms simplify the development and automated management of software functions, running serverless applications reliably on resource-constrained edge devices poses multiple challenges. These challenges include a lack of flexibility, minimum control over management policies, high data shipping, and cold start latencies. EdgeFn addresses these challenges by providing distributed data storage for serverless applications and allows users to define custom policies that affect the life cycle of serverless functions and their objects. First, we study the challenges of existing serverless systems to adapt to the edge environment. Sec- ond, we propose a distributed data store on top of a Distributed Hash Table (DHT) based Peer-to-Peer (P2P) Overlay, which achieves data locality by co-locating the function and its data. Third, we implement programmable callbacks for storage operations which users can leverage to define custom policies for their applications. We also define some use cases that can be built using the callbacks. Finally, we evaluate EdgeFn scalability and performance using industry-generated trace workload and real-world edge applications. / Master of Science / Serverless Edge Computing is an extension of the serverless computing paradigm that enables the deployment and execution of modular software functions on resource-constrained edge devices. However, it poses several challenges due to the edge network's dynamic nature and serverless applications' latency constraints. In this work, we introduce EdgeFn, a lightweight distributed data store for the serverless edge computing system. While serverless comput- ing platforms simplify the development and automated management of software functions, running serverless applications reliably on resource-constrained edge devices poses multiple challenges. These challenges include a lack of flexibility, minimum control over management policies, high data shipping, and cold start latencies. EdgeFn addresses these challenges by providing distributed data storage for serverless applications and allows users to define custom policies that affect the life cycle of serverless functions and their objects. First, we study the challenges of existing serverless systems to adapt to the edge environment. Sec- ond, we propose a distributed data store on top of a Distributed Hash Table (DHT) based Peer-to-Peer (P2P) Overlay, which achieves data locality by co-locating the function and its data. Third, we implement programmable callbacks for storage operations which users can leverage to define custom policies for their applications. We also define some use cases that can be built using the callbacks. Finally, we evaluate EdgeFn scalability and performance using industry-generated trace workload and real-world edge applications.
|
517 |
GraphDHT: Scaling Graph Neural Networks' Distributed Training on Edge Devices on a Peer-to-Peer Distributed Hash Table NetworkGupta, Chirag 03 January 2024 (has links)
This thesis presents an innovative strategy for distributed Graph Neural Network (GNN) training, leveraging a peer-to-peer network of heterogeneous edge devices interconnected through a Distributed Hash Table (DHT). As GNNs become increasingly vital in analyzing graph-structured data across various domains, they pose unique challenges in computational demands and privacy preservation, particularly when deployed for training on edge devices like smartphones. To address these challenges, our study introduces the Adaptive Load- Balanced Partitioning (ALBP) technique in the GraphDHT system. This approach optimizes the division of graph datasets among edge devices, tailoring partitions to the computational capabilities of each device. By doing so, ALBP ensures efficient resource utilization across the network, significantly improving upon traditional participant selection strategies that often overlook the potential of lower-performance devices. Our methodology's core is weighted graph partitioning and model aggregation in GNNs, based on partition ratios, improving training efficiency and resource use. ALBP promotes inclusive device participation in training, overcoming computational limits and privacy concerns in large-scale graph data processing. Utilizing a DHT-based system enhances privacy in the peer-to-peer setup. The GraphDHT system, tested across various datasets and GNN architectures, shows ALBP's effectiveness in distributed GNN training and its broad applicability in different domains and structures. This contributes to applied machine learning, especially in optimizing distributed learning on edge devices. / Master of Science / Graph Neural Networks (GNNs) are a type of machine learning model that focuses on analyzing data structured like a network, such as social media connections or biological systems. These models can help identify patterns and make predictions in various tasks, but training them on large-scale datasets can require significant computing power and careful handling of sensitive data. This research proposes a new method for training GNNs on small devices, like smartphones, by dividing the data into smaller pieces and using a peer-to-peer (p2p) network for communication between devices. This approach allows the devices to work together and learn from the data while keeping sensitive information private. The main contributions of this research are threefold: (1) examining existing ways to divide network data and how they can be used for training GNNs on small devices, (2) improving the training process by creating a localized, decentralized network of devices that can communicate and learn together, and (3) testing the method on different types of datasets and GNN models, showing that it works well across a variety of situations. To sum up, this research offers a novel way to train GNNs on small devices, allowing for more efficient learning and better protection of sensitive information.
|
518 |
Scaled: Scalable Federated Learning via Distributed Hash Table Based OverlaysKim, Taehwan 14 April 2022 (has links)
In recent years, Internet-of-Things (IoT) devices generate a large amount of personal data.
However, due to the privacy concern, collecting the private data in cloud centers for training Machine Learning (ML) models becomes unrealistic. To address this problem, Federated Learning (FL) is proposed. Yet, central bottleneck has become a severe concern since the central node in traditional FL is responsible for the communication and aggregation of mil- lions of edge devices. In this paper, we propose Scalable Federated Learning via Distributed Hash Table Based Overlays for network (Scaled) to conduct multiple concurrently running FL-based applications over edge networks. Specifically, Scaled adopts a fully decentral- ized multiple-master and multiple-slave architecture by exploiting Distributed Hash Table (DHT) based overlay networks. Moreover, Scaled improves the scalability and adaptability by involving all edge nodes in training, aggregating, and forwarding. Overall, we make the following contributions in the paper. First, we investigate the existing FL frameworks and discuss their drawbacks. Second, we improve the existing FL frameworks from centralized master-slave architecture by using DHT-based Peer-to-Peer (P2P) overlay networks. Third, we implement the subscription-based application-level hierarchical forest for FL training.
Finally, we demonstrate Scaled's scalability and adaptability over large scale experiments. / Master of Science / In recent years, Internet-of-Things (IoT) devices generate a large amount of personal data.
However, due to privacy concerns, collecting the private data in central servers for training Machine Learning (ML) models becomes unrealistic. To address this problem, Federated Learning (FL) is proposed. In traditional ML, data from edge devices (i.e. phones) should be collected to the central server to start model training. In FL, training results, instead of the data, are collected to perform training. The benefit of FL is that private data can never be leaked during the training. However, there is a major problem in traditional FL:
a single point of failure. When power to a central server goes down or the central server is disconnected from the system, it will lose all the data. To address this problem, Scaled:
Scalable Federated Learning via Distributed Hash Table Based Overlays is proposed. Instead of having one powerful main server, Scaled launches many different servers to distribute the workload. Moreover, since Scaled is able to build and manage multiple trees at the same time, it allows multi-model training.
|
519 |
Toward a periodic table of personality: mapping personality scales between the five-factor model and the circumplex modelWoods, S.A., Anderson, Neil 04 1900 (has links)
Yes / In this study we examine the structures of ten personality inventories widely used for personnel assessment, by mapping the scales of personality inventories (PIs) to the lexical Big Five circumplex model resulting in a ‘Periodic Table of Personality’. Correlations between 273 scales from ten internationally popular PIs with independent markers of the lexical Big Five are reported, based on data from samples in two countries (UK N = 286; USA N = 1,046), permitting us to map these scales onto the AB5C framework. Emerging from our findings we propose a common facet framework derived from the scales of the PIs in our study. These results provide important insights into the literature on criterion-related validity of personality traits, and enable researchers and practitioners to understand how different PI scales converge and diverge and how compound PI scales may be constructed or replicated. Implications for research and practice are considered.
|
520 |
Table Understanding for Information RetrievalPande, Ashwini K. 03 September 2002 (has links)
This thesis proposes a novel approach for finding tables in text files containing a mixture of unstructured and structured text. Tables may be arbitrarily complex because the data in the tables may themselves be tables and because the grouping of data elements displayed in a table may be very complex. Although investigators have proposed competence models to explain the structure of tables, there are no computationally feasible performance models for detecting and parsing general structures in real data. Our emphasis is placed on the investigation of a new statistical procedure for detecting basic tables in plain text documents. The main task here is defining and testing this theory in the context of the Odessa Digital Library. / Master of Science
|
Page generated in 0.0448 seconds