291 |
Unsupervised Categorical Clustering on Labor MarketsSteffen, Matthew James 10 April 2023 (has links)
During this "white collar recession,'' there is a flooded labor market of workers. For employers seeking to hire, there is a need to identify potential qualified candidates for each job. The current state of the art is LinkedIn Recruiting or elastic search on Resumes. The current state of the art lacks efficiency and scalability along with an intuitive ranking of candidates. We believe this can be fixed with multi-layer categorical clustering via modularity maximization. To test this, we gathered a dataset that is extensive and representative of the job market. Our data comes from PeopleDataLabs and LinkedIn and is sampled from 153 million individuals. As such, this data represents one of the most informative datasets for the task of ranking and clustering job titles and skills. Properly grouping individuals will help identify more candidates to fulfill the multitude of vacant positions. We implement a novel framework for categorical clustering, involving these attributes to deliver a reliable pool of candidates. We develop a metric for clustering based on commonality to rank clustering algorithms. The metric prefers modularity-based clustering algorithms like the Louvain algorithm. This allows us to use such algorithms to outperform other unsupervised methods for categorical clustering. Our implementation accurately clusters emergency services, health-care and other fields while managerial positions are interestingly swamped by soft or uninformative features thereby resulting in dominant ambiguous clusters.
|
292 |
A SYSTEMATIC STUDY OF SPARSE DEEP LEARNING WITH DIFFERENT PENALTIESXinlin Tao (13143465) 25 April 2023 (has links)
<p>Deep learning has been the driving force behind many successful data science achievements. However, the deep neural network (DNN) that forms the basis of deep learning is</p>
<p>often over-parameterized, leading to training, prediction, and interpretation challenges. To</p>
<p>address this issue, it is common practice to apply an appropriate penalty to each connection</p>
<p>weight, limiting its magnitude. This approach is equivalent to imposing a prior distribution</p>
<p>on each connection weight from a Bayesian perspective. This project offers a systematic investigation into the selection of the penalty function or prior distribution. Specifically, under</p>
<p>the general theoretical framework of posterior consistency, we prove that consistent sparse</p>
<p>deep learning can be achieved with a variety of penalty functions or prior distributions.</p>
<p>Examples include amenable regularization penalties (such as MCP and SCAD), spike-and?slab priors (such as mixture Gaussian distribution and mixture Laplace distribution), and</p>
<p>polynomial decayed priors (such as the student-t distribution). Our theory is supported by</p>
<p>numerical results.</p>
<p><br></p>
|
293 |
Extraction of Loudspeaker- and room impulse responses under overlapping conditionsGustafsson, Felix January 2022 (has links)
A loudspeaker is often considered to be a Linear Time Invariant (LTI) system, which can be completely categorized by its impulse response. What sets loudspeakers apart from other LTI-systems is the acoustical aspect including echoes, which makes it a lot harder to take accurate noise free measurements compared to other LTI-systems such as a simple RC circuit. There are two main challenges regarding loudspeaker measurement, the first is high frequency reflections of surrounding surfaces and the second is low frequency modal resonances in the room stemming from the initial echoes. A straightforward way of dealing with this issue is simply truncating the measured impulse response before the arrival of the first high frequency reflection. This is however not without its problems as this will result in high uncertainty for low frequency content of the measurement. The longer time until the first reflection is measured, the better the measurement. The ideal measurement would be a noise free environment with infinite distance towards the nearest reflective surface. This is of course not possible in practice, but this ideal environment can be simulated by using an anechoic chamber. This thesis investigates the possibility of creating pseudo anechoic measurements in a general room using optimization with information extracted from measurement data in combination with linear time-varying (LTV) filtering. Algorithms for extracting information such as time delay between reflections as well as compensation for distortion in the reflections have been developed. This information is later used to minimize a cost function in order to obtain an estimation of the loudspeakers' impulse response using multiple measurements. The resulting estimation is then filtered using the LTV filter in order to obtain the pseudo anechoic impulse response. This thesis investigates two different loudspeakers in two ordinary rooms as well as in an anechoic chamber, and evaluates the performance of the developed methods. The overall results seem promising, but due to some inconsistencies of the measurements taken in the anechoic chamber that changes the direct wave of the loudspeakers, the developed methods are unable to achieve a true anechoic impulse response. It is concluded that to be able to achieve true pseudo anechoic results, measurements in rooms must better resemble the ones taken inside the anechoic chamber. This in combination with tuning the hyper parameters of the LTV filter looks promising to achieve pseudo anechoic impulse responses with high correlation to the true anechoic measurements.
|
294 |
Taming Wild Faces: Web-Scale, Open-Universe Face Identification in Still and Video ImageryOrtiz, Enrique 01 January 2014 (has links)
With the increasing pervasiveness of digital cameras, the Internet, and social networking, there is a growing need to catalog and analyze large collections of photos and videos. In this dissertation, we explore unconstrained still-image and video-based face recognition in real-world scenarios, e.g. social photo sharing and movie trailers, where people of interest are recognized and all others are ignored. In such a scenario, we must obtain high precision in recognizing the known identities, while accurately rejecting those of no interest. Recent advancements in face recognition research has seen Sparse Representation-based Classification (SRC) advance to the forefront of competing methods. However, its drawbacks, slow speed and sensitivity to variations in pose, illumination, and occlusion, have hindered its wide-spread applicability. The contributions of this dissertation are three-fold: 1. For still-image data, we propose a novel Linearly Approximated Sparse Representation-based Classification (LASRC) algorithm that uses linear regression to perform sample selection for l1-minimization, thus harnessing the speed of least-squares and the robustness of SRC. On our large dataset collected from Facebook, LASRC performs equally to standard SRC with a speedup of 100-250x. 2. For video, applying the popular l1-minimization for face recognition on a frame-by-frame basis is prohibitively expensive computationally, so we propose a new algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and employing the knowledge that the face track frames belong to the same individual. Employing MSSRC results in a speedup of 5x on average over SRC on a frame-by-frame basis. 3. Finally, we make the observation that MSSRC sometimes assigns inconsistent identities to the same individual in a scene that could be corrected based on their visual similarity. Therefore, we construct a probabilistic affinity graph combining appearance and co-occurrence similarities to model the relationship between face tracks in a video. Using this relationship graph, we employ random walk analysis to propagate strong class predictions among similar face tracks, while dampening weak predictions. Our method results in a performance gain of 15.8% in average precision over using MSSRC alone.
|
295 |
Sparse Aperture Speckle Interferometry Telescope Active Optics Control SystemClause, Matthew 01 December 2015 (has links) (PDF)
A conventional large aperture telescope required for binary star research is typically cost prohibitive. A prototype active optics system was created and fitted to a telescope frame using relatively low cost components. The active optics system was capable of tipping, tilting, and elevating the mirrors to align reflected star light. The low cost mirror position actuators have a resolution of 31 nm, repeatable to within 16 nm. This is accurate enough to perform speckle analysis for the visible light spectrum. The mirrors used in testing were not supported with a whiffletree and produced trefoil-like aberrations which made phasing two mirrors difficult.
The active optics system was able to successfully focus and align the mirrors through manual adjustment. Interference patterns could not be found due to having no method of measuring the mirror surfaces, preventing proper mirror alignment and phasing. Interference from air turbulence and trefoil-like aberrations further complicated this task. With some future project additions, this system has the potential to be completely automated. The success of the active optics actuators makes for a significant step towards a fully automated sparse aperture telescope.
|
296 |
Methods for Characterizing Groundwater Resources with Sparse In-Situ DataNishimura, Ren 14 June 2022 (has links)
Groundwater water resources must be accurately characterized in order to be managed sustainably. Due to the cost to install monitoring wells and challenges in collecting and managing in-situ data, groundwater data is sparse in space and time especially in developing countries. In this study we analyzed long-term groundwater storage changes with limited times-series data where each well had only one groundwater measurement in time. We developed methods to synthetically create times-series groundwater table elevation (WTE) by clustering wells with uniform grid and k-means-constrained clustering and creating pseudo wells. Pseudo wells with the WTE values from the cluster-member wells were temporally and spatially interpolated to analyze groundwater changes. We used the methods for the Beryl-Enterprise aquifer in Utah where other researchers quantified the groundwater storage depletion rate in the past, and the methods yielded a similar storage depletion rate. The method was then applied to the southern region in Niger and the result showed a ground water storage change that partially matched with the trend calculated by the GRACE data. With a limited data set that regressions or machine learning did not work, our method captured the groundwater storage trend correctly and can be used for the area where in-situ data is highly limited in time and space.
|
297 |
Coded Acquisition of High Speed Videos with Multiple CamerasPournaghi, Reza 10 April 2015 (has links)
High frame rate video (HFV) is an important investigational tool in sciences, engineering and military. In ultrahigh speed imaging, the obtainable temporal, spatial and spectral resolutions are limited by the sustainable throughput of in-camera mass memory, the lower bound of exposure time, and illumination conditions. In order to break these bottlenecks, we propose a new coded video acquisition framework that employs K>1 cameras, each of which makes random measurements of the video signal in both temporal and spatial domains. For each of the K cameras, this multi-camera strategy greatly relaxes the stringent requirements in memory speed, shutter speed, and illumination strength. The recovery of HFV from these random measurements is posed and solved as a large scale l1 minimization problem by exploiting joint temporal and spatial sparsities of the 3D signal. Three coded video acquisition techniques of varied trade o s between performance and hardware complexity are developed: frame-wise coded acquisition, pixel-wise coded acquisition, and column-row-wise coded acquisition. The performances of these techniques are analyzed in relation to the sparsity of the underlying video signal.
To make ultra high speed cameras of coded exposure more practical and a fordable, we develop a coded exposure video/image acquisition system by an innovative assembling of multiple rolling shutter cameras. Each of the constituent rolling shutter cameras adopts a random pixel read-out mechanism by simply changing the read out order of pixel rows from sequential to random.
Simulations of these new image/video coded acquisition techniques are carried out and experimental results are reported. / Dissertation / Doctor of Philosophy (PhD)
|
298 |
GRAPH-BASED ANALYSIS OF NON-RANDOM MISSING DATA PROBLEMS WITH LOW-RANK NATURE: STRUCTURED PREDICTION, MATRIX COMPLETION AND SPARSE PCAHanbyul Lee (17586345) 09 December 2023 (has links)
<p dir="ltr">In most theoretical studies on missing data analysis, data is typically assumed to be missing according to a specific probabilistic model. However, such assumption may not accurately reflect real-world situations, and sometimes missing is not purely random. In this thesis, our focus is on analyzing incomplete data matrices without relying on any probabilistic model assumptions for the missing schemes. To characterize a missing scheme deterministically, we employ a graph whose adjacency matrix is a binary matrix that indicates whether each matrix entry is observed or not. Leveraging its graph properties, we mathematically represent the missing pattern of an incomplete data matrix and conduct a theoretical analysis of how this non-random missing pattern affects the solvability of specific problems related to incomplete data. This dissertation primarily focuses on three types of incomplete data problems characterized by their low-rank nature: structured prediction, matrix completion, and sparse PCA.</p><p dir="ltr">First, we investigate a basic structured prediction problem, which involves recovering binary node labels on a fixed undirected graph, where noisy binary observations corresponding to edges are given. Essentially, this setting parallels a simple binary rank-1 symmetric matrix completion problem, where missing entries are determined by a fixed undirected graph. Our aim is to establish the fundamental limit bounds of this problem, revealing a close association between the limits and graph properties, such as connectivity.</p><p dir="ltr">Second, we move on to the general low-rank matrix completion problem. In this study, we establish provable guarantees for exact and approximate low-rank matrix completion problems that can be applied to any non-random missing pattern, by utilizing the observation graph corresponding to the missing scheme. We theoretically and experimentally show that the standard constrained nuclear norm minimization algorithm can successfully recover the true matrix when the observation graph is well-connected and has similar node degrees. We also verify that matrix completion is achievable with a near-optimal sample complexity rate when the observation graph has uniform node degrees and its adjacency matrix has a large spectral gap.</p><p dir="ltr">Finally, we address the sparse PCA problem, featuring an approximate low-rank attribute. Missing data is common in situations where sparse PCA is useful, such as single-cell RNA sequence data analysis. We propose a semidefinite relaxation of the non-convex $\ell_1$-regularized PCA problem to solve sparse PCA on incomplete data. We demonstrate that the method is particularly effective when the observation pattern has favorable properties. Our theory is substantiated through synthetic and real data analysis, showcasing the superior performance of our algorithm compared to other sparse PCA approaches, especially when the observed data pattern has specific characteristics.</p>
|
299 |
Process Monitoring and Control of Advanced Manufacturing based on Physics-Assisted Machine LearningChung, Jihoon 05 July 2023 (has links)
With the advancement of equipment and the development of technology, the manufacturing process is becoming more and more advanced. This appears as an advanced manufacturing process that uses innovative technology, including robotics, artificial intelligence, and autonomous systems. Additive manufacturing (AM), also known as 3D printing, is the representative advanced manufacturing technology that creates 3D geometries in a layer-by-layer fashion with various types of materials.
However, quality assurance in the manufacturing process requires high expectations as the process develops. Therefore, the objective of this dissertation is to propose innovative methodologies for process monitoring and control to achieve quality assurance in advanced manufacturing.
The development of sensor technologies and computational power offer process data, providing opportunities to achieve effective quality assurance through a machine learning approach. Hence, exploring the connections between sensor data and process quality using machine learning methodologies would be advantageous. Although this direction is promising, some constraints and complex process dynamics in the actual process hinder achieving quality assurance from the existing machine learning methods.
To address these challenges, several machine learning approaches assisted by the physics knowledge obtained from the process have been proposed in this dissertation. These approaches are successfully validated by various manufacturing processes, including AM and multistage assembly processes. Specifically, three new methodologies are proposed and developed, as listed below.
-To detect the process anomalies with imbalanced process data due to different ratios of occurrence between process states, a new Generative Adversarial Network (GAN)-based method is proposed. The proposed method jointly optimizes the GAN and classifier to augment realistic and state-distinguishable images to provide balanced data. Specifically, the method utilizes the knowledge and features of normal process data to generate effective abnormal process data. The benefits of the proposed approach have been confirmed in both polymer AM and metal AM processes.
-To diagnose process faults with a limited number of sensors caused by the physical constraints in the multistage assembly process, a novel sparse Bayesian learning is proposed. The method is based on a practical assumption that it will likely have a few process faults (sparse). In addition, the temporal correlation of process faults and the prior knowledge of process faults are considered through the Bayesian framework. Based on the proposed method, process faults can be accurately identified with limited sensors.
-To achieve online defect mitigation of new defects that occurred during the printing due to the complex process dynamics of the AM process, a novel Reinforcement Learning (RL)-based algorithm is proposed. The proposed method is to learn the machine parameter adjustment to mitigate the new defects during the printing. The method transfers knowledge learned from various sources in the AM process to RL. Therefore, with a theoretical guarantee, the proposed method learns the mitigation strategy with fewer training samples than traditional RL.
By overcoming the challenges in the process, the above-proposed methodologies successfully achieve quality assurance in the advanced manufacturing process. Furthermore, the methods are not designed for the typical processes. Therefore, they can easily be applied to other domains, such as healthcare systems. / Doctor of Philosophy / The development of equipment and technologies has led to advanced manufacturing processes. Along with that, quality assurance in the manufacturing processes has become a very important issue. Therefore, the objective of this dissertation is to accomplish quality assurance by developing advanced machine learning approaches.
In this dissertation, several advanced machine learning methodologies using the physics knowledge from the process are proposed. These methods overcome some constraints and complex process dynamics of the actual process that degrade the performance of existing machine learning methodologies in achieving quality assurance. To validate the effectiveness of the proposed methodologies, various advanced manufacturing processes, including additive manufacturing and multistage assembly processes, are utilized. The performance of the proposed methodologies provides superior results for achieving quality assurance in various scenarios compared to existing state-of-the-art machine learning methods.
The applications of the achievements in this dissertation are not limited to the manufacturing process. Therefore, the proposed machine learning approaches can be further extended to other application areas, such as healthcare systems.
|
300 |
Airfoil analysis and design using surrogate modelsMichael, Nicholas Alexander 01 May 2020 (has links)
A study was performed to compare two different methods for generating surrogate models for the analysis and design of airfoils. Initial research was performed to compare the accuracy of surrogate models for predicting the lift and drag of an airfoil with data collected from highidelity simulations using a modern CFD code along with lower-order models using a panel code. This was followed by an evaluation of the Class Shape Trans- formation (CST) method for parameterizing airfoil geometries as a prelude to the use of surrogate models for airfoil design optimization and the implementation of software to use CST to modify airfoil shapes as part of the airfoil design process. Optimization routines were coupled with surrogate modeling techniques to study the accuracy and efficiency of the surrogate models to produce optimal airfoil shapes. Finally, the results of the current research are summarized, and suggestions are made for future research.
|
Page generated in 0.0532 seconds