21 |
Parallel Algorithms for Real-time Motion PlanningMcNaughton, Matthew 01 July 2011 (has links)
For decades, humans have dreamed of making cars that could drive themselves, so that travel would be less taxing, and the roads safer for everyone. Toward this goal, we have made strides in motion planning algorithms for autonomous cars, using a powerful new computing tool, the parallel graphics processing unit (GPU).
We propose a novel five-dimensional search space formulation that includes both spatial and temporal dimensions, and respects the kinematic and dynamic constraints on a typical automobile. With this formulation, the search space grows linearly with the length of the path, compared to the exponential growth of other methods. We also propose a parallel search algorithm, using the GPU to tackle the curse of dimensionality directly and increase the number of plans that can be evaluated by an order of magnitude compared to a CPU implementation. With this larger capacity, we can evaluate a dense sampling of plans combining lateral swerves and accelerations that represent a range of effective responses to more on-road driving scenarios than have previously been addressed in the literature.
We contribute a cost function that evaluates many aspects of each candidate plan, ranking them all, and allowing the behavior of the vehicle to be fine-tuned by changing the ranking. We show that the cost function can be changed on-line by a behavioral planning layer to express preferred vehicle behavior without the brittleness induced by top-down planning architectures. Our method is particularly effective at generating robust merging behaviors, which have traditionally required a delicate and failure-prone coordination between multiple planning layers. Finally, we demonstrate our proposed planner in a variety of on-road driving scenarios in both simulation and on an autonomous SUV, and make a detailed comparison with prior work.
|
22 |
Attention-guided Algorithms to Retarget and Augment Animations, Stills, and VideosJain, Eakta 13 May 2012 (has links)
Still pictures, animations and videos are used by artists to tell stories visually. Computer graphics algorithms create visual stories too, either automatically, or, by assisting artists. Why is it so hard to create algorithms that perform like a trained visual artist? The reason is that artists think about where a viewer will look at and how their attention will flow across the scene, but algorithms do not have a similarly sophisticated understanding of the viewer.
Our key insight is that computer graphics algorithms should be designed to take into account how viewer attention is allocated. We first show that designing optimization terms based on viewers’ attentional priorities allows the algorithm to handle artistic license in the input data, such as geometric inconsistencies in hand-drawn shapes. We then show that measurements of viewer attention enables algorithms to infer high-level information about a scene, for example, the object of storytelling interest in every frame of a video.
All the presented algorithms retarget or augment the traditional form of a visual art. Traditional art includes artwork such as printed comics, i.e., pictures that were created before computers became mainstream. It also refers to artwork that can be created in the way it was done before computers, for example, hand-drawn animation and live action films. Connecting traditional art with computational algorithms allows us to leverage the unique strengths on either side. We demonstrate these ideas on three applications:
Retargeting and augmenting animations: Two widely practiced forms of animation are two-dimensional (2D) hand-drawn animation and three-dimensional (3D) computer animation. To apply the techniques of the 3D medium to 2D animation, researchers have attempted to compute 3D reconstructions of the shape and motion of the hand-drawn character, which are meant to act as their ‘proxy’ in the 3D environment. We argue that a perfect reconstruction is excessive because it does not leverage the characteristics of viewer attention. We present algorithms to generate a 3D proxy with different levels of detail, such that at each level the error terms account for quantities that will attract viewer attention. These algorithms allow a hand-drawn animation to be retargeted to a 3D skeleton and be augmented with physically simulated secondary effects.
Augmenting stills: Moves-on-stills is a technique to engage the viewer while presenting still pictures on television or in movies. This effect is widely used to augment comics to create ‘motion comics’. Though state of the art software, like iMovie, allows a user to specify the parameters of the camera move, it does not solve the problem of how the parameters are chosen. We believe that a good camera move respects the visual route designed by the artist who crafted the still picture; if we record the gaze of viewers looking at composed still pictures, we can reconstruct the artist’s intention. We show, through a perceptual study, that the artist succeeds in directing viewer attention in comic book pictures, and we present an algorithm to predict the parameters of camera moves-on-stills from statistics derived from eyetracking data.
Retargeting video: Video retargeting is the process of altering the original video to fit the new display size, while best preserving content and minimizing artifacts. Recent techniques define content as color, edges, faces and other image-based saliency features. We suggest that content is, in fact, what people look at. We introduce a novel operator that extends the classic “pan-and-scan” to introduce cuts in addition to automatic pans based on viewer eyetracking data. We also present a gaze-based evaluation criterion to quantify the performance of our operator.
|
23 |
Toward an Automated System for the Analysis of Cell Behavior: Cellular Event Detection and Cell Tracking in Time-lapse Live Cell MicroscopyHuh, Seungil 01 February 2013 (has links)
Time-lapse live cell imaging has been increasingly employed by biological and biomedical researchers to understand the underlying mechanisms in cell physiology and development by investigating behavior of cells. This trend has led to a huge amount of image data, the analysis of which becomes a bottleneck in related research. Consequently, how to efficiently analyze the data is emerging as one of the major challenges in the fields.
Computer vision analysis of non-fluorescent microscopy images, representatively phase-contrast microscopy images, promises to realize a long-term monitoring of live cell behavior with minimal perturbation and human intervention. To take a step forward to such a system, this thesis proposes computer vision algorithms that monitor cell growth, migration, and differentiation by detecting three cellular events—mitosis (cell division), apoptosis (programmed cell death), and differentiation— and tracking individual cells. Among the cellular events, to the best our knowledge, apoptosis and a certain type of differentiation, namely muscle myotubes, have never been detected without fluorescent labeling. We address these challenging problems by developing computer vision algorithms adopting phase contrast microscopy. We also significantly improve the accuracy of mitosis detection and cell tracking in phase contrast microscopy over previous methods, particularly under non-trivial conditions, such as high cell density or confluence. We demonstrate the usefulness of our methods in biological research by analyzing cell behavior in scratch wound healing assays. The automated system that we are pursuing would lead to a new paradigm of biological research by enabling quantitative and individualized assessment in behavior of a large population of intact cells.
|
24 |
Graph-based Trajectory Planning through Programming by DemonstrationMelchior, Nik A. 01 July 2011 (has links)
Autonomous robots are becoming increasingly commonplace in industry, space exploration, and even domestic applications. These diverse fields share the need for robots to perform increasingly complex motion behaviors for interacting with the world. As the robots’ tasks become more varied and sophisticated, though, the challenge of programming then becomes more difficult and domain-specific. Robotics experts without domain knowledge may not be well-suited for communicating task specific goals and constraints to the robot, but domain experts may not possess the skills for programming robots through conventional means. Ideally, any person capable of demonstrating the necessary skill should be able to instruct the robot to do so. In this thesis, we examine the use of demonstration to program or, more aptly, to teach a robot to perform precise motion tasks.
Programming by Demonstration (PbD) offers an expressive means for teaching while being accessible to domain experts who may be novices in robotics. This learning paradigm relies on human demonstrations to build a model of a motion task. This thesis develops an algorithm for learning from examples that is capable of producing trajectories that are collision-free and that preserve non-geometric constraints such as end effector orientation, without requiring special training for the teacher or a model of the environment. This approach is capable of learning precise motions, even when the precision required is on the same order of magnitude as the noise in the demonstrations. Finally, this approach is robust to the occasional errors in strategy and jitter in movement inherent in imperfect human demonstrations.
The approach contributed in this thesis begins with the construction of a neighbor graph, which determines the correspondences between multiple imperfect demonstrations. This graph permits the robot to plan novel trajectories that safely and smoothly generalize the teacher’s behavior. Finally, like any good learner, a robot should assess its knowledge and ask questions about any detected deficiencies. The learner presented here detects regions of the task in which the demonstrations appear to be ambiguous or insufficient, and requests additional information from the teacher. This algorithm is demonstrated in example domains with a 7 degree-of-freedom manipulator, and user trials are presented.
|
25 |
Cross-Cultural Believability of Robot CharactersMakatchev, Maxim 01 February 2013 (has links)
Believability of characters is an objective in literature, theater, animation, film, and other media. Virtual characters, believable as sharing their ethnic background with users, improve their perception of the character and, sometimes, even their task performance. Social scientists refer to this phenomenon as homophily—humans tend to associate and bond with similar others. Homophily based on ethnic similarity between humans and robots, however, has not previously been tested, in part due to the difficulties of endowing a robot with ethnicity. We tackle this task by attempting to avoid blatant labels of ethnicity such as clothing, accent, or ethnic appearance (although we control for the latter), and instead aim at evoking ethnicity via more subtle verbal and nonverbal behaviors.
Until now, when designing ethnically-specific virtual agents, their behaviors have been typically borrowed from anthropological studies and cultural models. Other approaches collect corpora of human interactions in target contexts and select maximally distinctive behaviors for further implementation on a virtual character. In this thesis, we argue that both behaviors that signal differences between an anthropologist and the target ethnicity (rich points), as well as maximally distinctive behaviors between target ethnicities, may vary on their ability to evoke ethnic attribution. We address this discrepancy by performing an additional evaluation of the candidate behaviors on their salience as ethnic cues via online crowdsourcing. The most salient ethnic cues are then implemented on the robot for a study with colocated participants.
This methodology has allowed us to design robot characters that elicit associations between the robot’s behaviors and ethnic attributions of the characters as native speakers of American English, or native speakers of Arabic speaking English as a foreign language, by members of both of these ethnic communities. Although we did not find evidence of ethnic homophily, we believe that the suggested pathway can be used to create robot characters with a higher degree of perceived similarity, and better chances of evoking homophily effect.
|
26 |
Data-Driven Geometric Scene UnderstandingSatkin, Scott 01 July 2013 (has links)
In this thesis, we describe a data-driven approach to leverage repositories of 3D models for scene understanding. Our ability to relate what we see in an image to a large collection of 3D models allows us to transfer information from these models, creating a rich understanding of the scene. We develop a framework for auto-calibrating a camera, rendering 3D models from the viewpoint an image was taken, and computing a similarity measure between each 3D model and an input image. We demonstrate this data-driven approach in the context of geometry estimation and show the ability to find the identities, poses and styles of objects in a scene.
We begin by presenting a proof-of-concept algorithm for matching 3D models with input images. Next, we present a series of extensions to this baseline approach. Our goals here are three-fold. First, we aim to produce more accurate reconstructions of a scene by determining both the exact style and size of objects as well as precisely localizing their positions. In addition, we aim to increase the robustness of our scene-matching approach by incorporating new features and expanding our search space to include many viewpoint hypotheses. Lastly, we address the computational challenges of our approach by presenting algorithms for more efficiently exploring the space of 3D scene hypotheses, without sacrificing the quality of results.
We conclude by presenting various applications of our geometric scene understanding approach. We start by demonstrating the effectiveness of our algorithm for traditional applications such as object detection and segmentation. In addition, we present two novel applications incorporating our geometry estimates: affordance estimation and geometryaware object insertion for photorealistic rendering.
|
27 |
Shape For ContactRodriguez Garcia, Alberto 01 August 2013 (has links)
Given a desired function for an effector, what is its appropriate shape? This thesis addresses the problem of designing the shape of a rigid end effector to perform a given manipulation task. It presents three main contributions: First, it describes the contact kinematics of an effector as the product of both its shape and its motion, and assumes a fixed motion model to explore the role of shape in satisfying a certain manipulation task. Second, it formulates that manipulation task as a set of constraints on the geometry of contact between the effector and the world. Third, it develops tools to transform those contact constraints into an effector shape for general 1-DOF planar mechanisms and general 1-DOF spatial mechanisms, and discusses the generalization to mechanisms with more than one degree of freedom.
We describe the case studies of designing grippers with invariant grasp geometry, grippers with improved grasp stability, and grippers with extended grasp versatility. We further showcase the techniques with the design of the fingers of the MLab hand, a three-fingered gripper actuated with a single motor, capable of exerting any combination of geometrically correct enveloping or fingertip grasps of spherical, cylindrical, and prismatic objects of varying size.
|
28 |
Interactive Learning for Sequential Decisions and PredictionsRoss, Stephane 01 June 2013 (has links)
Sequential prediction problems arise commonly in many areas of robotics and information processing: e.g., predicting a sequence of actions over time to achieve a goal in a control task, interpreting an image through a sequence of local image patch classifications, or translating speech to text through an iterative decoding procedure.
Learning predictors that can reliably perform such sequential tasks is challenging. Specifically, as predictions influence future inputs in the sequence, the datageneration process and executed predictor are inextricably intertwined. This can often lead to a significant mismatch between the distribution of examples observed during training (induced by the predictor used to generate training instances) and test executions (induced by the learned predictor). As a result, naively applying standard supervised learning methods - that assume independently and identically distributed training and test examples - often leads to poor test performance and compounding errors: inaccurate predictions lead to untrained situations where more errors are inevitable.
This thesis proposes general iterative learning procedures that leverage interactions between the learner and teacher to provably learn good predictors for sequential prediction tasks. Through repeated interactions, our approaches can efficiently learn predictors that are robust to their own errors and predict accurately during test executions. Our main approach uses existing no-regret online learning methods to provide strong generalization guarantees on test performance.
We demonstrate how to apply our main approach in various sequential prediction settings: imitation learning, model-free reinforcement learning, system identification, structured prediction and submodular list predictions. Its efficiency and wide applicability are exhibited over a large variety of challenging learning tasks, ranging from learning video game playing agents from human players and accurate dynamic models of a simulated helicopter for controller synthesis, to learning predictors for scene understanding in computer vision, news recommendation and document summarization. We also demonstrate the applicability of our technique on a real robot, using pilot demonstrations to train an autonomous quadrotor to avoid trees seen through its onboard camera (monocular vision) when flying at low-altitude in natural forest environments.
Our results throughout show that unlike typical supervised learning tasks where examples of good behavior are sufficient to learn good predictors, interaction is a fundamental part of learning in sequential tasks. We show formally that some level of interaction is necessary, as without interaction, no learning algorithm can guarantee good performance in general.
|
29 |
Physics-Based Manipulation Planning in Cluttered Human EnvironmentsDogar, Mehmet R. 01 July 2013 (has links)
This thesis presents a series of planners and algorithms for manipulation in cluttered human environments. The focus is on using physics-based predictions, particularly for pushing operations, as an effective way to address the manipulation challenges posed by these environments.
We introduce push-grasping, a physics-based action to grasp an object first by pushing it and then closing the fingers. We analyze the mechanics of push-grasping and demonstrate its effectiveness under clutter and object pose uncertainty. We integrate a planning system based on push-grasping to the geometric planners traditionally used in grasping. We then show that a similar approach can be used to perform manipulation with environmental contact in cluttered environments. We present a planner where the robot can simultaneously push multiple obstacles out of the way while grasping an object through clutter.
In the second part of this thesis we focus on planning a sequence of actions to manipulate clutter. We present a planning framework to rearrange clutter using prehensile and nonprehensile primitives. We show that our planner succeeds in environments where planners which only use prehensile primitives fail. We then explore the problem of manipulating clutter to search for a hidden object. We formulate the problem as minimizing the expected time to find the target, present two algorithms, and analyze their complexity and optimality.
|
30 |
Addressing Ambiguity In Object Instance DetectionHsiao, Edward 01 June 2013 (has links)
In this thesis, we study the topic of ambiguity when detecting object instances in scenes with severe clutter and occlusions. Our work focuses on the three key areas: (1) objects that have ambiguous features, (2) objects where discriminative point-based features cannot be reliably extracted, and (3) occlusions.
Current approaches for object instance detection rely heavily on matching discriminative point-based features such as SIFT. While one-to-one correspondences between an image and an object can often be generated, these correspondences cannot be obtained when objects have ambiguous features due to similar and repeated patterns. We present the Discriminative Hierarchical Matching (DHM) method which preserves feature ambiguity at the matching stage until hypothesis testing by vector quantization. We demonstrate that combining our quantization framework with Simulated Affine featurescan significantly improve the performance of 3D point-based recognition systems
While discriminative point-based features work well for many objects, they cannot be stably extracted on smooth objects which have large uniform regions. To represent these feature-poor objects, we first present Gradient Networks, a framework for robust shape matching without extracting edges. Our approach incorporates connectivity directly on low-level gradients and significantly outperforms approaches which use only local information or coarse gradient statistics. Next, we present the Boundary and Region Template (BaRT) framework which incorporates an explicit boundary representation with the interior appearance of the object. We show that the lack of texture in the object interior is actually informative and that an explicit representation of the boundary performs better than a coarse representation.
While many approaches work well when objects are entirely visible, their performance decrease rapidly with occlusions. We introduce two methods for increasing the robustness of object detection in these challenging scenarios. First, we present a framework for capturing the occlusion structure under arbitrary object viewpoint by modeling the Occlusion Conditional Likelihood that a point on the object is visible given the visibility of all other points. Second, we propose a method to predict the occluding region and score a probabilistic matching pattern by searching for a set of valid occluders. We demonstrate significant increase in detection performance under severe occlusions.
|
Page generated in 0.0624 seconds