Learning to Singulate Objects in Packed Environments using a Dexterous Hand

Hanyang ZHOU
Research
October 1, 2024

Table of Contents

This paper presents the Singulating Objects in Packed Environments (SOPE) framework, which utilizes a novel displacement-based state representation and a multi-phase RL approach to enable effective singulation of target objects in cluttered environments using a 16-DOF Allegro Hand.

The proposed method demonstrates high success rates in both simulation and real-world experiments, outperforming alternative techniques.

See more excellent videos on our Project Website

About this Project

Robotic object singulation, where a robot must isolate, grasp, and retrieve a target object in a cluttered environment, is a fundamental challenge in robotic manipulation.

This task is difficult due to occlusions and how other objects act as obstacles for manipulation. A robot must also reason about the effect of object-object interactions as it tries to singulate the target.

Prior work has explored object singulation in scenarios where there is enough free space to perform relatively long pushes to separate objects, in contrast to when space is tight and objects have little separation from each other.

In this paper, we propose the Singulating Objects in Packed Environments (SOPE) framework. We propose a novel method that involves a displacement-based state representation and a multi-phase reinforcement learning procedure that enables singulation using the 16-DOF Allegro Hand.

We demonstrate extensive experiments in Isaac Gym simulation, showing the ability of our system to singulate a target object in clutter. We directly transfer the policy trained in simulation to the real world.

Over 250 physical robot manipulation trials, our method obtains success rates of 79.2%, outperforming alternative learning and non-learning methods.

Method Overview

This paper introduces a novel approach for robotic object singulation - the process of isolating, grasping, and retrieving a target object from a cluttered environment.

We train a policy in simulation using phase-dependent reward functions. The policy uses the hand joints, the block positions, and the robot arm in the state representation. We directly transfer the trained policy to the real world.

The research presents several key contributions:

A framework called Singulating Objects in Packed Environments (SOPE) for training dexterous manipulators to singulate objects in tight spaces.
A novel state representation focused on object displacements and a multi-phase reinforcement learning procedure.
Extensive simulation experiments in Isaac Gym and real-world validation using an Allegro Hand.

Key Methodological Innovations

State Representation

Develops a displacement-based state representation that focuses on the target object and its neighbors.
Uses keypoints to represent object states, reducing the sim-to-real gap compared to image-based representations.
The representation is scalable and invariant to the total number of objects in the scene.

Multi-Phase Reinforcement Learning

Proposes a 3-phase approach: isolating, grasping, and retrieving.
Each phase has a specific reward function tailored to its objectives.
This design encourages more robust grasping and lifting behaviors.

Sim-to-Real Transfer

Achieves zero-shot sim-to-real transfer from Isaac Gym simulation to a physical robot.
Uses AprilTag markers for accurate state estimation in the real world.
Employs domain randomization in simulation to enhance robustness.

Experimental Setup

Setup	Component	Details
Simulation	Environment	Isaac Gym (GPU-accelerated reinforcement learning)
	Objects	Rigid container with rigid blocks
	Manipulator	16-DOF Allegro Hand
Real-World	Robot	Franka Panda arm with Allegro Hand
	State Estimation	AprilTag markers on blocks and container
	Perception	Two Intel RealSense D435 RGBD cameras

Results

Simulation Performance

Method	1-out-of-1	1-out-of-2	1-out-of-5	1-out-of-10
Tactile	.84±.02	.84±.02	.81±.02	.78±.03
Naive	.58±.25	.57±.27	.50±.27	.45±.26
Two Phase	.84±.02	.86±.03	.81±.03	.79±.03
Ours	.86±.03	.86±.02	.82±.02	.78±.01

Achieves 86% success rate for singulating 1 out of 1-2 objects.
Maintains 78% success rate even when singulating 1 out of 10 objects.

Real-World Performance

Environment	Method	Front	Middle-1	Middle-2	Back	Overall	Blocks
Normal	Two Phase	2/15	3/15	-	5/15	10/45	3
Normal	Ours	13/15	14/15	-	13/15	40/45	3
Normal	S2SS&P	10/10	6/10	10/10	10/10	36/40	4
Normal	Ours	9/10	8/10	9/10	7/10	31/40	4
Constrained	S2SS&P	0/10	0/10	0/10	0/10	0/40	4
Constrained	Ours	7/10	6/10	5/10	10/10	28/40	4

88.9% success rate (40/45 trials) for singulating 1 out of 3 objects.
77.5% success rate (31/40 trials) for singulating 1 out of 4 objects.
Demonstrates consistent performance across different block positions (front, middle, back).

Significance and Future Directions

This work represents a significant advancement in dexterous manipulation capabilities for robots, enabling them to handle complex singulation tasks in tightly packed environments. The approach shows promising generalization and robustness, paving the way for more flexible and capable robotic manipulation systems.

Future work could focus on:

Reducing the real-world engineering required for sim-to-real transfer.
Extending the approach to handle deformable objects or fragile items.
Incorporating more sophisticated tactile sensing for enhanced manipulation.

By addressing these challenges, the SOPE framework could lead to robots capable of performing intricate manipulation tasks in diverse real-world scenarios, from warehouse logistics to household assistance.

Learning to Singulate Objects in Packed Environments using a Dexterous Hand

About this Project

Method Overview