Hierarchical Planning with Foundation Models

To make effective decisions in novel environments with long-horizon goals, it is crucial to engage in hierarchical reasoning across spatial and temporal scales. This entails planning abstract subgoal sequences, visually reasoning about the underlying plans, and executing actions in accordance with the devised plan through visual-motor control. We propose Hierarchical Planning with Foundation Models (HiP), a framework that leverages different modalities of knowledge to capture information supporting the different levels of decision-making. We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model. Generated video plans are then grounded to visual-motor control, through an inverse dynamics model that infers actions from generated videos. To enable effective reasoning within this hierarchy, we enforce consistency between the models via iterative refinement. We illustrate the efficacy and adaptability of our approach in two different long-horizon table-top manipulation tasks.



Paint Block Results


Successful execution trajectories of HiP on novel long-horizon tasks in paint-block environment.


Goal: Place purple block left of yellow block and cyan block right of yellow block

Goal: Stack red block on top of brown block and place yellow block to the left of the stack

Goal: Stack brown block on top of pink block and place cyan block to the left of the stack

Goal: Stack orange block on top of red block and place purple block to the right of the stack



Object Arrange Results


Successful execution trajectories of HiP on novel long-horizon tasks in object-arrange environment.


Goal: Pack spiderman figure, frypan, nintendo 3ds, red and white striped towel in brown box

Goal: Pack butterfinger chocolate, porcelain salad plate, porcelain spoon, green and white striped towel in brown box

Goal: Pack spiderman figure, porcelain salad plate, nintendo cartridge, hammer in brown box

Goal: Pack crayon box, ball puzzle, hammer, red and white striped towel in brown box