Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning

Yantian Zha1,*, Lin Guan2,*, Subbarao Kambhampati2,
1University of Maryland, College Park, 2Arizona State University

https://github.com/YantianZha/SERLfD

Robot Task Learning

How to convey the task knowledge to robots?

How to convey the task knowledge to robots?

Image Icon Convey the task via (demonstrations)

Learning from Demonstrations (LfD)

Learning from Demonstrations (LfD)

Affordance-Aware Imitation Learning,
Zha et al., IROS, 2022

Coarse-to-Fine Imitation Learning,
Edward Johns, ICRA, 2021

One-Shot Imitation Learning,
Yu et al., RSS, 2018

Demonstrations provide a robust learning signal,
contributing to
sample-efficient learning

Demonstrations provide a robust learning signal,
contributing to
sample-efficient learning

1) Distribution drifting issues;
2) Learners cannot outperform demonstrators;
3) High dataset collection costs

Reinforcement Learning

Robot Task Learning

How to convey the task knowledge to robots?

Image Icon Convey the task via (demonstrations)
Image Icon Convey the task via (rewards)

Reinforcement Learning (RL)

Reinforcement Learning (RL)

SOLAR, Zhang et al., 2020

SAC-X, Riedmiller, et al., ICML, 2018

Learners could outperform teachers;
More robust to distribution-drifting; No need of demonstrations

Learners could outperform teachers;
More robust to distribution-drifting; No need of demonstrations

Learning is not sample-efficient (especially in sparse-reward environments)

RL + LfD?

RL + LfD: Reinforcement Learning from Demonstrations (RLfD)

RL + LfD: Reinforcement Learning from Demonstrations (RLfD)

DAPG, Rajeswaran et al., RSS, 2018

DDPGfD, Vecerik, 2018

Combine the benefits of RL and LfD – making RL more sample-efficient

Combine the benefits of RL and LfD – making RL more sample-efficient

Still inefficient to handle ambiguity in demonstrations and environments

Insights from Human Cognition

Insights from Human Cognition

Image

Insights from Human Cognition

Image Image Icon Humans are aware and reflective of their own learning

Image Icon Humans instinctively self-explain experiences, covering problem-solving, mistakes, and the actions and outcomes of others

Insights from Human Cognition

Image Image

Self-Explanation:
object-location > object-color


Image Image

Image
Image

Self-Explanation:
object-location < object-color


Image

Image

Background Knowledge: Shared Vocabulary

Self-Explanation for RLfD (SERLfD)

Self-Explanation for RLfD (SERLfD)

Robots take human advice

Self-Explanation for RLfD (SERLfD)

Robots take human advice

Self-Explanation for RLfD (SERLfD)

Robots take human advice

Self-Explanation for RLfD (SERLfD)

Robots take human advice

Evaluation Domains

Evaluation Domains

Robot-Push-Simple

6 Predicates
Continuous Action Space

Robot-Push

10 Predicates
Continuous Action Space

Robot-Remove-and-Push

20 Predicates
Continuous Action Space

Pacman

2 Predicates
Discrete Action Space

Our work pioneers the integration of self-explanation into robot learning using deep neural networks.

Future Directions

Future Directions

Limitations:

Future Directions

Limitations:

1. The symbolic vocabulary, representing the robots' background knowledge, is pre-defined and finite.

Future Directions

Limitations:

1. The symbolic vocabulary, representing the robots' background knowledge, is pre-defined and finite.

Opportunities:

Future Directions

Limitations:

1. The symbolic vocabulary, representing the robots' background knowledge, is pre-defined and finite.

Opportunities:

1. Enhance the self-explanation guided learning mechanism.

Future Directions

Limitations:

1. The symbolic vocabulary, representing the robots' background knowledge, is pre-defined and finite.

Opportunities:

1. Enhance the self-explanation guided learning mechanism.
2. Utilize the self-explanation guided learning to enhance various learning frameworks.

Future Directions

Limitations:

1. The symbolic vocabulary, representing the robots' background knowledge, is pre-defined and finite.

Opportunities:

1. Enhance the self-explanation guided learning mechanism.
2. Utilize the self-explanation guided learning to enhance various learning frameworks.
3. Harness the power of self-explanation guided learning to address diverse problems and challenges.
document.addEventListener("DOMContentLoaded", function() { var headings = document.querySelectorAll(".reveal h1, .reveal h2"); headings.forEach(function(heading) { var words = heading.textContent.split(" "); var capitalizedWords = words.map(function(word) { // List of words to avoid capitalizing var avoidCapitalization = ["for", "from", "in"]; if (avoidCapitalization.includes(word.toLowerCase())) { return word.toLowerCase(); // Keep the word in lowercase } else { return word.charAt(0).toUpperCase() + word.slice(1).toLowerCase(); // Capitalize the first letter } }); heading.textContent = capitalizedWords.join(" "); }); });