Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning

Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning

Yantian Zha^1,*, Lin Guan^2,*, Subbarao Kambhampati²,

¹University of Maryland, College Park, ²Arizona State University

https://github.com/YantianZha/SERLfD

Robot Task Learning

How to convey the task knowledge to robots?

How to convey the task knowledge to robots?

Convey the task via (demonstrations)

Learning from Demonstrations (LfD)

Learning from Demonstrations (LfD)

Affordance-Aware Imitation Learning,
Zha et al., IROS, 2022

Coarse-to-Fine Imitation Learning,
Edward Johns, ICRA, 2021

One-Shot Imitation Learning,
Yu et al., RSS, 2018

✅ Demonstrations provide a robust learning signal,
contributing to sample-efficient learning

✅ Demonstrations provide a robust learning signal,
contributing to sample-efficient learning

❌ 1) Distribution drifting issues;
2) Learners cannot outperform demonstrators;
3) High dataset collection costs

Reinforcement Learning

Robot Task Learning

How to convey the task knowledge to robots?

Convey the task via (demonstrations)
Image Icon

Convey the task via (rewards)

Reinforcement Learning (RL)

Reinforcement Learning (RL)

SOLAR, Zhang et al., 2020

SAC-X, Riedmiller, et al., ICML, 2018

✅ Learners could outperform teachers;
More robust to distribution-drifting; No need of demonstrations

✅ Learners could outperform teachers;
More robust to distribution-drifting; No need of demonstrations

❌ Learning is not sample-efficient (especially in sparse-reward environments)

RL + LfD?

RL + LfD: Reinforcement Learning from Demonstrations (RLfD)

RL + LfD: Reinforcement Learning from Demonstrations (RLfD)

DAPG, Rajeswaran et al., RSS, 2018

DDPGfD, Vecerik, 2018

✅ Combine the benefits of RL and LfD – making RL more sample-efficient

✅ Combine the benefits of RL and LfD – making RL more sample-efficient

❌ Still inefficient to handle ambiguity in demonstrations and environments

Insights from Human Cognition

Insights from Human Cognition

Insights from Human Cognition

Humans are aware and reflective of their own learning

Image Icon Humans instinctively self-explain experiences, covering problem-solving, mistakes, and the actions and outcomes of others

Insights from Human Cognition

Self-Explanation:
object-location > object-color

Self-Explanation:
object-location < object-color

Background Knowledge: Shared Vocabulary

Self-Explanation for RLfD (SERLfD)

Self-Explanation for RLfD (SERLfD)

Robots take human advice

Self-Explanation for RLfD (SERLfD)

Robots take human advice

Self-Explanation for RLfD (SERLfD)

Robots take human advice

Self-Explanation for RLfD (SERLfD)

Robots take human advice

Evaluation Domains

Evaluation Domains

Robot-Push-Simple

6 Predicates
Continuous Action Space

Robot-Push

10 Predicates
Continuous Action Space

Robot-Remove-and-Push

20 Predicates
Continuous Action Space

Pacman

2 Predicates
Discrete Action Space

Our work pioneers the integration of self-explanation into robot learning using deep neural networks.

Future Directions

Future Directions

Limitations:

Future Directions

Limitations:

1. The symbolic vocabulary, representing the robots' background knowledge, is pre-defined and finite.

Future Directions

Limitations:

1. The symbolic vocabulary, representing the robots' background knowledge, is pre-defined and finite.

Opportunities:

Future Directions

Limitations:

1. The symbolic vocabulary, representing the robots' background knowledge, is pre-defined and finite.

Opportunities:

1. Enhance the self-explanation guided learning mechanism.

Future Directions

Limitations:

1. The symbolic vocabulary, representing the robots' background knowledge, is pre-defined and finite.

Opportunities:

1. Enhance the self-explanation guided learning mechanism.

2. Utilize the self-explanation guided learning to enhance various learning frameworks.

Future Directions

Limitations:

1. The symbolic vocabulary, representing the robots' background knowledge, is pre-defined and finite.

Opportunities:

1. Enhance the self-explanation guided learning mechanism.

2. Utilize the self-explanation guided learning to enhance various learning frameworks.

3. Harness the power of self-explanation guided learning to address diverse problems and challenges.

document.addEventListener("DOMContentLoaded", function() { var headings = document.querySelectorAll(".reveal h1, .reveal h2"); headings.forEach(function(heading) { var words = heading.textContent.split(" "); var capitalizedWords = words.map(function(word) { // List of words to avoid capitalizing var avoidCapitalization = ["for", "from", "in"]; if (avoidCapitalization.includes(word.toLowerCase())) { return word.toLowerCase(); // Keep the word in lowercase } else { return word.charAt(0).toUpperCase() + word.slice(1).toLowerCase(); // Capitalize the first letter } }); heading.textContent = capitalizedWords.join(" "); }); });