The study investigates human decision-making behaviour within a game-based context and endeavours to replicate said behaviour using the Generative Adversarial Imitation Learning (GAIL) technique. In this gamified environment, inspired by a hunter-gatherer scenario, players have to ensure their survival in a challenging environment while accounting for their episodic homeostasis and factoring in current and future climatic conditions, necessitating the estimation of stochastic trade-offs. The initial phase of the study primarily centres on the collection and analysis of data from healthy participants, thereby yielding valuable insights into their gameplay dynamics and the cognitive processes underpinning their decision-making. An overarching observation is that participants can seemingly differentiate between cases where simple heuristics advance the game and those cases where prior and present information is necessary for informed action selection. Subsequently, the study pivots toward predictive modelling, firstly by considering the task in a supervised learning paradigm as the basis for behavioural cloning by utilizing a white-box decision tree algorithm. The trained decision tree model attained a survival rate of 20%, proximity to the human benchmark of 21%, whereas comparison with imitation-based evaluation metric, namely Monte Carlo Distance (MCD) registered a score of 8.92 - a decent score considering the stochasticity of the game and variability in human behaviour. On the other hand, GAIL regards the task in an inverse reinforcement learning framework and attempts to imitate the behaviour, achieving a score of 16% as a survival rate and an MCD score of 8.83, showcasing competitive performance and effective imitation capabilities. The study also looks closely at how the GAIL model works using post hoc model explainability, more specifically, utilizing Shapley analysis and training a decision tree on GAIL's synthetic behavioural data, suggesting that GAIL is capable of learning complex strategies that are similar to those used by humans. This research contributes to the understanding of resource management, risk assessment, and strategic thinking within a game environment, and demonstrates the potential of GAIL for imitating human behaviour in a tabular setting. Code can be found here: https://github.com/faizankshaikh/ForaGym
Anotace v angličtině
The study investigates human decision-making behaviour within a game-based context and endeavours to replicate said behaviour using the Generative Adversarial Imitation Learning (GAIL) technique. In this gamified environment, inspired by a hunter-gatherer scenario, players have to ensure their survival in a challenging environment while accounting for their episodic homeostasis and factoring in current and future climatic conditions, necessitating the estimation of stochastic trade-offs. The initial phase of the study primarily centres on the collection and analysis of data from healthy participants, thereby yielding valuable insights into their gameplay dynamics and the cognitive processes underpinning their decision-making. An overarching observation is that participants can seemingly differentiate between cases where simple heuristics advance the game and those cases where prior and present information is necessary for informed action selection. Subsequently, the study pivots toward predictive modelling, firstly by considering the task in a supervised learning paradigm as the basis for behavioural cloning by utilizing a white-box decision tree algorithm. The trained decision tree model attained a survival rate of 20%, proximity to the human benchmark of 21%, whereas comparison with imitation-based evaluation metric, namely Monte Carlo Distance (MCD) registered a score of 8.92 - a decent score considering the stochasticity of the game and variability in human behaviour. On the other hand, GAIL regards the task in an inverse reinforcement learning framework and attempts to imitate the behaviour, achieving a score of 16% as a survival rate and an MCD score of 8.83, showcasing competitive performance and effective imitation capabilities. The study also looks closely at how the GAIL model works using post hoc model explainability, more specifically, utilizing Shapley analysis and training a decision tree on GAIL's synthetic behavioural data, suggesting that GAIL is capable of learning complex strategies that are similar to those used by humans. This research contributes to the understanding of resource management, risk assessment, and strategic thinking within a game environment, and demonstrates the potential of GAIL for imitating human behaviour in a tabular setting. Code can be found here: https://github.com/faizankshaikh/ForaGym
Klíčová slova
Deep Reinforcement Learning, Imitation Learning, GAIL, Decision Neuroscience, Shapley Values
Klíčová slova v angličtině
Deep Reinforcement Learning, Imitation Learning, GAIL, Decision Neuroscience, Shapley Values
Rozsah průvodní práce
37 p
Jazyk
AN
Anotace
The study investigates human decision-making behaviour within a game-based context and endeavours to replicate said behaviour using the Generative Adversarial Imitation Learning (GAIL) technique. In this gamified environment, inspired by a hunter-gatherer scenario, players have to ensure their survival in a challenging environment while accounting for their episodic homeostasis and factoring in current and future climatic conditions, necessitating the estimation of stochastic trade-offs. The initial phase of the study primarily centres on the collection and analysis of data from healthy participants, thereby yielding valuable insights into their gameplay dynamics and the cognitive processes underpinning their decision-making. An overarching observation is that participants can seemingly differentiate between cases where simple heuristics advance the game and those cases where prior and present information is necessary for informed action selection. Subsequently, the study pivots toward predictive modelling, firstly by considering the task in a supervised learning paradigm as the basis for behavioural cloning by utilizing a white-box decision tree algorithm. The trained decision tree model attained a survival rate of 20%, proximity to the human benchmark of 21%, whereas comparison with imitation-based evaluation metric, namely Monte Carlo Distance (MCD) registered a score of 8.92 - a decent score considering the stochasticity of the game and variability in human behaviour. On the other hand, GAIL regards the task in an inverse reinforcement learning framework and attempts to imitate the behaviour, achieving a score of 16% as a survival rate and an MCD score of 8.83, showcasing competitive performance and effective imitation capabilities. The study also looks closely at how the GAIL model works using post hoc model explainability, more specifically, utilizing Shapley analysis and training a decision tree on GAIL's synthetic behavioural data, suggesting that GAIL is capable of learning complex strategies that are similar to those used by humans. This research contributes to the understanding of resource management, risk assessment, and strategic thinking within a game environment, and demonstrates the potential of GAIL for imitating human behaviour in a tabular setting. Code can be found here: https://github.com/faizankshaikh/ForaGym
Anotace v angličtině
The study investigates human decision-making behaviour within a game-based context and endeavours to replicate said behaviour using the Generative Adversarial Imitation Learning (GAIL) technique. In this gamified environment, inspired by a hunter-gatherer scenario, players have to ensure their survival in a challenging environment while accounting for their episodic homeostasis and factoring in current and future climatic conditions, necessitating the estimation of stochastic trade-offs. The initial phase of the study primarily centres on the collection and analysis of data from healthy participants, thereby yielding valuable insights into their gameplay dynamics and the cognitive processes underpinning their decision-making. An overarching observation is that participants can seemingly differentiate between cases where simple heuristics advance the game and those cases where prior and present information is necessary for informed action selection. Subsequently, the study pivots toward predictive modelling, firstly by considering the task in a supervised learning paradigm as the basis for behavioural cloning by utilizing a white-box decision tree algorithm. The trained decision tree model attained a survival rate of 20%, proximity to the human benchmark of 21%, whereas comparison with imitation-based evaluation metric, namely Monte Carlo Distance (MCD) registered a score of 8.92 - a decent score considering the stochasticity of the game and variability in human behaviour. On the other hand, GAIL regards the task in an inverse reinforcement learning framework and attempts to imitate the behaviour, achieving a score of 16% as a survival rate and an MCD score of 8.83, showcasing competitive performance and effective imitation capabilities. The study also looks closely at how the GAIL model works using post hoc model explainability, more specifically, utilizing Shapley analysis and training a decision tree on GAIL's synthetic behavioural data, suggesting that GAIL is capable of learning complex strategies that are similar to those used by humans. This research contributes to the understanding of resource management, risk assessment, and strategic thinking within a game environment, and demonstrates the potential of GAIL for imitating human behaviour in a tabular setting. Code can be found here: https://github.com/faizankshaikh/ForaGym
Klíčová slova
Deep Reinforcement Learning, Imitation Learning, GAIL, Decision Neuroscience, Shapley Values
Klíčová slova v angličtině
Deep Reinforcement Learning, Imitation Learning, GAIL, Decision Neuroscience, Shapley Values
Zásady pro vypracování
-
Zásady pro vypracování
-
Seznam doporučené literatury
-
Seznam doporučené literatury
-
Přílohy volně vložené
-
Přílohy vázané v práci
grafy
Převzato z knihovny
Ne
Plný text práce
Přílohy
Posudek(y) oponenta
Hodnocení vedoucího
Záznam průběhu obhajoby
Presentation of the student.
- introduction of the case - comparative survival study, survival rates
- introduction of the theory
- Problem: It is group work and it is not clear - what is his work, level is similar to a bachelor thesis.
- comparison: heuristic, backward induction, human benchmark
- human behavior in ML, decision tree
- animation learning
- behavior cloning
- comparative graphs of used methods
Read of supervisor and oponents reviews:
Supervisor 2
Oponent1 1
Oponent2 2
Student - method Monte Carlo - his own Python code
Rudolf Vohnout - Where the data are gained - answer from Heidelberg University
Jan Valdman - One mentioned participant is unknown, what is done by the student himself. Another participant and his role is not clearly mentioned.
Jakub Geyer - insufficient citations and comparison with similar works.