When developing an imitation learning method, it is necessary to make several design choices to formalize the problem:
Behavioral Cloning (BC) learns a policy that directly maps input states/contexts to actions using supervised learning on demonstrated trajectories. Given a dataset of demonstrated trajectories with state-action pairs and contexts $\mathbf{\mathcal{D}} = \{(\mathcal{x}_t, \mathcal{s}_t, \mathcal{u}_t)\}$, we can directly compute a mapping from states and/or contexts to control inputs as
$$ \mathcal{u} = \pi(\mathcal{x}_t, \mathcal{s}_t) \tag{2.1} $$
Alternatively, given a reward signal, a policy can be obtained by optimizing the expected return under the learned reward function:
$$ \pi = \argmax_{\hat{\pi}} J(\hat{\pi}), \tag{2.2} $$
where $J(\hat{\pi})$ is the expectation of the accumulated reward given the policy $\pi$. However, the reward function is considered unknown and needs to be recovered from expert demonstrations under the assumption that the demonstrations are (approximately) optimal w.r.t. this reward function. Recovering the reward function from demonstrations is often referred to as Inverse Reinforcement Learning (IRL).
Model-free imitation learning methods directly learn a policy to replicate expert behavior without attempting to model the underlying system dynamics. This approach is simpler and avoids the complexities associated with estimating dynamics. It is particularly effective for fully-actuated robotic systems, such as industrial robots with reliable position and velocity controllers, where the dynamics are negligible, and smooth trajectories can be easily planned. Consequently, model-free behavioral cloning (BC) has been widely adopted for such applications.