There is an older paper on something related to this [1], where the model outputs reflection tokens that either trigger retrieval or critique steps. The idea is that the model recognizes that it needs to fetch some grounding subsequent to generating some factual content. Then it reviews what it previously generated with the retrieved grounding.
The problem with this approach is that it does not generalize well at all out of distribution. I'm not aware of any follow up to this, but I do think it's an interesting area of research nonetheless.
The problem with this approach is that it does not generalize well at all out of distribution. I'm not aware of any follow up to this, but I do think it's an interesting area of research nonetheless.
[1] https://arxiv.org/abs/2310.11511