So check this out, there’s this research from OpenAI that’s super interesting. They’re looking at these language models, like ChatGPT, that are trained using reinforcement learning from human feedback. The idea is to reward and penalize the model based on its performance to make it better. But here’s the thing – this only works if the human evaluator can tell if the model’s behavior is good or bad.
Now, imagine a superhuman model that’s doing some crazy impressive stuff that’s way beyond human understanding. How do you align these models? Well, the researchers at OpenAI have proposed an analogy – can a smaller, less capable model supervise a larger, more capable model? They created weak supervisors by finetuning small models on ground truth labels, and then used those weak labels to finetune a stronger model.
They tested this approach on different tasks, like NLP tasks, chess puzzles, and reward modeling. And they found some really promising results. For example, when they supervised the GPT-4 model with a GPT-2 level model on NLP tasks, they were able to recover much of GPT-4’s capabilities. They also found ways to improve the performance using simple methods.
But, it’s not all perfect. The researchers noted that their methods didn’t work consistently across all settings, and they’re still looking for more practical solutions. They’re encouraged by the results, though, and they’ve made their code open-source to kickstart more research in this area.
It’s a really exciting research, and I’m looking forward to seeing where it goes. If you want more details, check out the Paper and OpenAI Blog. All credit for this research goes to the researchers of this project. Also, don’t forget to join our ML SubReddit, Facebook Community, Discord Channel, and Email Newsletter for all the latest AI research news. If you like our work, you’ll love our newsletter.