In “An Opinionated Guide to ML Research”, John Schulman endorses what he calls goal-driven rather than idea-driven research. A month observing goal-driven biology language models research at the Arc Institute as an intern has cemented my impression that this is good advice.

Idea-driven research, in Schulman’s categorization, involves “[following] some sector of the literature”, and “[a]s you read a paper showing how to do X, you have an idea of how to do X even better. Then you embark on a project to test your idea.” On the other hand, in goal-driven research, you “[d]evelop a vision of some new AI capabilities you’d like to achieve, and solve problems that bring you closer to that goal.” You “test a variety of existing methods from the literature, and then you develop your own methods that improve on them”.

Idea-driven research, he says, comes with “a high risk of getting scooped”, as “[r]esearchers around the world are reading the same literature, which leads them to similar ideas”. But goal-driven research “will give you a perspective that’s differentiated from the rest of the community”. Schulman cites his own example: during his PhD, he was focused on robotic locomotion. When the DeepMind Atari DQN paper came out, “many people jumped on the bandwagon and tried to develop better versions of Q-learning and apply them to the Atari domain”. But Schulman had already explored Q-learning and concluded it wasn’t a good fit for the locomotion capabilities he was targeting. So, he continued working on policy gradient methods and eventually developed the extremely successful PPO, whose variants are still the RL algorithms used in frontier LLMs.

Goal-driven research comes with two other benefits:

  • It’s more motivating: “You can wake up every morning and imagine achieving your goal—what the result would look like and how you would feel.”
  • It enables “a team of researchers to work together and attack different aspects of the problem, whereas idea-driven research is most effectively carried out by ‘teams’ of 1-2 people”.

On his podcast with Abhishaike Mahajan, Sergey Ovchinnikov also (implicitly) attributes his creativity to goal-driven research:

Abhi: I’m curious. I think you’re often… whenever people think of Sergey Ovchinnikov, they often think of deeply creative papers. I think papers that you wouldn’t really expect to come from any other people. Do you think there’s an aspect to… do you think a lot of your quote unquote, alpha as a researcher comes from your background in Phylogenetics and that people who work at Isomorphic and EvoScale could stand to learn a little bit more about phylogeny?

Sergey: I don’t know if that’s where it’s coming from, but I guess for me, I guess my big goal in life has always been to come up with a unified model of protein evolution that accounts for all these different effects. And so what may appear to be creativity is just trying to tackle every part of the problem. Like for example, we’re trying to extract evolution signal, but then we also need to think about alignments of sequences, right? So for example, maybe we’re extracting the wrong coevolution signal because sequences are misaligned. And so we venture into the alignment problem. But then once you start thinking about alignments, then you’re like, well, how do you know you got the right alignment? Yeah. Well there, that’s where it’s like, well, maybe a structure prediction model could tell you that the alignment’s correct or not. Right. And so… I guess what you could say, what may look like creativity, it’s all just trying to solve this unified model problem, I guess. That would be one way to put it.

And finally, on his podcast with Dwarkesh Patel, Dario Amodei says that the germ of his wildly counterintuitive ML scaling laws finding was planted while doing “make number go up” research:

So I joined Andrew Ng’s group at Baidu. I had been in a different field and this was my first experience with AI and it was a bit different from a lot of the academic style research that was going on elsewhere in the world.

I kind of got lucky in that the task that was given to me and the other folks there. It was just to make the best speech recognition system that you can.

There was a lot of data available, there were a lot of GPUs available. It posed the problem in a way that was amenable to discovering that kind of scaling was a solution. That’s very different from being a postdoc whose job is to come up with an idea that seems clever and new and makes your mark as someone who’s invented something.

I just tried the simplest experiments. I was just fiddling with some dials. I was like, try adding more layers to the RNN, try training it for longer, what happens? How long does it take to overfit? What if I add new data and repeat it less times? And I just saw these very consistent patterns.

I didn’t really know that this was unusual or that others weren’t thinking in this way.