System 2 - Training for Test-time compute

Presentation Link

The fifty-fifth MeetUp of the Machine Learning Singapore Group, was titled : "System 2 - Training for Test-time compute".

My Presentation

My talk was titled "Ignore All Previous Instructions", and covered the following topics:

Fine-tuning Models
- LoRa with Variations
Simple RL idea : "Do-Over" training
- How is this different?
Training with RL
- ... and inference-time compute
- Is this something New?

This format of topics was deliberate, since it neatly broke into sections that spoke to different levels of the audience :

Beginner
- What is are LLMs?
- How can we fine-tune them?
- Practical steps to get good model results
Intermediate
- A quick illustration of how RL is a different 'mode'
- Illustration using "Do-Over Training"
Advanced
- Discussion of test-time vs train-time computation
- Whether o1's test-time scaling was impressive
- What o1's training regime might involve
Conclusions

In particular, the section on Fine-tuning refered to work that I had done with local LLMs - initially using Llama3.1-8B, but then proving out that Gemma2-9B was substantially better at learning my sample task. This involved trained over a dozen variations of the models/environment, with the highlights being given in the presentation.

Many thanks to Google for supporting the GCP usage for this project, which was part of their September 2024 #AISprint. My contribution there was titled: "Gemma Fine-tuning with ablations".

The slides for my talk, which contain links to all of the reference materials and sources, are here :

If there are any questions about the presentation please ask below, or contact me using the details given on the slides themselves.

Acknowledgements

Many thanks to the Google team, who not only allowed us to use Google's Developer Space, but were also kind enough to provide Pizza for the attendees!

Presentation Link

My Presentation

Other Presentations

Acknowledgements