Published on

System 2 - Training for Test-time compute

Presentation
Authors

The fifty-fifth MeetUp of the Machine Learning Singapore Group, was titled : "System 2 - Training for Test-time compute".

My Presentation

My talk was titled "Ignore All Previous Instructions", and covered the following topics:

  • Fine-tuning Models
    • LoRa with Variations
  • Simple RL idea : "Do-Over" training
    • How is this different?
  • Training with RL
    • ... and inference-time compute
    • Is this something New?

This format of topics was deliberate, since it neatly broke into sections that spoke to different levels of the audience :

  • Beginner
    • What is are LLMs?
    • How can we fine-tune them?
    • Practical steps to get good model results
  • Intermediate
    • A quick illustration of how RL is a different 'mode'
    • Illustration using "Do-Over Training"
  • Advanced
    • Discussion of test-time vs train-time computation
    • Whether o1's test-time scaling was impressive
    • What o1's training regime might involve
  • Conclusions

In particular, the section on Fine-tuning refered to work that I had done with local LLMs - initially using Llama3.1-8B, but then proving out that Gemma2-9B was substantially better at learning my sample task. This involved trained over a dozen variations of the models/environment, with the highlights being given in the presentation.

Many thanks to Google for supporting the GCP usage for this project, which was part of their September 2024 #AISprint. My contribution there was titled: "Gemma Fine-tuning with ablations".

The slides for my talk, which contain links to all of the reference materials and sources, are here :

Presentation Screenshot

If there are any questions about the presentation please ask below, or contact me using the details given on the slides themselves.

Presentation Content Example

Other Presentations

We were also proud to host a talk by Gabriel Chua from one of Singapore's Government departments, who presented on a weekend project that he built (using Gemini, interfaced with GitHub actions) to automatically build daily summaries of AI papers and host them.

In his talk "What's behind o1 and can it be replicated", Sam Witteveen talked about OpenAI's recent o1 model launch, and demonstrated 2 different ways in which he had built his own "poor man's" version, using Microsoft's Phi3 as a building block within an agentic framework.

Acknowledgements

Many thanks to the Google team, who not only allowed us to use Google's Developer Space, but were also kind enough to provide Pizza for the attendees!