I saw an interesting tweet from Dimitris Papailopoulus on getting models to run their own training loop for an ML experiment end to end:
![[Pasted image 20260205111628.png]]
I thought I would try.
This is interesting for a few reasons:
1. I wanted to see how far a model could go, especially the extent of the inputs it's able to take into account. Running ML experiments is a domain where you need to take in information from many, many different sources. The ongoing loss functions, but also planning the different training cycles given the compute budget you have.
2. If agents can do this well (as well as is being suggested in the tweet, it should mean there are other tasks we might be able to automate end-to-end. For example, economics research or other experiments where the analysis is giving you a signal you can use to improve your analysis.
I let the model choose it's own experiment. The only advice given was to choose something from the Dive Into Deep Learning Book, as it's the DL resource that I have the most experience with, and is probably the most comprehensive online textbook available.
The experiment was character-level language modeling on Shakespeare, ideal for this kind of HP search because one can set an opbvious goal of hitting a loss threshold under a given param budget.
The goal was to achieve validation loss < 1.8 with < 500K parameters
Claude was hyped. Here is it describing how it will take the reins, choose the problem, and begin setting up the infra.
![[Pasted image 20260205111137.png]]
The harness was set up and first results for a baseline experiment came back:
> Experiment 1 Results:
> - Params: 430,785 ✓ (under 500K)
> - Val Loss: 2.19 (need < 1.8)
> - Gap: 0.39
In response to the results of the baseline experiment, the model came up with a strategy to improve the results:
![[Pasted image 20260205112232.png]]
> Experiment 2 Strategy:
> 1. Double training steps (4000)
> 2. Reduce dropout (0.05)
> 3. Increase block_size (256 for more context)
> 4. Higher peak LR (5e-4)