Discussion about this post

User's avatar
Jason Benn's avatar

Thank you for the fascinating writeup!

Expand full comment
Tim Hughes's avatar

I was also intrigued with your thoughts about deduction. In particular, that LLMs are clearly mimicking deduction (quite successfully), that CoT is a way to prompt the LLM to mimic deduction, but that *real* deductive functionality will require another approach.

But as you say (and show) very good mimicry of deduction can get us a long way. So, while we wait for true deductive functionality, I wonder if an LLMs deductive abilities could be greatly enhanced by fine tuning them on vast amounts of deductive reasoning (both correct and incorrect). We could prompt an LLM to write code (general principles) to carry out a transformation of an input_1 (it would not actually matter that the generated code correctly implemented the requested transformation). We would then run the code on the input_1 and record the output as the correct output (example of correct deduction: applying code "general principles" to specific input to deduce specific output). We can also make modifications to this output (and record the modified output as incorrect outputs given the program and the input_1 = example of erroneous deduction). We can repeat this for a whole set of inputs: input_2, input_3, etc.

The above can be repeated for a vast number of transformations (not limited to those that are relevant for ARC). The resulting dataset of programs and their associated sets of correct and incorrect input-output pairs can be used to fine-tune an LLM to become a top-notch deductive reasoning mimic. I would have thought that the creation of this fine-tuning dataset would be relatively straight-forward. It would be interesting to see whether this materially improves the LLM performance on ARC or other tests requiring deductive reasoning.

Expand full comment
9 more comments...

No posts