These notes are my attempt to distill and present materials in an on-policy manner (see this and this). I also try to use different mediums to communicate my research.
- Learning by teaching - RL
- Multi-Armed Bandits #reinforcement-learning
-
- Tản mạn #daily
- Just know stuffs #research
- Thinking in Language Models - The mechanistic questions
- Scaling compute #reasoning,llm
- Learning to search #reasoning #LLM
-
- Optimization in Deep Learning: From convexity to invexity
- Learning as optimization #math
- Beyond convexity #math
-