Emotion Concepts and their Function in a Large Language Model
transformer-circuits.pubfrontend
Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why this is the case in Claude Sonnet 4.5 and explore implications for alignment-relevant behavior. We fin… [+318427 chars]