The latest large language models (LLMs), like the newly released ChatGPT-4o, are truly astonishing. They are multimodal, meaning they have multiple senses and can interpret text, images, and audio information and respond in a seemingly human-like manner.