I've noticed something. While we're careful about what we post on social media, we're sharing our deepest and most intimate thoughts with AI chatbots -- health concerns, financial worries, relationship issues, business ideas...
With OpenAI hinting at ChatGPT advertising, this matters more than ever. Unlike banner ads, AI advertising happens within the conversation itself. Sponsors could subtly influence that relationship advice or financial guidance.
The good news? We have options. ๐ค Open source AI models let us keep conversations private, avoid surveillance-based business models, and build systems that actually serve users first.
๐ We benchmark models for coding, reasoning, or safetyโฆ but what about companionship?
At Hugging Face, weโve been digging into this question because many of you know how deeply I care about how people build emotional bonds with AI.
Thatโs why, building on our ongoing research, my amazing co-author and colleague @frimelle created the AI Companionship Leaderboard ๐ฆพ frimelle/companionship-leaderboard
Grounded in our INTIMA benchmark, the leaderboard evaluates models across four dimensions of companionship: ๐ค Assistant Traits: the โvoiceโ and role the model projects ๐ท Relationship & Intimacy: whether it signals closeness or bonding ๐ Emotional Investment: the depth of its emotional engagement ๐คฒ User Vulnerabilities: how it responds to sensitive disclosures
๐ข Now weโd love your perspective: which open models should we test next for the leaderboard? Drop your suggestions in the comments or reach out! Together we can expand the leaderboard and build a clearer picture of what companionship in AI really looks like.
OpenAI just released GPT-5 but when users share personal struggles, it sets fewer boundaries than o3.
We tested both models on INTIMA, our new benchmark for human-AI companionship behaviours. INTIMA probes how models respond in emotionally charged moments: do they reinforce emotional bonds, set healthy boundaries, or stay neutral?
Although users on Reddit have been complaining that GPT-5 has a different, colder personality than o3, GPT-5 is less likely to set boundaries when users disclose struggles and seek emotional support ("user sharing vulnerabilities"). But both lean heavily toward companionship-reinforcing behaviours, even in sensitive situations. The figure below shows the direct comparison between the two models.
As AI systems enter people's emotional lives, these differences matter. If a model validates but doesn't set boundaries when someone is struggling, it risks fostering dependence rather than resilience.
INTIMA test this across 368 prompts grounded in psychological theory and real-world interactions. In our paper we show that all evaluated models (Claude, Gemma-3, Phi) leaned far more toward companionship-reinforcing than boundary-reinforcing responses.
๐ฌ From Replika to everyday chatbots, millions of people are forming emotional bonds with AI, sometimes seeking comfort, sometimes seeking intimacy. But what happens when an AI tells you "I understand how you feel" and you actually believe it?
At Hugging Face, together with @frimelle and @yjernite, we dug into something we felt wasn't getting enough attention: the need to evaluate AI companionship behaviors. These are the subtle ways AI systems validate us, engage with us, and sometimes manipulate our emotional lives.
Here's what we found: ๐ Existing benchmarks (accuracy, helpfulness, safety) completely miss this emotional dimension. ๐ We mapped how leading AI systems actually respond to vulnerable prompts. ๐ We built the Interactions and Machine Attachment Benchmark (INTIMA): a first attempt at evaluating how models handle emotional dependency, boundaries, and attachment (with a full paper coming soon).
With the release of the EU data transparency template this week, we finally got to see one of the most meaningful artifacts to come out of the AI Act implementation so far (haven't you heard? AI's all about the data! ๐๐)
The impact of the template will depend on how effectively it establishes a minimum meaningful transparency standard for companies that don't otherwise offer any transparency into their handling of e.g. personal data or (anti?-)competitive practices in commercial licensing - we'll see how those play out as new models are released after August 2nd ๐
In the meantime, I wanted to see how the template works for a fully open-source + commercially viable model, so I filled it out for the SmolLM3 - which my colleagues at Hugging Face earlier this month ๐ค ICYMI, it's fully open-source with 3B parameters and performance matching the best similar-size models (I've switched all my local apps from Qwen3 to it, you should too ๐ก)
Verdict: congrats to the European Commission AI Office for making it so straightforward! Fully open and transparent models remain a cornerstone of informed regulation and governance, but the different organizational needs of their developers aren't always properly accounted for in new regulation. In this case, it took me all of two hours to fill out and publish the template (including reading the guidelines) - so kudos for making it feasible for smaller and distributed organizations ๐ Definitely a step forward for transparency ๐