stable-bias (Stable Bias)

meg

posted an update 5 days ago

Post

344

🤖 👾 Thanks so much to BBC News and the stellar Suranjana Tewari for having me on to talk about US <—> China relationship in AI, and what it means for AI ethics.

yjernite

posted an update 8 days ago

Post

3937

𝗙𝗶𝗿𝘀𝘁 𝗚𝗣𝗔𝗜 𝗠𝗼𝗱𝗲𝗹 𝘄𝗶𝘁𝗵 𝗘𝗨 𝗗𝗮𝘁𝗮 𝗧𝗿𝗮𝗻𝘀𝗽𝗮𝗿𝗲𝗻𝗰𝘆 𝗧𝗲𝗺𝗽𝗹𝗮𝘁𝗲? 🇪🇺

With the release of the EU data transparency template this week, we finally got to see one of the most meaningful artifacts to come out of the AI Act implementation so far (haven't you heard? AI's all about the data! 📊📚)

The impact of the template will depend on how effectively it establishes a minimum meaningful transparency standard for companies that don't otherwise offer any transparency into their handling of e.g. personal data or (anti?-)competitive practices in commercial licensing - we'll see how those play out as new models are released after August 2nd 👀

In the meantime, I wanted to see how the template works for a fully open-source + commercially viable model, so I filled it out for the SmolLM3 - which my colleagues at Hugging Face earlier this month 🤗 ICYMI, it's fully open-source with 3B parameters and performance matching the best similar-size models (I've switched all my local apps from Qwen3 to it, you should too 💡)

Verdict: congrats to the European Commission AI Office for making it so straightforward! Fully open and transparent models remain a cornerstone of informed regulation and governance, but the different organizational needs of their developers aren't always properly accounted for in new regulation. In this case, it took me all of two hours to fill out and publish the template (including reading the guidelines) - so kudos for making it feasible for smaller and distributed organizations 🙌 Definitely a step forward for transparency 🔍

To learn more have a look at:

- The SmolLM3 model: HuggingFaceTB/SmolLM3-3B
- Its filled out Public Summary of Training Content: hfmlsoc/smollm3-eu-data-transparency
- And if you're interested, some previous remarks on regulatory minimum meaningful standards for data disclosure: https://huggingface.co/blog/yjernite/naiac-data-transparency

yjernite

posted an update about 2 months ago

Post

2066

Congrats to the top trending dataset institutional/institutional-books-1.0 !

This is a fantastic example of large-scale curation of public domain books with intentional governance for AI research and use - definitely recommend checking it out, experimenting with the metadata ( institutional/institutional-books-1.0-metadata), and starting to build on top of it 🤗

sasha

updated a Space 2 months ago

4

Stable Bias: Executive Summary

🌖

Explore gender and ethnicity biases in generated images

meg

posted an update 4 months ago

Post

2710

New launch: See the energy use of chatbot conversations, in real time. =)
jdelavande/chat-ui-energy
Great work from @JulienDelavande !

yjernite

posted an update 4 months ago

Post

3376

Today in Privacy & AI Tooling - introducing a nifty new tool to examine where data goes in open-source apps on 🤗

HF Spaces have tons (100Ks!) of cool demos leveraging or examining AI systems - and because most of them are OSS we can see exactly how they handle user data 📚🔍

That requires actually reading the code though, which isn't always easy or quick! Good news: code LMs have gotten pretty good at automatic review, so we can offload some of the work - here I'm using Qwen/Qwen2.5-Coder-32B-Instruct to generate reports and it works pretty OK 🙌

The app works in three stages:
1. Download all code files
2. Use the Code LM to generate a detailed report pointing to code where data is transferred/(AI-)processed (screen 1)
3. Summarize the app's main functionality and data journeys (screen 2)
4. Build a Privacy TLDR with those inputs

It comes with a bunch of pre-reviewed apps/Spaces, great to see how many process data locally or through (private) HF endpoints 🤗

Note that this is a POC, lots of exciting work to do to make it more robust, so:
- try it: yjernite/space-privacy
- reach out to collab: yjernite/space-privacy

sasha

authored a paper 4 months ago

Fully Autonomous AI Agents Should Not be Developed

Paper • 2502.02649 • Published Feb 4 • 34

meg

authored a paper 4 months ago

Fully Autonomous AI Agents Should Not be Developed

Paper • 2502.02649 • Published Feb 4 • 34

yjernite

authored a paper 5 months ago

Beyond Release: Access Considerations for Generative AI Systems

Paper • 2502.16701 • Published Feb 23 • 16

meg

posted an update 7 months ago

Post

3393

💫...And we're live!💫 Seasonal newsletter from ethicsy folks at Hugging Face, exploring the ethics of "AI Agents"
https://huggingface.co/blog/ethics-soc-7
Our analyses found:
- There's a spectrum of "agent"-ness
- *Safety* is a key issue, leading to many other value-based concerns
Read for details & what to do next!
With @evijit , @giadap , and @sasha

yjernite

posted an update 7 months ago

Post

2433

🤗👤 💻 Speaking of AI agents ...
...Is easier with the right words ;)

My colleagues @meg @evijit @sasha and @giadap just published a wonderful blog post outlining some of the main relevant notions with their signature blend of value-informed and risk-benefits contrasting approach. Go have a read!

https://huggingface.co/blog/ethics-soc-7

yjernite

posted an update 8 months ago

Post

2255

🇪🇺 Policy Thoughts in the EU AI Act Implementation 🇪🇺

There is a lot to like in the first draft of the EU GPAI Code of Practice, especially as regards transparency requirements. The Systemic Risks part, on the other hand, is concerning for both smaller developers and for external stakeholders.

I wrote more on this topic ahead of the next draft. TLDR: more attention to immediate large-scale risks and to collaborative solutions supported by evidence can help everyone - as long as developers disclose sufficient information about their design choices and deployment contexts.

Full blog here, based on our submitted response with @frimelle and @brunatrevelin :

https://huggingface.co/blog/yjernite/eu-draft-cop-risks#on-the-proposed-taxonomy-of-systemic-risks

2 replies

·

christopher

posted an update 8 months ago

Post

1963

The folks at Foursquare released a dataset of 104.5 million places of interest ( foursquare/fsq-os-places) and here's all of them on a plot

4 replies

·

christopher

posted an update 8 months ago

Post

2424

The Lichess database of games, puzzles, and engine evaluations is now on the Hub:

Lichess

Billions of chess data points to download, query, and stream and we're excited to see what you'll build with it! ♟️ 🤗

- Lichess/positions-datasets-66f50837db5cd3287d60d489
- https://huggingface.co/collections/Lichess/games-datasets-66f508df78f4b43e1bb2d353

christopher

posted an update 11 months ago

Post

1336

4 million chess puzzles

meg

authored a paper about 1 year ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 98

yjernite

authored a paper over 1 year ago

On the Societal Impact of Open Foundation Models

Paper • 2403.07918 • Published Feb 27, 2024 • 17

yjernite

posted an update over 1 year ago

Post

👷🏽‍♀️📚🔨 Announcing the Foundation Model Development Cheatsheet!

My first 🤗Post🤗 ever to announce the release of a fantastic collaborative resource to support model developers across the full development stack: The FM Development Cheatsheet available here: https://fmcheatsheet.org/

The cheatsheet is a growing database of the many crucial resources coming from open research and development efforts to support the responsible development of models. This new resource highlights essential yet often underutilized tools in order to make it as easy as possible for developers to adopt best practices, covering among other aspects:
🧑🏼‍🤝‍🧑🏼 data selection, curation, and governance;
📖 accurate and limitations-aware documentation;
⚡ energy efficiency throughout the training phase;
📊 thorough capability assessments and risk evaluations;
🌏 environmentally and socially conscious deployment strategies.

We strongly encourage developers working on creating and improving models to make full use of the tools listed here, and to help keep the resource up to date by adding the resources that you yourself have developed or found useful in your own practice 🤗

Congrats to all the participants in this effort for the release! Read more about it from:
@Shayne - https://twitter.com/ShayneRedford/status/1763215814860186005
@hails and @stellaathena - https://blog.eleuther.ai/fm-dev-cheatsheet/
@alon-albalak - http://nlp.cs.ucsb.edu/blog/a-new-guide-for-the-responsible-development-of-foundation-models.html

And also to @gabrielilharco @sayashk @kklyman @kylel @mbrauh @fauxneticien @avi-skowron @Bertievidgen Laura Weidinger, Arvind Narayanan, @VictorSanh @Davlan @percyliang Rishi Bommasani, @breakend @sasha 🔥

1 reply

·

yjernite

authored a paper over 1 year ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 147

christopher

authored a paper over 1 year ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 147

AI & ML interests

Team members 4

stable-bias's activity

Stable Bias: Executive Summary