AI & ML interests
None defined yet.
Recent Activity
View all activity
Add Bookshop (~200M words)
1
#99 opened 17 days ago
by
KennethEnevoldsen

wiki-comments
1
#101 opened 13 days ago
by
robvanderg
Wikipedia Comments
9
#78 opened about 2 months ago
by
robvanderg
Add instruct variant on Danish datasets (~1M words)
2
#40 opened 8 months ago
by
KennethEnevoldsen

Add instruct variant on Danish datasets (~1M words)
2
#40 opened 8 months ago
by
KennethEnevoldsen

[for newcomers] Overview issues: Adding or updating datasets
#100 opened 16 days ago
by
KennethEnevoldsen

Add Common Crawl with permissive licenses
#80 opened about 1 month ago
by
KennethEnevoldsen

Add Danske Aviser (waiting)
3
#66 opened 3 months ago
by
perdalum

Add domain distribution plot
1
#55 opened 6 months ago
by
KennethEnevoldsen

Add hvadvilduhelst from hyg.dk (~16k words)
#64 opened 3 months ago
by
KennethEnevoldsen

Check public data from snakmodel-pretraining-data-v0.1
2
#49 opened 6 months ago
by
KennethEnevoldsen

Add Bookshop (~200M words)
1
#99 opened 17 days ago
by
KennethEnevoldsen

Add CoRAL (~1-3M tokens)
#32 opened 8 months ago
by
KennethEnevoldsen

Add datasets quality metrics and default filtering
#39 opened 8 months ago
by
KennethEnevoldsen

Add reference to license for datasets
#36 opened 8 months ago
by
KennethEnevoldsen

Add section on dataset quality to new and current datasets
#34 opened 8 months ago
by
KennethEnevoldsen

Allow for audio and image content
#56 opened 5 months ago
by
KennethEnevoldsen

Update dataset names to a prettier versions
1
#19 opened 9 months ago
by
KennethEnevoldsen

Add Danish github repositories (~10-500M tokens)
2
#31 opened 8 months ago
by
KennethEnevoldsen

add test to check for duplicates and remove existing duplicates
2
#25 opened 9 months ago
by
KennethEnevoldsen
