News

Never mind why the code is like this and why I need it to be this way, but it's just the way it is.I currently have a dataset with data from a database. In the C# code I transform this data ...
Language models are powerful, but their training data is largely secret. AI2 aims to change this with a new dataset that's free and open.
Harvard University announced Thursday it’s releasing a high-quality dataset of nearly 1 million public-domain books that could be used by anyone to train large language models and other AI tools ...
Open Molecules 2025, an unprecedented dataset of molecular simulations, has been released to the scientific community, paving the way for the development of machine learning tools that can ...
Dataset Search allows searchers to find datasets on many topics across “environmental and social sciences, as well as data from other disciplines including government data and data provided by ...
LAION-5B, a dataset used by Stable Diffusion, contained thousands of links to child sexual abuse material that could influence AI-image generation.