WEBSITE
CHAT
EMAIL
TEXT
PHONE
Datasets, also called "data sets" are structured groups of raw data, statistics, and information compiled during and after a research study. They are often presented in spreadsheets or charts. While there is movement toward open data, not all agencies are there yet. Many governmental agencies and non-profit organizations around the world offer their data freely, while most for-profit companies charge a fee for access.
Depending on the topic you're researching, there are loads of different sources for datasets. For example, if you are looking for information on the United States population and demographics, U.S. Census Bureau would be a great starting place. For data on public opinion on modern social issues, such a politics, the media, and technology, one may start by searching the Pew Research Center. Looking for engineering and sciences? Try a Google Dataset Search. Google searches thousands of data respositories around the world, locating the metadata of millions of datasets where they are hosted.
ApolloScape - Includes datasets covering scene parsing, car instance, lane segmentation, detection/tracking, trajectory and more.
Audi Autonomous Driving Dataset - 2.3 TB of data including 2D semantic segmentation, 3D point clouds, 3D bounding boxes, and vehicle bus data.
Argoverse - Two public datasets supported by highly detailed maps to test, experiment, and teach self-driving vehicles how to understand the world around them.
Berkeley DeepDrive - diverse dataset for autonomous driving from UC Berkeley. Also called BDD100K.
Cityscapes Dataset - semantic, instance-wise dense pixel annotations of 30 classes.
Comma2k19 - a dataset of over 33 hours of commute in California's 280 highway.
Google Landmark Dataset V2 - 5 million images depicting human-made and natural landmarks spanning 200 thousand classes.
Kitti-360 - a large-scale dataset with 3D and 2D annotations.
Leddar PixSet - full-Waveform flash LiDAR dataset for autonomous vehicle R&D.
Level 5 - large collection of 3D annotation, lidar point clouds, traffic agent movement, and semantic map annotations.
nuScenes - large-scale public dataset for autonomous driving using the full sensor suite of a real self-driving car.
Oxford Radar Robotcar Dataset - this dataset captures many different combinations of weather, traffic and pedestrians, along with construction and roadwork.
PandaSet - open-source AV dataset combining Hesai’s best-in-class LiDAR sensors with Scale AI’s high-quality data annotation.
Udacity Self Driving Car Dataset - this dataset contains 97,942 labels across 11 classes and 15,000 images.
Waymo Open Dataset - motion dataset comprising object trajectories and corresponding 3D maps for 103,354 segments.
Data.gov - 321,000+ datasets from U.S. governmental agencies.
Google Dataset Search - millions of datasets on a wide variety of topics.
Kaggle - open data machine learning and scientific repository of 17,000+ datasets.
Kettering University Common Datasets - contains enrollment data, graduation rates, expenses, and much more.
Statista Statistics Portal - datasets on a wide variety of sciences and social sciences.
CERN Open Data - over 2 petabytes of particle physics open data.
EarthChem Library - open chemistry and earth science datasets.
EarthData - NASA's free and open Earth science data is interactive, interoperable, and accessible for research and societal benefit both today and tomorrow.
Figshare - data repository where research outputs are available in a citable, shareable and discoverable manner.
Global Health Observatory - the WHO's gateway to health-related statistics for more than 1000 indicators for its 194 Member States.
Harvard Dataverse - topics include math, science, engineering, business, social sciences and more.
Mendeley Data - over 29 million searchable datasets.
OSF - find projects, data, materials, and collaborators on OSF that might be helpful to your own research.
Statista Statistics Portal - datasets on a wide variety of sciences and social sciences.
UCI Machine Learning Repository - collection of databases, domain theories, and data generators used by the machine learning community for the empirical analysis of machine learning algorithms.
FBI Crime Data Explorer - datasets covering topics such as hate crime statistics, human trafficking, assaults on officers, arrest data, and crime by territory.
ICPSR - world's largest social and behavioral science data archive. 250,000 data files available.
Mendeley Data - over 29 million searchable datasets.
Open Data Flint - Flint-based data gathered from academic institutions, local organizations, and federal agencies to encourage a healthier and informed community.
Data.Gov.UK - Search over 17,000 datasets from the government of the United Kingdom.
DataHub - thousands of datasets from financial market data and population growth to cryptocurrency prices.
International Monetary Fund - datasets on finance, economic outlook, trade, consumer price indices, and more.
Open Data Canada - vast array of subjects, including science, health, technology, labor, and transport.
UN Data - data from around the world, including social, economic, trade, education, and health indicators.
World Bank Open Data - financial and population data for countries around the world.