Skip to Main Content

Library Resources Guide For Kettering University Online: Datasets

What Are Datasets?

Datasets, also called "data sets" are structured groups of raw data, statistics, and information compiled during and after a research study. They are often presented in spreadsheets or charts. While there is movement toward open data, not all agencies are there yet. Many governmental agencies and non-profit organizations around the world offer their data freely, while most for-profit companies charge a fee for access. 

Depending on the topic you're researching, there are loads of different sources for datasets. For example, if you are looking for information on the United States population and demographics, U.S. Census Bureau would be a great starting place. For data on public opinion on modern social issues, such a politics, the media, and technology, one may start by searching the Pew Research Center. Looking for engineering and sciences? Try a Google Dataset Search. Google searches thousands of data respositories around the world, locating the metadata of millions of datasets where they are hosted.

Popular Datasets

  • ApolloScape - Includes datasets covering scene parsing, car instance, lane segmentation, detection/tracking, trajectory and more.

  • Audi Autonomous Driving Dataset - 2.3 TB of data including 2D semantic segmentation, 3D point clouds, 3D bounding boxes, and vehicle bus data.

  • Argoverse - Two public datasets supported by highly detailed maps to test, experiment, and teach self-driving vehicles how to understand the world around them.

  • Berkeley DeepDrive - Diverse dataset for autonomous driving from UC Berkeley. Also called BDD100K.

  • Bosch Small Traffic Lights - Bosch Small Traffic Lights Dataset, an accurate dataset for vision-based traffic light detection.

  • Boxy - Vehicle detection set with two million annotated vehicles for evaluating object detection methods for self-driving cars on freeways.

  • Brain4Cars - Cabin sensing, sensory-fusion driver maneuver anticipation. 

  • Caltech Vision Lab - Datasets including faces, pedestrian data, camera traps, motorcycles, and occluded faces.

  • CamVid (Cambridge-driving Labeled Video Database) - A collection of videos with object class semantic labels, complete with metadata. 

  • Cityscapes Dataset - Semantic, instance-wise dense pixel annotations of 30 classes. 

  • Comma2k19 - A dataset of over 33 hours of commute in California's 280 highway. 

  • CULane Dataset - CULane is a large-scale dataset for academic research on traffic lane detection.

  • DAVIS Driving Dataset 2017- Car data such as steering, throttle, GPS to evaluate the fusion of frame and event data for driving apps.

  • DIPLECS Autonomous Driving Datasets 2015 - Three datasets recording steering information in different cars and environments.

  • DR(eye)VE - Dataset of gaze fixations and their temporal integration providing task-specific saliency maps for attention-tracking in AV.

  • EISATS - Sets of image sequences for comparative performance evaluation of stereo vision, optic flow, motion analysis, or further techniques in computer vision.

  • Elektra - Includes multi-modal stereo, day-night pedestrian sequences, optic flow, and semantic segmentation datasets.

  • Ford/PERL - Collected by an autonomous ground vehicle testbed, based upon a modified Ford F-250 pickup truck in Dearborn, MI.

  • Google Landmark Dataset V2 - 5 million images depicting human-made and natural landmarks spanning 200 thousand classes.

  • HCI Challenging data - Outdoor (weather, day/night, city/country, varied motions and depths) data to for research in computer vision.

  • HD1K - An autnonomous driving dataset and benchmark for optical flow, including varied weather, lens flare, raindrops, and traffic.

  • JAAD - For studying joint attention. Focus is on pedestrian and driver behaviors at the point of crossing and factors that influence them.

  • KAIST - Multi-spectral dataset that covers a greater range of drivable regions, from urban to residential, for autonomous systems.

  • Kitti-360 - Large-scale dataset with 3D and 2D annotations.

  • Leddar PixSet - Full-Waveform flash LiDAR dataset for autonomous vehicle R&D.

  • Level 5 - Large collection of 3D annotation, lidar point clouds, traffic agent movement, and semantic map annotations.

  • Malaga Stereo and Laser Urban - Urban driving scenarios, with high-resolution stereo images grabbed at a high rate (20fps). 

  • Mapillary - Free account required to access diverse street-level imagery with pixel‑accurate and instance‑specific human annotations.

  • nuScenes - Large-scale public dataset for autonomous driving using the full sensor suite of a real self-driving car. 

  • Oxford Radar Robotcar Dataset - Captures many different combinations of weather, traffic, pedestrians, construction, and roadwork.

  • PandaSet - Open-source AV dataset combining Hesai’s best-in-class LiDAR sensors with Scale AI’s high-quality data annotation.

  • Stanford Track Collection - Contains 14,000 labeled tracks of objects as observed in natural street scenes by a Velodyne S2 LIDAR.

  • TME Motorway - Selection includes variable traffic situations, number of lanes, road curvature, and lighting

  • Udacity Self Driving Car Dataset - This dataset contains 97,942 labels across 11 classes and 15,000 images.

  • Unsupervised LLAMAS - Lane-marker dataset with annotations and lane approximations using LiDAR mapping.

  • Waymo Open Dataset - motion dataset comprising object trajectories and corresponding 3D maps for 103,354 segments.

  • CERN Open Data - over 2 petabytes of particle physics open data. 

  • EarthChem Library - open chemistry and earth science datasets.

  • EarthData - NASA's free and open Earth science data is interactive, interoperable, and accessible for research and societal benefit both today and tomorrow.

  • Figshare - data repository where research outputs are available in a citable, shareable, and discoverable manner.

  • Global Health Observatory - the WHO's gateway to health-related statistics for more than 1000 indicators for its 194 Member States.

  • Harvard Dataverse - topics include math, science, engineering, business, social sciences, and more.

  • Mendeley Data - over 29 million searchable datasets.

  • OpenFEMA - open datasets from the Federal Emergency Management Agency on a variety of topics.

  • OSF - find projects, data, materials, and collaborators on OSF that might be helpful to your own research.

  • Registry of Open Data - explore the catalog to find open, free, and commercial data sets.

  • Statista Statistics Portal - datasets on a wide variety of sciences and social sciences.

  • UCI Machine Learning Repository - collection of databases, domain theories, and data generators used by the machine learning community for the empirical analysis of machine learning algorithms.

  • USGS Mineral Resources Online Spatial Data - interactive maps and downloadable data for regional and global analysis.

  • FBI Crime Data Explorer - datasets covering topics such as hate crime statistics, human trafficking, assaults on officers, arrest data, and crime by territory.

  • GitHub AwesomeData Social Sciences

  • Google Public Data - publicly available data and datasets on a wide array of topics.

  • ICPSR - world's largest social and behavioral science data archive. 250,000 data files available.

  • Mendeley Data - over 29 million searchable datasets.

  • Open Data Flint - Flint-based data gathered from academic institutions, local organizations, and federal agencies to encourage a healthier and informed community.

  • - Business and economy datasets from the United States.

  • Data.Gov.UKSearch over 17,000 datasets from the government of the United Kingdom.

  • DataHub - Thousands of datasets from financial market data and population growth to cryptocurrency prices.

  • International Monetary Fund - Datasets on finance, economic outlook, trade, consumer price indices, and more.

  • ​​​​​​Open Data Canada - Vast array of subjects, including science, health, technology, labor, and transport.

  • UNCTADstat - United Nations Conference on Trade and Development offers analytical groupings, with a unique coverage for countries and products and a particular focus on developing economies. 

  • UN Data - Data from around the world, including social, economic, trade, education, and health indicators.

  • World Bank Open Data - Financial and population data for countries around the world.

Library Homepage

Like us on Facebook

Follow us on Twitter