Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Datasets: Home

Contact The Library

24-7-icon - NRT WEBSITE

 IN PERSON

  • Librarians are available in person at our service desks on the third and fourth floors of the Learning Commons 

  • In-person hours are:

  • Monday-Thursday 10am - 6:30pm

  • Friday 10am - 5pm

  • Sunday 10am - 6:30pm

Chat bubbles, chat room, online chatting, talk, web chat iconCHAT

  • To chat KU Librarians, please see browser sidebar on the library website  

 EMAIL

Chat, discussion, message, messaging, text, texting iconTEXT

  • KU Library available for texting services at 810-255-9009
  • KU Archives available for texting services at 810-255-0022

 PHONE

  • For general inquiries and research assistance, please call a Librarian at 810-762-9598
  • For information regarding materials (books/articles), your library account or reserves, please call Access Staff at 810-762-7814 
  • For historical questions, please call the Archivist at 810-762-9690

What Are Datasets?

Datasets, also called "data sets" are structured groups of raw data, statistics, and information compiled during and after a research study. They are often presented in spreadsheets or charts. While there is movement toward open data, not all agencies are there yet. Many governmental agencies and non-profit organizations around the world offer their data freely, while most for-profit companies charge a fee for access. 

Depending on the topic you're researching, there are loads of different sources for datasets. For example, if you are looking for information on the United States population and demographics, U.S. Census Bureau would be a great starting place. For data on public opinion on modern social issues, such a politics, the media, and technology, one may start by searching the Pew Research Center. Looking for engineering and sciences? Try a Google Dataset Search. Google searches thousands of data respositories around the world, locating the metadata of millions of datasets where they are hosted.

Popular Datasets

  • ApolloScape - Includes datasets covering scene parsing, car instance, lane segmentation, detection/tracking, trajectory and more.

  • Audi Autonomous Driving Dataset - 2.3 TB of data including 2D semantic segmentation, 3D point clouds, 3D bounding boxes, and vehicle bus data.

  • Argoverse - Two public datasets supported by highly detailed maps to test, experiment, and teach self-driving vehicles how to understand the world around them.

  • Berkeley DeepDrive - Diverse dataset for autonomous driving from UC Berkeley. Also called BDD100K.

  • Bosch Small Traffic Lights - Bosch Small Traffic Lights Dataset, an accurate dataset for vision-based traffic light detection.

  • Boxy - Vehicle detection set with two million annotated vehicles for evaluating object detection methods for self-driving cars on freeways.

  • Brain4Cars - Cabin sensing, sensory-fusion driver maneuver anticipation. 

  • Caltech Vision Lab - Datasets including faces, pedestrian data, camera traps, motorcycles, and occluded faces.

  • CamVid (Cambridge-driving Labeled Video Database) - A collection of videos with object class semantic labels, complete with metadata. 

  • Cityscapes Dataset - Semantic, instance-wise dense pixel annotations of 30 classes. 

  • Comma2k19 - A dataset of over 33 hours of commute in California's 280 highway. 

  • CULane Dataset - CULane is a large-scale dataset for academic research on traffic lane detection.

  • DAVIS Driving Dataset 2017- Car data such as steering, throttle, GPS to evaluate the fusion of frame and event data for driving apps.

  • DIPLECS Autonomous Driving Datasets 2015 - Three datasets recording steering information in different cars and environments.

  • DR(eye)VE - Dataset of gaze fixations and their temporal integration providing task-specific saliency maps for attention-tracking in AV.

  • EISATS - Sets of image sequences for comparative performance evaluation of stereo vision, optic flow, motion analysis, or further techniques in computer vision.

  • Elektra - Includes multi-modal stereo, day-night pedestrian sequences, optic flow, and semantic segmentation datasets.

  • Ford/PERL - Collected by an autonomous ground vehicle testbed, based upon a modified Ford F-250 pickup truck in Dearborn, MI.

  • Google Landmark Dataset V2 - 5 million images depicting human-made and natural landmarks spanning 200 thousand classes.

  • HCI Challenging data - Outdoor (weather, day/night, city/country, varied motions and depths) data to for research in computer vision.

  • HD1K - An autnonomous driving dataset and benchmark for optical flow, including varied weather, lens flare, raindrops, and traffic.

  • JAAD - For studying joint attention. Focus is on pedestrian and driver behaviors at the point of crossing and factors that influence them.

  • KAIST - Multi-spectral dataset that covers a greater range of drivable regions, from urban to residential, for autonomous systems.

  • Kitti-360 - Large-scale dataset with 3D and 2D annotations.

  • Leddar PixSet - Full-Waveform flash LiDAR dataset for autonomous vehicle R&D.

  • Level 5 - Large collection of 3D annotation, lidar point clouds, traffic agent movement, and semantic map annotations.

  • Malaga Stereo and Laser Urban - Urban driving scenarios, with high-resolution stereo images grabbed at a high rate (20fps). 

  • Mapillary - Free account required to access diverse street-level imagery with pixel‑accurate and instance‑specific human annotations.

  • nuScenes - Large-scale public dataset for autonomous driving using the full sensor suite of a real self-driving car. 

  • Oxford Radar Robotcar Dataset - Captures many different combinations of weather, traffic, pedestrians, construction, and roadwork.

  • PandaSet - Open-source AV dataset combining Hesai’s best-in-class LiDAR sensors with Scale AI’s high-quality data annotation.

  • Stanford Track Collection - Contains 14,000 labeled tracks of objects as observed in natural street scenes by a Velodyne S2 LIDAR.

  • TME Motorway - Selection includes variable traffic situations, number of lanes, road curvature, and lighting

  • Udacity Self Driving Car Dataset - This dataset contains 97,942 labels across 11 classes and 15,000 images.

  • Unsupervised LLAMAS - Lane-marker dataset with annotations and lane approximations using LiDAR mapping.

  • Waymo Open Dataset - motion dataset comprising object trajectories and corresponding 3D maps for 103,354 segments.

  • IEEE DataPort - this source holds thousands of open datasets on electrical and electronic engineering. Requires a free account to access.
  • CERN Open Data - over 2 petabytes of particle physics open data. 

  • EarthChem Library - open chemistry and earth science datasets.

  • EarthData - NASA's free and open Earth science data is interactive, interoperable, and accessible for research and societal benefit both today and tomorrow.

  • Figshare - data repository where research outputs are available in a citable, shareable and discoverable manner.

  • Global Health Observatory - the WHO's gateway to health-related statistics for more than 1000 indicators for its 194 Member States.

  • Harvard Dataverse - topics include math, science, engineering, business, social sciences and more.

  • Mendeley Data - over 29 million searchable datasets.

  • OSF - find projects, data, materials, and collaborators on OSF that might be helpful to your own research.

  • Statista Statistics Portal - datasets on a wide variety of sciences and social sciences.

  • UCI Machine Learning Repository - collection of databases, domain theories, and data generators used by the machine learning community for the empirical analysis of machine learning algorithms.

  • FBI Crime Data Explorer - datasets covering topics such as hate crime statistics, human trafficking, assaults on officers, arrest data, and crime by territory.

  • GitHub AwesomeData Social Sciences

  • ICPSR - world's largest social and behavioral science data archive. 250,000 data files available.

  • Mendeley Data - over 29 million searchable datasets.

  • Open Data Flint - Flint-based data gathered from academic institutions, local organizations, and federal agencies to encourage a healthier and informed community.

  • Census.gov - Business and economy datasets from the United States.

  • Data.Gov.UKSearch over 17,000 datasets from the government of the United Kingdom.

  • DataHub - Thousands of datasets from financial market data and population growth to cryptocurrency prices.

  • International Monetary Fund - Datasets on finance, economic outlook, trade, consumer price indices, and more.

  • ​​​​​​Open Data Canada - Vast array of subjects, including science, health, technology, labor, and transport.

  • UNCTADstat - United Nations Conference on Trade and Development offers analytical groupings, with a unique coverage for countries and products and a particular focus on developing economies. 

  • UN Data - Data from around the world, including social, economic, trade, education, and health indicators.

  • World Bank Open Data - Financial and population data for countries around the world.

Citing Your Sources

Public Services Librarian

Profile Photo
Meagan Brown
Contact:
Library: 2-202 AB
810-762-9598
Website

Library Homepage

Like us on Facebook

Follow us on Twitter