The term metadata is used to refer to your documentation since you are providing data about data. Researchers can choose among various metadata standards, often tailored to a particular file format or discipline. One such standard is DDI (the Data Documentation Initiative), designed to document numeric data files. Additional standards are listed on the left of this page.
Following are some general guidelines for aspects of your project and data that you should document, regardless of your discipline. At minimum, store this documentation in a readme.txt file or the equivalent, together with the data.
Title | Name of the dataset or research project that produced it |
Creator | Names and addresses of the organization or people who created the data |
Identifier | Number used to identify the data, even if it is just an internal project reference number |
Subject | Keywords or phrases describing the subject or content of the data |
Funders | Organizations or agencies who funded the research |
Rights | Any known intellectual property rights held for the data |
Access information | Where and how your data can be accessed by other researchers |
Language | Language(s) of the intellectual content of the resource, when applicable |
Dates | Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, e.g., maintenance cycle, update schedule |
Location | Where the data relates to a physical location, record information about its spatial coverage |
Methodology | How the data was generated, including equipment or software used, experimental protocol, other things one might include in a lab notebook |
Data processing | Along the way, record any information on how the data has been altered or processed |
Sources | Citations to material for data derived from other sources, including details of where the source data is held and how it was accessed |
List of file names | List of all data files associated with the project, with their names and file extensions (e.g. 'NWPalaceTR.WRL', 'stone.mov') |
File Formats | Format(s) of the data, e.g. FITS, SPSS, HTML, JPEG, and any software required to read the data |
File structure | Organization of the data file(s) and the layout of the variables, when applicable |
Variable list | List of variables in the data files, when applicable |
Code lists | Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. '999 indicates a missing value in the data') |
Versions | Date/time stamp for each file, and use a separate ID for each version |
Checksums | To test if your file has changed over time |
An equally important part of documentation is the providing to potential users the information necessary to fully understand and interpret the data. Minimally, you should include a file manifest, a short text describing the dataset, and include any information that is not adequately represented in the structured metadata. You should also include
Keep in mind, it is much easier to collect this as the data is created rather than after the fact. Data repositories and archives (see Publishing Your Data) typically support the submission of supporting materials and documentation. Even if you have no plans to publish or distribute your data now, circumstances may change later. Remember, keeping good records of the data in your project as it evolves will pay dividends by making your work and your research teams work easier over time.