Creating a data dictionary

Data dictionaries are useful information to include alongside your datasets. They help describe the elements and values contained within your data to help users reuse it. A simple data dictionary can be created quickly and should include a few key piece of information.

Learning outcomes

  • Understand what a data dictionary is
  • Know the advantages of having a data dictionary
  • Learn how to create a basic data dictionary and what to include

What is a data dictionary?

On a basic level, a data dictionary is metadata that describes the values contained within the dataset, how they have been collected and any standards the data conforms to. This includes the data type, format, size, descriptions and how the data is used. This helps users understand how the data is structured and the relationship that data has with other data.

What are the benefits? 

  • Improved data quality - Creates more detailed and depth to the data, making it have more uses.
  • Reuse of data - By creating higher quality data, the reuse of that data is encouraged.
  • Consistency in data use - By sticking to high standards consistency is present in data.
  • Faster and easier data analysis - The data is structured in a simple manner, making it take less time to analyse the data.

Creating a simple data dictionary

Creating a simple data dictionary can be done in a csv or other spreadsheet file and included alongside your dataset. Below are a set of attributes you should look to include. You can add further attributes to suit your needs for exmaple how you collected the data, any caveats of use etc. We provide a template to get you started at the bottom of the page.

  • Data name - The name of the column as it appears in your data. For example if you needed to put a field name for a driver’s licence number you might use "licence_id.
  • Description  - A short description for the type of data that is readable for humans e.g. Driver licence number.
  • Data type - If it’s a number you might list as type "integer", if it's a name you may use "text" etc
  • Data format - If it’s a number for example put N and the amount of Ns for the amount of numbers, for date put DD/MM/YYYY etc. You might also list any standards you have followed, ISO8601 for date formats for example.
  • Field size - What is the max amount of characters for the data type e.g. 20 for a name.

Data Dictionary Template CSV



Additional Resources

