Introduction to finding data and statistics

This guide provides an overview of the data services available at the library. It will help you understand the research process and will enable you to save time by using library resources more efficiently.

Data concepts

What's the difference between data and statistics?

While these terms are often used interchangeable, there is an important distinction between data and statistics. Data is the raw information from which statistics are derived. In other words, statistics provide an interpretation and summary of data. 

Data

Data are numeric files created and organized for processing and analysis. Data can be analyzed and interpreted using statistical procedures to answer the "why" or "how". 

Raw data is the direct result of research that was collected, as part of a study, observation or survey. Raw data is usually in a machine readable format that can be analyzed using software such as Excel, SPSS, SAS, R and so on. This is what a dataset can look like:

Statistics

Statistics represent a common method of presenting information. In general, statistics relate to numerical data, and can refer to the science of dealing with the numerical data itself.

Aggregate data are higher-level data that have been compiled from smaller units of data. For example, the Census data that you find on the Statistics Canada website have been aggregated to preserve the confidentiality of individual respondents.

Microdata consists of the data directly observed or collected from a specific unit of observation. The Public Use Microdata File (PUMF) for the Census provides access to the actual survey data from the Census, but eliminates information that would identify individuals.

A common classification is based upon who collected the data.

Primary data: Data collected by the investigator himself/ herself for a specific purpose.
Examples: Data collected by a student for his/her thesis or research project.

Secondary data: Data collected by someone else for some other purpose (but being utilized by the investigator for another purpose).
Examples: Census data collected by Statistics Canada being used to analyze the impact of education on career choice and earning.

Spatial data, also known as geospatial data, is a term used to describe any data related to or containing information about a specific location on the Earth’s surface. Spatial data can exist in a variety of formats and contain more than just location specific information. 

Numeric spatial data is a dataset which includes a geographical component, which when combined with vector files can be queried and displayed as a layer on a map in a geographic information system.

Vector data has a spatial component, or Lat/Long coordinates assigned to it. Vector files can contain sets of points, lines, or polygons that are referenced in a geographic space. 

Raster data is data that is presented in a grid of pixels and available in formats in .JPG, .GIF or similar format.