Common elements, when importing pollution and meteorological data

Well, there is only one thing that the data have in common … they are available in csv format. But this is it.

Whether you download from providers such as Metro Vancouver, the City of Montreal, the Ontario Ministry of the Environment and Climate Change (OMECC) or Environment Canada — all of them set up their data tables in a different fashion; data from the OMECC are formatted in an especially exotic fashion (hourly data columns!), whereas other providers basically stick to a date/time column(s), followed by data columns. Hourly data are the norm for Criteria Air Pollutants, but e.g., merging precipitation data, which is reported as cumulative daily value is more challenging.

So — first task is to cut away explanatory lines (if any — I’d rather have these; Environment Canada is pretty thorough here), harmonise data flags (I do this mostly in vim), find out what empty cells are (< LOD?, no data?, units used?… depends on the provider, so find and talk to the responsible person; good luck!), arrange all data in the same fashion and finally merge it into a common data table (in Matlab).

Bottom line: Open data is good, but open data formats and documentation not so much!

Published by

greg

Atmospheric chemistry researcher and university teacher. Data analysis/chemometrics specialist (PCA, PCR, Cluster analysis, SOM)

Leave a Reply

Your email address will not be published. Required fields are marked *