MeteoIODoc 20240411.d3bdb3cb
NetCDF

Format

In order to promote creation, access and sharing of scientific data, the NetCDF format has been created as a machine-independent format. NetCDF (network Common Data Form) is therefore an interface for array-oriented data access and a library that provides an implementation of the interface. The NetCDF software was developed at the Unidata Program Center in Boulder, Colorado. In order to graphicaly explore the content and structure of NetCDF files, you can use the ncBrowse java software or ncview. It is also possible to run ncdump on a given file in order to have a look at its structure (such as ncdump {my_netcdf_file} | more) and specially the parameters names (this is useful if remapping is needed, see below in the Keywords or in the Renaming section).

The NetCDF format does not impose a specific set of metadata and therefore in order to easily exchange data within a given field, it is a good idea to standardize the metadata. Several such metadata schema can be used by this plugin:

Moreover, when writing NetCDF files with MeteoIO, all the generated files will contain as much of the Attribute Conventions Dataset Discovery (ACDD) metadata as (automatically) possible, but some fields must be filled by the user for full compliance. This is detailed in section Editing below.

If you want to better understand the structure of the NetCDF file format, you are highly encouraged to read about its components.

Compilation

In order to compile this plugin, you need libnetcdf (for C). For Linux, please select both the libraries and their development files in your package manager.

Keywords

This plugin uses the following keywords:

  • General keys:
    • NC_EXT: only the files containing this pattern in their filename will be used; [Input] section (default: .nc)
    • NETCDF_SCHEMA_[METEO/GRID/DEM]: the schema to use for reading [MeteoData, Grid2d, or Dem] needs to be specified for each input type (either CF-1.6, CROCUS, AMUNDSEN, ERA-INTERIM, ERA5 or WRF); [Input] and [Output] section (default: CF-1.6)
    • NETCDF_VAR::{MeteoGrids::Parameters} = {netcdf_param_name} : this allows to remap the names as found in the NetCDF file to the MeteoIO grid parameters; [Input] section;
    • NETCDF_DIM::{MeteoGrids::Parameters} = {netcdf_dimension_name} : this allows to remap the names as found in the NetCDF file to the ncFiles Dimensions; [Input] section;
    • NC_DEBUG: print some low level details about the file being read (default: false); [Input] section;
    • NC_ALLOW_MISSING_COORDS: for files containing timeseries without any STATION dimension, accept files that do not contain the geolocalization of the measurements (please then use a data creator to provide the geolocalization, otherwise you can expect a mess at some point. Default: false); [Input] section;
    • NC_KEEP_FILES_OPEN: keep files open for efficient access (default for input: true, default for output: false). Reading from or writing to many NetCDF files may cause the plugin to exceed the maximum allowed concurrent open files determined by system limits. Also, when multiple modules write to the same output file, file corruption may occur. For those cases, NC_KEEP_FILES_OPEN = FALSE forces the plugin to open only one file at a time for reading (when in [Input] section), or writing (when in [Output] section, default behavior).
  • Gridded data handling:
    • DEMFILE: The filename of the file containing the DEM; [Input] section
    • DEMVAR: The variable name of the DEM within the DEMFILE; [Input] section
    • GRID2DPATH: if this directory contains files, they will be used for reading the input from; [Input] and [Output] section
    • GRID2DFILE: force reading the data from a single file within GRID2DPATH or specify the output file name; [Input] and [Output] section
    • NETCDF_SPLIT_BY_YEAR: create a new file for each year of data, the precise naming being based on GRID2DFILE (default: false); [Output]
    • NETCDF_SPLIT_BY_VAR: create a new file for each variable, the precise naming being based on GRID2DFILE (default: false); [Output]
  • Time series handling:
    • STATION#: if provided, only the given station IDs will be kept in the input data (this is specially useful when reading a file containing multiple stations); [Input]
    • METEOPATH: meteo files directory where to read the meteofiles from; [Input] section. Two modes are available when reading input files:
      • a fixed list of files is provided:
        • METEOFILE#: input filename (in METEOPATH). As many meteofiles as needed may be specified (the extension can be skipped if it is NC_EXT); [Input]
      • METEOPATH is scanned for files having the NC_EXT extension:
        • METEOPATH_RECURSIVE: should the scan for files be recursive (default: false)?; [Input]
    • NC_SINGLE_FILE: when writing timeseries of station data, force all stations to be contained in a single file (default: false); [Output]
    • METEOFILE: when NC_SINGLE_FILE is set, the output file name to use [Output];
    • NC_STRICT_SCHEMA: only write out parameters that are specifically described in the chosen schema (default: false, all parameters in MeteoGrids::Parameters are also written out); [Output]
    • NC_LAX_SCHEMA: write out all provided parameters even if no metadata can be associated with them (default: false); [Output]
    • For some applications, some extra information must be provided for meteorological time series (for example, for Crocus), in the [Output] section:
      • ZREF: the reference height for meteorological measurements;
      • UREF: the reference height for wind measurements;
      • DEFAULT_SLOPE: a default value for the slope when none is available;
      • DEFAULT_AZI: a default value for the azimuth when none is available;

Some of the ACDD metadata can also be configured, see the ACDD class.

Note
The timezone is assumed to be GMT. When reading a NetCDF file, the stationID and stationName are supposed to be provided by either global attributes or through a variable. If nothing is provided, the filename (stripped of its extension) is used.
When providing multiple grid files in one directory, in case of overlapping files (because each file can provide multiple timestamps), the file containing the newest data has priority. This is convenient when using forecast data to automatically use the most short-term forecast.
When using the CROCUS schema, please note that the humidity should be provided as specific humidity, so please use a data creator if don't already have a QI parameter (see HumidityGenerator). Crocus also requires split precipitation, this can be generated by the PrecSplitting creator (make sure to use parameters names from MeteoGrids::Parameters). Finally, Crocus does not handles missing data, so make sure you define some data generators in case some data might be missing (specially for the precipitation).

Example use

Using this plugin to build downscaled time series at virtual stations, with the ECMWF Era Interim data set (see section below):

[Input]
GRID2D = NETCDF
GRID2DPATH = /data/meteo_reanalysis
NETCDF_SCHEMA = ECMWF
DEM = NETCDF
DEMFILE = /data/meteo_reanalysis/ECMWF_Europe_20150101-20150701.nc
#The lines below have nothing to do with this plugin
[InputEditing]
Downscaling = true
VSTATION1 = 46.793029 9.821343 ;this is Davos
Virtual_parameters = TA RH PSUM ISWR ILWR P VW DW TSS HS RSWR TSG ;this has to fit the parameter set in the data files

Another example, to extract precipitation from the MeteoSwiss daily precipitation reanalysis, RhiresD

[Input]
DEM = NETCDF
DEMFILE = ./input/ch02_lonlat.nc
GRID2D = NETCDF
GRID2DPATH = /data/meteo_reanalysis
NC_EXT = .nc
NETCDF_VAR::PSUM = RhiresD ;overwrite the PSUM parameter with "RhiresD", for example for MeteoCH reanalysis
#The lines below have nothing to do with this plugin
[InputEditing]
Downscaling = true
VSTATION1 = 46.793029 9.821343 ;this is Davos
Virtual_parameters = PSUM ;this has to fit the parameter set in the data files

MeteoCH RhiresD & similar products

MeteoSwiss provides reanalysis of precipitation and other meteo fields from 1961 to present over Switzerland for different time scales: daily, monthly, yearly, as well as hourly (CPC dataset). The DEM are also provided, either in lat/lon, Swiss coordinates, rotated lat/lon, ... These data sets must be requested from MeteoSwiss and are available with a specific license for research.

WRF output files

While WRF can write its outputs in NetCDF, unfortunately it does not follow the CF1 convention and relies on lots of idiosyncracies (see http://www.ncl.ucar.edu/Applications/wrfnetcdf.shtml) that break lots of applications dealing with NetCDF. If some fields are not read by MeteoIO, please follow the tips given below. Moreover, WRF assumes that latitudes / longitudes are given on an ideal sphere while standard coordinates systems assume an ellipsoid. This may lead to trouble when converting model coordinates to real world coordinates (see http://www.pkrc.net/wrf-lambert.html).

ECMWF Era Interim

The Era Interim data can be downloaded on the ECMWF dataserver after creating an account and login in.

For Era Interim, it is recommended to extract data at 00:00, and 12:00 for all steps 3, 6, 9, 12. The select the following fields: 10 metre U wind component, 10 metre V wind component, 2 metre dewpoint temperature, 2 metre temperature, Forecast albedo, Skin temperature, Snow density, Snow depth, Soil temperature level 1, Surface pressure, Surface solar radiation downwards, Surface thermal radiation downwards, Total precipitation

Here we have included the forecast albedo so the RSWR can be computed from ISWR. You should download the altitude separately (it is in the "invariants" section on the left hand side of the page where you select the fields to download).

Note
The radiation fields are accumulated since the start of the forecast period (00:00 and 12:00 as recommended above), so they must be corrected before using them (see ProcDeAccumulate)!

You should therefore have the following request:

Parameter: 10 metre U wind component, 10 metre V wind component, 2 metre dewpoint temperature, 2 metre temperature, Forecast albedo,
Skin temperature, Snow density, Snow depth, Soil temperature level 1, Surface pressure,
Surface solar radiation downwards, Surface thermal radiation downwards, Total precipitation
Step: 3 to 12 by 3
Type: Forecast
Time: 00:00:00, 12:00:00

With the ECMWF Python Library, the request would be for example (the area is defined as North/West/South/East, see the WEBAPI documentation):

#!/usr/bin/env python
from ecmwfapi import ECMWFDataServer
server = ECMWFDataServer()
server.retrieve({
"class": "ei",
"dataset": "interim",
"date": "2015-01-01/to/2015-01-31",
"expver": "1",
"grid": "0.75/0.75",
"levtype": "sfc",
"param": "33.128/134.128/139.128/141.128/165.128/166.128/167.128/168.128/169.128/175.128/205.128/228.128/235.128/243.128",
"step": "3/6/9/12",
"area":"42.2/-1.5/51.7/15.7",
"stream": "oper",
"format":"netcdf",
"target": "my-era-interim.nc",
"time": "00/12",
"type": "fc",
})

Copernicus Era 5

Note
This is a work in progress, stay tuned!

The Era 5 data can be downloaded from the Copernicus Climate Data Store (either from the web interface or using the cdsapi), after registering and creating an api key.

Ideally, download the hourly data and select the following fields: 10 metre U wind component, 10 metre V wind component, 2 metre dewpoint temperature, 2 metre temperature, Near IR albedo for direct radiation, Skin temperature, Snow density, Snow depth, Soil temperature level 1, Surface pressure, Mean surface downward short-wave radiation flux, Mean surface downward long-wave radiation flux, Total precipitation

Here we have included the albedo so the RSWR can be computed from ISWR. You should download the altitude separately, it is available as variable: 'geopotential'. You need an additional python script where you download it for a single point in time so it is readable as a DEM.

Note
The reanalysis runs offer the mean fluxes over the last hour as well as accumulated precipitation over the last hour, making it very easy to work with.

With the cdsapi Python Library, you can use the code provided below to request the necessary data. Before using this Python code, you need to write your user ID and API key into the cdsapi configuration file:

  • go to Copernicus.eu and click on your user name to find your credentials (UID and API key);
  • create a new file (in your home directory on Linux) named ".cdsapirc" with the following content:
       url: https://cds.climate.copernicus.eu/api/v2
       key: {UID}:{API key}
       verify: 0
       
#!/usr/bin/env python
import sys
import cdsapi
c = cdsapi.Client()
month = sys.argv[1]
c.retrieve(
'reanalysis-era5-single-levels',
{
'product_type': 'reanalysis',
'format': 'netcdf',
'variable': [
'10m_u_component_of_wind', '10m_v_component_of_wind', '2m_dewpoint_temperature',
'2m_temperature', 'mean_surface_downward_long_wave_radiation_flux', 'mean_surface_downward_short_wave_radiation_flux',
'near_ir_albedo_for_direct_radiation', 'skin_temperature', 'snow_density',
'snow_depth', 'soil_temperature_level_1', 'surface_pressure',
'total_precipitation',
],
'year': '2021',
'month': [
f'{month}'
],
'day': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
'13', '14', '15',
'16', '17', '18',
'19', '20', '21',
'22', '23', '24',
'25', '26', '27',
'28', '29', '30',
'31',
],
'time': [
'00:00', '01:00', '02:00',
'03:00', '04:00', '05:00',
'06:00', '07:00', '08:00',
'09:00', '10:00', '11:00',
'12:00', '13:00', '14:00',
'15:00', '16:00', '17:00',
'18:00', '19:00', '20:00',
'21:00', '22:00', '23:00',
],
'area': [
47, 9.5, 46.5,
10,
],
},
f'./era5_{month}.nc')

The time period and spatial extent of the request should be defined properly in the 'year' and 'area' instances, respectively. This API request downloads from the CDS all the aforementioned variables at hourly resolution for each day of the desired year(s). The request limit for CDS is 120'000 items per single request, which is why sometimes heavy requests can crash after being queued and processed. For this reason, the above script should be launched from terminal in the following way:

parallel ./era5_download.py ::: {01..12} -j 4

In this way, each month of the selected year(s) will be downloaded separately within a parallel process that runs, for example, maximum 4 jobs at one time. This should ease each request and reduce the risk of crashing. Some post-processing will be required later to merge all the files together into a single .nc file. This can either be left to be done transparently by MeteoIO or manually for example with the xarray library for Python.

External tools and tricks to work with NetCDF

Editing the metadata of a NetCDF file

In order to ensure that a NetCDF is ACDD compliant, several global variables must be defined. Most of them have already been populated by MeteoIO but a few could benefit from further editing. You can have a look at what has already been defined by dumping the header content with ncdump:

ncdump -h {my_netcdf_file}

Then, to add your metadata, you can use ncatted (it is often packaged as "nco"). For example, to replace the previous global attribute summary by yours or to append a line to the history global attribute:

ncatted -a summary,global,o,c,"My summary" {my_netcdf_file} #Overwrite the summary field
ncatted -a history,global,a,c,"Edited by me on 2018-06-06\n" {my_netcdf_file} #Append a line to the history field

Saving the day when a file is not standard compliant

Unfortunately, the naming of the parameters and dimensions within the files is not always standard nor consistent. In order to handle the parameters names, simply run ncdump {my_netcdf_file} | more and use the name mapping facility of this plugin to map the non-standard parameters to our internal names (see the plugin keywords). When the dimensions are not standard (for example the time axis being called "TIME_T"), use first the ncrename tool that is part of the NCO utilities to rename both the dimension (-d) and the variable (-v):

ncrename -d TIME_T,time -v TIME_T,time {my_netcdf_file}