This is the changelog for MeteoIO release 2.9.0
Highligts
- A python wrapper has been written, so it is possible to call MeteoIO from Python
- It is now possible to use multiple data sources with multiples plugins simulateously
- new filters and generators, including a particule filter and a Kalman filter
- support for extended syntax in the config files (nevironment variables, arithmetic expressions, etc)
- spatial resampling in now quite robust
- performance improvements (more speed, less memory consumption)
- lots of improvements to the NetCDF and Csv plugins
Details
Plugins
- Support for a long awaited feature: support for reading meteorological timeseries using multiple plugin simultaneously, for example to read both data using the SMET plugin AND the IMIS plugin
- A possible bug / weakness has been fixed: when the same plugin was defined for the inputs and outputs, the exact same object was used which could result in bad behaviors (for complex plugins such as NetCDF). Now this would construct one object for the inputs and one for the outputs (leading to more robustness).
- Making plugin names case insensitive in the ini file (less confusing for the end user)
- Netcdf
- There was a bug when writing data in the CROCUS Netcdf schema without requesting all stations in one single file (then each file would contain a blend of metadata from all the stations but only the data for one station)
- When writing multiple stations at once but in different files (in the CROCUS schema), there was a bug (relying on a wrong index)
- in NetCDFIO, there was a bug when writing multiple stations into a single file if they didn't share a common time base; now a common time base is computed so it works as expected
- Now multiple stations written into a single NetCDF file have both the geospatial_lat/lon min and max but also geospatial_bounds as MultiPoint
- When writing multiple timeseries into a single Netcdf file, there could be a segfault (if a station would stop provinding data earlier than the others)
- In order to properly handle input netcdf files containing multiple stations, the syntax had to be changed. Now METEOFILE# provides the list of input files and STATION# is an optional parameter used to restrict the reading to the provided stations (so a files containing 10 stations can be used to force a model on a single station among these 10). It is also now possible to scan for all files with a given extension within METEOPATH (and also to do it recursively). A small bug has been fixed in the parsing of station IDs: the whole data buffer was written to each station ID so it contained the real station ID padded with NULL characters, making simple string comparisons fail.
- When creating meteo timeseries from a large number of stations, the NetCDF "title" attribute should be nicer (limiting the string lenght instead of just the number of stations to switch to a generic title instead of the list of station ids).
- Two configuration keys have been added to NetCDF: NC_STRICT_SCHEMA and NC_LAX_SCHEMA (one only writes out parameters defined in the schema and the other writes out every possible parameter even when there are no metadata at all. Default is in between)
- A bug has been fixed in ncFiles::getParameterIndex: the wrong map was filled with out-of-schema variable information.
- Fixed a wrong comparison, when checking that all the stations to be written in a NetCDF file have all the parameters
- In NetCDF, the UREF / ZREF and DEFAULT_SLOPE / DEFAULT_AZI parameters are now read from the [Output] section. The stations' names are now truncated if they are too long.
- Adding a missing CF attribute, reverted time units to "min" for CROCUS
- When reading stations' timeseries in Netcdf, the nodata and units were not accounted for...
- When reading meteo timeseries from Netcdf, the projections parameters were not used, leading to incomplete Coords objects
- Improvements to the generated NetCDF files: the variables used as "axis" receive an attribute that helps to understand their role (for example, cdo can now reliably understand the structure of meteo timeseries).
- support for non-standard parameters in NetCDF (ie parameters that are not in MeteoGrids::Parameters)
- Added the runoff and OLWR parameters to the CF NetCDF schemas
- In NetCDFIO, extra fallbacks for "time" dimension names
- it was not possible to write time-indepedent variables
- Small changes to properly handle writing out special grids (ie of the form "var@date")
- Adding a function to add attributes to non-standard variables for NetCDF output, proving some examples for Alpine3D defined field
- Correcting units for SWE in NetCDF output
- The WRF netcdf schema has been significantly extended
- The ECMWF Netcdf schema has been split into ERA_INTERIM and ERA5 to account for some small differences.
- Better documentation for the ERA5 downloads
- Some Netcdf files (ProSnow's Adamont files) were using numerical station IDS. This is now supported.
- Added 'cache_grids_out' in netCDF module to speed up writing (no need to read all the dates at each writing), implemented only for grids, not for meteo files.
- Improving efficiency of NetCDF plugin by keeping files open in between reads and writes. This behaviour can be switched off by NC_KEEP_FILES_OPEN (to prevent exceedance of the system determined concurrent open files limit).
- Adding functionality that both parameters and time periods can be spread over multiple input NetCDF files, and the plugin will identify the right file to use.
- All the ACDD keys are now prefixed by ACDD_
- In ACDD, the "program" key has been added
- Fixed a NetCDF bug: nodata values must be preserved, even when performing units corrections!
- Bug fix for NetCDF plugin. The top row was not written out due to a wrongly constructed for-loop.
- CSV
- In a CSV file, when providing a specification string to extract metadata from either the header or the file name, it is now possible to append multiple values for the ID or NAME metadata (simply by providing multiple time these fields)
- It is now possible to parse decimal seconds, thus supporting sub-second resolution timestamps
- The handling of datetime_spec, date_spec and time_spec was leading to potential conflicts and it was not taking into account the priority of the station-specific definitions over the global ones
- Some CSV tweaks: 1) allow Cartesian coordinates 2) allow different delimiter for header lines 3) possibility to set which column gets renamed by special param parsing 4) cleanup of parsed field names to make sure it goes through the SMET plugin
- A new option has been added to CSV: "dequote" to remove all quote from each line before parsing. This is useful for spreadsheets outputs that might (or not) enclose all fields in quotes
- It is now possible to read and parse CSV files containing multiple stations in one file, each line containing a station ID (declared as ID field). Which ID to keep is either the CSV_ID ID or the ID provided by the CSV#_FILTER_ID configuration key. A file can contain different lines length for each ID (and the plugin would have to be called several times to provide the headers for all the different IDs).
- It is now possible in the Csv plugin to read components dates / time including when using a julian day number (number of days since the begining of the year). Mixing date/time strings and components is almost ready, but not fully yet...
- Better conversions for some units, added more recognized units and unknown units only emit a warning
- It is now possible to provide a units header (ie a space delimited list of units) in the ini file for the CSV plugin
- The handling of CSV files in descending order was not that great: when the file time span was bigger than the buffer, the earlier data would never be returned. This has been fixed by identifying the order just after reading the headers (by parsing the first 10 lines of data after the headers). In the case of descending order, no file indexing is used (so very large files might be very slow to process).
- There was a logical failure in the pre-reading of data in CsvIO in order to figure out if the file is in increasing or decreasing order: the timestamp was read before the parsing of the headers had been done, so in effect it would not have been able to work for files that don't have their timestamps in the first column. Now, as soon as the headers have been read, the fields are parsed so the timestanps can properly be parsed in the subsequent data lines.
- if CSV_COLUMNS_HEADERS is > CSV_NR_HEADERS, it now gets reset to IOUtils::npos (since it does not make any sense)
- if the coordinates of the station are provided in the headers or in the file name, the POSITION key is not mandatory anymore
- Imis
- The Imis plugin was relying on Coordsys to compute the lat/lon for stations read from the database. This was wrong since the database is hard-coded CH1903. This is now also hard-coded in the plugin (so the COORDSYS key can be used for other input parameters).
- In ther IMIS plugin, the ANETZ stations providing precipitation at xx:00 were not properly handled, this has been corrected.
- It is possible to use the IMIS plugin without any tnsnames.ora configuration file. This has been documented in the plugin documentation (this makes the Oracle client installation much easier: the includes and the libs are now enough and can be copied at a standard location).
- Smet
- Proper handling of nodata in header fields
- A new key has been added to SMET headers: column_separator. If the column separator is different from white spaces, it will contain the character that is used. This is needed by some databases import softwares but makes the generated files non-conformant.
- It is now possible to output smet files where the headers are all commented out (for easy import into Dbs). This makes the smet files non-conformant (and they can currently not be re-read), but this was requested by some users (and avoid having to redevelop a fully new plugin for this)
- It is now possible to write ACDD search and discovery metadata in the headers of smet files
- the parameters long names and units are now more relying non MeteoGrids (to avoid redundancies) and can now support several versions of any given parameter (for example, TA, TA_1, TA_2). The version number MUST be delimited by '_' and come last (for example, it MUST be VW_MAX_2)
- the range for RH was wrong (0-100 when it has always been 0-1)
- The plot colors that are not attributed now receive "-" as color instead of a grey value, the plotting client can now choose how to handle such undefined values itself
- Oshd
- More ways to recognize the "°C" units in oshd
- Improving an error message and enforcing more checks on the units in oshd
- Fixed more units issues for oshd. This is just a workaround, as I understand the problem recent versions use multibyte encoding and this is not supported by MatIO...
- Arc
- making the code consistent with documentation (default ext is .asc)
- Added a check on cellsize to catch lat/lon grids provided as ARC (cartesian grids)
- Other plugins
- DBO: Removing an unused time zone parameter (DBO is always GMT)
- Introducing simple output-only plugin for WISKI databases
- First version of a plugin to read GOES data streams (data encoded to be transmitted through the GOES satellites). It is still in testing...
- A plugin has been written to read Argos transmitted data. It is still in testing...
- Finally submitting the PMOD plugin so it does not get lost...
- Removing the BORMA and GSN plugins, since the former has not been used in more than 10 years and the latter will not be usable anymore (the last GSN server got shutdown a month ago)
Documentation and usability
- Small documentation layout bug in Coords
- Updated a link in the documentation for the official wgs84 - CH1903 coordinates conversion
- Updating old plugin developer guide
- The spatial resampling examples were missing the [Input] section keys, making it confusing for some users
- Update the spatial interpolations example code snippet (it was referring to some algorithms that had been renammed a long time ago)
- some code examples were using an invalid Date() call
- Changing the default merge strategy, to make it less surprising (hopefully)
- Improved error message when timestamps errors are found (now tells the user that there are time filters for this)
- Improved documentation for the ALSscale spatial interpolation
- Better documentation for the AllSky ILWR generator (including plots), so one can understand what to expect from the parametrizations
- Renaming the traditional Makefile in the examples in order to make it clearer it is only an example
- More helpful error messages when setting both lat/lon and x/y in UTM and the checks don't validate (ie now checking specifically if the UTM zone is wrong and advising the user about it if this is the case)
- Giving the [Generators] section for the data generators examples.
- What the plugins provide has now been split between input and output capabilities.
- Small documentation improvement for the raw data editing
- Now there is one applcations that is built alongside MeteoIO: meteoio_timeseries. This is a user-friendlier version of data_converter
Filters
- Time filters
- The Time and Suppr filters could not be used with the exclude / only options, this is now fixed.
- The ability to delete contiguous periods of data has been added to the SUPPR filters. So it is now possible for example to delete up to 50% of the data by chunks of 24 hours.
- The arguments parsing of the TimeSuprr filter has been redone: now there is a "type" argument to specify how the filter should operate
- more consistent formatting for time filtering error messages
- A new operation mode has been implemented in the TimeSuppr time filter: CLEANUP, that removes invalid or duplicated timestamps (but a warning is printed as this is still a worrying issue for the dataset quality).
- A new TimeFilter has been implemented to sort the timestamps in increasing order
- Fixing UnDST that would break when a) multiple years of corrections are in the corrections file and b) data is requested from a period after the first pair of corrections would apply. I. e. we find the correction that is relevant for the first data point.
- DST starting to be annoying as usually... fixing other extreme, where all data lies before the first correction.
- implementation of a timeLoop filter that repeats a given period of data over and over filling the time period requested by MeteoIO's caller. This is meant to spin up model runs.
- New filters
- Added as particule filter
- Added a Kalman filter
- New easy filter: conditional min/max; e. g. IF HS > 0.3 THEN MAX(TSS) -> 0.
- FilterMaths to evaluate arithmetic expressions with available meteo data
- A filter (RHWATERTOICE) to correct relative humidity over water for the full temperature range to relative humidity over ice when T below triple point and over water otherwise. For example useful for weather stations that report relative humidity w.r.t water for the full temperature range.
- A filter to transform wind direction or wind speed components from WGS84 to a PROJ4 coordinate system.
- Filter improvements
- Enabling SOFT keyword for PotentialSW filter, allowing to compare ground value instead of top of atmosphere (so the filter can intuitively be the curve you get when generating ISWR), and including P in the calculation if available.
- Filter PotentialSWR can now rudimentarily handle measurements aggregated by the data logger
- In the FilterPotentialSW, the averaging period units have been set to seconds
- Added some very basic min/max rules to the TA and RH criterias in FilterPotentialSW to avoid arithmetic exceptions.
- The despiking filter would sometimes add spikes through the cubic fit. Now you can set the interpolation degree and also disable it (and maybe resample later via [INTERPOLATIONS1D]
- adding some flexibility to NO_CHANGE by enabling to set the maximum allowed variance
- Added an option to write out the quantiles in ProcQuantileMapping (so it is possible to run once to get the quantiles, compute the factors and then run with the correction factors)
- Making the corrections file of Quantile Mapping more tolerant: if the 0 or 1 values are not provided, they are generated
- A bug has been found in the deGrass filter: the TSS offset was used in K when it should have been a relative value... A new option has been introduced, to provide the said offset (because the automatic method to compute it might not always work fine)
- two new methods added to ProcessingBlock: one to query if a given filter excludes some stations and another calling that will be used by the time filters to actually modify the start and end times.
Packaging
- When compiling the examples, the search paths were not correctly set for a compiled but not installed MeteoIO
- The FindMeteoIO macro can now also be used from the examples (the in-tree version has priority)
- The search priorities in FindMeteoIO were not ideal, this should now be much better.
- it is now possible to compile MeteoIO and the provided examples with VC++
- Edited cpack's dependencies to work properly on Centos
- Added the possibility to add custom compiler flags (this could be important for some packages)
- A small change that could make packaging much easier: now the directory where the binary is (ie ./) is added to the RPATH, so on Linux and osX copying necessary libraries next to the binary should be enough
Handling of the configuration file
- Adding the possibility to get the list of all sections found in a config file
- Added the possibility to check if a given section exists within a Config object
- A method to move all configuration keys from one section to another one has been implemented
- It is now possible to resolve environment variables in ini files as well as refer to other keys (either as ${key} in the same section or as ${section::key} in another section)
- It is now possible to evaluate arithmetic expressions (based on the tinyepr library) in values (the evaluation is performed on parsing the key), using the ${{expression}} syntax.
- The check for circular dependencies (in includes) has been redone, it is more logical and should be very reliable
- In Config, keys that contain spaces are now rejected.
- When looking for key matches anywhere in the Config, the section part was not handled properly, leading to mistakes if a section name was contained in a key name (example: keys containing "generators" in the [input] section being wronly identified as matches for keys in the [Generators] section). This has been fixed (the match with the section name has to be at the beginning of the string).
- A method to read Date() objects from Config has been implemented.
matrices
- A few matrix class enhancements to retrieve and set rows and cols by index, get the diagonal, the max coefficient with index, and resize with a data vector to init
- Offering an alternative, completely independent, solver for linear systems via Gauss elimination with partial pivoting. Matrix inversion is about 2.4 times slower than the Doolittle version but it can handle rogue zeros at the diagonal.
- Singular value decomposition implemented, albeit the most easy one, and therefore probably still not enough to tackle matrices that don't behave
- Two matrix helpers that will be useful in the future (Householder): L2 norm and submatrix extraction. toString() can now take a precision and can output the raw matrix (e. g. to write to file like in the particle filter).
Spatial interpolations and resampling
- spatial resampling
- When using "special" mode of operation (such as virtual stations), it might be necessary to tweak the processing (for example, changing the precipitation re-accumulation period to match VSTATION_SAMPLING_RATE). This means that the Meteo!dInterpolation needs to know if it is running in such as special mode and in this case, which rank it has (is it the first pass or the second one?). This is now done by passing this information around. Moreover, the VSTATION_SAMPLING_RATE key is now mandatory (since there is no reason to justify a 1h default).
- When using virtual stations, the buffering was not always properly handled, leading to periods of nodata when data was actually available
- Finally, supporting reading lat/lon grids and extracting data directly as lat/lon! When a lat/lon grid is read, it is kept so until GridsManager calls for a cartesian reprojection (currently, still relying on the same very primitive computation). If the GridsManager has been requested to extract a point from the grids, the reprojection will not occur so the accuracy should be much improved for large or high latitude grids. So far, only the Netcdf plugin relies on it.
- When extracting timeseries from grids and adjusting the coordinates of the virtual stations to match the grid coordinates, the adjusted coordinates could get grossly wrong if extracting the data from a lat/lon grid (because the calculated cellsize could be grossly wrong, such as close to the poles).
- There was a problem when rebuffering in GRID_SMART mode
- The rounding direction in gridify() had been changed to "nearest". This has been reverted since this seems less than ideal and might create surprising behavior (a point just below the top right corner of the grid could be rounded to the last cell +1, therefore being considered as outside the grid).
- spatial interpolations
- There a was bug leading to possible exceptions when spatially interpolating non-standard variables that would not be always present for all the stations (for example, mixed snowpack smet results containing HN24 with forcing files not having this parameter)
- The IDW_Slopes spatial interpolation needs to cache the data and was failing to clear the data before the next run, leading to accumulating more and more data (which is bad) and mixing data between different calls (which is even worst). This has been fixed and the code has been largely rewritten.
- The arguments parsing of the WINSTRAL spatial interpolations have been slightly changed: there is now a TYPE argument that is either AUTO, FIXED or REF_STATION. This selects the mode of operation and allows better parameters checks.
- Adding the option to provide time constant grids using the USER interpolation algorithm (thanks to Eric Keenan).
- Allowing the ListonWind algorithm to be used for other wind speed parameters (VW_DRIFT, VW_MAX) in addition to VW.
Temporal resampling and generators
- The ILWR parametrization by Carmona et al (2014) has been implemented
- It is now possible to run the WindComponent generator over long timeseries where the U,V components are not always present
- A new RadiationComponents data generator has been implemented to generate the global radiation from the direct and diffuse components
- Fixed a bug in DailyAverageResampling: the range parameter should not be mandatory
- Fixed an annoying behavior: the Accumulate resampling would return 0 when accumulating before / after the period of raw data... It now returns IOUtils::nodata
API improvements
- A new constructor has been written for Date() that supports providing seconds as double
- A new method "isNodata" has been implemented in MeteoData, so now data_converter only provides a data line for a given station when the said station actually has some data.
- When IOUtils::readLineToVec() finds a line ending with a user-provided delimiter, it will return an empty string for this position (so the vector will contain the correct number of elements).
- Restructuring of the GridsManager, hoping for more clarity (code cleanup, renaming of methods, writing of documentation)
- new call to Date to set the date by julian day number
- New options for MeteoData::toString() have been created (so it is possible to get an output better suited to vectors of MeteoData).
- Introducing a universal function to convert wind speed components to wind direction, and vice versa, with additional corrections for wind component calculations.
Performance improvements
- Full rewrite of getNearestValidPts in ResamplingAlgorithms: when a gap is identified, this information is kept so looking for a gap next time will be faster (only the last gap is kept in memory). This leads to ~10% speedup in usual cases (hourly or half-hourly values, window_size ~ a few days) but will lead to major improvements for high resolution data (sub-second resolution with window_size of a few days should be many times faster).
- Trying to reduce memory consumption (issue 780) as well as memory re-allocations (ie. growing a vector because it was not sized at start). Extracting and converting one year of half-hourly data now takes 25 to 30% less memory. This has been achieved by using vector::resize or vector::reserve() whenever possible as well as std::swap() to prevent copying the whole dataset when filtering vecMeteo (in TimeSeriesManager and MeteoProcessor).
- In order to prevent using the caching data from one station for another one, a hash is now built in TimeSeriesManager in a way that should be quite safe (ie unique for each station). This has slightly changed the interface to the resmapling algorithms (therefore the large number of changed files) and is also now used instead of ad-hoc hashes (badly) built locally by some other resampling algorithms (for example for the solar radiation).
Timeseries and Grids and Coordinates handling
- Timeseries
- fixed a major bug in MeteoBuffer: when requesting the data start / end, the first station was always ignored!
- live data qa tracking on per parameter basis (so some plugins can output an indicator for each point that has been filtered/interpolated/generated). The DATA_QA_LOGS flag only controls log file output.
- When renaming a parameter (such as TA::MOVE = ...), now a merge is performed: if the original parameter had a value, it is preserved otherwise the first new parameter that has a value has priority
- Some merge mode could duplicate the last timestamp of a vector if the same timestamp was found in the "from" vector
- Fixed a small bug in MeteoData::mergeTimeSeries. In some cases, one data point was not merged and lost...
- The handling of PINT in SMET was a big hack: if such a parameter was found, it was directly converted into PSUM within the plugin, leading to surprising properties for the user. In order to be more predictable, it is not done anymore, PINT is treated like any other non-standard parameter. Then the Aggregate filter can now sum a parameter over the last time step, so it can convert PINT into PSUM (depending on user feedback, we will see if it remains this way or if this should be done in a data generator / creator).
- Fixed some potential source of bug in IOUtils::count()
- Grids
- The serialization operators for the Grid2DObject class were wrong, they have been corrected (sorry for the inconvenience)
- A bug has been found in DEMObject::calculateHick() : in some cases the normals would be set to nodata because they could not be computed but the slope was not set to nodata, leading the dem in an inconsistent state. Moreover, the sanitize() method has been fully redone in order to check the dem even if the slope_failures and/or curvature_failures counters are zero (so if the slopes are manually set, they will still be checked)
- Coordinates
- When calling Coords::setLatLon(), there was no way to force them to nodata. It is now possible.
- The UPS coordinate system was using the wrong epsg code (NATO version with reversed easting/northing) and was also wrong in the Antarctic (latitudes must be positive for the UPS calculation).
- Trying to address issue 774: now when both lat/lon and east/north coordinates are provided, their consistency is checked (it used to be this way a while ago but somehow this behavior ceased, probably because of some refactoring in Coords).
Wrappers and tools
- a Python wrapper has been written, so it is possible to access MeteoIO from Python. It uses Cython (https://cython.org/) and works on Linux, Windows and Mac
- Contributing a light-weight R SMET parser
Other bug fixes
- Random Numbers Generators: A wrong LCG multiplier/increment combination was found by Stefan Kanthak (thanks!) where the cited performance was not valid for increment!=0
- Tiny readability concern, and better header include order in RNG.
- Fixing wet bulb temperature calculation at the cost of a loop