Skip to content

This is the online end-user manual to the Data Access Made Easy initiative, tailored for the needs of Data Owners (as Data User won't see all the processing that happens in the background and only interact with the system through data portals).

General Workflow

As Data Owner who has some raw data files to provide through the system, here is an overview of the workflow to use the system.

At first, a configuration file (in the INI format) must be created that will describe the full data processing. This file is best created reyling on the Inishell GUI that is installed locally on your own computer. The following steps are recommended:

  1. Standardize the raw data This means configuring the data source and parsing (in the Input section) and the output format and metadata (in the output section). Take a subset of your data locally and work in Inishell until you can reliably generate a standardized output file (as SMET for an text-base format or as NetCDF for publication through OpenDAP). As the processing is performed by MeteoIO, more details will be provided in MeteoIO's documentation. Often particularly relevant is the documentation of the CSV plugin and specially how to troubleshoot complex CSV parsing.
  2. Apply low level Data Editing Data Editing is where you can delete known periods of invalid data (such as a broken sensor, failing data logger, etc). Many operations are supported, such as applying calibration corrections (offset correction, calibration factor), renaming parameters, swapping sensors… Choose any operations from the Data Editing or the Data Filters, knowing that there are sometimes several ways to achieve the same result. Please note that these kind of corrections will be expanded over the life of the station.
  3. Filter the data Meteorological filters are defined per meteorological parameter to validate, reject or correct data points. Any number of filter may be stacked per parameter. This provides basic data QA on your data. In order to monitor the performance of your station, you can toggle a special option DATA_QA_LOGS that will report any point that gets filtered.

Web service

Setting up Automatic updates

In order to automatically generate up-to-date datasets, define cron jobs to run the data processing at regular intervals and to push new data through an SSH connection to the system. Both sftp and scp are supported, although scp is required for non-interactive usage.

Go through the following steps (click to get more details):

  1. Choose the computer that will serve your data Select a computer that has access to both your raw data (coming out of your data acquisition system) and to the Internet. It must also have ssh installed (It is usually installed by default on Linux and MacOS as well as Windows 10 build 1809 and higher);
  2. Generate an ssh key If you don't already have one, you need to generate one. In a terminal window under the username that will push the data, type ssh-keygen↵ and answer all subsequent questions with (in order to keep the default values)
  3. Provide your public ssh key to the web platform
    • select the dataset you want to push data to;
    • select the Source Data tab;
    • click on the SFTP access and then select the add new key button;
    • provide a title that helps you understand which key is the one you are about to provide;
    • copy the content (as text) of your public key file in the Public key box. On Linux, your public key is in /home/{your username}/.ssh/id_rsa.pub. On MacOS, it is in /Users/{your username}/.ssh/id_rsa.pub. On Windows, it is in C:\Users\{your username}\.ssh\id_rsa.pub;
  4. Try manually pushing some data
    • from a terminal window, go to where your raw data is located;
    • from the dialog box where you have provided your key, copy your SSH username (provided as a long hash);
    • type the following command: scp -P 2200 {path to your data file} {your username from above}@{web service url}:/{destination path}↵. The destination path is understood relative to your dataset root: / would copy your data at the highest level of your dataset file structure while /input would copy your data in an input subdirectory;
  5. Automate the data push

    • this relies on tasks scheduling capabilities in your operating system;
    • Using Cron:
      • if necessary, set your default text editor: export EDITOR={name of your editor} (for example, your editor might be joe, nano, vi…);
      • create a new conjob: crontab -e;
      • if you want to job outputs to be emailed to you, add/edit a line on top of your crontab like MAILTO="myself@mydomain.eu" with your email address. If you prefer not to have any emails, add/edit a line on top of your crontab such as MAILTO="";
      • the crontab file contains one command per line, starting with when to execute the command (patterns for the minutes, hours, day of month, month, day of week) followed by the command itself. Please note that it does not necessarily execute in the same shell as your login shell!
        • you can use a cron schedule expression generator to configure when the data push will run;
        • the command that follows the schedule is the following: scp -P 2200 {path to your data file} {your username from above}@{web service url}:/{destination path}
        • as an example, sending the file WFJ2.csv from the /mnt/data/weissfluhjoch/ directory once an hour would look like: 0 * * * * scp -P 2200 /mnt/data/weissfluhjoch/WFJ2.csv 012a4180-d4a1-7c05@dame-server.slf.ch:/input
    • Using Task Scheduler:
      • from the start menu, type / search for the Task Scheduler application;
      • in the top menu, select Action then New Folder…;
      • give a meaningful name for the new folder (this is to keep things organized on your side);
      • select the created folder, then in the top menu, select Action then Create Basic Task…;
      • give your task a meaningful name, then next. Select the repetition interval you want, the next;
      • select the start time for your task, then next;
      • select Start a program as action you want to perform, then next;
      • as program/script write scp. As arguments, write -P 2200 {path to your data file} {your username from above}@{web service url}:/{destination path}. As Start in copy the path where your data is located, then next. Validate the last screen with Finish. If you want higher refresh rates than daily, you can now double -click on your task, move to the triggers tab, double-click on your trigger and fine tune the repetition intervals in the Advanced settings part of the window.