March 12, 2010, Friday, 70

Metadata:Home

From SwissExperiment

Metadata:Home
Jump to: navigation, search

Metadata

Contents

About Metadata

Once the data has been acquired, the metadata becomes equally important.


Your data is streaming through GSN into SensorMap and you have users at several sites observing this data and using it for different purposes, if your data is public, you may not know who your users are.......You go into the field, repair a broken sensor, replace another and notice that another sensor is not working as it is frozen.....how do you communicate this to the users? If you merely write it in a notebook, they will never know about it.

Without metadata, many anomalies in the data are hard to interpret.


In addition to the basic wiki functionality, the semantic MediaWiki offers an easy to use though powerful technique of annotating wiki pages with semantics. This means that the semantic MediaWiki is able to store information in a dynamic database structure which is ideal for storing dynamic and/or static data about the sensor, but not provided by the sensor, i.e. the semantic MediaWiki is an ideal user interface for a metadata database.

The SwissEx metadata management wiki extensions are designed to aid projects in creating a metadata database on the hardware used in their experimental fieldwork as well as on the site itself. Data is entered using a user interface designed to guide the user in entering hardware information and observations. The semantic wiki then automatically stores this metadata in an SQL database so that it may be queried along with the corresponding data.

Storage of this data in a database format means that it may be later queried for annotating the data. For this reason, the database created using the SwissEx extensions may be read by either the SPARQL GSN wrapper or directly by SensorMap to provide information entered through the wiki interface directly where it is needed: alongside the data.


What information is considered as metadata?

In Swiss Experiment, four types of metadata are considered.

  • Experiment centric static information - background information on the experiment, the overall structure and the phenomena which the sensor network is there to measure, which must be stored once.
  • Sensor centric static information - background information such as the type and location of each sensor, which must be stored once.
  • Dynamic sensor information - information on the servicability of the sensor i.e. whether it is deployed, stored, broken etc.
  • Dynamic data quality information - a measure of the quality of the data which requires a continuous variable.


Experiment Centric Static Information

Storing free text within the wiki is ideal for scientific notation of the experiment centric static information, i.e. information which is not suited to a database structure and which would not be useful in annotating the data. It allows scientists easy collaborative access to a central store of background information.

Sensor Centric Static Information and Dynamic Sensor Information

Dynamic sensor data can be either user entered (e.g. installation times or observations) or where available may be streamed real-time with the data (e.g. battery status), whereas Sensor-centric static information is always user entered. SwissEx user entered metadata will be entered via the semantic wiki (see below) e.g. the Wannengrat metadata store. This metadata store will be integrated with SensorMap and used to automatically tag the data with the metadata.

Dynamic Data Quality Information

Data quality information requires a measure of quality for every data point, hence this is best stored in the same database as the data. This quality value is calculated using algorithms which infer a past knowledge of data from a particular sensor and may predict limits on min/max possible recorded values. A quality factor of 0-5 is suggested. Algorithms are to be created which will check e.g. the standard deviation of the data to check if a wind sensor is frozen, but as a first step, we will start with defining data ranges. If the data is outside these ranges, the data reliability can automatically be set to 0.


This approach to metadata recording provides two databases, a metadata and a data database (which also includes metadata). Full integration of the metadata and data databases would allow a single data quality variable as well as annotation of data/plots.

  • Example 1: consider the case where a data quality algorithm highlights data outside a specific standard deviation as anomalous data. When a sensor fails and provides a constant output, the data quality algorithm still calculates the data as within the limits. A manual record of when this sensor failed and when it was repaired and reinstalled, could be used to override the algorithm calculated quality values and highlight the data as unusable. Annotation of the data/plots would tell the analyst why this data was unusable.
  • Example 2: consider the case where snowfall or wind periodically corrupt a snow height measurement. Here, the manual record of sensor failures cannot be used to annotate the data as the sensor is working perfectly. The algorithm calculated data quality value could however be used to highlight the anomalous values.
  • Example 3: consider the case where changes in external factors such as temperature corrupt the data with a constant (or slowly varying) offset.

Although this is unlikely to be detected by the quality algorithm (unless the data could not otherwise vary in such a way), the semantic MediaWiki could be used to note that during this period, the values were subject to an offset and also to note the reason for the offset, which could be used to annotate the data and/or adjust the data quality variable.


Metadata Semantic Wiki pages

Station Management: A station management metadata database is the first such Semantic Wiki database produced by SwissEx. The user interface allows scientists to create stations and sensors, recording the capabilites of the sensor. The user may then create specific instances of each sensor (assigned by serial number) and track the movements of the sensor, i.e. record that it has been deployed on a station, removed, repaired or stored. Observations may also be recorded, e.g. reasons that the data may be invalid. When this is used to annotate the data it will provide the analyst with better information to inform his/her interpretation of the data.


A basic user guide for the station management tool is provided here (work in progress).

Entering metadata using SwissEx metadata forms

  • To create a fieldsite metadata page, click here (after creating a fieldsite page, please inform the administrator so that the page may be whitelisted (made public)).
  • To create a measurement location, click here


.....from this point on, all forms are created with random names, hence the links are dynamic and you must create a measurement location and click on the links from there (as pages need to be created from within a namespace)


These further forms allow you to:

  • Create a sensor model
  • Create a sensor instance (a specific serial number) and associate it with a model
  • Create multiple database parameters associated with a sensor instance
  • Register invalid data or observations associated with a database parameter
  • Attach calibration functions to a sensor
  • Attach an experimental method to a database parameter
  • Deploy a sensor


If you wish to provide 'metadata quicklinks' from within your namespace pages, the links are provided here, alternatively, if you wish to have coherent names for your pages instead of one click page creation, you can use the following links. The page names should be created according to NAMESPACE:Pagename


An example deployment is available for Wannengrat (currently location Wan1 is implemented). If you do not have access to this, please contact dawes@slf.ch

Changes to the metadata system should be requested through this page

NOTE: If you create any deployment test pages, please delete them afterwards as they will be recorded in our fieldsite database

Further development of metadata tools: A further development of the station management tool is in use in the wiki by the RECORD project. This interface is expanded to provide experimental methods and manual data entry. The interface is currently only available using data input via wiki-text, but will be further developed to include the use of form entry in the next months, providing a generalised metadata database of use to all field research.