From SwissExperiment
Metadata:Home
Metadata
Once the data has been acquired, the metadata becomes equally important.
Your data is streaming through GSN into SensorMap and you have users at several sites observing this data and using it for different purposes. If your data is public, you may not know who your users are.......You go into the field, repair a broken sensor, replace another, and notice that a third sensor is frozen and therefore not working.....How do you communicate this to the users? If you merely write it in a notebook, they will never know about it.
Without metadata, many anomalies in the data are hard to interpret.
In addition to the basic wiki functionality, the semantic MediaWiki offers an easy to use though powerful technique of annotating wiki pages with semantics. This means that the semantic MediaWiki is able to store information in a dynamic database structure that is ideal for storing dynamic and/or static data about the sensor, but not provided by the sensor, i.e. the semantic MediaWiki is an ideal user interface for a metadata database.
The SwissEx metadata management wiki extensions are designed to aid projects by creating a metadata database about the hardware used in their experimental fieldwork as well as about the site itself. Data is entered using a user interface designed to guide the user through the process of entering hardware information and observations. The semantic wiki then automatically stores this metadata in an SQL database so that it may be queried along with the corresponding data.
The database created using the SwissEx extensions may be read using SPARQL queries to provide information entered through the wiki interface directly where it is needed: alongside the data.
What information is considered as metadata?
In Swiss Experiment, four types of metadata are considered:
- Experiment centric static information - background information on the experiment, the overall structure and the phenomena which the sensor network is there to measure, which must be stored once.
- Sensor centric static information - background information such as the type and location of each sensor, which must be stored once.
- Dynamic sensor information - information on the serviceability of the sensor i.e. whether it is deployed, stored, broken etc.
- Dynamic data quality information - a measure of the quality of the data which requires a continuous variable.
Experiment Centric Static Information
Storing free text within the wiki is ideal for scientific notation of the experiment centric static information, i.e. information which is not suited to a database structure and which would not be useful in annotating the data. It allows scientists easy collaborative access to a central store of background information.
Sensor Centric Static Information and Dynamic Sensor Information
Dynamic sensor data can be either user entered (e.g. installation times or observations) or where available may be streamed real-time with the data (e.g. battery status), whereas Sensor-centric static information is always user entered. SwissEx user entered metadata will be entered via the semantic wiki (see below) e.g. the Wannengrat metadata store. This metadata store will be integrated with SensorMap and used to automatically tag the data with the metadata.
Dynamic Data Quality Information
Data quality information requires a measure of quality for every data point, hence this is best stored in the same database as the data. This quality value is calculated using algorithms which infer a past knowledge of data from a particular sensor and may predict limits on min/max possible recorded values.
A quality factor of 0-5 is suggested.
Algorithms are to be created which will check e.g. the standard deviation of the data to check if a wind sensor is frozen, but as a first step, we will start with defining data ranges. If the data is outside these ranges, the data reliability can automatically be set to 0.
This approach to metadata recording provides two databases, a metadata and a data database (which also includes metadata). Full integration of the metadata and data databases would allow a single data quality variable as well as annotation of data/plots.
- Example 1: Consider the case where a data quality algorithm highlights data outside a specific standard deviation as anomalous data. When a sensor fails and provides a constant output, the data quality algorithm still calculates the data as within the limits. A manual record of when this sensor failed and when it was repaired and reinstalled could be used to override the algorithm calculated quality values and highlight the data as unusable. Annotation of the data/plots would tell the analyst why this data was unusable.
- Example 2: Consider the case where snowfall or wind periodically corrupt a snow height measurement. Here, the manual record of sensor failures cannot be used to annotate the data as the sensor is working perfectly. The algorithm calculated data quality value could, however, be used to highlight the anomalous values.
- Example 3: Consider the case where changes in external factors such as temperature corrupt the data with a constant (or slowly varying) offset. Although this is unlikely to be detected by the quality algorithm (unless the data could not otherwise vary in such a way), the semantic MediaWiki could be used to note that during this period the values were subject to an offset and also to note the reason for the offset. This information could be used to annotate the data and/or adjust the data quality variable.
Metadata Semantic Wiki pages
Station Management: A station management metadata database is the first such Semantic Wiki database produced by SwissEx. The user interface allows scientists to create stations and sensors, thereby recording the capabilities of the sensor. The user may then create specific instances of each sensor (assigned by serial number) and track the movements of the sensor, i.e. record that it has been deployed on a station, removed, repaired or stored. Observations may also be recorded, e.g. reasons that the data may be invalid. When this information is used to annotate the data, it will provide the analyst with better information for his/her interpretation of the data.
A basic user guide for the station management tool is provided here (work in progress).
Entering metadata using SwissEx metadata forms
- To create a fieldsite metadata page, click here (after creating a fieldsite page, please inform the administrator so that the page may be whitelisted (made public)).
- To create a measurement location, click here
.....from this point on, all forms are created with random names, hence the links are dynamic and you must create a measurement location and click on the links from there (as pages need to be created from within a namespace).
These further forms allow you to:
- Create a sensor model
- Create a sensor instance (a specific serial number) and associate it with a model
- Create multiple database parameters associated with a sensor instance
- Register invalid data or observations associated with a database parameter
- Attach calibration functions to a sensor
- Attach an experimental method to a database parameter
- Deploy a sensor
If you wish to provide 'metadata quicklinks' from within your namespace pages, the links are provided here.
An example deployment is available for Wannengrat. If you do not have access to this, please contact Nicholas Dawes (wiki homepage)
Changes to the metadata system should be requested through this page.
NOTE: If you create any deployment test pages, please delete them afterwards as they will be recorded in our fieldsite database .
Further development of metadata tools: A further development of the station management tool is in use in the wiki by the RECORD project. This interface is expanded to provide experimental methods and manual data entry. The interface is currently only available using data input via wiki-text but will be further developed to include the use of form entry in the next months, providing a generalised metadata database of use to all field research.