March 11, 2010, Thursday, 69

SwissEx:Infrastructure

From SwissExperiment

Infrastructure
Jump to: navigation, search

The SwissEx Infrastructure

Contents


What benefits does the SwissEx infrastructure bring to the scientist?

The SwissEx infrastructure is designed to allow scientists to publish data (either for their own benefit, for public use or both), publish results for collaborative working or public information and to keep track of metadata on their measurements.

What are the tools that provide this?

A flow chart is provided below to show the steps and choices in the SwissEx infrastructure

Image:Structure.jpg


Metadata

The SwissEx wiki (the user editable website that you are looking at) is the main point for metadata entry. Projects can create their own sites within the SwissEx site and use this for project organisation and as a collaborative tool for discussing work. These pages can be open or password protected within the team and using the SVN which runs behind the wiki means that storage and collaborative working on a document are possible for widely distributed organisations. These pages are all free text.

This wiki runs Semantic Mediawiki software. This means that it is possible to store information in a structured way using the SQL database behind the wiki. In order to do this, projects have to define templates of the data that they wish to record (more about semantics entry is provided here), although for the majority of metadata recording tasks, the SwissEx team have predefined templates and provided a form-based data entry system. One outcome of this is that the SwissEx wiki contains a database of fieldsites and drilling down within this provides all of the metadata on the sensors and data.

This method of metadata entry provides a web based, distributed interface. Once this metadata has been entered, it can be read by both GSN and SensorMap to provide metadata alongside the data.


Data

Many different types of data are collected in the field and the SwissEx team have tried to provide one or more solutions to acquiring this data.

Data measured and entered by hand

There are two choices for entering this type of data:

  • Entry into a CSV file: if the data is unlikely to be changed at a later point, the easiest way to put the data into the system is to write it to a CSV file and save it to a file system where GSN has access. Using the CSV file wrapper, GSN can continually check the file for new data. When you append the next data set to the file, GSN will automatically update the database.
  • Entry into the wiki: if you think that you are going to be adjusting your data values at a later point, the best way to acquire data would be to set up a template in the wiki and use the Excel macro developed by the SwissEx team to upload the data to the wiki. In this way, distributed access to the data is available and the data is editable. Using the GSN SPARQL wrapper, this data can be synchronised into the GSN database.

Logger data collected by hand

This uses the same principle as the CSV file entry above: collecting the data from your loggers and writing it to a CSV file on the filesystem, GSN can acquire the data using the CSV wrapper.

Logger data acquired by 3rd party technology

Loggers such as Campbell loggers which acquire data and send it back over wireless networks to proprietory software, generally write the data to CSV files. Using the same principle as above, [[these files can be continually monitored by GSN for new data. When the data is appended to the file, GSN will update the database.

Streaming data from the sensor

GSN has a number of wrappers for acquiring data directly from the sensor or sensor network, e.g. for cameras, gps sensors, SensorScope stations, DTS sensors, etc. Using these wrappers, the data is acquired by GSN without the need to be written to a disk space in between.

Non time-series data

Non time-series data requires careful consideration of the best way to store it. Please contact the SwissEx infrastructure team to discuss your requirements


GSN

GSN is a sensor middleware, acquiring the data from a variety of sources as above, storing it in a structured database and providing a web interface for querying, downloading and displaying data as well as web services for 3rd party software to query the data and return the result. These interfaces allow for a variety of visual interfaces to the SwissEx data.

GSN is also becoming more and more capable of advanced processing of the data. An R interface will soon be available behind GSN, allowing GSN to run R scripts and return the result as a plot or as data to the database. The SPARQL wrapper will also eventually enable GSN to take data from various streams and 'cut and paste' them into new streams according to the metadata. This means that when sensors and loggers are exchanged, GSN can keep writing the data to the correct location.


Visual Interfaces

GSN

The GSN web interface provides a web based query interface which sits directly on top each of the distributed databases. This provides a basic query interface for the data in that database. Latest data is provided in a list based query interface and querying of historical data is provided via a query interface based on drop down lists. Basic metadata is used to provide a map of the sensors.

SensorMap

Whereas GSN has a limited ability to communicate between instances, SensorMap provides the glue to bring together all of the data from the separate instances into a single web interface

Microsoft SensorMap can be considered the central hub of SwissEx data access: the glue to bring together the SwissEx distributed GSN infrastructure into a single interface. GSN virtual sensors can be registered into SensorMap either using a very simple form system or registered directly by GSN. This registration system means that all instances of GSN from the various organisations is made available in a single interface and can be queried, downloaded and visualised as if it were a single database. SensorMap uses a map based interface, where clicking on a sensor will provide a list of the latest data and drilling down further into this list will provide a plot of the data. Registration of the sensors into usergroups means that tight user access rights are also maintained.

Microsoft Research have developed a user interface for sensor data retrieval, specifically to the requirements of SwissEx. The interface is controlled mainly through a visual map interface incorporating a relief model and aerial photography/satellite photographs. The positions of sensors and/or model results will be overlayed onto this. Microsoft SensorMap has been integrated with GSN web interfaces to allow the user to query the sensor and retrieve real time sensor information through interaction with the map.

Wiki

An extension exists within the Wiki to show the latest data from a GSN virtual sensor and allow the user to download data. A second extension has also been developed which will allow this data to be plotted in an interactive plot inside the wiki.


This has been a basic overview of the technologies used within the SwissEx infrastructure. For more information on the various components, please click on the links provided.


SwissEx Infrastructure Example Use

A real example of how this infrastructure can be used is found in the BigLink project. The runoff measurement instrument consists of a collection receptacle which must be emptied by the field engineer once it exceeds a certain level.

Previous to SwissEx, the liquid volume data was automatically returned to a ‘Loggernet’ instance which laid down the data into a csv file. The field engineers did not have access to this server and had to regularly call the project manager in order to check on the level of liquid in the bottle.

With SwissEx in place, the csv file will be read by GSN and stored in the GSN database. This means that the csv file may be overwritten on each update. Through either GSN or the SwissEx wiki, the field engineers may view the data online. An algorithm will be included in the virtual sensor which will check whether the data has been consistently above a threshold (to avoid false alarms) or possibly will check that the runoff has increased consistently until it exceeds the threshold. GSN will then send a SMS/text message to the field engineer to tell him/her that the runoff measurement equipment requires attention. The field engineers can then check on the calendar in the wiki to find out if anyone is going to the site in the near future and ask them to empty the receptacle. On emptying the receptacle, an observation is entered into the Metadata wiki entry pages that this work was carried out on this day at this time.

Through GSN and SensorMap, the analyst will be able to plot the runoff data against the meteorological station data (surface temperature, precipitation, air temperature) and e.g. may perform correlations to find the lag between the meteorological parameters and the increase in runoff volume. This data will be tagged according to the quality of the data as well as having tags to inform them that the receptacle had been emptied at a certain point, hence the inconsistency in the data. This eliminates the requirement for the analyst to go back to the field engineer’s logbooks and try to work it out for himself. When modelling, the model can be set to ignore any values which have a bad reliability tag, e.g. large jumps in values or very constant values which may indicate that the instrument is stuck. This saves the manual editing of data prior to entry into the model.