wiki:SoftWare/ProjectMeta

Version 16 (modified by sommeria, 2 years ago) (diff)

--

SoftWare / ProjectMeta - Meta project for open data management

Aim

Project-Meta is a software to help you to manage your open data, using the protocol OpenDAP. The initiative is supported by the European Commission as part of the project Hydralab+ of the Horizon 2020 programme. This programme requests that research data are open access, that is providing online access free of charge to the end-user and reusable. Furthermore access must allow the right to copy, distribute, search, link, crawl and mine the data. In addition to these general requests, we aim at achieving the following goals:

  1. Allow the end user to scan and visualise the data without downloading.
  2. Integrate the process in the data analysis procedure, with minimal additional work.

The attached document OpenDAP_GM.pdf describes the wider motivation of the project.

OPeNDAP

The protocol OPeNDAP (Open-source Project for a Network Data Access Protocol). This includes standards for encapsulating structured data, annotating the data with attributes and adding semantics that describe the data. OPeNDAP is widely used by governmental agencies such as NASA and NOAA to serve satellite, weather and other observed earth science data.

The protocol is based on http, so that data can be scanned with an ordinary web browser. However added functionality of data visualization is provided by graphics programs (like Matlab, GrADS, Ferret or ncBrowse). Compared to ordinary file transfer protocols (e.g. FTP) a major advantage using OPeNDAP is the ability to retrieve subsets of files, so it is possible to work remotely without downloading whole data files. Although any file format can be use, data are often in HDF or NetCDF formats. The older NetCDF format is limited to arrays of numbers, while HDF provides wider possibilities of data structures (and it contains NetCDF as a particular case). We choose the NetCDF format which is sufficient for most experimental data and can be more easily read with a variety of software.

Description

The creation of the OpenDAP repository is done by the script project-meta using UNIX commands, scripts in Perl and C++.

The first step is to make a list of the data folders to display. This list needs to be introduced in a text file, complemented by some information about the authors of the work and about related publications. This text file must be put in your current folder with the name PROJECT-META.yml. The text structure must follow some simple rules consisting in the YAML format. An example could be found in the Project-Meta repository or online PROJECT-META.sample.yml.

project-meta help
man project-meta

PROJECT-META.yml meta file

This file is at the core of the procedure. A first task is to list the data folder to publish. For that purpose a good practice is to organise the data and the procedures of analysis such that the final data to publish are contained in folders named with specific extensions. Then search tools can be used to list all the selected folders. For instance the following unix command lists all the folders with extension '*.mproj*' and append it to the file PROJECT-META.yml (and creates the file if it does not exist yet):

find . -name '*.mproj*' -a -type d | sed 's/^/    - /;' >> PROJECT-META.yml

The find command only search the folders with the right extension (recursively) under the current one (.) and the sed command add 4 spaces and the dash at the beginning of each line in order to respect the YAML format.

The YAML file has to be complemented by general information about the authors and the related publications, following the template PROJECT-META.sample.yml.

Debian package

Debian is a GNU/Linux distribution. Debian (and certainly Ubuntu) package for amd64 arch could be download on: http://servforge.legi.grenoble-inp.fr/pub/soft-trokata/project-meta/download.

You can then install it with

sudo dpkg -i project-meta_*_amd64.deb

(just replace * with the version you have donwloaded).

Software repository

All code is under free license. Scripts in bash are under GPL version 3 or later (http://www.gnu.org/licenses/gpl.html), C++ sources are under GPL version 2 or newer, the perl scripts are under the same license as perl itself ie the double license GPL and Artistic License (http://dev.perl.org/licenses/artistic.html).

All sources are available on the LEGI forge: http://servforge.legi.grenoble-inp.fr/svn/soft-trokata/trunk/project-meta

The sources are managed via subversion (http://subversion.tigris.org/). It is very easy to stay synchronized with these sources

  • initial recovery
    svn checkout http://servforge.legi.grenoble-inp.fr/svn/soft-trokata/trunk/project-meta soft-project-meta
    
  • the updates thereafter
    svn update
    

It is possible to have access to writing at the forge on reasoned request to Gabriel Moreau. For issues of administration time and security, the forge is not writable without permission. For the issues of decentralization of the web, autonomy and non-allegiance to the ambient (and North American) centralism, we use our own forge...

You can propose an email patch of a particular file via the diff command. Note that svn defaults to the unified format (-u). Two examples:

diff -u project-meta.org project-meta.new > project-meta.patch
svn diff project-meta > project-meta.patch

We apply the patch (after having read and read it again) via the command

patch -p0 < project-meta.patch

Attachments (1)

Download all attachments as: .zip