236
A. Valente et al.: A compilation of global bio-optical in situ data
Earth Syst. Sci. Data, 8, 235-252, 2016
www.earth-syst-sci-data.net/8/235/2016/
27 NASA Goddard Space Flight Center, Wallops Flight Facility, Wallops Island, VA, USA
28 Institute for Marine Remote Sensing/ImaRS, College of Marine Science, University of South Florida,
St. Petersburg, FL, USA
2y Fisheries and Ecosystem Advisory Services, Marine Institute, Rinville, Oranmore, Galway, Ireland
30 NOAA/NESDIS/STAR/SOCD, College Park, MD, USA
31 Ocean Biogeochemistry and Ecosystems, National Oceanography Centre, Waterfront Campus,
Southampton, UK
32 IFREMER Centre de Brest, Plouzane, France
33 Biology Department, Woods Fiole Oceanographic Institution, Woods Fiole, MA, USA
34 Flarbor Branch Oceanographic Institute, Fort Pierce, FL, USA
35 Physics Department, University of Miami, Coral Gables, FL, USA
36 Physical Oceanography, Marine Optics & Remote Sensing, Royal Netherlands Institute for Sea Research,
Texel, Netherlands
Correspondence to: André Valente (adovalente@fc.ul.pt)
Received: 11 November 2015 - Published in Earth Syst. Sci. Data Discuss.: 19 January 2016
Revised: 18 May 2016 - Accepted: 19 May 2016 - Published: 3 June 2016
Abstract. A compiled set of in situ data is important to evaluate the quality of ocean-colour satellite-data
records. Flere we describe the data compiled for the validation of the ocean-colour products from the ESA Ocean
Colour Climate Change Initiative (OC-CCI). The data were acquired from several sources (MOBY, BOUSSOLE,
AERONET-OC, SeaBASS, NOMAD, MERMAID, AMT, ICES, HOT, GeP&CO), span between 1997 and 2012,
and have a global distribution. Observations of the following variables were compiled: spectral remote-sensing
reflectances, concentrations of chlorophyll a, spectral inherent optical properties and spectral diffuse attenuation
coefficients. The data were from multi-project archives acquired via the open internet services or from individual
projects, acquired directly from data providers. Methodologies were implemented for homogenisation, quality
control and merging of all data. No changes were made to the original data, other than averaging of observations
that were close in time and space, elimination of some points after quality control and conversion to a standard
format. The final result is a merged table designed for validation of satellite-derived ocean-colour products and
available in text format. Metadata of each in situ measurement (original source, cruise or experiment, principal
investigator) were preserved throughout the work and made available in the final table. Using all the data in a
validation exercise increases the number of matchups and enhances the representativeness of different marine
regimes. By making available the metadata, it is also possible to analyse each set of data separately. The compiled
data are available at doi: 10.1594/PANGAEA.854832 (Valente et ak, 2015).
1 Introduction
Currently, there are several bio-optical in situ datasets world
wide suitable for validation of ocean-colour satellite data.
While some are managed by the data producers, others are
in international repositories with contributions from multi
ple scientists. Many have rigid quality controls and are built
specifically for ocean-colour validation. The use of only one
of these datasets would limit the number of data in valida
tion exercises. It would therefore be useful to acquire and
merge all these datasets into a single unified dataset to max
imise the number of matchups available for validation and
their distribution in time and space and consequently reduce
the uncertainties in the validation exercise. However, merg
ing several datasets together can be a complicated task. First,
it is necessary to acquire and harmonise all datasets into a
single standard format. Second, during the merging, the du
plicates between datasets have to be identified and removed.
Third, the metadata should be propagated throughout the pro
cess and made available in the final merged product. Ideally,
the compiled dataset would be made available as a simple
text table, to facilitate ease of access and manipulation. In
this work such unification of multiple datasets is presented.
This was done for the validation of the ocean-colour products
from the ESA Ocean Colour Climate Change Initiative (OC-
CCI), but with the intent to serve the broad user community
as well.
A merged dataset is not without drawbacks: it is likely
to be large and so not always easy to manipulate; because
the merging is done on pre-existing, processed databases,
one does not have full command of the whole processing
chain; and the dataset would be a compilation of observa
tions collected by several investigators using different instru
ments, sampling methods and protocols, which might even