240
A. Valente et al.: A compilation of global bio-optical in situ data
Earth Syst. Sei. Data, 8, 235-252, 2016
www.earth-syst-sci-data.net/8/235/2016/
quality control step was to remove days where the standard
deviation was more than half of the daily average. This was
meant to identify days with high variability. Very few days
(N — 2) were removed with this test. These quality control
criteria were applied per wavelength, which resulted in some
observations with an incomplete spectrum.
2.2.3 AErosol RObotic NETwork-Ocean Color
(AERONET-OC)
The AErosol RObotic NETwork-Ocean Color (AERONET-
OC) is a component of AERONET, including sites where
sun photometers operate with a modified measurement pro
tocol leading to the determination of the fully normalised
water-leaving radiance (Zibordi et al., 2006, 2009). The
result of collaboration between the Joint Research Centre
(JRC) and NASA, this component has been specifically de
veloped for the validation of ocean-colour radiometric prod
ucts. The strength of AERONET-OC is “the production of
standardised measurements that are performed at different
sites with identical measuring systems and protocols, cali
brated using a single reference source and method, and pro
cessed with the same codes” (Zibordi et al., 2006, 2009). All
high-quality data (level 2) were acquired from the project
website for 11 sites: Abu_Al_Bukhoosh (~ 25° N, ~ 53° E),
COVE_SEAPRISM (~ 36° N, ~ 75° W), Gloria (~ 44° N,
~ 29° E), Gustav_Dalen_Tower (~ 58° N, ~ 17° E), Helsinki
Lighthouse (~ 59° N, ~ 24° E), LISCO (~ 40° N, ~ 73° W),
Lucinda (~ 18° S, ~ 146° E), MVCO (~41°N, ~70°W),
Palgrunden (~58°N, ~13°E), Venice (~45°N, ~ 12° E)
and WaveCIS_Site_CSI_6 (~ 28° N, ~ 90° W). The com
piled variable was rrs. Remote-sensing reflectance was com
puted from the original fully normalised water-leaving radi
ance (see Sect. 2.2.2 for definition). The solar irradiance (Fo),
which is not part of the AERONET-OC data, was computed
from the Thuillier (2003) solar spectrum irradiance by av
eraging Fo over a wavelength-centred lOnm window. Data
were compiled for the exact wavelengths of each record,
which can change over time for a given site depending on
the specific instrument deployed.
2.2.4 SeaWiFS Bio-optical Archive and Storage System
(SeaBASS)
The SeaWiFS Bio-optical Archive and Storage System
(SeaBASS) is one of the largest archives of in situ marine
bio-optical data (Werdell et al., 2003). It is maintained by
NASA’s Ocean Biology Processing Group (OBPG) and in
cludes measurements of optical properties, phytoplankton
pigment concentrations, and other related oceanographic and
atmospheric data. The SeaBASS database consists of in situ
data from multiple contributors, collected using a variety of
measurement instruments with consistent, community-vetted
protocols, from several marine platforms such as fixed buoys,
hand-held radiometers and profiling instruments. Quality
control of the received data includes a rigorous series of pro
tocols that range from hie format verification to the inspec
tion of the geophysical data values (Werdell et al., 2003). Ra
diometric data were acquired through the Validation search
tool, which provided in situ data with matchups for particular
ocean-colour sensors (Bailey and Werdell, 2006). The crite
ria in the search query were defined to have the minimal hag
conditions in the satellite data in order to retrieve a greater
number of matchups and therefore in situ data. Regarding
phytoplankton pigment data, they were acquired through
the Pigment search tool, which provides pigment data di
rectly from the archives. As stated in the SeaBASS website
(see Pigment tab at http://seabass.gsfc.nasa.gov/seabasscgi/
search.cgi), the Pigment search tool was originally designed
to return only in vitro huorometric measurements, which is
consistent with our approach, but over time chlorophyll a
measurements made using other methods (e.g. in situ fluo-
rometry) were included in the retrieved pigment data. In the
pigment data used in this work, a large number of in situ flu-
orometric measurements from continuous underway instru
ments were identified and discarded. These data were firstly
identified from cruises with more than 50 observations per
day and then re-checked on the SeaBASS website to confirm
whether indeed they were continuous underway measure
ments. A total of 148 015 such measurements were identified
and discarded. Given the large volume of this group of data,
it is possible that some chlorophyll a observations from in
situ methods may have escaped the scrutiny and made it into
the final merged dataset. In the future, the SeaBASS plans
to add ancillary information to the extractions, which will
enable users to distinguish the different types of chlorophyll
measurements. The compiled variables from SeaBASS data
were: rrs, chla_hplc, chla_fluor, aph, adg, bbp and kd. No
conversion was necessary since all variables were acquired
in the desired format.
2.2.5 NASA bio-Optical Marine Algorithm Data set
(NOMAD)
The NASA bio-Optical Marine Algorithm Data set (NO
MAD) is a publicly available dataset compiled by the NASA
OBPG at the Goddard Space Flight Center. It is a high-
quality global dataset of coincident radiometric and phyto
plankton pigment observations for use in ocean-colour algo
rithm development and satellite-data product-validation ac
tivities (Werdell and Bailey, 2005). The source bio-optical
data are the SeaBASS archive; therefore, many dependencies
exist between these two datasets, which were addressed dur
ing the merging. The current version (Version 2.0 ALPHA,
2008) includes data from 1991 to 2007 and an additional
set of observations of inherent optical properties. The cur
rent version was used in this work, but with an additional set
of columns of remote-sensing reflectance corrected for the
bidirectional effects (Morel and Gentili, 1996; Morel et al.,
2002). This additional set of columns was provided directly