accessibility__skip_menu__jump_to_main

Full text: Automatic detection of boulders by neural networks

Boulder detection | 
tions for mapping geogenic reefs (Heinicke et al., 
’n press), used to characterise geogenic reefs over 
‚arger areas. The agreement between the human 
axperts is calculated using the F, score of the re- 
sulting confusion matrix. An F, score of 1.0 indi 
zates perfect agreement, while the lowest value is 
I, when either precision or recall are 0. The Fi score 
’s calculated from the confusion matrix by Fi = 2 x 
(precision X recall) / (precision + recall). Values for 
aach class (no boulders, one to five boulders and 
more than five boulders) were averaged. 
2.4 Automatic boulder count 
2.4.1 Neural network 
Artifical neural networks are composed of series 
af interconnected layers of artificial neurons. In 
a trained neural network, input signals are trans- 
formed by changing weights at each connection, 
until the last layer of the network reports the re- 
sult of the computation. Convolutional neural 
networks are a subset of neural networks and 
were developed for image classification with over- 
whelming success. While the architecture of CNNs 
varies, all include a series of convolutional layers, 
that operate by convolving a small part, often 3 x 3 
pixels, of the underlying image (or the output of an 
earlier layer in the network) with weights initialised 
at random. This assumes that pixels in close vicin- 
ity are more likely to form patterns significant for 
the image context than those pixels with greater 
distance. The weights are adjusted during model 
training with annotated images to minimise a loss 
function. Loss functions compare the predictions 
af the neural network to the annotations. To allow 
ZNNSs learning non-linear features, activation func- 
tions change the output of layers in the network, 
while regular downsampling of the image size al 
'ows the network to learn features of larger scales. 
The automated boulder count was done using 
the YOLO (You Look Only Once) framework, de- 
veloped by Joseph Redmon (Redmon et al. 2015), 
with the current implementation available under a 
yermissive license on GitHub (https:/github.com/ 
AlexeyAB/darknet). Lary et al. (2016) and Schmid 
huber (2015) give a detailed description of convo- 
‚utional neural networks and their application for 
'mage interpretation. 
/he YOLO network was developed for object 
detection. To identify and locate different objects 
ıNn images is more complicated than the classifica- 
tion of entire images and requires a different net- 
work architecture. YOLO is a one-stage detector, 
mMeaning it analyses images in one pass (hence the 
abbreviation, You Only Look Once) while keeping 
nigh accuracy. One-stage detectors are a faster 
approach compared to other object detection 
frameworks that rely on multiple stages for object 
detection in images. The YOLO architecture is de- 
scribed by Bochkovskiy et al. (2020). In principle, it 
Uses a series of different convoalutional lavers (the 
A 
{19 — 06/2027 
oackbone and neck) to extract object features and 
divide the input image into grids at three different 
‚esolutions. For each grid cell at each resolution, 
t predicts the probability that the cell includes a 
aarned object within anchor boxes of predefined 
size. These probabilities and the corresponding 
20unding box coordinates are the output of the 
:rained model. YOLO networks are available in dif 
‘erent configurations of the backbone, of which 
we here utilise the standard configuration of YOLO 
version 4. 
2.4.2 Model training and application 
O create the training data sets, a human inter- 
»reter identihed bounding boxes of boulders in 
zraining areas in QGIS 3.16. Boulders were required 
to have a shadow. The boulders were exported as 
an SQLite database. The training database for the 
555 model includes 13,847 boulder instances. A 
model was trained on a data set with an empha 
sis on small boulders comprising only a few pixels 
his data set comprises 4,070 entries. The MBES 
vraining database was only started with the inves 
zigation site reported here (Fig. 2). It is not possible 
to use the same training data sets for MBES and 
555 models, since the position accuracy of the 
side-scan sonar is not good enough to co-locate 
“eatures of only a few pixels in size. Therefore, the 
VBES training data set comprises 2,654 instances 
af boulders (Fig. 2), with typical sizes of 3 x 3 to 
3 X 15 pixels including shadows. The training mo- 
saics were cut into small georeferenced images of 
54 X 64 pixels (corresponding to approximately 
16 MX 16 m in this study), overlapping by six pix 
als to minimise the number of training boulders 
that are cut by image boundaries. In the following, 
che pixel coordinates of the annotated examples 
were calculated and used as an input for training. 
3esides the annotated boulder examples, 182 ex 
amples of empty images (defined as containing no 
ao0ulders) were selected for the MBES data set and 
2,349 examples of empty images for the 5$S data 
set. 
For training, we used the YOLO network ver 
sion 4, in contrast to earlier case studies that used 
“he two-stage RetinaNet framework (Lin et al. 
2017). We adhered to suggestions published on 
*he project’s GitHub page and changed the de- 
“ault configuration of the YOLO network. There 
"ore, the maximum number of training batches 
was reduced to 6,000 for MBES models and 24,000 
or SSS models, the number of classes reduced to 
ne, and the filter number of the convolutional 
ayers before the object detection layers reduced 
:o 18. Images were magnified to 512 x 512 pixels 
pefore training. Random variations in hue, expo- 
sure and saturation applied to the image were re 
Juced from their standard settings to 0.1. The size 
of the input image was changed by 40 % every 
(en batches at random, and the size and aspect
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.