Chapter 1 Introduction

Our objectives in writing this guideline are to:

  1. Describe the steps needed to set up and process camera trap data using popular artificial intelligence (AI) platforms, including Wildlife Insights, MegaDetector, MLWIC2, and Conservation AI.

  2. Demonstrate common workflows for analyzing camera trap data using these platforms via a case study in which we process data collected by the lead author. The aim of the case study project is to develop a joint species distribution model integrating data from camera traps and acoustic sensors to understand interactions between wildlife species in multi-functional landscapes in Colombia.

Each chapter covers a different AI platform, and we provide appropriate links to instruction manuals and other resources for researchers looking for additional documentation. We describe the steps required to set up the platforms, upload pictures (e.g., required folder structure), and include and format metadata (e.g., geographical coordinates of locations, deployment dates, and other deployment information such as camera height, use of bait, etc.). We then provide guidance on how to use the artificial intelligence platforms for object detection (e.g., to separate blanks from non-blanks) and species classification.

Importantly, we also demonstrate methods for evaluating the performance of AI platforms. Before AI platforms can be evaluated, users will need to manually label a subset of images which can then be compared with AI output. This labeling can be done using a variety of available software (e.g., Scotson et al. 2017), but the resulting data should include, at minimum, the 1) image filename, 2) camera location and 3) species name. The first two variables (e.g., filename and location) are needed to match records from the human-labeled and AI-labeled data sets (hereafter human and computer vision, respectively), and the third variable will allow one to compare human and AI-generated labels.

Having a subset of labeled images will allow you to assess how a particular AI model is performing with your data set and determine appropriate use given its performance. We provide annotated R code and examples demonstrating how to compute model performance metrics estimated using categories described in Table 1.1 that classify correct and incorrect predictions.

Table 1.1: Notation and categories of classifications used to estimate model performance metrics.
NotationDescription
TP - True PositivesNumber of observations where the species was correctly identified as being present in the photo.
TN - True NegativesNumber of observations where the species was correctly identified as being absent in the photo.
FP - False PositivesNumber of observations where the species was absent, but the AI classified the species as being present.
FN - False NegativesNumber of observations where the species was present, but the AI classified the species as being absent.

Performance metrics include model accuracy, precision, recall and F1 score (Table 1.2; Sokolova and Lapalme 2009). To describe these metrics, we will refer to AI classifications as “predictions” and human vision classifications as “true classifications”. Accuracy is the proportion of correct AI predictions in the data set (Kuhn and Vaughan 2021), precision is the probability that the species is present given it is predicted to be present, and recall is the probability a species is predicted to be present given it is truly present; F1 score is a weighted average of precision and recall (Table 1.2). When inspecting model performance, it can be useful to calculate these metrics separately for each species.

`

Table 1.2: Metrics used to assess model performance
MetricsEquationInterpretation
Accuracy(TP+TN)/(TP+FP+TN+FN)Proportion of correct predictions in a data set.
PrecisionTP/(TP+FP)Probability the species is correctly classified as present given that the AI system classified it as present.
RecallTP/(TP+FN)Probability the species is correctly classified as present given that the species truly is present.
F1 Score2*precision*recall / (precision + recall)Weighted average of precision and recall.

AI platforms typically assign a confidence level to each classification, with higher values reflective of more certain classifications. These confidence levels can be used to post-process the data in a way that trades off precision and recall. For example, one can choose to only accept classifications that have a high level of confidence. Doing so will typically reduce the number of false positives, leading to high levels of precision (i.e., users can be more confident that the species is truly present when AI returns a species classification). The number of true positives, and thus recall, may also be reduced but hopefully to a lesser extent.

References

Kuhn, Max, and Davis Vaughan. 2021. Yardstick: Tidy Characterizations of Model Performance. https://CRAN.R-project.org/package=yardstick.

Scotson, Lorraine, Lisa R Johnston, Fabiola Iannarilli, Oliver R Wearn, Jayasilan Mohd-Azlan, Wai Ming Wong, Thomas NE Gray, et al. 2017. “Best Practices and Software for the Management and Sharing of Camera Trap Data for Small and Large Scales Studies.” Remote Sensing in Ecology and Conservation 3 (3). Wiley Online Library: 158–72.

Sokolova, Marina, and Guy Lapalme. 2009. “A Systematic Analysis of Performance Measures for Classification Tasks.” Journal Article. Information Processing & Management 45 (4): 427–37. https://doi.org/https://doi.org/10.1016/j.ipm.2009.03.002.