GERMS is a dataset for active object recognition, consisting of recordings of 1365 give-and-take trials between different people and a humanoid robot. In each give and take trials, an object is handed to the robot, that receives the object and actively examines it. For each trial, this information is stored: frames from the robot's camera and the position of the robot’s servo motors. The statistics of the dataset is shown in table below.


Collection Number of tracks Frames in each track Annotated frames in each track
Train 816 265±7 157±12
Test 549 265±7 145±19


A sample give-and-take trial is shown in this video:







The object set we use for GERMS data collection consists of 136 stuffed toys of different microorganisms. The toys are divided into 7 smaller categories, formed by semantic division of the toy microbes. The motivation for dividing the objects into smaller categories is to provide benchmarks with different degrees of difficulty. Here is a collage of all objects used in GERMS dataset.



Collage of objects

Table of Germs (click for additional videos)




The annotation for train set is done manually and includes the bounding box of objects in the images. Train set annotations are available along with the dataset. For examples of annotations, see Benchmarks.



We define 16 benchmarks on the GERMS dataset, 2 for each of the 8 categories of objects. For the first benchmark, methods are allowed to use human-annotated object bounding boxes for both train and test sets. Methods should report the accuracy of label prediction for object in each category vs. the number of observed views. Here is the baseline for the first 8 benchmarks:



Label prediction accuracy using human-annotated bounding boxes for train and test sets


For the second benchmark, one is allowed to use human-annotations only for train data. These benchmarks are more challenging since the backgroud is non-simple. Here are the baseline results for these benchmarks:



Label prediction accuracy using human annotations only in train set


Here is a video of active object recognition. The left image is the original video stream, the middle image is the human annotated and the right image is the result of automatic annotation. Above each image, a histogram shows the degree of belief of the identity of the object in image. The correct object is shown with purple background.




How to obtain



The size of the dataset is more than 2.5 TB. If you are interested in obtaining the dataset, please send an email to: mmalmir@eng.ucsd.edu.






Please cite the dataset using the following:


Malmir M, Sikka K, Forster D, Movellan JR, Cottrell G. Deep Q-learning for Active Recognition of GERMS: Baseline performance on a standardized dataset for active learning. InBMVC 2015 (pp. 161-1). pdf





This work was supported by NSF grant IIS-0808767.