Distributed Collection of Eye Movement Data in Programming – Dataset

The EMIP dataset contains eye movement data during program comprehension.

The dataset consists of:

  1. data_anonymized folder with plain txt files with raw eye-tracking data
  2. emip_metadata.csv file with metadata
  3. pics folder with stimuli and XML descriptions of areas of interest

Apparatus:

The stimuli were presented to participants using a laptop PC. The screen resolution was 1920×1080 px. The data was recorded with an SMI RED250 mobile eye tracker (sampling rate 250 Hz). The experiment was built with SMI Experimental Suite Scientific Premium.

Participants:

The experiment involved 266 participants: 51 females (age M: 28.16, SD: 11.92) and 215 males (age M: 26.66, SD: 9.1). All subjects filled a questionnaire with the following items:

  • Age
  • Gender (female | male | other)
  • Mother tongue
  • English level (none | low | medium | high)
  • Overall programming expertise (none | low | medium | high)
  • Expertise in Java / Python / Scala (none | low | medium | high)
  • How long have you been programming (years)
  • How long have you been programming in Java / Python / Scala (years)
  • How often do use you any programming language other than Java / Python / Scala
    (not at all | less than 1 hour / month | less than 1 hour / week | less than 1 hour / day | more than 1 hour / day)
  • How often do you program in Java / Python / Scala
    (not at all | less than 1 hour / month | less than 1 hour / week | less than 1 hour / day | more than 1 hour / day)
  • Which programming languages do you know (Language – Level of expertise: low | medium | high)
  • Are you currently wearing glasses or contact lenses (no | glasses | contact lenses)
  • Are you currently wearing mascara or other eye-makeup (yes | no)

The answers are compiled in emip_metadata.csv.

Dataset

For each participant, the dataset contains a txt-file with raw data. The file header provides information like version of the IDF Converter used, number of samples, coordinates of the calibration points, and head distance. The lines of the header all start with “#”.

The data columns are separated by a regular comma (“,”), they include among others:

  • Time: timestamp of the sample
  • Type: Sample (SMP) | Message (MSG)
  • L Raw X [px]: horizontal pupil position – left eye
  • L Raw Y [px]: vertical pupil position – left eye
  • R Raw X [px]: horizontal pupil position – right eye
  • R Raw Y [px]: vertical pupil position – right eye
  • L Dia X [px]: pupil diameter on the X axis in pixels – left eye
  • L Dia Y [px]: pupil diameter on the Y axis in pixels – left eye
  • L Pupil Diameter [mm]: pupil diameter in mm – left eye
  • R Dia X [px]: pupil diameter X axis in pixels – right eye
  • R Dia Y [px]: pupil diameter Y axis in pixels – right eye
  • R Pupil Diameter [mm]: pupil diameter in mm – right eye
  • L CR1 X [px]: horizontal corneal reflex position. One or two CRs can be present – left eye
  • L CR1 Y [px]: vertical corneal reflex position. One or two CRs can be present – left eye
  • L CR2 X [px]: left eye horizontal corneal reflex position. One or two CRs can be present
  • L CR2 Y [px]: left eye vertical corneal reflex position. One or two CRs can be present
  • R CR1 X [px]: horizontal corneal reflex position. One or two CRs can be present – right eye
  • R CR1 Y [px]: vertical corneal reflex position. One or two CRs can be present – right eye
  • R CR2 X [px] : horizontal corneal reflex position. One or two CRs can be present – right eye
  • R CR2 Y [px] : vertical corneal reflex position. One or two CRs can be present – right eye
  • L POR X [px]: point-of regard in the X axis in pixels – left eye
  • L POR Y [px]: point-of regard in the Y axis in pixels – left eye
  • R POR X [px]: point-of regard in the X axis in pixels – right eye
  • R POR Y [px]: point-of regard in the Y axis in pixels – right eye
  • Timing: Quality values
  • L Validity: validity – left eye
  • R Validity: validity – right eye
  • Pupil Confidence
  • L Plane: plane number – left eye
  • R Plane: plane number – right eye
  • L EPOS X: left pupil position from the perspective of the subjective camera, in the X axis in pixels
  • L EPOS Y: left pupil position from the perspective of the subjective camera, in the Y axis in pixels
  • L EPOS Z: left pupil position from the perspective of the subjective camera, in the Z axis in pixels
  • R EPOS X: right pupil position from the perspective of the subjective camera, in the X axis in pixels
  • R EPOS Y: right pupil position from the perspective of the subjective camera, in the Y axis in pixels
  • R EPOS Z: right pupil position from the perspective of the subjective camera, in the Z axis in pixels
  • L GVEC X: left eye gaze vector, from the perspective of the left eye camera, X axis component
  • L GVEC Y: left eye gaze vector, from the perspective of the left eye camera, Y axis component
  • L GVEC Z: left eye gaze vector, from the perspective of the left eye camera, Z axis component
  • R GVEC X: right eye gaze vector, from the perspective of the left eye camera, X axis component
  • R GVEC Y: right eye gaze vector, from the perspective of the left eye camera, Y axis component
  • R GVEC Z: right eye gaze vector, from the perspective of the left eye camera, Z axis component

Each trial includes messages naming the stimulus screenshot, which can be found in the pics-folder and giving the coordinates of mouse clicks.

All data is anonymous. 15% of the data are not published.

Distributed Data Set can be used under the CC 4.0 license (https://creativecommons.org/licenses/by/4.0/)