The EMIP dataset

Distributed Eye-Movement Data Collection – The Dataset.

Eye-movement data were collected during source code comprehension. The data collection run in 8 countries and 4 continents, 10 labs, cooperated to collect this data.

 

program1

The data set consists of:

  1. data_anonymized folder with plain txt files with raw eye-tracking data.
  2. emip_metadata.csv file with metadata.
  3. pics folder with stimuli and XML descriptions of areas of interest.

Apparatus:

The stimuli were presented to the subjects using a desktop PC. The screen resolution was 1920×1080 px. The oculomotor activity was registered by a mobile eye-tracker SMI RED250 mobile (sampling rate 250 Hz). The experiment was built in the Experimental Suite Scientific Premium environment.

Participants:

The experiment involved 266 participants: 51 females (age M: 28.16, SD: 11.92) and 215 males (age M: 26.66, SD: 9.1). All subjects filled the background forms with following fields:

Age (M: 26.86; SD: 9.822)

Gender (female | male)

Mother tongue

Proficiency in English (medium | high | low) , correspondingly 91 | 134 | 4

Overall programming expertise (medium | high | low) , correspondingly 126 | 42 | 50

How long has participant been programming (M: 5.994;  SD: 8.11)

How often does participant program in languages other than experiment language (Not at all | Less than 1 hour/month | Less than 1 hour/week | Less than 1 hour/day)

Other programming languages participant is proficient in (including level of expertise): M: 1.729; SD: 1.59

Visual aids (contact lenses | glasses | no)

Eye make-up

Participants answered three questions related to their proficiency of the selected programming language (Java|Python|Scala):

Programming expertise in experiment language (Java | Python | Scala)

How long has the participant been programming in experiment language

How often does participant program in experiment language (Not at all | Less than 1 hour/month | Less than 1 hour/week | Less than 1 hour/day)

The answers are collated in emip_metadata.csv file.

Procedure:

At the beginning of the experiment, participants saw the calibration instructions on the screen: “Next, the eye tracker will calibrate. Please follow the points with your eyes”. After the calibration procedure the first question with instructions was presented: “Please read and comprehend the following source code. When you are done, press space.” Then the source-code in Java, Scala or Python was displayed. Source-code was presented in a black and white mode, without highlighting or other text decoration. An example of the screen with stimuli is in: 2b2f533c65d96a81057ce17643b5c544_1920x1080.jpg .

After the source-code comprehension participants had to answer the question about the program outcome. Answer choices were presented as a list on a screen.

 

Second question had the following instructions: “Please read and comprehend the following source code. When you are done, press space. Then you will be given a MULTIPLE CHOICE question about the code.” An example of the screen with a second stimuli is shown in: 8a3893d3fab072a0a78db1788cc3afdc_1920x1080.jpg .

Half of the participants in each lab-session started with Question 1 and another half started with Question 2 to prevent the sequence effect.

The Java source-codes for tasks:

Source Code 1

public class Rectangle {

private int x1 , y1 , x2 , y2 ;

public Rectangle ( int x1 , int y1 , int x2 , int y2 ) {

this.x1 = x1 ; this.y1 = y1 ; this.x2 = x2 ; this.y2 = y2 ;

}

public int width ( ) { return this.x2 - this.x1 ; }

public int height ( ) { return this.y2 - this.y1 ; }

public double area ( ) { return this.width ( ) * this.height ( ) ; } public static void main ( String [ ] args ) {

Rectangle rect1 = new Rectangle ( 0 , 0 , 10 , 10 ) ; System.out.println ( rect1.area ( ) ) ;

Rectangle rect2 = new Rectangle ( 5 , 5 , 10 , 10 ) ; System.out.println ( rect2.area ( ) ) ;

} }

Comprehension Question 1

The program

  • … computes the area of rectangles by multiplying their width (x1-x2) and height (y1-y2).
  • … computes the area of rectangles by multiplying their width (x2-x1) and height (y2-y1).
  • … computes the area of rectangles by multiplying their width (x1-y1) and height (x2-y2).
  • I’m not sure.

Source Code 2

public class Vehicle {

String producer , type ;

int topSpeed , currentSpeed ;

public Vehicle ( String p , String t , int tp ) {

this.producer = p ; this.type = t ; this.topSpeed = tp ; this.currentSpeed = 0 ;

}

public int accelerate ( int kmh ) {

if ( ( this.currentSpeed + kmh ) > this.topSpeed ) { this.currentSpeed = this.topSpeed ;

} else {

this.currentSpeed = this.currentSpeed + kmh ;

}

return this.currentSpeed ; }

public static void main ( String args [ ] ) {

Vehicle v = new Vehicle ( "Audi" , "A6" , 200 ) ; v.accelerate ( 10 ) ;

} }

Comprehension Question 2

The program

  • … defines a vehicle by a producer, that has a type and can reduce its speed.
  • … defines a vehicle by a producer, that has a type and can accelerate its speed.
  • … defines a vehicle by a producer, that has a type and can accelerate and reduce the speed.
  • …. I’m not sure.

Dataset

We provide the raw data collected. For each participant we provide a TXT file with following descriptors:

## [iView]

## Converted from: ###

## Date: ###

## Version:      IDF Converter 3.0.20

## IDF Version:          9

## Sample Rate:          250

## Separator Type:     Unknown

## Trial Count:           1

## Uses Plane File:     False

## Number of Samples:          53233

## Reversed:   none

## [Run]

## Subject: 1

## Description:

## [Calibration]

## Calibration Area:    1920    1080

## Calibration Point 0:            Position(960;540)

## Calibration Point 1:            Position(480;54)

## Calibration Point 2:            Position(1824;270)

## Calibration Point 3:            Position(1440;1026)

## Calibration Point 4:            Position(96;810)

## [Geometry]

## Stimulus Dimension [mm]:            344      194

## Head Distance [mm]:         700

## [Hardware Setup]

## System ID: ###

## Operating System :            6.2.9200

## iView X Version:   4.2.0.0

## [Filter Settings]

## Heuristic:   False

## Heuristic Stage:      0

## Bilateral:     True

## Gaze Cursor Filter:            False

## Saccade Length [px]:          0

## Filter Depth [ms]:  0

Then goes comma-separated value file (separator=”,”, comments=”#”, with headers). This is the raw eye measures time series as exported using BeGaze software. It has the following fields:

 

Time : timestamp of the sample, in microseconds (number)

Type : protocol used to detect the fixations ( SMP )

Trial : trial of the eye-tracking recording (number, 1)

L Raw X[px] : left eye horizontal pupil position.

L Raw Y[px] : left eye vertical pupil position.

R Raw X[px] : right eye horizontal pupil position.

R Raw Y[px] : right eye vertical pupil position.

L Dia X [px] : left-eye pupil diameter X axis in pixels (number)

L Dia Y [px] : left-eye pupil diameter Y axis in pixels (number)

L Pupil Diameter [mm] : left-eye pupil diameter in mm (number)

R Dia X [px] : right-eye pupil diameter X axis in pixels (number)

R Dia Y [px] : right-eye pupil diameter Y axis in pixels (number)

R Pupil Diameter [mm] : right-eye pupil diameter in mm (number)

L CR1 X[px] : left eye horizontal corneal reflex position. One or two CRs can be present

L CR1 Y[px] : left eye vertical corneal reflex position. One or two CRs can be present

L CR2 X[px] : left eye horizontal corneal reflex position. One or two CRs can be present

L CR2 Y[px] : left eye vertical corneal reflex position. One or two CRs can be present

R CR1 X[px] : right eye horizontal corneal reflex position. One or two CRs can be present

R CR1 Y[px] : right eye vertical corneal reflex position. One or two CRs can be present

R CR2 X[px] : right eye horizontal corneal reflex position. One or two CRs can be present

R CR2 Y[px] : right eye vertical corneal reflex position. One or two CRs can be present

L POR X [px] : left eye point-of regard in the X axis, in pixels (number)

L POR Y [px] : left eye point-of regard in the Y axis, in pixels (number)

R POR X [px] : right eye point-of regard in the X axis, in pixels (number)

R POR Y [px] : right eye point-of regard in the Y axis, in pixels (number)

Timing : Quality values

L Validity : left eye validity

R Validity : right eye validity

Pupil Confidence : Pupil Confidence

L Plane : left eye plane number

R Plane : right eye plane number

L EPOS X : left pupil position from the perspective of the subjective camera, in the X axis, in pixels (number)

L EPOS Y : left pupil position from the perspective of the subjective camera, in the Y axis, in pixels (number)

L EPOS Z : left pupil position from the perspective of the subjective camera, in the Z axis, in pixels (number)

R EPOS X : right pupil position from the perspective of the subjective camera, in the X axis, in pixels (number)

R EPOS Y : right pupil position from the perspective of the subjective camera, in the X axis, in pixels (number)

R EPOS Z : right pupil position from the perspective of the subjective camera, in the X axis, in pixels (number)

L GVEC X : left eye gaze vector, from the perspective of the left eye camera, X axis component (number)

L GVEC Y : left eye gaze vector, from the perspective of the left eye camera, Y axis component (number)

L GVEC Z : left eye gaze vector, from the perspective of the left eye camera, Z axis component (number)

R GVEC X : right eye gaze vector, from the perspective of the left eye camera, X axis component (number)

R GVEC Y : right eye gaze vector, from the perspective of the left eye camera, Y axis component (number)

R GVEC Z : right eye gaze vector, from the perspective of the left eye camera, Z axis component (number)

Frame : empty field

Aux1 : empty field

 

The file with raw eye-tracking data has the data from every screen, including questions and instructions. Each trial is separated by the message starting with MSG. For example, 945553445   MSG   1          # Message: b0c9d887c1cd28c005d204cf287878e2_1920x1080.jpg means that at this moment the corresponding picture were shown on the screen. The pictures with source-codes are located in the pic folder of the data set. To match the eye-tracking file with metadata, users should the file name with eye-tracking data and and sid_anon column from sid_anon column from emip_metadata.csv. For example “3_gaze.txt” file corresponds with key 3 at sid_anon column from emip_metadata.csv

We created automatically the description XML files with the structure of the Area Of Interest (AOI). AOIs were created with four levels of abstractions: the level of the character, word, sentence and paragraph. To map the stimuli picture and XML AOI file with metadata use corresponding columns at emip_metadata.csv: 1stTaskCode, 1stTaskFile, 2ndTaskCode, 2ndTaskFile.

All data is anonymous. 15% of the data are not published.

Participating Labs

  • The Physical Structure of Perception and Computation Group, University of Genova, Italy
  • Software Engineering Research and Empirical Studies Lab, Youngstown State University, USA
  • Faculty of Informatics and Information Technologies, Slovak University of Technology, Slovakia
  • School of Mathematics and Computer Science of the Netanya Academic College, Netanya, Israel
  • Centre for Human Centred Technology Design, University of Technology Sydney, Australia
  • Department of Computer Science, Aalto University, Finland
  • Department of Computer Science, University of Helsinki, Finland
  • Information & Computer Sciences, University of Hawai’i at Manoa, USA
  • Neuroinformatics, Bielefeld University, Germany
  • Vision and Eye Research Unit, Anglia Ruskin University, Great Britain

Distributed Data Set can be used under the CC 4.0 license (https://creativecommons.org/licenses/by/4.0/)