Example FR1: Custom file reader loading particle properties and the simulation cell

In this example we will implement a reader component for a custom file format using OVITO’s custom file reader API.

A few key assumptions and provisions before we start writing a file reader for our custom file format:

  • The file format is easily recognizable from the characteristic file suffix *.ExampleFormat.

  • We will use the Timestep tag as the delimiter separating successive trajectory frames in the file.

  • The dimensions of the simulation cell may change in every frame. We assume the simulations always use 3D periodic boundary conditions.

  • The total number of atoms and their types may change during a simulation. Notice how they do in the third trajectory frame below.

  • Each line starts with the particle type of the atom encoded as the chemical symbol.

  • Particle positions are given in reduced cell coordinates (preceded by the line Reduced coordinates).

Timestep 0
7.865954 0.000000 0.000000
0.000000 7.865954 0.000000
0.000000 0.000000 6.973035
Atoms 18
Reduced coordinates
Na 0.09306390 0.30261238 0.46908935
Na 0.90693610 0.69738762 0.46908938
Na 0.30261235 0.90693613 0.53091062
Na 0.69738763 0.09306392 0.53091062
Na 0.40693612 0.80261241 0.03091062
Na 0.59306393 0.19738766 0.03091059
Na 0.19738763 0.40693613 0.96908938
Na 0.80261236 0.59306396 0.96908938
S 0.24180739 0.09397526 0.19064404
S 0.75819276 0.90602458 0.19064394
S 0.09397538 0.75819274 0.80935606
S 0.90602453 0.24180732 0.80935606
S 0.25819273 0.59397545 0.30935596
S 0.74180726 0.40602446 0.30935606
S 0.40602443 0.25819271 0.69064391
S 0.59397545 0.74180729 0.69064404
Sn 0.00000000 0.00000000 0.00000000
Sn 0.50000000 0.50000000 0.49999997
Timestep 100
7.870939 0.000000 0.000000
0.000000 7.870939 0.000000
0.000000 0.000000 6.977331
Atoms 18
Reduced coordinates
Na 0.09309428 0.30260112 0.46914038
Na 0.90690570 0.69739888 0.46914042
Na 0.30260109 0.90690574 0.53085958
Na 0.69739888 0.09309430 0.53085958
Na 0.40690573 0.80260115 0.03085957
Na 0.59309430 0.19739893 0.03085954
Na 0.19739889 0.40690574 0.96914042
Na 0.80260110 0.59309434 0.96914043
S 0.24179056 0.09398270 0.19066976
S 0.75820959 0.90601718 0.19066973
S 0.09398279 0.75820959 0.80933034
S 0.90601713 0.24179048 0.80933037
S 0.25820958 0.59398272 0.30933030
S 0.74179038 0.40601712 0.30933031
S 0.40601707 0.25820957 0.69066957
S 0.59398288 0.74179044 0.69066971
Sn 0.00000002 0.00000002 -0.00000002
Sn 0.49999998 0.50000000 0.49999997
Timestep 200
7.878383 0.000000 0.000000
0.000000 7.878383 0.000000
0.000000 0.000000 6.982911
Atoms 19
Reduced coordinates
Na 0.09309963 0.30259914 0.46914936
Na 0.90690035 0.69740086 0.46914940
Na 0.30259910 0.90690039 0.53085060
Na 0.69740086 0.09309965 0.53085059
Na 0.40690038 0.80259917 0.03085059
Na 0.59309965 0.19740091 0.03085055
Na 0.19740087 0.40690039 0.96914940
Na 0.80259911 0.59309969 0.96914941
S 0.24178760 0.09398401 0.19067428
S 0.75821255 0.90601588 0.19067427
S 0.09398409 0.75821256 0.80932582
S 0.90601583 0.24178752 0.80932585
S 0.25821254 0.59398400 0.30932578
S 0.74178741 0.40601583 0.30932578
S 0.40601578 0.25821253 0.69067409
S 0.59398418 0.74178748 0.69067423
Sn 0.00000002 0.00000002 -0.00000002
Sn 0.49999997 0.50000000 0.49999997
Fe 0.32000002 0.41690039 0.42785059

The detect method

For simplicity we will identify this file type by its name. That means, if the file extension is .ExampleFormat, the file reader will attempt to parse it. Extracting and validating the file extension is done in line 10:

 1from ovito.data import DataCollection
 2from ovito.io import FileReaderInterface
 3from typing import Callable, Any
 4import os
 5
 6class ExampleFileFormatReader(FileReaderInterface):
 7
 8    @staticmethod
 9    def detect(filename: str):
10        return os.path.splitext(filename)[1] == ".ExampleFormat"
11
12    def scan(self, filename: str, register_frame: Callable[..., None]):
13        ...
14
15    def parse(self, data: DataCollection, filename: str, frame_info: Any, **kwargs: Any):
16        ...

The scan method

Frames are delimited by the phrase Timestep N, where N is the simulation timestep at that frame. We will go through the file and track the line number of this tag. This line will also be used as a human readable label for each frame.

Within the scan method, the whole file is read in a for loop (lines 13-14). The following lines 15-16 process each frame, here, each time the Timestep tag is encountered, the current line number and frame label are registered with OVITO using the register_frame callback. The line number will be used to find a specific frame, while the string is be used as a human readable label in the GUI.

 1from ovito.data import DataCollection
 2from ovito.io import FileReaderInterface
 3from typing import Callable, Any
 4import os
 5
 6class ExampleFileFormatReader(FileReaderInterface):
 7
 8    @staticmethod
 9    def detect(filename: str):
10        return os.path.splitext(filename)[1] == ".ExampleFormat"
11
12    def scan(self, filename: str, register_frame: Callable[..., None]):
13        with open(filename, "r") as f:
14            for line_number, line in enumerate(f):
15                if line.startswith("Timestep "):
16                    register_frame(frame_info=line_number, label=line.strip())
17
18    def parse(self, data: DataCollection, filename: str, frame_info: Any, **kwargs: Any):
19        ...

The parse method

The parse method has multiple tasks. The goal is to set the SimulationCell, particle positions, and ParticleType. The current timestep will also be stored as attributes for easier access inside the OVITO application window.

First we skip thorugh the file until we reach the requested frame (lines 21-23). The starting line of the frame is can be obtained from the frame_info object set in the scan method.

The next line in the file now contains the Timestep information which we would like to save as an OVITO attribute. The information is read in line 26 and stored in the attributes dictionary in line 27. This dictionary can be filled with custom fields which will all be accessible to the user from within OVITO.

In line 30 an “empty” simulation SimulationCell with all cell vectors set to 0 is created and its periodic boundary conditions are set. The cell vectors are subsequently read form the input file and placed in the simulation cell matrix (code lines 32-33).

The next line after the simulation cell block contains the number of atoms stored in the file for the current frame. It is read in line 36. This information is use to create the Particles PropertyContainer (line 38). Once this container is constructed it can be filled with particle properties. In this case we want to read the particle type and the position from the file. These properties are defined in lines 40 and 42, respectively.

After skipping the reduced coordinates line we read the next N lines from the file, where N is the number of particles (lines 48-53). Each line contains the data for one particle.The first column contains the chemical symbol which can be automatically converted into OVITO particle types using the type_property.add_type_name(tokens[0], particles).id method, where tokens[0] is the chemical symbol. After parsing, all Na atoms will have the numeric type 1, all S atoms will have type 2, and so on. The result is stored in the type_property property array at the correct index. Lastly, positions for each particle are read from the file and converted into cartesian coordiantes by multiplication with the simulation cell matrix. The resulting position vector is stored in the type_property array.

 1from ovito.data import DataCollection
 2from ovito.io import FileReaderInterface
 3from typing import Callable, Any
 4import os
 5
 6class ExampleFileFormatReader(FileReaderInterface):
 7
 8    @staticmethod
 9    def detect(filename: str):
10        return os.path.splitext(filename)[1] == ".ExampleFormat"
11
12    def scan(self, filename: str, register_frame: Callable[..., None]):
13        with open(filename, "r") as f:
14            for line_number, line in enumerate(f):
15                if line.startswith("Timestep"):
16                    register_frame(frame_info=line_number, label=line.strip())
17
18    def parse(self, data: DataCollection, filename: str, frame_info: int, **kwargs: Any):
19        with open(filename, "r") as f:
20            # Seek to the beginning of the requested trajectory frame
21            starting_line_number = frame_info
22            for _ in range(starting_line_number):
23                f.readline()
24
25            # Parse the timestep number as a global attribute
26            timestep = int(f.readline().split()[1])
27            data.attributes["Timestep"] = timestep
28
29            # Create 3D periodic cell
30            cell = data.create_cell([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]], pbc=[True, True, True])
31            # Parse cell vectors
32            for i in range(3):
33                cell[:, i] = [float(c) for c in f.readline().split()]
34
35            # Parse particle count
36            particle_count = int(f.readline().split()[1])
37            # Create new particles container object
38            particles = data.create_particles(count=particle_count)
39            # Create particles type property
40            type_property = particles.create_property("Particle Type")
41            # Create position property
42            position_property = particles.create_property("Position")
43
44            # Skip "Reduced coordinates" line
45            f.readline()
46
47            # Parse particle types and coordinates
48            for i in range(particle_count):
49                tokens = f.readline().split()
50                # Register named particle type and get its numeric ID assigned by OVITO
51                type_property[i] = type_property.add_type_name(tokens[0], particles).id
52                # Convert reduced to cartesian particle coordinates
53                position_property[i] = cell @ ([float(c) for c in tokens[1:]] + [1.0])

A note on performance

This custom file reader exhibits two potential performance issues:

  1. For long trajectories, seeking to a requested frame by reading all lines preceding that frame will become slower and slower for later frames.

  2. For simulations containing many atoms, performing many property assignments such as type_property[i] = ... can become a bottleneck.

These issues will be addressed in the next Example FR2: Optimizing the performance of Example FR1.