Manipulating data

This chapter discusses how the contents of a DataCollection can be modified or amended from Python code, e.g., how you can manipulate the values of particle properties or give particles additional properties. Please make sure to read the preceding section first, which describes how the data structures in a DataCollection container are organized.

Data ownership

Announcing object modification

Assigning new particle properties

Assigning new bond properties

Modifying property values

Adding global attributes

Modifying the simulation cell

In-place application of modifiers

Data ownership

OVITO’s data pipeline system tries to avoid unnecessary and expensive data copies as far as possible. In order to achieve this goal, data objects follow a shared ownership model. That means they may be part of more than one DataCollection at the same time. For instance, when you duplicate an existing DataCollection using its clone() method, you will obtain a second data collection containing the exact same data objects as the original collection:

data2 = data.clone()
assert(data2.particles is data.particles)

The assertion statement above shows that the Particles object in the original data collection and in the cloned collection are physically one and the same Python object. In other words, the Particles data object has not been copied but rather just been inserted into the second data collection too. It is now owned simultaneously by both data collections.

This type of “shallow” copying is used by OVITO’s data pipeline system in many situations. For instance, the FileSource of a pipeline always keeps the original DataCollection that was loaded from the input file in memory. (You can access it through the FileSource.data field.) This unmodified master copy of the input data enables quick re-evaluations of the data pipeline in case a modifier is being changed, without needing to reload the input data again from disk. Modifiers in the pipeline always operate on a copy of the original DataCollection, and it is the final copy that the Pipeline.compute() method returns to the caller. However, it would be inefficient for the system to fully copy the master DataCollection, including all its contents, because modifiers typically change just a small portion of a dataset at a time, or they just amend the dataset with some newly computed information without altering any of the existing data objects at all.

This is where so-called shallow object copies come into play. It is okay to share data objects between more than one data collection as long as these data objects are not modified. In case you intend to modify a particular data object, for example, when implementing a user-defined modifier function, you first have to make sure that the data object is exclusively owned by just one DataCollection. Otherwise, unwanted side effects would occur because changing the data object in your data collection would also affect the contents of another data collection owned by someone else.

Note

Data objects can be part of more than one data collection (or another type of container) at a time. Then only read access to a shared data object is allowed because object modifications would result in unexpected side effects.

Announcing object modification

OVITO’s Python programming interface has safeguards in place that prevent you from accidentally modifying data objects that are shared with some other part of the program. Consider the following attempt to change the boundary condition flags of a SimulationCell object in a data collection returned by the pipeline system:

>>> data = pipeline.compute()
>>> data.cell.pbc = (True, True, False)
RuntimeError: You tried to modify a SimulationCell object that is currently shared by
multiple owners. Please explicitly request a mutable version of the data object by
using the '_' notation.

The attempt to assign a new value to the cell’s pbc field raised an error, because OVITO detects that the SimulationCell object is not only part of the data collection returned by Pipeline.compute() but also of another internal data collection owned by the pipeline. Thus, modifying the simulation cell object is not valid because it would cause side effects on the internal state of OVITO.

The solution is to first make a unique copy of the SimulationCell object before modifying it. The programming interface provides a handy shortcut notation for this:

>>> data.cell_.pbc = (True, True, False)

The underscore suffix appended to the cell field signals the DataCollection that you intend to modify the simulation cell object. Behind the scenes, the data collection will check whether the cell object is shared with multiple owners. If so, it will create an actual copy of the data object and replace the original reference in this data collection. Thus, it ensures that the returned SimulationCell object is exclusively owned by your data collection only, making it safe for you to modify it.

Note

Before modifying a data object that is referenced by multiple owners (i.e., that is part of more than one parent container), the child object must be replaced with an exclusively owned copy of the original, which is safe to modify. OVITO’s Python interface provides the underscore notation to perform this copy-and-replacement step if needed. You can consider the underscore suffix as a means to signal your intention to modify the object.

Assigning new particle properties

New per-particle properties are assigned to a particle system using the PropertyContainer.create_property() method. In addition to the name of the property, this method accepts an optional NumPy array as argument for initializing the per-particle values of the new property:

color_values = numpy.random.random_sample(size=(data.particles.count, 3))
data.particles_.create_property('Color', data=color_values)

Note that we used the underscore version of the DataCollection.particles_ field here in order to request a modifiable version of the Particles container object. This is necessary because create_property() adds a new Property object to the particles container object, which may be implicitly shared by multiple data collections (see previous section).

If the particle property with the given name already exists in the Particles container, then its contents will be overwritten with the per-particle array provided in the NumPy array. For details, please see the documentation of the create_property() method.

Assigning new bond properties

Like in the case of particles, new properties can also be assigned to bonds using the create_property() method. Bond properties are stored in the Bonds container object, which is owned by the parent Particles object. Thus, adding a new bond property represents a change to the nested Bonds object as well as to the parent Particles object. Therefore we need to request modifiable versions of both objects in this case (particles_.bonds_.):

color_values = numpy.random.random_sample(size = (data.particles.bonds.count, 3))
data.particles_.bonds_.create_property('Color', data=color_values)

Modifying property values

Manipulating the values of existing particle or bond properties is as easy as changing values in a NumPy array; just make sure you use the underscore notation introduced above to make the property array modifiable:

# Move the first particle to new XYZ coordinates:
data.particles_.positions_[0] = (10.0, 8.0, 4.5)

# Displace the first particle by 2 length units along the x-axis:
data.particles_.positions_[0] += (2.0, 0, 0)

# Displace all particles by 1 length unit along the z-axis:
data.particles_.positions_[...] += (0, 0, 1.0)

You can also make use of NumPy’s advanced indexing capabilities to modify just a subset of the elements:

# Displace only selected particles, i.e. those whose 'Selection' property is non-zero:
data.particles_.positions_[data.particles.selection != 0] += (0, 0, 1)

# Alternatively, modify just the z-component of each position vector:
data.particles_.positions_[data.particles.selection != 0, 2] += 1

For more information, please see also the documentation of the Property class.

Adding global attributes

Global attributes are primitive values (numeric values/text strings) associated with a DataCollection as a whole, for instance, the simulation time of the current snapshot or the name of the input file from which the dataset was originally loaded. Furthermore, some analysis modifiers output their computation results as new global attributes and add them to the DataCollection. Global attributes are stored within the attributes Python dictionary of the DataCollection. Thus, adding, removing, or changing attributes is as simple as modifying a Python dictionary, e.g.:

data.attributes['dislocation_density'] =
    data.attributes['DislocationAnalysis.total_line_length'] / data.cell.volume

This code adds the new attribute dislocation_density to the DataCollection, whose value is calculated from the ratio of the total dislocation line length in a crystal (which has previously been computed by a DislocationAnalysisModifier in this example) and the simulation cell’s volume.

Note

When modifying the attributes dictionary, the underscore notation is not needed, because the attributes dictionary is not an object that is ever shared between more than one data collection.

Modifying the simulation cell

The following code demonstrates how to set the three vectors spanning the SimulationCell by modifying the individual columns of the 3x4 cell matrix:

data.cell_[:,0] = (3.15, 0, 0)  # Cell vector 'a'
data.cell_[:,1] = (0, 3.15, 0)  # Cell vector 'b'
data.cell_[:,2] = (0, 0, 3.30)  # Cell vector 'c'
data.cell_[:,3] = (0, 0, 0)     # Cell origin

Note that this will only change the dimensions of the simulation cell without affecting the contents of the cell, e.g., the particle coordinates. If you want to rescale the cell together with the contents, consider using AffineTransformationModifier.

In-place application of modifiers

The built-in modifiers from the ovito.modifiers module are normally employed within a Pipeline, which produces a new DataCollection upon calling its compute() method. But it is also possible to directly apply a modifier to a trajectory snapshot and let it modify the data in-place. This is done by invoking the apply() method on a DataCollection:

threshold = -3.6
data.apply(ExpressionSelectionModifier(expression = f"PotentialEnergy < {threshold}"))
data.apply(DeleteSelectedModifier())

In this example, the two ad-hoc modifiers act on the data stored in the collection. This type of direct modifier application to an existing data collection can, in some situations, be easier than building and evaluating an entire pipeline containing the modifier(s). However, it only works for one trajectory snapshot at a time.

An typical use case for the DataCollection.apply() method is in Python modifier functions, where it allows you to invoke a built-in modifier of OVITO as a sub-operation.