Forum Navigation
You need to log in to create posts and topics.

Parallel processing traj files

Hello,

I am interested to pre-process a massive trajectory file. How do I leverage parallel computing power using ovito?

My current scripts is a serial as follows:

def read_lammps_dump(fdump, nequil=None, iframes=None):

  pl = import_file(fdump) # pipe line

  nframe = pl.source.num_frames

  traj = []

  if iframes is None:

    iframes = range(nframe)

  if nequil is not None:

    iframes = iframes[nequil:]

  for iframe in iframes:

    dc = pl.compute(iframe) # data collection

    atoms = dc.to_ase_atoms()

    atoms.info['Timestep'] = dc.attributes['Timestep']

    keys_to_delete = [key for key in atoms.arrays.keys()

      if key not in ['numbers', 'positions']]

    for key in keys_to_delete:

      del atoms.arrays[key]

    # any results to add?

    results = get_particle_results(dc)

    if 'Charge' in results:  # add charges

      atoms.set_initial_charges(results['Charge'])

    add_results = {}

    lmp_names = ['Force', 'Dipole Orientation']

    ase_names = ['forces', 'dipole']

    for lmp_name, ase_name in zip(lmp_names, ase_names):

      if lmp_name in results:

        add_results[ase_name] = results[lmp_name]

    traj.append(atoms)

    if len(add_results) > 0:

      from ase.calculators.singlepoint import SinglePointCalculator

      calc = SinglePointCalculator(atoms)

      calc.results.update(add_results)

      atoms.set_calculator(calc)

  return traj

To read frames from trajectory, is there a ways that I can split for loops on different cores and all run in parallel with little modifications in the existing custom code? (I have pasted just reading part of my code)

My trajectory file is of 0.5 TB

 

Abhi

 

Hi Abhi,

This is not an easy question. I have only little personal experience in this area, and I am still thinking about possible advice I could give you in this situation.

Perhaps the most important question I would have to you is whether your input trajectory is stored as one big dump file or as a series of smaller files, one for each frame? Loading a big trajectory dump file using import_file(...) incurs an extra cost, because OVITO has to scan the entire file from beginning to end once to determine the number of frames and their bytes offsets.

Maybe you can answer that question first and then we can start discussing possible parallelization strategies.

-Alex

Alex,

 

Yeah, its a single big file.

Alex,

Do you have anything to add or start discussion? Is there any way to implement batch_size if user give the input i.e. how many number of frames are in the dump file. So ovito reader will not go and start scan over all snapshots in dump, rather just reading {batch_size} snapshots.

 

Abhi

Abhi,

The import_file() function will always scan the entire trajectory file, I do no really see a good way around that. That makes attempts of splitting the processing into several partial ranges problematic, because this would probably waste time as each independent job has to make a separate call to import_file().

Things would be easier if your trajectory wasn't one big file. When every simulation frame is stored in a separate dump file, then OVITO can perform random accesses to the frames without scanning the entire trajectory first. In such a scenario, one can simply launch multiple instances of the Python script, each processing just a sub-range of the entire trajectory.

I don't know if this viable (your file is quite large), but perhaps you can split the trajectory file into partial files. I believe there are simple utilities available out there, which can perform the splitting of LAMMPS dump files quite efficiently.

I as the developer of OVITO can try to address this limitation in a future version of the software. For example, it would be possible to have the import_file() function cache the byte offsets of the discovered trajectory frames in some sort of index file. Then, a single scanning pass would be sufficient, and subsequent calls to import_file() with the same trajectory file would be much faster.

-Alex

Okay, how it would be if I provide a single snapshot, instead of single dump file?

Using attached code. How will be the code from very first post get change?

Abhi

Uploaded files:

Abhi,

Is your idea to map the entire trajectory file to memory and then pass a sub-range of the memory buffer to OVITO? I'm afraid this approach won't work. The current implementation of the import_file() function can only deal with files stored in the filesystem. It does not accept in-memory buffers as data sources.

I am not a Unix/Linux expert, but perhaps it is possible to write the frame data to something that looks like a file to OVITO? import_file() expects a filesystem path. I am thinking of something like a RAM disk, which involves only memory transfers but no actual I/O to the hard drive.

Furthermore, would you mind explaining to us what you are after? What is the big picture? I'm not sure if I fully understand the goal of the script that you posted initially. Are you using OVITO's import_file() function just for parsing the LAMMPS trajectory file and then hand the atomic dataset over to ASE? Why is OVITO needed for this in the first place?

Thanks.

新的OVITO微信频道!
New for our users in China: OVITO on WeChat 

Official OVITO WeChat channel operated by Foshan Diesi Technology Co., Ltd.