Nov 132017
 

The CRYSTALS v6720 installer is now available.

This update uses a new 64-bit Fortran compiler, fixes a number of bugs and eliminates a few mysterious crashes. Please report problems. The version number now refers to a specific snapshot of the source code enabling us to better identify and fix bugs.

Key changes between v14.6237 and 14.6720
64-bit compiler, bundle mkl math libraries – improved stability. Updated icon.
Improved SHELX import for atoms on special positions, SADI, DFIX and MPLA restraints, long form factors.
Spot is_nan in plot data.
New set of displacement parameter restraints, including TLS.
Better #CHECK output for restraints – including individual leverages.
Fix misreport of symmetry operators in CIF for C2/m
Dynamic threshold of Q peak height in Fourier peak viewer script.
Allow paths containing single quotes.
Add greek symbols to text and graphs for more concise display.
Leverages calculation via #sfls, refine punch=leverages, end
Changes related to getting consistent results for Thmax in CIF
Fixed listbox operation clicking in list now generates appropriate feedback for scripts.
Least squares matrix condition number output now included for problem refinement diagnosis.
Avoid almost infinite loop looking for bonds if atom coords have gone bad. Side effect – bonds outside +/- 50 unit cells of the origin won’t be found.
New choice of matrix invertors for SFLS. Default can be set in L23.
Fix default hkl format when re-reading ‘from-cif.hkl’ files (3F4.0,2F10.0).
Add a few more instrument/diffractometer types (synchrotron, neutron sources, etc.)
Modify platanom calculation: (i) use more d.p for improved precision; (ii) use c and d terms to correct data with all anomalous scattering effect removed as detailed in final version of ‘HUG and SQUEEZE’ manuscript.
Compute dispersion correction terms and photon interaction cross section for non-standard (=Mo,Cu) wavelength data. Alter xinlist3 to streamline experience.
Undo dialog: fix situation where current L5 is marked as error list.
Check for isotropic / unknown atoms before attempting to SPLIT in xsplits.ssc
Make L11 depend on L28, now if L28 filters change the VCV becomes invalid, require re-refinement. Avoids publishing CIF after omitting reflections without extra cycles.
Fix storage of converge checkbox state so it isn’t overridden when called from xwrite5.
Edlist3: revamp user interface to simplify. Added get/set mu values from L29 for each element.
Add Helium to properties file.
LIST 6 version incremented to 124. NEW DSC files will not be backwards compatible with older versions of CRYSTALS.
Better treatment of twins in Fourier maps.
Fix output of RIDE constraints (better refinement stability).
Reduce decimal places for angles with no esd (e.g. by symmetry constraint, or not refined) from 3 to 2 in CIF output. Avoids reporting e.g. 179.995 degress for exactly 180 degree angles.
Add internal vs external variance plot
Fix switch on of extinction from the Guide ‘Add extinction parameter’ dialog.

 

Oct 232017
 

Acta Crystallographica, 2017, C73, 845–853. [ doi:10.1107/S2053229617013304 ]

Using an approximate correction to the X-ray scattering from disordered, resonantly scattering regions of crystal structures we have developed and tested a procedure (HUG) to recover the absolute structure using conventional Flack x refi nement or other post-re finement determination methods.

Apr 082016
 

group16The 2016 British Crystallographic Meeting Spring Meeting took place at the University of Nottingham from 4th – 7th April. Contributions from Chem. Cryst. staff and students were:

Jerome G. P. Wicker, Bill I. F. David & Richard I. Cooper
When will it Crystallise? (Talk in session: From Amorphous to Crystal)

Jo Baker & Richard I. Cooper
Making and Measuring Photoswitchable Materials (Talk in session: Young Crystallographers’ Satellite)

Pascal Parois, Karim J. Sutton & Richard I. Cooper
On the application of leverage analysis to parameter precision using area detector strategies (Poster)

Oliver Robshaw & Richard I. Cooper
The role of molecular similarity in crystal structure packing (Poster)

Katie McInally & Richard I. Cooper
Linking crystallization prediction, theory and experiment using solubility curve determination (Poster)

Richard I. Cooper, Pascal Parois & David J. Watkin
Non-routine single crystal structure analyses using CRYSTALS (Poster)

Alex Mercer & Richard I. Cooper
Fitting Disordered Crystal Structures by Simulated Annealing of an Ensemble Model (Poster)

 

May 152013
 

pascalPascal is a senior post-doctoral researcher working on refinement and analysis of diffraction from very short lived excited state chemical species. He obtained a PhD with Dr Mark Murrie at Glasgow University studying the effects of pressure on single molecular magnets, and has since held posts at Utrecht University and University of Nancy working on software development and time-resolved diffraction. Pascal maintains a personal blog on chemical and crystallographic software matters, and in his spare time enjoys hiking and genealogy.

May 152013
 

Crystallographic structure refinement can involve hundreds of millions of calculations for a single iteration of structure refinement; careful optimisation plays an important role in determining how efficiently the software makes uses of the available CPU power. The following freely available tools help identify bottlenecks in software implementations, and allow testing potentially faster algorithms and compiler options. On recent CPUs, a carefully optimised algorithm can easily be ten times faster than a naïve implementation. Not only does this save time, but it also enables the use of larger data sets and more complicated models to tackle ever more complicated problems. A time-critical portion of code from the crystallographic refinement package CRYSTALS is analysed here.

All the software tools discussed are free and open source, running on the Linux operating system. Some of them are not available on Windows.

Profiling

The first optimisation step is profiling the execution of the existing code and algorithms. This will reveal exactly how much time is spent in functions, lines of code, and even assembly instructions. Two approaches are common:

  1. Emulating a CPU in software and then counting every instruction executed. This is the method used in valgrind. It can also look for memory leaks. Because the CPU is emulated, it can take up to 200 times longer than normal code execution.
  2. Exploiting hardware counters directly inside the CPU. These counters can be checked at fixed intervals and then recorded. While there is almost no performance penalty compared to the normal execution, it can be inaccurate. The software perf from the linux kernel can exploit them.
pascal-kcache

KCacheGrind: the coloured regions correspond to different functions in the software while the area of each corresponds to the time spent in that function during execution.

This example uses the least-squares refinement routine (\sfls) in the crystallographic analysis package CRYSTALS. A decent size data set (http://dx.doi.org/10.1107/S1600536813007757) from the journal Acta Crystallographica Section E has been used. The command line version of CRYSTALS (compiled with COMPCODE=LIN) was compiled on Linux using the open source compiler gfortran. The executable was then profiled using the software valgrind and the profiling data were analysed with kcachegrind. The output includes a graphical map (shown below) in which coloured regions correspond to different functions in the software and the area of each region corresponds to the time spent in that function during execution.

 

The same data has been analysed using the software perf with the following result:

89.94% crystals crystals      [.] adlhsblock_
 3.71% crystals crystals      [.] xchols_
 2.90% crystals crystals      [.] xsflsx_
 0.89% crystals libm-2.17.so  [.] __expf_finite
 0.44% crystals crystals      [.] xzerof_

In both cases, the profile analysis reveals that about 90% of the time is spent in the adlhsblock function, which is just 21 lines long including declarations. The body of the function is shown below (accumula.F, revision 1.8).

I = 1
do ROW=1, BLOCKdimension  ! Loop over all the rows of the block
    CONST = DERIVS(ROW)   ! Get the constent term
    do COLUMN = ROW, BLOCKdimension
        MATBLOCK(I) = MATBLOCK(I) + CONST*DERIVS(COLUMN) ! Sum on the next term.
        I = I + 1         ! Move to the next position in the matrix
    end do
end do

Instruction level analysis

The adlhsblock function is forming the normal matrix from the design matrix and is mathematically doing the matrix multiplication Zt Z. To save memory, the design matrix is not stored completely and the calculation is done reflection by reflection, multiplying and accumulating the outer product of one row of Z. Furthermore, only the upper triangle of the normal matrix is stored, which reduces the number of operations, but makes for convoluted row/column indexing of the elements.

Further investigation of the code profiling within the function indicates that the bottleneck is on the line:

MATBLOCK(I) = MATBLOCK(I) + CONST*DERIVS(COLUMN)

The assembly instructions also revealed the used of scalar instructions.

 0.09 │70:  vmovss (%rsi),%xmm0
21.21 │     add $0x4,%rsi
      │         MATBLOCK(I) = MATBLOCK(I) + CONST*DERIVS(COLUMN)
 0.05 │78:  vmulss %xmm0,%xmm1,%xmm0
12.24 │     movslq %ecx,%rcx 
 0.12 │     lea -0x4(%rdx,%rcx,4),%rcx
20.42 │     vaddss (%rcx),%xmm0,%xmm0
13.35 │     vmovss %xmm0,(%rcx)
      │         I = I + 1
20.91 │     mov %eax,%ecx
 0.06 │     add $0x1,%eax

Modern CPUs include two kind of processing units: scalar units (which process one input at a time) and vector units (which can process multiple input at the same time with the same operation). The latter instructions are called SIMD. The performance of SIMD can be outstanding compare to scalar instructions: On a Sandy bridge Intel processor the vector instructions can operate on up to eight single precision numbers at the same time. Compilers can automatically use these instructions based on patterns in the source code (see http://gcc.gnu.org/projects/tree-ssa/vectorization.html). However it is advisable to always check if a loop has been vectorized as expected as some restrictions may apply (see http://blog.debroglie.net/2013/04/14/autovectorization-not-vectorized-not-suitable-for-gather/). The use of scalar instructions is symptomatic of a suboptimal optimisation.

Optimisation and analysis

The standard optimisation level compiler switch for CRYSTALS in the Linux makefile is ‘-O2’ which does not include autovectorisation (autovectorisation is enabled at ‘O3’ level). CRYSTALS was therefore compiled with autovectorisation enabled (-ftree-vectorize -msse2) and did not give any speed up. Applying the flag (-ftree-vectorizer-verbose) and checking the output during compilation confirmed that no loop had been vectorised. In order to improve the situation the inner loop was removed and replaced with array operations and the recursive dependency on the indices was removed.

do ROW=1, BLOCKdimension
   i = ((row-1)*(2*blockdimension-row+2))/2+1
   j = i + blockdimension - row
   MATBLOCK(i:j) = MATBLOCK(i:j)+DERIVS(ROW)* derivs(row:BLOCKdimension)
end do

The new version has been compared to the original given different level of optimisation:

Compilation flag Original code (Wall clock time in s) New code (Wall clock time in s)
-O2 16 12
-O2 -ftree-vectorize -msse2 16 6.7
-O2 -ftree-vectorize -mavx 16 5.0

The improvement without vectorization (16s to 12s) is surprising: Because each cycle in the loop is independent the greater flexibility could be exploited by the scheduler to reorder instructions for greater efficiency. When using sse2 or avx instructions the new version is much faster still. The double size of the avx vector compare to sse is also clearly visible.

The new code was profiled using perf and compared to the original one. The bottleneck remains in the adlhsblock function, but the assembly output confirms the use of the vectorised avx intructions (vmulps and vaddps for example).

85.75% crystals crystals     [.] adlhsblock_
 5.40% crystals crystals     [.] xchols_
 4.40% crystals crystals     [.] xsflsx_
 1.30% crystals libm-2.17.so [.] __expf_finite
 0.61% crystals crystals     [.] xzerof_
      |         MATBLOCK(i:j) = MATBLOCK(i:j)+DERIVS(ROW)*derivs(row:BLOCKdimension)
 2.35 |15a: vmovup (%r11,%rcx,1),%xmm1
 8.42 |     add $0x1,%r8
 4.73 |     vinser $0x1,0x10(%r11,%rcx,1),%ymm1,%ymm1
 7.91 |     vmulps %ymm2,%ymm1,%ymm1
 8.82 |     vaddps (%r14,%rcx,1),%ymm1,%ymm1
41.82 |     vmovap %ymm1,(%r14,%rcx,1)
12.10 |     add $0x20,%rcx 
 2.62 |     cmp %r8,%r13
      |   ? ja 15a

 

Using code profiling to identify a bottleneck, followed by optimisation of the algorithm and appropriate choice of compiler switches result in least-squares refinement that is up to three times faster.

Apr 302012
 

J Appl. Cryst. (2012), 45, 417-429. [ doi:10.1107/S0021889812015191 ]

Leverages measure the influence that observations (intensity data and restraints) have on the fit obtained in crystal structure refinement. Further analysis enables the influence that observations have on specific parameters to be measured. The results of leverage analyses are discussed in the context of the amino acid alanine and an incomplete high-pressure data set of the complex bis(salicylaldoximato)copper(II). Leverage analysis can reveal situations where weak data are influential and allows an assessment of the influence of restraints. Analysis of the high-pressure refinement of the copper complex shows that the influence of the highest-leverage intensity observations increases when completeness is reduced, but low leverages stay low. The influence of restraints, notably those applying the Hirshfeld rigid-bond criterion, also increases dramatically. In alanine the precision of the Flack parameter is determined by medium-resolution data with moderate intensities. The results of a leverage analysis can be incorporated into a weighting scheme designed to optimize the precision of a selected parameter. This was applied to absolute structure refinement of light-atom crystal structures. The standard uncertainty of the Flack parameter could be reduced to around 0.1 even for a hydrocarbon.

Electronic reprints

Publisher’s copy

Sep 282011
 

Photo of David EdgeleyDavid is exploring the different ways of describing ring puckering and conformation.  By using data in the CSD and avoiding the lab as much as possible, he hopes to create an amalgamated approach to defining ring puckering and conformation that could be implemented in refinement software. If he isn’t hiding in the CRL basement he’s most likely to be found doing some from of college sport.

Sep 132011
 

J. Appl. Cryst.  (2011), 44, 1017-1022.    [ doi:10.1107/S0021889811034066 ]

A summary of the features for investigating absolute structure available in the crystallographic refinement program CRYSTALS is presented, together with the results of analyses of 150 light-atom structures collected with molybdenum radiation carried out with these tools. The results confirm that the Flack and Hooft parameters are strongly indicative, even when the standard uncertainties are large compared to the thresholds recommended by Flack & Bernardinelli [J. Appl. Cryst. (2000), 33, 1143–1148].

Electronic reprints

  • Oxford University Research Archive [direct pdf]

Publisher’s copy