.. _building-ookami: Ookami (Stony Brook) ==================== The `Ookami cluster `__ is located at Stony Brook University. Introduction ------------ If you are new to this system, **please see the following resources**: * `Ookami documentation `__ * Batch system: `Slurm `__ (see `available queues `__) * `Filesystem locations `__: * ``/lustre/home/`` (30GByte, backuped) * ``/lustre/scratch/`` (14 day purge) * ``/lustre/projects/*`` (1TByte default, up to 8TB possible, shared within our group/project, backuped, prefer this location) We use Ookami as a development cluster for `A64FX `__, The cluster also provides a few extra nodes, e.g. two ``Thunder X2`` (ARM) nodes. Installation ------------ Use the following commands to download the WarpX source code and switch to the correct branch: .. code-block:: bash git clone https://github.com/BLAST-WarpX/warpx.git $HOME/src/warpx We use the following modules and environments on the system (``$HOME/warpx_gcc10.profile``). .. literalinclude:: ../../../../Tools/machines/ookami-sbu/ookami_warpx.profile.example :language: bash :caption: You can copy this file from ``Tools/machines/ookami-sbu/ookami_warpx.profile.example``. We recommend to store the above lines in a file, such as ``$HOME/warpx_gcc10.profile``, and load it into your shell after a login: .. code-block:: bash source $HOME/warpx_gcc10.profile Then, ``cd`` into the directory ``$HOME/src/warpx`` and use the following commands to compile: .. code-block:: bash cd $HOME/src/warpx rm -rf build cmake -S . -B build -DWarpX_COMPUTE=OMP -DWarpX_DIMS="1;2;3" cmake --build build -j 10 # or (currently better performance) cmake -S . -B build -DWarpX_COMPUTE=NOACC -DWarpX_DIMS="1;2;3" cmake --build build -j 10 The general :ref:`cmake compile-time options ` apply as usual. **That's it!** A 3D WarpX executable is now in ``build/bin/`` and :ref:`can be run ` with a :ref:`3D example inputs file `. Most people execute the binary directly or copy it out to a location in ``/lustre/scratch/``. .. _running-cpp-ookami: Running ------- For running on 48 cores of a single node: .. code-block:: bash srun -p short -N 1 -n 48 --pty bash OMP_NUM_THREADS=1 mpiexec -n 48 --map-by ppr:12:numa:pe=1 --report-bindings ./warpx inputs # alternatively, using 4 MPI ranks with each 12 threads on a single node: OMP_NUM_THREADS=12 mpiexec -n 4 --map-by ppr:4:numa:pe=12 --report-bindings ./warpx inputs The Ookami HPE Apollo 80 system has 174 A64FX compute nodes each with 32GB of high-bandwidth memory. Additional Compilers -------------------- This section is just a note for developers. We compiled with the Fujitsu Compiler (Clang) with the following build string: .. code-block:: bash cmake -S . -B build \ -DCMAKE_C_COMPILER=$(which mpifcc) \ -DCMAKE_C_COMPILER_ID="Clang" \ -DCMAKE_C_COMPILER_VERSION=12.0 \ -DCMAKE_C_STANDARD_COMPUTED_DEFAULT="11" \ -DCMAKE_CXX_COMPILER=$(which mpiFCC) \ -DCMAKE_CXX_COMPILER_ID="Clang" \ -DCMAKE_CXX_COMPILER_VERSION=12.0 \ -DCMAKE_CXX_STANDARD_COMPUTED_DEFAULT="14" \ -DCMAKE_CXX_FLAGS="-Nclang" \ -DAMReX_DIFFERENT_COMPILER=ON \ -DAMReX_MPI_THREAD_MULTIPLE=FALSE \ -DWarpX_COMPUTE=OMP cmake --build build -j 10 Note that the best performance for A64FX is currently achieved with the GCC or ARM compilers.