.. role:: ref(emphasis)

.. _futhark-cuda(1):

==============
futhark-cuda
==============

SYNOPSIS
========

futhark cuda [options...] <program.fut>

DESCRIPTION
===========


``futhark cuda`` translates a Futhark program to C code invoking CUDA
kernels, and either compiles that C code with a C compiler to an
executable binary program, or produces a ``.h`` and ``.c`` file that
can be linked with other code. The standard Futhark optimisation
pipeline is used.

``futhark cuda`` uses ``-lcuda -lcudart -lnvrtc`` to link.  If using
``--library``, you will need to do the same when linking the final
binary.

The generated CUDA code can be called from multiple CPU threads, as it
brackets every API operation with ``cuCtxPushCurrent()`` and
``cuCtxPopCurrent()``.

OPTIONS
=======

Accepts the same options as :ref:`futhark-c(1)`.

ENVIRONMENT VARIABLES
=====================

``CC``

  The C compiler used to compile the program.  Defaults to ``cc`` if
  unset.

``CFLAGS``

  Space-separated list of options passed to the C compiler.  Defaults
  to ``-O -std=c99`` if unset.

EXECUTABLE OPTIONS
==================

Generated executables accept the same options as those generated by
:ref:`futhark-c(1)`. The ``-t`` option behaves as with
:ref:`futhark-opencl(1)`.

The following additional options are accepted.

-h, --help

  Print help text to standard output and exit.

--default-thread-block-size=INT

  The default size of thread blocks that are launched.  Capped to the
  hardware limit if necessary.

--default-num-thread-blocks=INT

  The default number of thread blocks that are launched.

--default-threshold=INT

  The default parallelism threshold used for comparisons when
  selecting between code versions generated by incremental flattening.
  Intuitively, the amount of parallelism needed to saturate the GPU.

--default-tile-size=INT

  The default tile size used when performing two-dimensional tiling
  (the workgroup size will be the square of the tile size).

--dump-cuda=FILE

  Don't run the program, but instead dump the embedded CUDA kernels to
  the indicated file.  Useful if you want to see what is actually
  being executed.

--dump-ptx=FILE

  Don't run the program, but instead dump the PTX-compiled version of
  the embedded kernels to the indicated file.

--load-cuda=FILE

  Instead of using the embedded CUDA kernels, load them from the
  indicated file.

--load-ptx=FILE

  Load PTX code from the indicated file.

--nvrtc-option=OPT

  Add an additional build option to the string passed to NVRTC.  Refer
  to the CUDA documentation for which options are supported.  Be
  careful - some options can easily result in invalid results.

ENVIRONMENT
===========

If run without ``--library``, ``futhark cuda`` will invoke a C
compiler to compile the generated C program into a binary.  This only
works if the C compiler can find the necessary CUDA libraries.  On
most systems, CUDA is installed in ``/usr/local/cuda``, which is
usually not part of the default compiler search path. You may need to
set the following environment variables before running ``futhark
cuda``::

  LIBRARY_PATH=/usr/local/cuda/lib64
  LD_LIBRARY_PATH=/usr/local/cuda/lib64/
  CPATH=/usr/local/cuda/include

At runtime the generated program must be able to find the CUDA
installation directory, which is normally located at
``/usr/local/cuda``.  If you have CUDA installed elsewhere, set any of
the ``CUDA_HOME``, ``CUDA_ROOT``, or ``CUDA_PATH`` environment
variables to the proper directory.

SEE ALSO
========

:ref:`futhark-opencl(1)`
