rocBLAS User Guide¶
Contents:
- 1. Getting Started Guide
- 2. Installation and Building for Linux
- 3. Installation and Building for Windows
- 4. API Reference Guide
- 5. Using rocBLAS API
- 5.1. rocBLAS Datatypes
- 5.2. rocBLAS Enumeration
- 5.2.1. rocblas_operation
- 5.2.2. rocblas_fill
- 5.2.3. rocblas_diagonal
- 5.2.4. rocblas_side
- 5.2.5. rocblas_status
rocblas_statusrocblas_status::rocblas_status_successrocblas_status::rocblas_status_invalid_handlerocblas_status::rocblas_status_not_implementedrocblas_status::rocblas_status_invalid_pointerrocblas_status::rocblas_status_invalid_sizerocblas_status::rocblas_status_memory_errorrocblas_status::rocblas_status_internal_errorrocblas_status::rocblas_status_perf_degradedrocblas_status::rocblas_status_size_query_mismatchrocblas_status::rocblas_status_size_increasedrocblas_status::rocblas_status_size_unchangedrocblas_status::rocblas_status_invalid_valuerocblas_status::rocblas_status_continuerocblas_status::rocblas_status_check_numerics_fail
- 5.2.6. rocblas_datatype
rocblas_datatyperocblas_datatype::rocblas_datatype_f16_rrocblas_datatype::rocblas_datatype_f32_rrocblas_datatype::rocblas_datatype_f64_rrocblas_datatype::rocblas_datatype_f16_crocblas_datatype::rocblas_datatype_f32_crocblas_datatype::rocblas_datatype_f64_crocblas_datatype::rocblas_datatype_i8_rrocblas_datatype::rocblas_datatype_u8_rrocblas_datatype::rocblas_datatype_i32_rrocblas_datatype::rocblas_datatype_u32_rrocblas_datatype::rocblas_datatype_i8_crocblas_datatype::rocblas_datatype_u8_crocblas_datatype::rocblas_datatype_i32_crocblas_datatype::rocblas_datatype_u32_crocblas_datatype::rocblas_datatype_bf16_rrocblas_datatype::rocblas_datatype_bf16_crocblas_datatype::rocblas_datatype_invalid
- 5.2.7. rocblas_pointer_mode
- 5.2.8. rocblas_atomics_mode
- 5.2.9. rocblas_layer_mode
- 5.2.10. rocblas_gemm_algo
- 5.2.11. rocblas_gemm_flags
- 5.3. rocBLAS Helper functions
- 5.3.1. Auxiliary Functions
rocblas_create_handle()rocblas_destroy_handle()rocblas_set_stream()rocblas_get_stream()rocblas_set_pointer_mode()rocblas_get_pointer_mode()rocblas_set_atomics_mode()rocblas_get_atomics_mode()rocblas_query_int8_layout_flag()rocblas_pointer_to_mode()rocblas_set_vector()rocblas_get_vector()rocblas_set_matrix()rocblas_get_matrix()rocblas_set_vector_async()rocblas_set_matrix_async()rocblas_get_matrix_async()rocblas_initialize()rocblas_status_to_string()
- 5.3.2. Device Memory Allocation Functions
- 5.3.3. Build Information Functions
- 5.3.1. Auxiliary Functions
- 5.4. rocBLAS Level-1 functions
- 5.4.1. rocblas_iXamax + batched, strided_batched
- 5.4.2. rocblas_iXamin + batched, strided_batched
- 5.4.3. rocblas_Xasum + batched, strided_batched
- 5.4.4. rocblas_Xaxpy + batched, strided_batched
rocblas_saxpy()rocblas_daxpy()rocblas_haxpy()rocblas_caxpy()rocblas_zaxpy()rocblas_saxpy_batched()rocblas_daxpy_batched()rocblas_haxpy_batched()rocblas_caxpy_batched()rocblas_zaxpy_batched()rocblas_saxpy_strided_batched()rocblas_daxpy_strided_batched()rocblas_haxpy_strided_batched()rocblas_caxpy_strided_batched()rocblas_zaxpy_strided_batched()
- 5.4.5. rocblas_Xcopy + batched, strided_batched
- 5.4.6. rocblas_Xdot + batched, strided_batched
rocblas_sdot()rocblas_ddot()rocblas_hdot()rocblas_bfdot()rocblas_cdotu()rocblas_cdotc()rocblas_zdotu()rocblas_zdotc()rocblas_sdot_batched()rocblas_ddot_batched()rocblas_hdot_batched()rocblas_bfdot_batched()rocblas_cdotu_batched()rocblas_cdotc_batched()rocblas_zdotu_batched()rocblas_zdotc_batched()rocblas_sdot_strided_batched()rocblas_ddot_strided_batched()rocblas_hdot_strided_batched()rocblas_bfdot_strided_batched()rocblas_cdotu_strided_batched()rocblas_cdotc_strided_batched()rocblas_zdotu_strided_batched()rocblas_zdotc_strided_batched()
- 5.4.7. rocblas_Xnrm2 + batched, strided_batched
- 5.4.8. rocblas_Xrot + batched, strided_batched
rocblas_srot()rocblas_drot()rocblas_crot()rocblas_csrot()rocblas_zrot()rocblas_zdrot()rocblas_srot_batched()rocblas_drot_batched()rocblas_crot_batched()rocblas_csrot_batched()rocblas_zrot_batched()rocblas_zdrot_batched()rocblas_srot_strided_batched()rocblas_drot_strided_batched()rocblas_crot_strided_batched()rocblas_csrot_strided_batched()rocblas_zrot_strided_batched()rocblas_zdrot_strided_batched()
- 5.4.9. rocblas_Xrotg + batched, strided_batched
- 5.4.10. rocblas_Xrotm + batched, strided_batched
- 5.4.11. rocblas_Xrotmg + batched, strided_batched
- 5.4.12. rocblas_Xscal + batched, strided_batched
rocblas_sscal()rocblas_dscal()rocblas_cscal()rocblas_zscal()rocblas_csscal()rocblas_zdscal()rocblas_sscal_batched()rocblas_dscal_batched()rocblas_cscal_batched()rocblas_zscal_batched()rocblas_csscal_batched()rocblas_zdscal_batched()rocblas_sscal_strided_batched()rocblas_dscal_strided_batched()rocblas_cscal_strided_batched()rocblas_zscal_strided_batched()rocblas_csscal_strided_batched()rocblas_zdscal_strided_batched()
- 5.4.13. rocblas_Xswap + batched, strided_batched
- 5.5. rocBLAS Level-2 functions
- 5.5.1. rocblas_Xgbmv + batched, strided_batched
- 5.5.2. rocblas_Xgemv + batched, strided_batched
- 5.5.3. rocblas_Xger + batched, strided_batched
rocblas_sger()rocblas_dger()rocblas_cgeru()rocblas_zgeru()rocblas_cgerc()rocblas_zgerc()rocblas_sger_batched()rocblas_dger_batched()rocblas_cgeru_batched()rocblas_zgeru_batched()rocblas_cgerc_batched()rocblas_zgerc_batched()rocblas_sger_strided_batched()rocblas_dger_strided_batched()rocblas_cgeru_strided_batched()rocblas_zgeru_strided_batched()rocblas_cgerc_strided_batched()rocblas_zgerc_strided_batched()
- 5.5.4. rocblas_Xsbmv + batched, strided_batched
- 5.5.5. rocblas_Xspmv + batched, strided_batched
- 5.5.6. rocblas_Xspr + batched, strided_batched
- 5.5.7. rocblas_Xspr2 + batched, strided_batched
- 5.5.8. rocblas_Xsymv + batched, strided_batched
- 5.5.9. rocblas_Xsyr + batched, strided_batched
- 5.5.10. rocblas_Xsyr2 + batched, strided_batched
- 5.5.11. rocblas_Xtbmv + batched, strided_batched
- 5.5.12. rocblas_Xtbsv + batched, strided_batched
- 5.5.13. rocblas_Xtpmv + batched, strided_batched
- 5.5.14. rocblas_Xtpsv + batched, strided_batched
- 5.5.15. rocblas_Xtrmv + batched, strided_batched
- 5.5.16. rocblas_Xtrsv + batched, strided_batched
- 5.5.17. rocblas_Xhemv + batched, strided_batched
- 5.5.18. rocblas_Xhbmv + batched, strided_batched
- 5.5.19. rocblas_Xhpmv + batched, strided_batched
- 5.5.20. rocblas_Xher + batched, strided_batched
- 5.5.21. rocblas_Xher2 + batched, strided_batched
- 5.5.22. rocblas_Xhpr + batched, strided_batched
- 5.5.23. rocblas_Xhpr2 + batched, strided_batched
- 5.6. rocBLAS Level-3 functions
- 5.6.1. rocblas_Xgemm + batched, strided_batched
rocblas_sgemm()rocblas_dgemm()rocblas_hgemm()rocblas_cgemm()rocblas_zgemm()rocblas_sgemm_batched()rocblas_dgemm_batched()rocblas_hgemm_batched()rocblas_cgemm_batched()rocblas_zgemm_batched()rocblas_sgemm_strided_batched()rocblas_dgemm_strided_batched()rocblas_hgemm_strided_batched()rocblas_cgemm_strided_batched()rocblas_zgemm_strided_batched()
- 5.6.2. rocblas_Xsymm + batched, strided_batched
- 5.6.3. rocblas_Xsyrk + batched, strided_batched
- 5.6.4. rocblas_Xsyr2k + batched, strided_batched
- 5.6.5. rocblas_Xsyrkx + batched, strided_batched
- 5.6.6. rocblas_Xtrmm + batched, strided_batched
- 5.6.7. rocblas_Xtrsm + batched, strided_batched
- 5.6.8. rocblas_Xhemm + batched, strided_batched
- 5.6.9. rocblas_Xherk + batched, strided_batched
- 5.6.10. rocblas_Xher2k + batched, strided_batched
- 5.6.11. rocblas_Xherkx + batched, strided_batched
- 5.6.12. rocblas_Xtrtri + batched, strided_batched
- 5.6.1. rocblas_Xgemm + batched, strided_batched
- 5.7. rocBLAS Extension
- 5.7.1. rocblas_axpy_ex + batched, strided_batched
- 5.7.2. rocblas_dot_ex + batched, strided_batched
- 5.7.3. rocblas_dotc_ex + batched, strided_batched
- 5.7.4. rocblas_nrm2_ex + batched, strided_batched
- 5.7.5. rocblas_rot_ex + batched, strided_batched
- 5.7.6. rocblas_scal_ex + batched, strided_batched
- 5.7.7. rocblas_gemm_ex + batched, strided_batched
- 5.7.8. rocblas_gemm_ext2
- 5.7.9. rocblas_trsm_ex + batched, strided_batched
- 5.7.10. rocblas_Xgeam + batched, strided_batched
- 5.7.11. rocblas_Xdgmm + batched, strided_batched
- 5.8. rocBLAS Beta Features
- 5.9. Graph Support for rocBLAS
- 5.10. Device Memory Allocation in rocBLAS
- 5.10.1. Environment Variable for Preallocating
- 5.10.2. Functions for Manually Setting Memory Size
- 5.10.3. Function for Setting User Owned Workspace
- 5.10.4. Functions for Finding How Much Memory Is Required
- 5.10.5. rocBLAS Function Return Values for Insufficient Device Memory
- 5.10.6. Stream-Ordered Memory Allocation
- 5.11. Logging in rocBLAS
- 6. Programmer’s Guide
- 6.1. Library Source Code Organization
- 6.2. Handle, Stream, and Device Management
- 6.3. Device Memory Allocation
- 6.4. Thread Safe Logging
- 6.5. rocBLAS Numerical Checking
- 6.6. rocBLAS Order of Argument Checking and Logging
- 6.6.1. Legacy BLAS
- 6.6.2. rocBLAS
- 6.6.3. rocBLAS has the Following Differences When Compared To Legacy BLAS
- 6.6.4. To Accommodate the Additions
- 6.6.5. Device Memory Size Queries
- 6.6.6. rocBLAS Control Flow
- 6.6.7. Legacy L1 BLAS “single vector”
- 6.6.8. Legacy L1 BLAS “two vector”
- 6.6.9. Legacy L2 BLAS
- 6.6.10. Legacy L3 BLAS
- 6.7. rocBLAS Benchmarking and Testing
- 7. Contributor’s Guide
- 8. Acknowledgement
- 9. Disclaimer