On writing/optimizing CUDA kernels for a Finite Element library
On writing/optimizing CUDA kernels for a Finite Element library
Slides from a presentation describing how to improve CPU performance through vectorization
How to speed up Mathematica notebooks by calling through to functions in C++ libraries
This document demonstrates how in some cases, eliminating template parameters can significantly reduce compile time and binary size, while retaining performance parity.
Compiler Explorer is a great tool for prototyping and understanding code snippets, and running a local instance can make it even more flexible and responsive.
CMake's ExternalData provides a way for projects to download large data files just-in-time, rather than putting them directly in a git repo or project tarball. This feature doesn't seem to be used very often, so this is an example project showing how to set it up.