Efficient Alternative to scipy.spatial.distance
and numpy.inner
SimSIMD leverages SIMD intrinsics, capabilities that only select compilers effectively utilize. This framework supports conventional AVX2 instructions on x86, NEON on Arm, as well as rare AVX-512 FP16 instructions on x86 and Scalable Vector Extensions (SVE) on Arm. Designed specifically for Machine Learning contexts, it's optimized for handling high-dimensional vector embeddings.
- β 3-200x faster than NumPy and SciPy distance functions.
- β Euclidean (L2), Inner Product, and Cosine (Angular) spatial distances.
- β Hamming (~ Manhattan) and Jaccard (~ Tanimoto) binary distances.
- β
Single-precision
f32
, half-precisionf16
,i8
, and binary vectors. - β Compatible with GCC and Clang on MacOS and Linux, and MinGW on Windows.
- β Compatible with NumPy, PyTorch, TensorFlow, and other tensors.
- β Has no dependencies, not even LibC.
- β JavaScript API.
- β C API.
Given 10,000 embeddings from OpenAI Ada API with 1536 dimensions, running on the Apple M2 Pro Arm CPU with NEON support, here's how SimSIMD performs against conventional methods:
Conventional | SimSIMD | f32 improvement |
f16 improvement |
i8 improvement |
---|---|---|---|---|
scipy.spatial.distance.cosine |
cosine |
39 x | 84 x | 196 x |
scipy.spatial.distance.sqeuclidean |
sqeuclidean |
8 x | 25 x | 22 x |
numpy.inner |
inner |
3 x | 10 x | 18 x |
On the Intel Sapphire Rapids platform, SimSIMD was benchmarked against autovectorized-code using GCC 12. GCC handles single-precision float
and int8_t
well. However, it fails on _Float16
arrays, which has been part of the C language since 2011.
GCC 12 f32 |
GCC 12 f16 |
SimSIMD f16 |
f16 improvement |
|
---|---|---|---|---|
cosine |
3.28 M/s | 336.29 k/s | 6.88 M/s | 20 x |
sqeuclidean |
4.62 M/s | 147.25 k/s | 5.32 M/s | 36 x |
inner |
3.81 M/s | 192.02 k/s | 5.99 M/s | 31 x |
Technical Insights:
- Uses Arm SVE and x86 AVX-512's masked loads to eliminate tail
for
-loops. - Uses AVX-512 FP16 for half-precision operations, that few compilers vectorize.
- Substitutes LibC's
sqrt
calls with bithacks using Jan Kadlec's constant. - Avoids slow PyBind11 and SWIG, directly using the CPython C API.
- Avoids slow
PyArg_ParseTuple
and manually unpacks argument tuples.
Broader Benchmarking Results:
pip install simsimd
import simsimd
import numpy as np
vec1 = np.random.randn(1536).astype(np.float32)
vec2 = np.random.randn(1536).astype(np.float32)
dist = simsimd.cosine(vec1, vec2)
Supported functions include cosine
, inner
, sqeuclidean
, hamming
, and jaccard
.
batch1 = np.random.randn(100, 1536).astype(np.float32)
batch2 = np.random.randn(100, 1536).astype(np.float32)
dist = simsimd.cosine(batch1, batch2)
If either batch has more than one vector, the other batch must have one or same number of vectors. If it contains just one, the value is broadcasted.
For calculating distances between all possible pairs of rows across two matrices (akin to scipy.spatial.distance.cdist
):
matrix1 = np.random.randn(1000, 1536).astype(np.float32)
matrix2 = np.random.randn(10, 1536).astype(np.float32)
distances = simsimd.cdist(matrix1, matrix2, metric="cosine")
By default, computations use a single CPU core. To optimize and utilize all CPU cores on Linux systems, add the threads=0
argument. Alternatively, specify a custom number of threads:
distances = simsimd.cdist(matrix1, matrix2, metric="cosine", threads=0)
To view a list of hardware backends that SimSIMD supports:
print(simsimd.get_capabilities())
Want to use it in Python with USearch?
You can wrap the raw C function pointers SimSIMD backends into a CompiledMetric
, and pass it to USearch, similar to how it handles Numba's JIT-compiled code.
from usearch.index import Index, CompiledMetric, MetricKind, MetricSignature
from simsimd import pointer_to_sqeuclidean, pointer_to_cosine, pointer_to_inner
metric = CompiledMetric(
pointer=pointer_to_cosine("f16"),
kind=MetricKind.Cos,
signature=MetricSignature.ArrayArraySize,
)
index = Index(256, metric=metric)
After you add simsimd
as a dependency and npm install
, you will be able to call SimSIMD function on various TypedArray
variants:
const { sqeuclidean, cosine, inner, hamming, jaccard } = require('simsimd');
const vectorA = new Float32Array([1.0, 2.0, 3.0]);
const vectorB = new Float32Array([4.0, 5.0, 6.0]);
const distance = sqeuclidean(vectorA, vectorB);
console.log('Squared Euclidean Distance:', distance);
If you're aiming to utilize the _Float16
functionality with SimSIMD, ensure your development environment is compatible with C 11. For other functionalities of SimSIMD, C 99 compatibility will suffice.
For integration within a CMake-based project, add the following segment to your CMakeLists.txt
:
FetchContent_Declare(
simsimd
GIT_REPOSITORY https://github.com/ashvardanian/simsimd.git
GIT_SHALLOW TRUE
)
FetchContent_MakeAvailable(simsimd)
include_directories(${simsimd_SOURCE_DIR}/include)
Stay updated with the latest advancements by always using the most recent compiler available for your platform. This ensures that you benefit from the newest intrinsics.
Should you wish to integrate SimSIMD within USearch, simply compile USearch with the flag USEARCH_USE_SIMSIMD=1
. Notably, this is the default setting on the majority of platforms.
Here's a glance at the exciting developments on our horizon:
- Exposing Hamming and Tanimoto bitwise distances to the Python interface.
- Intel AMX backend. Note: Currently, the intrinsics are functional only with Intel's latest compiler.
To Rerun Experiments utilize the following command:
cmake -DCMAKE_BUILD_TYPE=Release -DSIMSIMD_BUILD_BENCHMARKS=1 -B ./build_release && make -C ./build_release && ./build_release/simsimd_bench
To Test with PyTest:
pip install -e . && pytest scripts/test.py -s -x
To benchmark: you can pass option --n
argument for the batch size, and --ndim
for the number of vector dimensions.
python python/bench.py --n 1000 --ndim 1000000
To Test JavaScript bindings:
npm install && npm test