This documentation is still new and evolving. If you spot any mistakes, unclear explanations, or missing details, please open an issue.
Your feedback helps us improve!
SIMD operatorsโ
This page lists all operators available in the exp/simd sub-package. These helpers use AVX (128-bit), AVX2 (256-bit) or AVX512 (512-bit) SIMD when built with Go 1.26+, the GOEXPERIMENT=simd flag, and on amd64.
SIMD operators are experimental. The API may break in the future.
Installโ
First, import the sub-package in your project:
go get -u github.com/samber/ro/plugins/exp/simd
ScalarToSIMDโ
Converts streams of scalar values into SIMD vectors. Each variant buffers a specific number of scalar values (based on the vector size: 2, 4, 8, 16, 32, or 64) and emits them as a single SIMD vector.
import (
"fmt"
"github.com/samber/ro"
rosimd "github.com/samber/ro/plugins/exp/simd"
"simd/archsimd"
)
obs := ro.Pipe[int8, *archsimd.Int8x16](
ro.Just(
int8(1), int8(2), int8(3), int8(4),
int8(5), int8(6), int8(7), int8(8),
int8(9), int8(10), int8(11), int8(12),
int8(13), int8(14), int8(15), int8(16),
int8(17), int8(18), // partial final vector
),
rosimd.ScalarToInt8x16[int8](),
)
sub := obs.Subscribe(ro.NewObserver[*archsimd.Int8x16](
func(vec *archsimd.Int8x16) {
var buf [16]int8
vec.Store(&buf)
fmt.Printf("Next: %v\n", buf[:])
},
ro.OnError(func(err error) {
fmt.Printf("Error: %v\n", err)
}),
ro.OnComplete(func() {
fmt.Println("Completed")
}),
))
defer sub.Unsubscribe()
// Next: [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16]
// Next: [17 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
// CompletedAVX variants (128-bit vectors)
Available on all x86_64 CPUs with AVX support (basically all modern x86_64 CPUs).
- ScalarToInt8x16
- ScalarToInt16x8
- ScalarToInt32x4
- ScalarToInt64x2
- ScalarToUint8x16
- ScalarToUint16x8
- ScalarToUint32x4
- ScalarToUint64x2
- ScalarToFloat32x4
- ScalarToFloat64x2
AVX2 variants (256-bit vectors)
Requires AVX2 CPU support (Intel Haswell [2013]+, AMD Ryzen [2017]+).
- ScalarToInt8x32
- ScalarToInt16x16
- ScalarToInt32x8
- ScalarToInt64x4
- ScalarToUint8x32
- ScalarToUint16x16
- ScalarToUint32x8
- ScalarToUint64x4
- ScalarToFloat32x8
- ScalarToFloat64x4
AVX-512 variants (512-bit vectors)
Requires AVX-512 CPU support (Intel Skylake-X/Xeon [2017]+, AMD Zen 4 [2022]+).
- ScalarToInt8x64
- ScalarToInt16x32
- ScalarToInt32x16
- ScalarToInt64x8
- ScalarToUint8x64
- ScalarToUint16x32
- ScalarToUint32x16
- ScalarToUint64x8
- ScalarToFloat32x16
- ScalarToFloat64x8
Variant:scalartofloat32x16scalartofloat32x4scalartofloat32x8scalartofloat64x2scalartofloat64x4scalartofloat64x8scalartoint16x16scalartoint16x32scalartoint16x8scalartoint32x16scalartoint32x4scalartoint32x8scalartoint64x2scalartoint64x4scalartoint64x8scalartoint8x16scalartoint8x32scalartoint8x64scalartouint16x16scalartouint16x32scalartouint16x8scalartouint32x16scalartouint32x4scalartouint32x8scalartouint64x2scalartouint64x4scalartouint64x8scalartouint8x16scalartouint8x32scalartouint8x64Similar:Prototypes:func ScalarToInt8x16[T ~int8]()
func ScalarToInt16x8[T ~int16]()
func ScalarToInt32x4[T ~int32]()
func ScalarToInt64x2[T ~int64]()
func ScalarToUint8x16[T ~uint8]()
func ScalarToUint16x8[T ~uint16]()
func ScalarToUint32x4[T ~uint32]()
func ScalarToUint64x2[T ~uint64]()
func ScalarToFloat32x4[T ~float32]()
func ScalarToFloat64x2[T ~float64]()
func ScalarToInt8x32[T ~int8]()
func ScalarToInt16x16[T ~int16]()
func ScalarToInt32x8[T ~int32]()
func ScalarToInt64x4[T ~int64]()
func ScalarToUint8x32[T ~uint8]()
func ScalarToUint16x16[T ~uint16]()
func ScalarToUint32x8[T ~uint32]()
func ScalarToUint64x4[T ~uint64]()
func ScalarToFloat32x8[T ~float32]()
func ScalarToFloat64x4[T ~float64]()
func ScalarToInt8x64[T ~int8]()
func ScalarToInt16x32[T ~int16]()
func ScalarToInt32x16[T ~int32]()
func ScalarToInt64x8[T ~int64]()
func ScalarToUint8x64[T ~uint8]()
func ScalarToUint16x32[T ~uint16]()
func ScalarToUint32x16[T ~uint32]()
func ScalarToUint64x8[T ~uint64]()
func ScalarToFloat32x16[T ~float32]()
func ScalarToFloat64x8[T ~float64]()SIMDToScalarโ
Converts SIMD vectors back into streams of scalar values. Each SIMD vector emits multiple scalar values based on its lane count (2, 4, 8, 16, 32, or 64 values per vector).
import (
"fmt"
"github.com/samber/ro"
rosimd "github.com/samber/ro/plugins/exp/simd"
)
obs := ro.Pipe[float32, float32](
ro.Just(
float32(1.5), float32(2.5), float32(3.5), float32(4.5),
),
rosimd.ScalarToFloat32x4[float32](),
rosimd.Float32x4ToScalar[float32](),
)
sub := obs.Subscribe(ro.NewObserver[float32](
func(v float32) {
fmt.Printf("Next: %.1f\n", v)
},
ro.OnError(func(err error) {
fmt.Printf("Error: %v\n", err)
}),
ro.OnComplete(func() {
fmt.Println("Completed")
}),
))
defer sub.Unsubscribe()
// Next: 1.5
// Next: 2.5
// Next: 3.5
// Next: 4.5
// CompletedAVX variants (128-bit vectors)
Available on all x86_64 CPUs with AVX support (basically all modern x86_64 CPUs).
- Float32x4ToScalar
- Float64x2ToScalar
- Int8x16ToScalar
- Int16x8ToScalar
- Int32x4ToScalar
- Int64x2ToScalar
- Uint8x16ToScalar
- Uint16x8ToScalar
- Uint32x4ToScalar
- Uint64x2ToScalar
AVX2 variants (256-bit vectors)
Requires AVX2 CPU support (Intel Haswell [2013]+, AMD Ryzen [2017]+).
- Float32x8ToScalar
- Float64x4ToScalar
- Int8x32ToScalar
- Int16x16ToScalar
- Int32x8ToScalar
- Int64x4ToScalar
- Uint8x32ToScalar
- Uint16x16ToScalar
- Uint32x8ToScalar
- Uint64x4ToScalar
AVX-512 variants (512-bit vectors)
Requires AVX-512 CPU support (Intel Skylake-X/Xeon [2017]+, AMD Zen 4 [2022]+).
- Float32x16ToScalar
- Float64x8ToScalar
- Int8x64ToScalar
- Int16x32ToScalar
- Int32x16ToScalar
- Int64x8ToScalar
- Uint8x64ToScalar
- Uint16x32ToScalar
- Uint32x16ToScalar
- Uint64x8ToScalar
Variant:float32x16toscalarfloat32x4toscalarfloat32x8toscalarfloat64x2toscalarfloat64x4toscalarfloat64x8toscalarint16x16toscalarint16x32toscalarint16x8toscalarint32x16toscalarint32x4toscalarint32x8toscalarint64x2toscalarint64x4toscalarint64x8toscalarint8x16toscalarint8x32toscalarint8x64toscalaruint16x16toscalaruint16x32toscalaruint16x8toscalaruint32x16toscalaruint32x4toscalaruint32x8toscalaruint64x2toscalaruint64x4toscalaruint64x8toscalaruint8x16toscalaruint8x32toscalaruint8x64toscalarSimilar:Prototypes:func Int8x16ToScalar[T ~int8]()
func Int16x8ToScalar[T ~int16]()
func Int32x4ToScalar[T ~int32]()
func Int64x2ToScalar[T ~int64]()
func Uint8x16ToScalar[T ~uint8]()
func Uint16x8ToScalar[T ~uint16]()
func Uint32x4ToScalar[T ~uint32]()
func Uint64x2ToScalar[T ~uint64]()
func Float32x4ToScalar[T ~float32]()
func Float64x2ToScalar[T ~float64]()
func Int8x32ToScalar[T ~int8]()
func Int16x16ToScalar[T ~int16]()
func Int32x8ToScalar[T ~int32]()
func Int64x4ToScalar[T ~int64]()
func Uint8x32ToScalar[T ~uint8]()
func Uint16x16ToScalar[T ~uint16]()
func Uint32x8ToScalar[T ~uint32]()
func Uint64x4ToScalar[T ~uint64]()
func Float32x8ToScalar[T ~float32]()
func Float64x4ToScalar[T ~float64]()
func Int8x64ToScalar[T ~int8]()
func Int16x32ToScalar[T ~int16]()
func Int32x16ToScalar[T ~int32]()
func Int64x8ToScalar[T ~int64]()
func Uint8x64ToScalar[T ~uint8]()
func Uint16x32ToScalar[T ~uint16]()
func Uint32x16ToScalar[T ~uint32]()
func Uint64x8ToScalar[T ~uint64]()
func Float32x16ToScalar[T ~float32]()
func Float64x8ToScalar[T ~float64]()Addโ
Adds a scalar number to all lanes in SIMD vectors using SIMD instructions for parallel computation.
import (
"fmt"
"github.com/samber/ro"
rosimd "github.com/samber/ro/plugins/exp/simd"
)
obs := ro.Pipe[float32, float32](
ro.Just(
float32(10), float32(20), float32(30), float32(40),
float32(5), float32(10), float32(15), float32(20),
),
rosimd.ScalarToFloat32x4[float32](),
rosimd.AddFloat32x4[float32](100),
rosimd.Float32x4ToScalar[float32](),
)
sub := obs.Subscribe(ro.NewObserver[float32](
func(v float32) {
fmt.Printf("Next: %.1f\n", v)
},
ro.OnError(func(err error) {
fmt.Printf("Error: %v\n", err)
}),
ro.OnComplete(func() {
fmt.Println("Completed")
}),
))
defer sub.Unsubscribe()
// Next: 110.0
// Next: 120.0
// Next: 130.0
// Next: 140.0
// Next: 105.0
// Next: 110.0
// Next: 115.0
// Next: 120.0
// CompletedAVX variants (128-bit vectors)
Available on all x86_64 CPUs with AVX support (basically all modern x86_64 CPUs).
- AddFloat32x4
- AddFloat64x2
- AddInt8x16
- AddInt16x8
- AddInt32x4
- AddInt64x2
- AddUint8x16
- AddUint16x8
- AddUint32x4
- AddUint64x2
AVX2 variants (256-bit vectors)
Requires AVX2 CPU support (Intel Haswell [2013]+, AMD Ryzen [2017]+).
- AddFloat32x8
- AddFloat64x4
- AddInt8x32
- AddInt16x16
- AddInt32x8
- AddInt64x4
- AddUint8x32
- AddUint16x16
- AddUint32x8
- AddUint64x4
AVX-512 variants (512-bit vectors)
Requires AVX-512 CPU support (Intel Skylake-X/Xeon [2017]+, AMD Zen 4 [2022]+).
- AddFloat32x16
- AddFloat64x8
- AddInt8x64
- AddInt16x32
- AddInt32x16
- AddInt64x8
- AddUint8x64
- AddUint16x32
- AddUint32x16
- AddUint64x8
Variant:addfloat32x16addfloat32x4addfloat32x8addfloat64x2addfloat64x4addfloat64x8addint16x16addint16x32addint16x8addint32x16addint32x4addint32x8addint64x2addint64x4addint64x8addint8x16addint8x32addint8x64adduint16x16adduint16x32adduint16x8adduint32x16adduint32x4adduint32x8adduint64x2adduint64x4adduint64x8adduint8x16adduint8x32adduint8x64Prototypes:func AddInt8x16[T ~int8](number T)
func AddInt16x8[T ~int16](number T)
func AddInt32x4[T ~int32](number T)
func AddInt64x2[T ~int64](number T)
func AddUint8x16[T ~uint8](number T)
func AddUint16x8[T ~uint16](number T)
func AddUint32x4[T ~uint32](number T)
func AddUint64x2[T ~uint64](number T)
func AddFloat32x4[T ~float32](number T)
func AddFloat64x2[T ~float64](number T)
func AddInt8x32[T ~int8](number T)
func AddInt16x16[T ~int16](number T)
func AddInt32x8[T ~int32](number T)
func AddInt64x4[T ~int64](number T)
func AddUint8x32[T ~uint8](number T)
func AddUint16x16[T ~uint16](number T)
func AddUint32x8[T ~uint32](number T)
func AddUint64x4[T ~uint64](number T)
func AddFloat32x8[T ~float32](number T)
func AddFloat64x4[T ~float64](number T)
func AddInt8x64[T ~int8](number T)
func AddInt16x32[T ~int16](number T)
func AddInt32x16[T ~int32](number T)
func AddInt64x8[T ~int64](number T)
func AddUint8x64[T ~uint8](number T)
func AddUint16x32[T ~uint16](number T)
func AddUint32x16[T ~uint32](number T)
func AddUint64x8[T ~uint64](number T)
func AddFloat32x16[T ~float32](number T)
func AddFloat64x8[T ~float64](number T)Subโ
Subtracts a scalar number from all lanes in SIMD vectors using SIMD instructions for parallel computation.
import (
"fmt"
"github.com/samber/ro"
rosimd "github.com/samber/ro/plugins/exp/simd"
)
obs := ro.Pipe[float32, float32](
ro.Just(
float32(100), float32(200), float32(300), float32(400),
),
rosimd.ScalarToFloat32x4[float32](),
rosimd.SubFloat32x4[float32](50),
rosimd.Float32x4ToScalar[float32](),
)
sub := obs.Subscribe(ro.NewObserver[float32](
func(v float32) {
fmt.Printf("Next: %.1f\n", v)
},
ro.OnError(func(err error) {
fmt.Printf("Error: %v\n", err)
}),
ro.OnComplete(func() {
fmt.Println("Completed")
}),
))
defer sub.Unsubscribe()
// Next: 50.0
// Next: 150.0
// Next: 250.0
// Next: 350.0
// CompletedAVX variants (128-bit vectors)
Available on all x86_64 CPUs with AVX support (basically all modern x86_64 CPUs).
- SubFloat32x4
- SubFloat64x2
- SubInt8x16
- SubInt16x8
- SubInt32x4
- SubInt64x2
- SubUint8x16
- SubUint16x8
- SubUint32x4
- SubUint64x2
AVX2 variants (256-bit vectors)
Requires AVX2 CPU support (Intel Haswell [2013]+, AMD Ryzen [2017]+).
- SubFloat32x8
- SubFloat64x4
- SubInt8x32
- SubInt16x16
- SubInt32x8
- SubInt64x4
- SubUint8x32
- SubUint16x16
- SubUint32x8
- SubUint64x4
AVX-512 variants (512-bit vectors)
Requires AVX-512 CPU support (Intel Skylake-X/Xeon [2017]+, AMD Zen 4 [2022]+).
- SubFloat32x16
- SubFloat64x8
- SubInt8x64
- SubInt16x32
- SubInt32x16
- SubInt64x8
- SubUint8x64
- SubUint16x32
- SubUint32x16
- SubUint64x8
Variant:subfloat32x16subfloat32x4subfloat32x8subfloat64x2subfloat64x4subfloat64x8subint16x16subint16x32subint16x8subint32x16subint32x4subint32x8subint64x2subint64x4subint64x8subint8x16subint8x32subint8x64subuint16x16subuint16x32subuint16x8subuint32x16subuint32x4subuint32x8subuint64x2subuint64x4subuint64x8subuint8x16subuint8x32subuint8x64Prototypes:func SubInt8x16[T ~int8](number T)
func SubInt16x8[T ~int16](number T)
func SubInt32x4[T ~int32](number T)
func SubInt64x2[T ~int64](number T)
func SubUint8x16[T ~uint8](number T)
func SubUint16x8[T ~uint16](number T)
func SubUint32x4[T ~uint32](number T)
func SubUint64x2[T ~uint64](number T)
func SubFloat32x4[T ~float32](number T)
func SubFloat64x2[T ~float64](number T)
func SubInt8x32[T ~int8](number T)
func SubInt16x16[T ~int16](number T)
func SubInt32x8[T ~int32](number T)
func SubInt64x4[T ~int64](number T)
func SubUint8x32[T ~uint8](number T)
func SubUint16x16[T ~uint16](number T)
func SubUint32x8[T ~uint32](number T)
func SubUint64x4[T ~uint64](number T)
func SubFloat32x8[T ~float32](number T)
func SubFloat64x4[T ~float64](number T)
func SubInt8x64[T ~int8](number T)
func SubInt16x32[T ~int16](number T)
func SubInt32x16[T ~int32](number T)
func SubInt64x8[T ~int64](number T)
func SubUint8x64[T ~uint8](number T)
func SubUint16x32[T ~uint16](number T)
func SubUint32x16[T ~uint32](number T)
func SubUint64x8[T ~uint64](number T)
func SubFloat32x16[T ~float32](number T)
func SubFloat64x8[T ~float64](number T)Clampโ
Clamps all lanes in SIMD vectors to a specified range [minValue, maxValue] using SIMD instructions for parallel computation.
import (
"fmt"
"github.com/samber/ro"
rosimd "github.com/samber/ro/plugins/exp/simd"
)
obs := ro.Pipe[float32, float32](
ro.Just(
float32(-200), float32(-100), float32(0), float32(100),
),
rosimd.ScalarToFloat32x4[float32](),
rosimd.ClampFloat32x4[float32](-50, 50),
rosimd.Float32x4ToScalar[float32](),
)
sub := obs.Subscribe(ro.NewObserver[float32](
func(v float32) {
fmt.Printf("Next: %.1f\n", v)
},
ro.OnError(func(err error) {
fmt.Printf("Error: %v\n", err)
}),
ro.OnComplete(func() {
fmt.Println("Completed")
}),
))
defer sub.Unsubscribe()
// Next: -50.0
// Next: -50.0
// Next: 0.0
// Next: 50.0
// CompletedAVX variants (128-bit vectors)
Available on all x86_64 CPUs with AVX support (basically all modern x86_64 CPUs).
- ClampFloat32x4
- ClampFloat64x2
- ClampInt8x16
- ClampInt16x8
- ClampInt32x4
- ClampInt64x2
- ClampUint8x16
- ClampUint16x8
- ClampUint32x4
- ClampUint64x2
AVX2 variants (256-bit vectors)
Requires AVX2 CPU support (Intel Haswell [2013]+, AMD Ryzen [2017]+).
- ClampFloat32x8
- ClampFloat64x4
- ClampInt8x32
- ClampInt16x16
- ClampInt32x8
- ClampUint8x32
- ClampUint16x16
- ClampUint32x8
AVX-512 variants (512-bit vectors)
Requires AVX-512 CPU support (Intel Skylake-X/Xeon [2017]+, AMD Zen 4 [2022]+).
- ClampFloat32x16
- ClampFloat64x8
- ClampInt8x64
- ClampInt16x32
- ClampInt32x16
- ClampInt64x4 (256-bit; requires AVX-512 for int64 min/max)
- ClampInt64x8
- ClampUint8x64
- ClampUint16x32
- ClampUint32x16
- ClampUint64x4 (256-bit; requires AVX-512 for uint64 min/max)
- ClampUint64x8
Variant:clampfloat32x16clampfloat32x4clampfloat32x8clampfloat64x2clampfloat64x4clampfloat64x8clampint16x16clampint16x32clampint16x8clampint32x16clampint32x4clampint32x8clampint64x2clampint64x4clampint64x8clampint8x16clampint8x32clampint8x64clampuint16x16clampuint16x32clampuint16x8clampuint32x16clampuint32x4clampuint32x8clampuint64x2clampuint64x4clampuint64x8clampuint8x16clampuint8x32clampuint8x64Prototypes:func ClampInt8x16[T ~int8](minValue, maxValue T)
func ClampInt16x8[T ~int16](minValue, maxValue T)
func ClampInt32x4[T ~int32](minValue, maxValue T)
func ClampInt64x2[T ~int64](minValue, maxValue T)
func ClampUint8x16[T ~uint8](minValue, maxValue T)
func ClampUint16x8[T ~uint16](minValue, maxValue T)
func ClampUint32x4[T ~uint32](minValue, maxValue T)
func ClampUint64x2[T ~uint64](minValue, maxValue T)
func ClampFloat32x4[T ~float32](minValue, maxValue T)
func ClampFloat64x2[T ~float64](minValue, maxValue T)
func ClampInt8x32[T ~int8](minValue, maxValue T)
func ClampInt16x16[T ~int16](minValue, maxValue T)
func ClampInt32x8[T ~int32](minValue, maxValue T)
func ClampInt64x4[T ~int64](minValue, maxValue T)
func ClampUint8x32[T ~uint8](minValue, maxValue T)
func ClampUint16x16[T ~uint16](minValue, maxValue T)
func ClampUint32x8[T ~uint32](minValue, maxValue T)
func ClampUint64x4[T ~uint64](minValue, maxValue T)
func ClampFloat32x8[T ~float32](minValue, maxValue T)
func ClampFloat64x4[T ~float64](minValue, maxValue T)
func ClampInt8x64[T ~int8](minValue, maxValue T)
func ClampInt16x32[T ~int16](minValue, maxValue T)
func ClampInt32x16[T ~int32](minValue, maxValue T)
func ClampInt64x8[T ~int64](minValue, maxValue T)
func ClampUint8x64[T ~uint8](minValue, maxValue T)
func ClampUint16x32[T ~uint16](minValue, maxValue T)
func ClampUint32x16[T ~uint32](minValue, maxValue T)
func ClampUint64x8[T ~uint64](minValue, maxValue T)
func ClampFloat32x16[T ~float32](minValue, maxValue T)
func ClampFloat64x8[T ~float64](minValue, maxValue T)Minโ
Ensures all lanes in SIMD vectors are at least the specified minimum value using SIMD instructions for parallel computation. Values below the minimum are replaced with the minimum.
import (
"fmt"
"github.com/samber/ro"
rosimd "github.com/samber/ro/plugins/exp/simd"
)
obs := ro.Pipe[float32, float32](
ro.Just(
float32(-200), float32(-100), float32(0), float32(100),
),
rosimd.ScalarToFloat32x4[float32](),
rosimd.MinFloat32x4[float32](-50),
rosimd.Float32x4ToScalar[float32](),
)
sub := obs.Subscribe(ro.NewObserver[float32](
func(v float32) {
fmt.Printf("Next: %.1f\n", v)
},
ro.OnError(func(err error) {
fmt.Printf("Error: %v\n", err)
}),
ro.OnComplete(func() {
fmt.Println("Completed")
}),
))
defer sub.Unsubscribe()
// Next: -50.0
// Next: -50.0
// Next: 0.0
// Next: 100.0
// CompletedAVX variants (128-bit vectors)
Available on all x86_64 CPUs with AVX support (basically all modern x86_64 CPUs).
- MinFloat32x4
- MinFloat64x2
- MinInt8x16
- MinInt16x8
- MinInt32x4
- MinInt64x2
- MinUint8x16
- MinUint16x8
- MinUint32x4
- MinUint64x2
AVX2 variants (256-bit vectors)
Requires AVX2 CPU support (Intel Haswell [2013]+, AMD Ryzen [2017]+).
- MinFloat32x8
- MinFloat64x4
- MinInt8x32
- MinInt16x16
- MinInt32x8
- MinUint8x32
- MinUint16x16
- MinUint32x8
AVX-512 variants (512-bit vectors)
Requires AVX-512 CPU support (Intel Skylake-X/Xeon [2017]+, AMD Zen 4 [2022]+).
- MinFloat32x16
- MinFloat64x8
- MinInt8x64
- MinInt16x32
- MinInt32x16
- MinInt64x4 (256-bit; requires AVX-512 for int64 min)
- MinInt64x8
- MinUint8x64
- MinUint16x32
- MinUint32x16
- MinUint64x4 (256-bit; requires AVX-512 for uint64 min)
- MinUint64x8
Variant:minfloat32x16minfloat32x4minfloat32x8minfloat64x2minfloat64x4minfloat64x8minint16x16minint16x32minint16x8minint32x16minint32x4minint32x8minint64x2minint64x4minint64x8minint8x16minint8x32minint8x64minuint16x16minuint16x32minuint16x8minuint32x16minuint32x4minuint32x8minuint64x2minuint64x4minuint64x8minuint8x16minuint8x32minuint8x64Prototypes:func MinInt8x16[T ~int8](minValue T)
func MinInt16x8[T ~int16](minValue T)
func MinInt32x4[T ~int32](minValue T)
func MinInt64x2[T ~int64](minValue T)
func MinUint8x16[T ~uint8](minValue T)
func MinUint16x8[T ~uint16](minValue T)
func MinUint32x4[T ~uint32](minValue T)
func MinUint64x2[T ~uint64](minValue T)
func MinFloat32x4[T ~float32](minValue T)
func MinFloat64x2[T ~float64](minValue T)
func MinInt8x32[T ~int8](minValue T)
func MinInt16x16[T ~int16](minValue T)
func MinInt32x8[T ~int32](minValue T)
func MinInt64x4[T ~int64](minValue T)
func MinUint8x32[T ~uint8](minValue T)
func MinUint16x16[T ~uint16](minValue T)
func MinUint32x8[T ~uint32](minValue T)
func MinUint64x4[T ~uint64](minValue T)
func MinFloat32x8[T ~float32](minValue T)
func MinFloat64x4[T ~float64](minValue T)
func MinInt8x64[T ~int8](minValue T)
func MinInt16x32[T ~int16](minValue T)
func MinInt32x16[T ~int32](minValue T)
func MinInt64x8[T ~int64](minValue T)
func MinUint8x64[T ~uint8](minValue T)
func MinUint16x32[T ~uint16](minValue T)
func MinUint32x16[T ~uint32](minValue T)
func MinUint64x8[T ~uint64](minValue T)
func MinFloat32x16[T ~float32](minValue T)
func MinFloat64x8[T ~float64](minValue T)Maxโ
Ensures all lanes in SIMD vectors are at most the specified maximum value using SIMD instructions for parallel computation. Values above the maximum are replaced with the maximum.
import (
"fmt"
"github.com/samber/ro"
rosimd "github.com/samber/ro/plugins/exp/simd"
)
obs := ro.Pipe[float32, float32](
ro.Just(
float32(-200), float32(-100), float32(0), float32(100),
),
rosimd.ScalarToFloat32x4[float32](),
rosimd.MaxFloat32x4[float32](50),
rosimd.Float32x4ToScalar[float32](),
)
sub := obs.Subscribe(ro.NewObserver[float32](
func(v float32) {
fmt.Printf("Next: %.1f\n", v)
},
ro.OnError(func(err error) {
fmt.Printf("Error: %v\n", err)
}),
ro.OnComplete(func() {
fmt.Println("Completed")
}),
))
defer sub.Unsubscribe()
// Next: -200.0
// Next: -100.0
// Next: 0.0
// Next: 50.0
// CompletedAVX variants (128-bit vectors)
Available on all x86_64 CPUs with AVX support (basically all modern x86_64 CPUs).
- MaxFloat32x4
- MaxFloat64x2
- MaxInt8x16
- MaxInt16x8
- MaxInt32x4
- MaxInt64x2
- MaxUint8x16
- MaxUint16x8
- MaxUint32x4
- MaxUint64x2
AVX2 variants (256-bit vectors)
Requires AVX2 CPU support (Intel Haswell [2013]+, AMD Ryzen [2017]+).
- MaxFloat32x8
- MaxFloat64x4
- MaxInt8x32
- MaxInt16x16
- MaxInt32x8
- MaxUint8x32
- MaxUint16x16
- MaxUint32x8
AVX-512 variants (512-bit vectors)
Requires AVX-512 CPU support (Intel Skylake-X/Xeon [2017]+, AMD Zen 4 [2022]+).
- MaxFloat32x16
- MaxFloat64x8
- MaxInt8x64
- MaxInt16x32
- MaxInt32x16
- MaxInt64x4 (256-bit; requires AVX-512 for int64 max)
- MaxInt64x8
- MaxUint8x64
- MaxUint16x32
- MaxUint32x16
- MaxUint64x4 (256-bit; requires AVX-512 for uint64 max)
- MaxUint64x8
Variant:maxfloat32x16maxfloat32x4maxfloat32x8maxfloat64x2maxfloat64x4maxfloat64x8maxint16x16maxint16x32maxint16x8maxint32x16maxint32x4maxint32x8maxint64x2maxint64x4maxint64x8maxint8x16maxint8x32maxint8x64maxuint16x16maxuint16x32maxuint16x8maxuint32x16maxuint32x4maxuint32x8maxuint64x2maxuint64x4maxuint64x8maxuint8x16maxuint8x32maxuint8x64Prototypes:func MaxInt8x16[T ~int8](maxValue T)
func MaxInt16x8[T ~int16](maxValue T)
func MaxInt32x4[T ~int32](maxValue T)
func MaxInt64x2[T ~int64](maxValue T)
func MaxUint8x16[T ~uint8](maxValue T)
func MaxUint16x8[T ~uint16](maxValue T)
func MaxUint32x4[T ~uint32](maxValue T)
func MaxUint64x2[T ~uint64](maxValue T)
func MaxFloat32x4[T ~float32](maxValue T)
func MaxFloat64x2[T ~float64](maxValue T)
func MaxInt8x32[T ~int8](maxValue T)
func MaxInt16x16[T ~int16](maxValue T)
func MaxInt32x8[T ~int32](maxValue T)
func MaxInt64x4[T ~int64](maxValue T)
func MaxUint8x32[T ~uint8](maxValue T)
func MaxUint16x16[T ~uint16](maxValue T)
func MaxUint32x8[T ~uint32](maxValue T)
func MaxUint64x4[T ~uint64](maxValue T)
func MaxFloat32x8[T ~float32](maxValue T)
func MaxFloat64x4[T ~float64](maxValue T)
func MaxInt8x64[T ~int8](maxValue T)
func MaxInt16x32[T ~int16](maxValue T)
func MaxInt32x16[T ~int32](maxValue T)
func MaxInt64x8[T ~int64](maxValue T)
func MaxUint8x64[T ~uint8](maxValue T)
func MaxUint16x32[T ~uint16](maxValue T)
func MaxUint32x16[T ~uint32](maxValue T)
func MaxUint64x8[T ~uint64](maxValue T)
func MaxFloat32x16[T ~float32](maxValue T)
func MaxFloat64x8[T ~float64](maxValue T)ReduceSumโ
Accumulates the sum of all lanes across SIMD vectors and emits a single scalar value when the source completes.
import (
"fmt"
"github.com/samber/ro"
rosimd "github.com/samber/ro/plugins/exp/simd"
)
obs := ro.Pipe[float32, float32](
ro.Just(
float32(10), float32(20), float32(30), float32(40),
float32(20), float32(40), float32(60), float32(80),
),
rosimd.ScalarToFloat32x4[float32](),
rosimd.ReduceSumFloat32x4[float32](),
)
sub := obs.Subscribe(ro.NewObserver[float32](
func(sum float32) {
fmt.Printf("Next: %.1f\n", sum)
},
ro.OnError(func(err error) {
fmt.Printf("Error: %v\n", err)
}),
ro.OnComplete(func() {
fmt.Println("Completed")
}),
))
defer sub.Unsubscribe()
// Next: 300.0
// CompletedAVX variants (128-bit vectors)
Available on all x86_64 CPUs with AVX support (basically all modern x86_64 CPUs).
- ReduceSumFloat32x4
- ReduceSumFloat64x2
- ReduceSumInt8x16
- ReduceSumInt16x8
- ReduceSumInt32x4
- ReduceSumInt64x2
- ReduceSumUint8x16
- ReduceSumUint16x8
- ReduceSumUint32x4
- ReduceSumUint64x2
AVX2 variants (256-bit vectors)
Requires AVX2 CPU support (Intel Haswell [2013]+, AMD Ryzen [2017]+).
- ReduceSumFloat32x8
- ReduceSumFloat64x4
- ReduceSumInt8x32
- ReduceSumInt16x16
- ReduceSumInt32x8
- ReduceSumInt64x4
- ReduceSumUint8x32
- ReduceSumUint16x16
- ReduceSumUint32x8
- ReduceSumUint64x4
AVX-512 variants (512-bit vectors)
Requires AVX-512 CPU support (Intel Skylake-X/Xeon [2017]+, AMD Zen 4 [2022]+).
- ReduceSumFloat32x16
- ReduceSumFloat64x8
- ReduceSumInt8x64
- ReduceSumInt16x32
- ReduceSumInt32x16
- ReduceSumInt64x8
- ReduceSumUint8x64
- ReduceSumUint16x32
- ReduceSumUint32x16
- ReduceSumUint64x8
Variant:reducesumfloat32x16reducesumfloat32x4reducesumfloat32x8reducesumfloat64x2reducesumfloat64x4reducesumfloat64x8reducesumint16x16reducesumint16x32reducesumint16x8reducesumint32x16reducesumint32x4reducesumint32x8reducesumint64x2reducesumint64x4reducesumint64x8reducesumint8x16reducesumint8x32reducesumint8x64reducesumuint16x16reducesumuint16x32reducesumuint16x8reducesumuint32x16reducesumuint32x4reducesumuint32x8reducesumuint64x2reducesumuint64x4reducesumuint64x8reducesumuint8x16reducesumuint8x32reducesumuint8x64Prototypes:func ReduceSumInt8x16[T ~int8]()
func ReduceSumInt16x8[T ~int16]()
func ReduceSumInt32x4[T ~int32]()
func ReduceSumInt64x2[T ~int64]()
func ReduceSumUint8x16[T ~uint8]()
func ReduceSumUint16x8[T ~uint16]()
func ReduceSumUint32x4[T ~uint32]()
func ReduceSumUint64x2[T ~uint64]()
func ReduceSumFloat32x4[T ~float32]()
func ReduceSumFloat64x2[T ~float64]()
func ReduceSumInt8x32[T ~int8]()
func ReduceSumInt16x16[T ~int16]()
func ReduceSumInt32x8[T ~int32]()
func ReduceSumInt64x4[T ~int64]()
func ReduceSumUint8x32[T ~uint8]()
func ReduceSumUint16x16[T ~uint16]()
func ReduceSumUint32x8[T ~uint32]()
func ReduceSumUint64x4[T ~uint64]()
func ReduceSumFloat32x8[T ~float32]()
func ReduceSumFloat64x4[T ~float64]()
func ReduceSumInt8x64[T ~int8]()
func ReduceSumInt16x32[T ~int16]()
func ReduceSumInt32x16[T ~int32]()
func ReduceSumInt64x8[T ~int64]()
func ReduceSumUint8x64[T ~uint8]()
func ReduceSumUint16x32[T ~uint16]()
func ReduceSumUint32x16[T ~uint32]()
func ReduceSumUint64x8[T ~uint64]()
func ReduceSumFloat32x16[T ~float32]()
func ReduceSumFloat64x8[T ~float64]()ReduceMinโ
Finds the minimum value across all lanes of SIMD vectors and emits a single scalar value when the source completes.
import (
"fmt"
"github.com/samber/ro"
rosimd "github.com/samber/ro/plugins/exp/simd"
)
obs := ro.Pipe[float32, float32](
ro.Just(
float32(10), float32(20), float32(30), float32(40),
float32(5), float32(10), float32(15), float32(20),
),
rosimd.ScalarToFloat32x4[float32](),
rosimd.ReduceMinFloat32x4[float32](),
)
sub := obs.Subscribe(ro.NewObserver[float32](
func(min float32) {
fmt.Printf("Next: %.1f\n", min)
},
ro.OnError(func(err error) {
fmt.Printf("Error: %v\n", err)
}),
ro.OnComplete(func() {
fmt.Println("Completed")
}),
))
defer sub.Unsubscribe()
// Next: 5.0
// CompletedAVX variants (128-bit vectors)
Available on all x86_64 CPUs with AVX support (basically all modern x86_64 CPUs).
- ReduceMinFloat32x4
- ReduceMinFloat64x2
- ReduceMinInt8x16
- ReduceMinInt16x8
- ReduceMinInt32x4
- ReduceMinInt64x2
- ReduceMinUint8x16
- ReduceMinUint16x8
- ReduceMinUint32x4
- ReduceMinUint64x2
AVX2 variants (256-bit vectors)
Requires AVX2 CPU support (Intel Haswell [2013]+, AMD Ryzen [2017]+).
- ReduceMinFloat32x8
- ReduceMinFloat64x4
- ReduceMinInt8x32
- ReduceMinInt16x16
- ReduceMinInt32x8
- ReduceMinUint8x32
- ReduceMinUint16x16
- ReduceMinUint32x8
AVX-512 variants (512-bit vectors)
Requires AVX-512 CPU support (Intel Skylake-X/Xeon [2017]+, AMD Zen 4 [2022]+).
- ReduceMinFloat32x16
- ReduceMinFloat64x8
- ReduceMinInt8x64
- ReduceMinInt16x32
- ReduceMinInt32x16
- ReduceMinInt64x4 (256-bit; requires AVX-512 for int64 min)
- ReduceMinInt64x8
- ReduceMinUint8x64
- ReduceMinUint16x32
- ReduceMinUint32x16
- ReduceMinUint64x4 (256-bit; requires AVX-512 for uint64 min)
- ReduceMinUint64x8
Variant:reduceminfloat32x16reduceminfloat32x4reduceminfloat32x8reduceminfloat64x2reduceminfloat64x4reduceminfloat64x8reduceminint16x16reduceminint16x32reduceminint16x8reduceminint32x16reduceminint32x4reduceminint32x8reduceminint64x2reduceminint64x4reduceminint64x8reduceminint8x16reduceminint8x32reduceminint8x64reduceminuint16x16reduceminuint16x32reduceminuint16x8reduceminuint32x16reduceminuint32x4reduceminuint32x8reduceminuint64x2reduceminuint64x4reduceminuint64x8reduceminuint8x16reduceminuint8x32reduceminuint8x64Prototypes:func ReduceMinInt8x16[T ~int8]()
func ReduceMinInt16x8[T ~int16]()
func ReduceMinInt32x4[T ~int32]()
func ReduceMinInt64x2[T ~int64]()
func ReduceMinUint8x16[T ~uint8]()
func ReduceMinUint16x8[T ~uint16]()
func ReduceMinUint32x4[T ~uint32]()
func ReduceMinUint64x2[T ~uint64]()
func ReduceMinFloat32x4[T ~float32]()
func ReduceMinFloat64x2[T ~float64]()
func ReduceMinInt8x32[T ~int8]()
func ReduceMinInt16x16[T ~int16]()
func ReduceMinInt32x8[T ~int32]()
func ReduceMinInt64x4[T ~int64]()
func ReduceMinUint8x32[T ~uint8]()
func ReduceMinUint16x16[T ~uint16]()
func ReduceMinUint32x8[T ~uint32]()
func ReduceMinUint64x4[T ~uint64]()
func ReduceMinFloat32x8[T ~float32]()
func ReduceMinFloat64x4[T ~float64]()
func ReduceMinInt8x64[T ~int8]()
func ReduceMinInt16x32[T ~int16]()
func ReduceMinInt32x16[T ~int32]()
func ReduceMinInt64x8[T ~int64]()
func ReduceMinUint8x64[T ~uint8]()
func ReduceMinUint16x32[T ~uint16]()
func ReduceMinUint32x16[T ~uint32]()
func ReduceMinUint64x8[T ~uint64]()
func ReduceMinFloat32x16[T ~float32]()
func ReduceMinFloat64x8[T ~float64]()ReduceMaxโ
Finds the maximum value across all lanes of SIMD vectors and emits a single scalar value when the source completes.
import (
"fmt"
"github.com/samber/ro"
rosimd "github.com/samber/ro/plugins/exp/simd"
)
obs := ro.Pipe[float32, float32](
ro.Just(
float32(10), float32(20), float32(30), float32(40),
float32(5), float32(10), float32(15), float32(20),
),
rosimd.ScalarToFloat32x4[float32](),
rosimd.ReduceMaxFloat32x4[float32](),
)
sub := obs.Subscribe(ro.NewObserver[float32](
func(max float32) {
fmt.Printf("Next: %.1f\n", max)
},
ro.OnError(func(err error) {
fmt.Printf("Error: %v\n", err)
}),
ro.OnComplete(func() {
fmt.Println("Completed")
}),
))
defer sub.Unsubscribe()
// Next: 40.0
// CompletedAVX variants (128-bit vectors)
Available on all x86_64 CPUs with AVX support (basically all modern x86_64 CPUs).
- ReduceMaxFloat32x4
- ReduceMaxFloat64x2
- ReduceMaxInt8x16
- ReduceMaxInt16x8
- ReduceMaxInt32x4
- ReduceMaxInt64x2
- ReduceMaxUint8x16
- ReduceMaxUint16x8
- ReduceMaxUint32x4
- ReduceMaxUint64x2
AVX2 variants (256-bit vectors)
Requires AVX2 CPU support (Intel Haswell [2013]+, AMD Ryzen [2017]+).
- ReduceMaxFloat32x8
- ReduceMaxFloat64x4
- ReduceMaxInt8x32
- ReduceMaxInt16x16
- ReduceMaxInt32x8
- ReduceMaxUint8x32
- ReduceMaxUint16x16
- ReduceMaxUint32x8
AVX-512 variants (512-bit vectors)
Requires AVX-512 CPU support (Intel Skylake-X/Xeon [2017]+, AMD Zen 4 [2022]+).
- ReduceMaxFloat32x16
- ReduceMaxFloat64x8
- ReduceMaxInt8x64
- ReduceMaxInt16x32
- ReduceMaxInt32x16
- ReduceMaxInt64x4 (256-bit; requires AVX-512 for int64 max)
- ReduceMaxInt64x8
- ReduceMaxUint8x64
- ReduceMaxUint16x32
- ReduceMaxUint32x16
- ReduceMaxUint64x4 (256-bit; requires AVX-512 for uint64 max)
- ReduceMaxUint64x8
Variant:reducemaxfloat32x16reducemaxfloat32x4reducemaxfloat32x8reducemaxfloat64x2reducemaxfloat64x4reducemaxfloat64x8reducemaxint16x16reducemaxint16x32reducemaxint16x8reducemaxint32x16reducemaxint32x4reducemaxint32x8reducemaxint64x2reducemaxint64x4reducemaxint64x8reducemaxint8x16reducemaxint8x32reducemaxint8x64reducemaxuint16x16reducemaxuint16x32reducemaxuint16x8reducemaxuint32x16reducemaxuint32x4reducemaxuint32x8reducemaxuint64x2reducemaxuint64x4reducemaxuint64x8reducemaxuint8x16reducemaxuint8x32reducemaxuint8x64Prototypes:func ReduceMaxInt8x16[T ~int8]()
func ReduceMaxInt16x8[T ~int16]()
func ReduceMaxInt32x4[T ~int32]()
func ReduceMaxInt64x2[T ~int64]()
func ReduceMaxUint8x16[T ~uint8]()
func ReduceMaxUint16x8[T ~uint16]()
func ReduceMaxUint32x4[T ~uint32]()
func ReduceMaxUint64x2[T ~uint64]()
func ReduceMaxFloat32x4[T ~float32]()
func ReduceMaxFloat64x2[T ~float64]()
func ReduceMaxInt8x32[T ~int8]()
func ReduceMaxInt16x16[T ~int16]()
func ReduceMaxInt32x8[T ~int32]()
func ReduceMaxInt64x4[T ~int64]()
func ReduceMaxUint8x32[T ~uint8]()
func ReduceMaxUint16x16[T ~uint16]()
func ReduceMaxUint32x8[T ~uint32]()
func ReduceMaxUint64x4[T ~uint64]()
func ReduceMaxFloat32x8[T ~float32]()
func ReduceMaxFloat64x4[T ~float64]()
func ReduceMaxInt8x64[T ~int8]()
func ReduceMaxInt16x32[T ~int16]()
func ReduceMaxInt32x16[T ~int32]()
func ReduceMaxInt64x8[T ~int64]()
func ReduceMaxUint8x64[T ~uint8]()
func ReduceMaxUint16x32[T ~uint16]()
func ReduceMaxUint32x16[T ~uint32]()
func ReduceMaxUint64x8[T ~uint64]()
func ReduceMaxFloat32x16[T ~float32]()
func ReduceMaxFloat64x8[T ~float64]()