High-efficiency floating-point neural network inference operators for mobile, server, and Web
-
Updated
Dec 5, 2024 - C
High-efficiency floating-point neural network inference operators for mobile, server, and Web
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
The Tensor Algebra SuperOptimizer for Deep Learning
Batch normalization fusion for PyTorch
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
Optimize layers structure of Keras model to reduce computation time
A set of tool which would make your life easier with Tensorrt and Onnxruntime. This Repo is designed for YoloV3
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
The blog, read report and code example for AGI/LLM related knowledge.
cross-platform modular neural network inference library, small and efficient
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
Faster inference YOLOv8: Optimize and export YOLOv8 models for faster inference using OpenVINO and Numpy 🔢
[WIP] A template for getting started writing code using GGML
Modified inference engine for quantized convolution using product quantization
A constrained expectation-maximization algorithm for feasible graph inference.
Your AI Catalyst: one & only inference backend to maximize your model's inference performance
LLM-Rank: A graph theoretical approach to structured pruning of large language models based on weighted Page Rank centrality as introduced by the related paper.
Batch Partitioning for Multi-PE Inference with TVM (2020)
🤖️ Optimized CUDA Kernels for Fast MobileNetV2 Inference
Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.
Add a description, image, and links to the inference-optimization topic page so that developers can more easily learn about it.
To associate your repository with the inference-optimization topic, visit your repo's landing page and select "manage topics."