The paper is currently under review.
- Code is based on implementation of Open-Sora
Code implementation of [MobiSys 2025] On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices. Base code referred -> Open-Sora : Democratizing Efficient Video Production for All
This repository provides code for On-device Sora, which is an open-sourced implementation of paper named On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices.
On-device Sora applies Linear Proportional Leap (LPL), Temporal Dimension Token Merging (TDTM), and Concurrent Inference with Dynamic Loading (CI-DL) to enable efficient video generation on the iPhone 15 Pro.
Open-Sora is a baseline model of On-Device Sora, an open-source project for video generation, and a T2V Diffusion model that can produce videos based on text input.
cd Device_conversion
conda create -n convert python=3.10
conda activate convert
pip install -r requirements/requirements-convert.txt
pip install -v .
cd t5
python3 export-t5.py
cd stdit3
python3 export-stdit3.py
When you run export-vae-spatial.py
, There are some error that is Fatal Python error: PyEval_SaveThread
.
To address this error, you should only run one code block for each VAE part. Comment out the rest.
cd vae
# for vae's temporal part
python3 export-vae-temporal.py
# for vae's spatial part
python3 export-vae-spatial.py
- Mac device for xcode
- Apple Account to build and launch the app
- iPhonne: over iPhone 15 pro
- iOS version: over 18
- All MLPackage (T5, STDiT, VAE)
You can download and use the converted models from the following link. [Download]
- Implement xcode project by clicking On-device/On-device-Sora.xcodeproj
- Change the Team (None -> Your Apple account) in TARGETS/Signing&Capabilities
- Launch the app