Building few ideas on the intersection of Data Science x AI x Blockchain in public.
This repository is for the purpose of building on ideas, and practice knowledge related to the intersection of Data Science x AI x Blockchain.
- BlockChain Intel
- BlockChain data
- Fraud detection
- On-Chain Analysis
- Liquidity prediction
- Ransom Tracking
BlockChain intelligence Data Platform
Base architecture design is Lambda Architecture
Later on, if required, we will migrate to Kappa architecture.
- JSON-RPC API
- Web3.js/web3.py API
- Etherscan API
- Blocknative API
- Infura API
- Blockcahin Nodes
Will write on this later.
Feel free to contribute.
Use Pydantic to perform data validation and ensure data quality of the data model.
Will go for airflow for now.
- Airflow
- Perfect
The choice between Airflow and a perfect orchestration tool ultimately depends on the specific needs of your organization. Airflow can be a great choice for smaller organizations or teams that want a flexible and extensible platform for managing their workflows, while perfect orchestration tools are better suited for larger enterprises with complex, distributed workflows that require more advanced features and support.
To handle event driven streaming data
Why ELK?
Its a search database and efficient to do lookup the data.
How to optimize the elasticsearch (Index Management)?
- Create seperate indexes for each blockchain
- Apply a weekly or daily index rollup policy . This will optimize the search
Read More on enginnering blogs
Use ULIDs rather than UUIDs. This small change will reduce the read and write costs to the DB by 50%
Considering the internals of BigQuery. It's good for analytical purposes.
- BigQuery uses columnar database
colossus
which stores the data in B-Tree dataStructure - It automatically manages re-clustering and partitioning