I am running a database for log analysis. At the moment I use a MySQL database and the table for my analysis looks like this:
- UUID
- REQUEST_ID
- REQUEST_TIMESTAMP
- RESPONSE_TIMESTAMP
- RUNTIME
- SERVER_NAME
- EXSIGHT_NAME
I use this table to create views for each entry, for 5 minute aggregation and for a daily aggregation. I am inserting about 400.000 entries a day. At the moment there are about 70 Million rows in this table.
My actual problem is, that my queries are getting slow, my insert/update queries as well as my aggregation queries.
So I created a second table for my daily aggregation. Once a day a job will run, to make an aggregation for the last day. A second job will delete all entries which are older than 30 days from the original table.
My question: Is this the right approach or would be a different table structure or even a another database (e.g. NoSQL, Graphdatabase, etc.) better?
[EDIT]
Select for daily aggregation:
select
date_format(REQUEST_TIMESTAMP,'%Y-%m-%d 00:00:00') as INTERVAL_START,
null as INTERVAL_END,
count(REQUEST_ID) as Anzahl,
FORMAT((sum(RUNTIME)/count(REQUEST_ID))/1000,0) as dStime
from REQUEST_LOGS
where EXSIGHT_NAME like (case when '<EXSIGHT>' = 'alle' then '%' else '<EXSIGHT>' end)
and SERVER_NAME like '<SERVER>'
and (REQUEST_TIMESTAMP between '<FILTERFROMDATE>' and '<FILTERTODATE>')
group by INTERVAL_START
order by INTERVAL_START desc
Indexes:
SHOW CREATE TABLE
; something is confusing in the Pos column of the index list.