Which database type for log analysis?

Question

I am running a database for log analysis. At the moment I use a MySQL database and the table for my analysis looks like this:

UUID
REQUEST_ID
REQUEST_TIMESTAMP
RESPONSE_TIMESTAMP
RUNTIME
SERVER_NAME
EXSIGHT_NAME

I use this table to create views for each entry, for 5 minute aggregation and for a daily aggregation. I am inserting about 400.000 entries a day. At the moment there are about 70 Million rows in this table.

My actual problem is, that my queries are getting slow, my insert/update queries as well as my aggregation queries.

So I created a second table for my daily aggregation. Once a day a job will run, to make an aggregation for the last day. A second job will delete all entries which are older than 30 days from the original table.

My question: Is this the right approach or would be a different table structure or even a another database (e.g. NoSQL, Graphdatabase, etc.) better?

[EDIT]

Select for daily aggregation:

select 
        date_format(REQUEST_TIMESTAMP,'%Y-%m-%d 00:00:00') as INTERVAL_START,
        null as INTERVAL_END,
        count(REQUEST_ID) as Anzahl,
        FORMAT((sum(RUNTIME)/count(REQUEST_ID))/1000,0) as dStime 
    from REQUEST_LOGS 
        where EXSIGHT_NAME like (case when '<EXSIGHT>' = 'alle' then '%' else '<EXSIGHT>' end) 
        and SERVER_NAME like '<SERVER>' 
        and (REQUEST_TIMESTAMP between '<FILTERFROMDATE>' and '<FILTERTODATE>')
    group by INTERVAL_START
    order by INTERVAL_START desc

Indexes:

My first guess is that a few properly placed indexes will help you out better than changing engines. Could you post any existing indexes as well as what your aggregate query looks like and what it groups on? — Jonathan Fite, Commented Feb 14, 2017 at 14:06
Please provide SHOW CREATE TABLE; something is confusing in the Pos column of the index list. — Rick James, Commented Feb 15, 2017 at 17:21
I'm confused. You say "daily aggregation" as if you do the same query each day. Yet there seem to be parameters passed to the query. I want move you toward a constant set of aggregation queries each day; put the results in Summary table(s); then get the real aggregation from the summary. It will be a little more work, but so fast that you can build a web page to let users get whatever they want (within limits). — Rick James, Commented Feb 15, 2017 at 17:29

Rick James · Accepted Answer · 2017-02-15 00:41:12Z

1

You need several of my blogs:

Why UUIDs can be very slow - Do you need a UUID?
How to efficiently purge "older than X days" using PARTITIONing;
Incrementally augmenting a Summary Table, though you may have this 'right'.

How big is the table? Which Engine are you using? What is the value of innodb_buffer_pool_size? It should be about 70% of RAM.

answered Feb 15, 2017 at 0:41

Rick James

79.4k5 gold badges51 silver badges117 bronze badges

At the moment I don't need the UUID. It is created in my Java Class an not in the database. Additionally it is not included in the an index. I thought that my UUID will be processed like a common varchar or am I wrong?
– LStrike
Commented Feb 15, 2017 at 15:03
1

When a table gets do big that many actions are I/O-bound, performance suffers. This applies regardless of the "database type" in use. Getting rid of UUID would be a small step in that direction.
– Rick James
Commented Feb 15, 2017 at 17:39

Add a comment |

Stack Exchange Network

Which database type for log analysis?

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
mysql
index
nosql
logs
graph
or ask your own question.

Hot Network Questions

Which database type for log analysis?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged mysqlindexnosqllogsgraph or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
mysql
index
nosql
logs
graph
or ask your own question.