Questions tagged [data-warehouse]
A database system optimised for reporting, particularly in aggregate. Often, but not always implemented using a star schema.
432 questions
0
votes
1
answer
23
views
Must fact tables be designed for aggregation?
I have been reading up on Microsoft SQL Server's columnstore indexes recently and I have become concerned that my experience does not match that of the industry. Columnstore indexes are designed to ...
0
votes
0
answers
20
views
Figuring out dimensional modeling requirements for OLAP in a database containing many pre-aggregated fact tables
I have the situation where we have an application that uses a PostgreSQL database for OLAP queries for reporting (less DWH/exploratory). I am trying to find an optimal database design for this ...
0
votes
1
answer
38
views
Transactional replication Snapshot agent not starting
I am trying to consider all the options before setting up transactional replication on my 2008R2 server to 2012 distribution server to 2012 Subscriber server. All the three servers are on FULL ...
0
votes
2
answers
52
views
Can you have a minimal transactions data warehouse?
Good Morning,
I have a client doing scientific research. The current database (Azure postgres) design is 3NF but very few transactions are happening. There are several "pipelines" that add ...
0
votes
1
answer
34
views
ideas to reduce association tables while building a data warehouse
I'm trying to create a data warehouse for my BI project. Initially I have a raw database that contains 100 data tables, most of them are association tables. My goal is to create normalized dimensional ...
0
votes
2
answers
22
views
One attributes to several statues with the date of each statue in my fact table data warehouse
I hope you are well. I have this problem and I am a beginner in data warehousing. I hope to find a solution to it.
I have an OLTP database from the GLPI system. We are only interested in ticket ...
1
vote
1
answer
76
views
What does psql do on connect (that could be taking 1 to 10 minutes)?
In some cases psql is taking ~1 to 10 minutes to connect to PostgreSQL, with no queries that it seems to be running while doing so (according to pg_activity that I think queries pg_stat_activity)
I'm ...
0
votes
0
answers
77
views
Best practice for applying complex business logic during transformation in ETL from Aurora db to Redshift data warehouse
I'll be pushing data from an Amazon Aurora db to a Redshift data warehouse. The painful part is that the transformation portion of the ETL workflow relies heavily on business logic, that lives within ...
2
votes
2
answers
55
views
SQL Server DWH architecture question
We have SQL Server DWH server. It has dwh_raw, dwh_prod and so on databases with client data, business operations.
Now there is a task to allocate new data - on employees, their salaries and so on.
Is ...
-1
votes
1
answer
39
views
Issue with modeling Facts table - don't know what do with the fact that one of the dimensions may happen many times
I am doing an model for warehouse whose topic are car crashes.
I've a problem I don't know how to solve, that is - my fact is a Crash. I don't know how to think in terms of drivers. Issue is that ...
0
votes
2
answers
122
views
Growing SQL Server Data Warehouse - How to reorganize efficiently?
we've been using SQL Server for quite a while now in our company to host all application data. We use a typical set-up with data being loaded from various data sources into a Staging Area, which ...
0
votes
1
answer
156
views
How to compress data in SQL Server? The data warehouse raw data is times bigger than the MySQL OLTP database for the same tables
For reporting purposes, we have a DWH (data warehouse) doing ETLs (extract-transform-load) to retrieve data from selected tables in a production OLTP (on-line transaction processing) database.
The ETL ...
0
votes
1
answer
817
views
Detecting deleted rows in mysql
I have a Spring Boot webapp with a MySQL database. I'm trying to replicate this data into a MSSQL for running reports against.
All rows are audited with created_at and modified_at datetimes so i can ...
1
vote
2
answers
127
views
Best way to store millions of measurements with hundreds of columns
I work with infrared spectrometers (several hundred of them) to analyze chemical compounds, and I am looking for an efficient solution to store the measured data so that I can process it later on.
...
1
vote
1
answer
808
views
Optimal join for joining facts with scd-type-2 dimension for aggregation/reporting
I have a fact table and an scd-type-2 dimension table. I want to produce sales report by region and year.
I have working solution with a query that joins them for reporting purposes. When I run the ...
0
votes
1
answer
89
views
Postgres JSONs transformation
I would very appreciate your help here, an important question about architecture modification.
We use the ETL process to fetch data from external services (for example, Github).
Extract the data (e....
2
votes
1
answer
4k
views
Is it acceptable to store global date times in UTC without offset for timezone?
I'm looking at a Data Warehouse which collections data from systems across the world. The Data Warehouse uses UTC for time but only stores the UTC offset and not the timezone. I have noticed that ...
0
votes
0
answers
236
views
Optimising a recursive SQL query that processes several million records in BQ
I need help optimizing a recursive SQL query in BQ.
I have hierarchical data stored in a table as parent-child relationships, i.e
will be stored as
parent_item_id
child_item_id
1
1
1
2
1
3
3
4
...
2
votes
1
answer
199
views
Securing Functional/Service ID for MSSQL with Windows Authentication
I'm currently assessing a setup of Data warehouse implementation in our environment, based on the requirements:
A functional/service ID to be setup to access multiple DB servers of other applications ...
1
vote
0
answers
438
views
"Snapshot" of Historical Data for Analysis Purposes
I pull monthly aggregate data from a SQL database. I store the aggregated data in a separate table in the same database (so I can quickly grab numbers from six months ago, for instance).
My issue is ...
0
votes
0
answers
750
views
ETL staging area incremental load design considerations
We are currently working on a design for our new data warehouse. In our current data warehouse we have a 'persistent staging area' or PSA. We load new records incrementally into a staging table and ...
1
vote
1
answer
45
views
Detect table updates on OracleDB
Say we have a table in oracledb called fruits:
id fruit
1 banana
2 mango
3 melon
Imagine this table is being overwitten completely, say every Friday for example, the column id is the primary key.
...
0
votes
0
answers
32
views
How does one design a database to be able to view state of data at any given place in time?
Are there accepted best practices/database features to solve the general problem of being able to view/query the state of a system at any given point in time? To describe the problem, using the ...
0
votes
0
answers
82
views
Generate unique ids in BigQuery data import
I am using Zapier to import woocommerce orders into BigQuery and build a datawarehouse with paid orders from multiple sites. However, multiple sites generate an ID per site, not a unique id globally ...
1
vote
0
answers
180
views
Data warehouse: solving many-to-many relationship with duplicates
I'm learning dimensional modeling and have a question.
I understand those basic examples such as a FactSale has DimItem, DimLocation.
But when it comes to complicated cases with many-to-many ...
0
votes
1
answer
75
views
SQL All Procs have WITH RECOMPLILE so how to maintain a bloated plan cache
I'm working a star schema data warehouse created in Azure SQL database where the last developer included WITH RECOMPILE on all the Procs.
This I believe was because the ETL only executes these Procs ...
2
votes
2
answers
495
views
Optimal Indexing Strategy for Datawarehouse and Data Lake Updates
We have a sql server database we use as a data lake and a datawarehouse. Each table in the database has some standardized definition as we are at 600 or so tables now, so maintenance needs to be ...
0
votes
1
answer
681
views
Boolean flag in fact table
Imagine we have received the results of a health survey on daily consumption habits of 3 different items, like the following:
Id
Date
Age
Country
CigarettesPerDay
CoffeesPerDay
BeersPerDay
1
2021-12-...
2
votes
1
answer
783
views
Data lake or data warehouse first?
I have been confused whether to create a data lake or a data warehouse and hope some experienced real-world professional can give me some enlightenment.
I will like to store, visualise and perform ...
0
votes
1
answer
616
views
Relations between fact tables in a data warehouse
I'm designing a data warehouse and I keep reading that there should be no relations between fact tables.
I'm sure everyone here is well aware that businesses are full of facts which are inherently ...
0
votes
1
answer
259
views
Fact table linking to natual key rather than row key in a slowly changing dimension table?
I’m designing a data warehouse and have a Slowly Changing Dimension Type 2 table something like this:
UserKey
UserID
Group
EffectiveDate
ExpiryDate
IsCurrent
1
1001
GR1
2021-01-01
9999-12-31
Y
2
...
3
votes
3
answers
7k
views
OBT (One Big Table) vs Star Schema for Data Warehousing
I am trying to prepare myself for some interviews at a FAANG, while coming from a much smaller company. Part of the interview process involves data modeling and ETL design. Kimball's "The Data ...
1
vote
1
answer
1k
views
Table design to store Refund Transaction information
I have a transaction table that contains purchases and refunds. I'd like to create a linkage between purchase and refunds in the same table (I decided not to use a separate table). I'm not sure which ...
0
votes
1
answer
1k
views
How to handle incoming time-series data in a data warehouse?
I am new to data warehousing (having learnt this in my school days) and is looking to do a data warehouse as a side project. Below is a simple data warehouse design I came up with:
The data warehouse ...
1
vote
2
answers
164
views
From what volumes of data do data ingestion tools like apache nifi, flume, storm or tools like logstash become relevant?
I'm working in a company that has two legacy data warehouses, which have evolved to unmaintainable monoliths throughout the time. Therefore, they are in dire need of re-form.
I'm investigating a ...
1
vote
2
answers
453
views
What is the disadvantage of not creating surrogate key in DW?
I want to create a data warehouse from 1 OLTP database. The tables in the OLTP db have got integer primary keys. So they are the business keys.
The tables are: Client, customer, products and sales. ...
1
vote
0
answers
190
views
Star Schema design considerations
We receive patient data that includes case level info (hospital id, patient id, length of stay, surgery date) as well as surveys/educational material that the patient completed as well as their ...
0
votes
0
answers
149
views
Key attributes in junk dimension table
I have a fact tables with a composite primary key of 10 key columns.
6 out of 10 of the key columns are short, and with low cardinality, and also "mostly" unrelated to each other.
I am ...
0
votes
2
answers
67
views
how do modern data warehouses tackle frequent small writes? esp. when streaming data is one of the sources? [closed]
So for many days, I had a question in mind.
How do modern data warehouses tackle frequent small writes? esp. when streaming data is one of the sources?
e.g. Kafka/Kinesis => DW(Snowflake, Teradata, ...
0
votes
1
answer
330
views
Setting up database for date dimension table [closed]
I was working on creating database for warehouse purpose. I have many dimension table and one fact table. I have created separate create_date dimension table but It is not working as I want it to. ...
1
vote
1
answer
494
views
How to handle NULL Dates in an "Accumulating fact table"
One type of fact table in Dimensional modeling is Accumulating Snapshot fact Table. if you think you need to review the meaning and definition of this type , you can take a look at this article :
...
1
vote
1
answer
2k
views
Connecting dimension tables in a multi-fact star schema database warehouse?
I am new to data warehouse modelling so please bear with me.
In database warehouse modelling, the star schema is typically a fact table with multiple dimensions connected directly to it. However, what ...
1
vote
0
answers
34
views
History data in CUBE
Not sure if this is the right place to ask.
I am working on an ETL procedure that gathers the data about medication consumption and I have something like this:
Dim_Patient, the dimension for all the ...
0
votes
2
answers
479
views
Storing Live Timeseries Data in SQL Server Azure Database
I have an SQL Server Azure Database and am trying to store time-series data that I get from multiple sensors.
Data Source:- 5-minute data is obtained via an API.
Current table structure: -
Timestamp |...
1
vote
1
answer
953
views
Is "ETL vs ELT" just "tables vs views"?
TL;DR; Is the difference between ETL and ELT just whether or not the transformed data is a concretion vs abstraction before being loaded into the warehouse?
So forgive me the broadness of the subject ...
1
vote
1
answer
942
views
MariaDB Column Store Suitability for Data Warehouse
We want to build a data warehouse to distribute valid information across the organization (reporting) and get some insights into our data (analytics). After some research we found MariaDB using Column ...
1
vote
2
answers
853
views
text unique to the grain should be stored in fact table or dimension table?
I am modeling a star schema for user reviews using the yelp data set.
Each user review has a business dimension key, user dimension key and a bunch of data associated with the review. All of the ...
0
votes
1
answer
62
views
Data modeling for Warehouse
I am learning about database warehouses and data modeling. I came across this blog post. In the section, "DIFFERENT MODELS HAVE DIFFERENT PURPOSES" the author mentions that depending upon ...
0
votes
1
answer
2k
views
SSAS error deploying project: Ensure that 'SQL Browser' service is running
After successfully building my project, I got this error when deploying it:
The project could not be deployed to the '\SQLEXPRESS' server because of the following connectivity problems :
A connection ...
0
votes
0
answers
198
views
Schema and/or reporting SQL for customer acquisition cost?
This a design question around creating fact/dimension tables to query customer acquisition cost and how to properly model the tables and query them.
Business Rule
We spend some money per day per ...