Skip to main content

Questions tagged [data-warehouse]

A database system optimised for reporting, particularly in aggregate. Often, but not always implemented using a star schema.

Filter by
Sorted by
Tagged with
0 votes
1 answer
23 views

Must fact tables be designed for aggregation?

I have been reading up on Microsoft SQL Server's columnstore indexes recently and I have become concerned that my experience does not match that of the industry. Columnstore indexes are designed to ...
J. Mini's user avatar
  • 799
0 votes
0 answers
20 views

Figuring out dimensional modeling requirements for OLAP in a database containing many pre-aggregated fact tables

I have the situation where we have an application that uses a PostgreSQL database for OLAP queries for reporting (less DWH/exploratory). I am trying to find an optimal database design for this ...
user991710's user avatar
0 votes
1 answer
38 views

Transactional replication Snapshot agent not starting

I am trying to consider all the options before setting up transactional replication on my 2008R2 server to 2012 distribution server to 2012 Subscriber server. All the three servers are on FULL ...
user25935915's user avatar
0 votes
2 answers
52 views

Can you have a minimal transactions data warehouse?

Good Morning, I have a client doing scientific research. The current database (Azure postgres) design is 3NF but very few transactions are happening. There are several "pipelines" that add ...
GPGVM's user avatar
  • 103
0 votes
1 answer
34 views

ideas to reduce association tables while building a data warehouse

I'm trying to create a data warehouse for my BI project. Initially I have a raw database that contains 100 data tables, most of them are association tables. My goal is to create normalized dimensional ...
Jalel MED's user avatar
0 votes
2 answers
22 views

One attributes to several statues with the date of each statue in my fact table data warehouse

I hope you are well. I have this problem and I am a beginner in data warehousing. I hope to find a solution to it. I have an OLTP database from the GLPI system. We are only interested in ticket ...
anfel andel's user avatar
1 vote
1 answer
76 views

What does psql do on connect (that could be taking 1 to 10 minutes)?

In some cases psql is taking ~1 to 10 minutes to connect to PostgreSQL, with no queries that it seems to be running while doing so (according to pg_activity that I think queries pg_stat_activity) I'm ...
Michal Charemza's user avatar
0 votes
0 answers
77 views

Best practice for applying complex business logic during transformation in ETL from Aurora db to Redshift data warehouse

I'll be pushing data from an Amazon Aurora db to a Redshift data warehouse. The painful part is that the transformation portion of the ETL workflow relies heavily on business logic, that lives within ...
mstrom's user avatar
  • 143
2 votes
2 answers
55 views

SQL Server DWH architecture question

We have SQL Server DWH server. It has dwh_raw, dwh_prod and so on databases with client data, business operations. Now there is a task to allocate new data - on employees, their salaries and so on. Is ...
George K's user avatar
  • 2,306
-1 votes
1 answer
39 views

Issue with modeling Facts table - don't know what do with the fact that one of the dimensions may happen many times

I am doing an model for warehouse whose topic are car crashes. I've a problem I don't know how to solve, that is - my fact is a Crash. I don't know how to think in terms of drivers. Issue is that ...
LaRevacholiere's user avatar
0 votes
2 answers
122 views

Growing SQL Server Data Warehouse - How to reorganize efficiently?

we've been using SQL Server for quite a while now in our company to host all application data. We use a typical set-up with data being loaded from various data sources into a Staging Area, which ...
Benny's user avatar
  • 3
0 votes
1 answer
156 views

How to compress data in SQL Server? The data warehouse raw data is times bigger than the MySQL OLTP database for the same tables

For reporting purposes, we have a DWH (data warehouse) doing ETLs (extract-transform-load) to retrieve data from selected tables in a production OLTP (on-line transaction processing) database. The ETL ...
James's user avatar
  • 149
0 votes
1 answer
817 views

Detecting deleted rows in mysql

I have a Spring Boot webapp with a MySQL database. I'm trying to replicate this data into a MSSQL for running reports against. All rows are audited with created_at and modified_at datetimes so i can ...
jared's user avatar
  • 123
1 vote
2 answers
127 views

Best way to store millions of measurements with hundreds of columns

I work with infrared spectrometers (several hundred of them) to analyze chemical compounds, and I am looking for an efficient solution to store the measured data so that I can process it later on. ...
Xav59130's user avatar
1 vote
1 answer
808 views

Optimal join for joining facts with scd-type-2 dimension for aggregation/reporting

I have a fact table and an scd-type-2 dimension table. I want to produce sales report by region and year. I have working solution with a query that joins them for reporting purposes. When I run the ...
Kashyap's user avatar
  • 149
0 votes
1 answer
89 views

Postgres JSONs transformation

I would very appreciate your help here, an important question about architecture modification. We use the ETL process to fetch data from external services (for example, Github). Extract the data (e....
Ewen Field's user avatar
2 votes
1 answer
4k views

Is it acceptable to store global date times in UTC without offset for timezone?

I'm looking at a Data Warehouse which collections data from systems across the world. The Data Warehouse uses UTC for time but only stores the UTC offset and not the timezone. I have noticed that ...
user3165854's user avatar
0 votes
0 answers
236 views

Optimising a recursive SQL query that processes several million records in BQ

I need help optimizing a recursive SQL query in BQ. I have hierarchical data stored in a table as parent-child relationships, i.e will be stored as parent_item_id child_item_id 1 1 1 2 1 3 3 4 ...
Roman Dryndik's user avatar
2 votes
1 answer
199 views

Securing Functional/Service ID for MSSQL with Windows Authentication

I'm currently assessing a setup of Data warehouse implementation in our environment, based on the requirements: A functional/service ID to be setup to access multiple DB servers of other applications ...
nlks's user avatar
  • 21
1 vote
0 answers
438 views

"Snapshot" of Historical Data for Analysis Purposes

I pull monthly aggregate data from a SQL database. I store the aggregated data in a separate table in the same database (so I can quickly grab numbers from six months ago, for instance). My issue is ...
baseball_is_life's user avatar
0 votes
0 answers
750 views

ETL staging area incremental load design considerations

We are currently working on a design for our new data warehouse. In our current data warehouse we have a 'persistent staging area' or PSA. We load new records incrementally into a staging table and ...
Niels Broertjes's user avatar
1 vote
1 answer
45 views

Detect table updates on OracleDB

Say we have a table in oracledb called fruits: id fruit 1 banana 2 mango 3 melon Imagine this table is being overwitten completely, say every Friday for example, the column id is the primary key. ...
moth's user avatar
  • 113
0 votes
0 answers
32 views

How does one design a database to be able to view state of data at any given place in time?

Are there accepted best practices/database features to solve the general problem of being able to view/query the state of a system at any given point in time? To describe the problem, using the ...
user975917's user avatar
0 votes
0 answers
82 views

Generate unique ids in BigQuery data import

I am using Zapier to import woocommerce orders into BigQuery and build a datawarehouse with paid orders from multiple sites. However, multiple sites generate an ID per site, not a unique id globally ...
tokyographer's user avatar
1 vote
0 answers
180 views

Data warehouse: solving many-to-many relationship with duplicates

I'm learning dimensional modeling and have a question. I understand those basic examples such as a FactSale has DimItem, DimLocation. But when it comes to complicated cases with many-to-many ...
Arxz's user avatar
  • 11
0 votes
1 answer
75 views

SQL All Procs have WITH RECOMPLILE so how to maintain a bloated plan cache

I'm working a star schema data warehouse created in Azure SQL database where the last developer included WITH RECOMPILE on all the Procs. This I believe was because the ETL only executes these Procs ...
Geezer's user avatar
  • 513
2 votes
2 answers
495 views

Optimal Indexing Strategy for Datawarehouse and Data Lake Updates

We have a sql server database we use as a data lake and a datawarehouse. Each table in the database has some standardized definition as we are at 600 or so tables now, so maintenance needs to be ...
Antyan's user avatar
  • 158
0 votes
1 answer
681 views

Boolean flag in fact table

Imagine we have received the results of a health survey on daily consumption habits of 3 different items, like the following: Id Date Age Country CigarettesPerDay CoffeesPerDay BeersPerDay 1 2021-12-...
Mads's user avatar
  • 31
2 votes
1 answer
783 views

Data lake or data warehouse first?

I have been confused whether to create a data lake or a data warehouse and hope some experienced real-world professional can give me some enlightenment. I will like to store, visualise and perform ...
SunnyBoiz's user avatar
  • 153
0 votes
1 answer
616 views

Relations between fact tables in a data warehouse

I'm designing a data warehouse and I keep reading that there should be no relations between fact tables. I'm sure everyone here is well aware that businesses are full of facts which are inherently ...
stovroz's user avatar
  • 631
0 votes
1 answer
259 views

Fact table linking to natual key rather than row key in a slowly changing dimension table?

I’m designing a data warehouse and have a Slowly Changing Dimension Type 2 table something like this: UserKey UserID Group EffectiveDate ExpiryDate IsCurrent 1 1001 GR1 2021-01-01 9999-12-31 Y 2 ...
stovroz's user avatar
  • 631
3 votes
3 answers
7k views

OBT (One Big Table) vs Star Schema for Data Warehousing

I am trying to prepare myself for some interviews at a FAANG, while coming from a much smaller company. Part of the interview process involves data modeling and ETL design. Kimball's "The Data ...
ohbrobig's user avatar
  • 131
1 vote
1 answer
1k views

Table design to store Refund Transaction information

I have a transaction table that contains purchases and refunds. I'd like to create a linkage between purchase and refunds in the same table (I decided not to use a separate table). I'm not sure which ...
Gabe's user avatar
  • 1,336
0 votes
1 answer
1k views

How to handle incoming time-series data in a data warehouse?

I am new to data warehousing (having learnt this in my school days) and is looking to do a data warehouse as a side project. Below is a simple data warehouse design I came up with: The data warehouse ...
user3118602's user avatar
1 vote
2 answers
164 views

From what volumes of data do data ingestion tools like apache nifi, flume, storm or tools like logstash become relevant?

I'm working in a company that has two legacy data warehouses, which have evolved to unmaintainable monoliths throughout the time. Therefore, they are in dire need of re-form. I'm investigating a ...
Psychotechnopath's user avatar
1 vote
2 answers
453 views

What is the disadvantage of not creating surrogate key in DW?

I want to create a data warehouse from 1 OLTP database. The tables in the OLTP db have got integer primary keys. So they are the business keys. The tables are: Client, customer, products and sales. ...
variable's user avatar
  • 3,489
1 vote
0 answers
190 views

Star Schema design considerations

We receive patient data that includes case level info (hospital id, patient id, length of stay, surgery date) as well as surveys/educational material that the patient completed as well as their ...
CandleWax's user avatar
0 votes
0 answers
149 views

Key attributes in junk dimension table

I have a fact tables with a composite primary key of 10 key columns. 6 out of 10 of the key columns are short, and with low cardinality, and also "mostly" unrelated to each other. I am ...
Avi's user avatar
  • 265
0 votes
2 answers
67 views

how do modern data warehouses tackle frequent small writes? esp. when streaming data is one of the sources? [closed]

So for many days, I had a question in mind. How do modern data warehouses tackle frequent small writes? esp. when streaming data is one of the sources? e.g. Kafka/Kinesis => DW(Snowflake, Teradata, ...
Libertarian's user avatar
0 votes
1 answer
330 views

Setting up database for date dimension table [closed]

I was working on creating database for warehouse purpose. I have many dimension table and one fact table. I have created separate create_date dimension table but It is not working as I want it to. ...
Kamal Panhwar's user avatar
1 vote
1 answer
494 views

How to handle NULL Dates in an "Accumulating fact table"

One type of fact table in Dimensional modeling is Accumulating Snapshot fact Table. if you think you need to review the meaning and definition of this type , you can take a look at this article : ...
Pantea's user avatar
  • 1,498
1 vote
1 answer
2k views

Connecting dimension tables in a multi-fact star schema database warehouse?

I am new to data warehouse modelling so please bear with me. In database warehouse modelling, the star schema is typically a fact table with multiple dimensions connected directly to it. However, what ...
user3118602's user avatar
1 vote
0 answers
34 views

History data in CUBE

Not sure if this is the right place to ask. I am working on an ETL procedure that gathers the data about medication consumption and I have something like this: Dim_Patient, the dimension for all the ...
Jovan Perovic's user avatar
0 votes
2 answers
479 views

Storing Live Timeseries Data in SQL Server Azure Database

I have an SQL Server Azure Database and am trying to store time-series data that I get from multiple sensors. Data Source:- 5-minute data is obtained via an API. Current table structure: - Timestamp |...
Spaceguy152's user avatar
1 vote
1 answer
953 views

Is "ETL vs ELT" just "tables vs views"?

TL;DR; Is the difference between ETL and ELT just whether or not the transformed data is a concretion vs abstraction before being loaded into the warehouse? So forgive me the broadness of the subject ...
e_i_pi's user avatar
  • 217
1 vote
1 answer
942 views

MariaDB Column Store Suitability for Data Warehouse

We want to build a data warehouse to distribute valid information across the organization (reporting) and get some insights into our data (analytics). After some research we found MariaDB using Column ...
CFM's user avatar
  • 13
1 vote
2 answers
853 views

text unique to the grain should be stored in fact table or dimension table?

I am modeling a star schema for user reviews using the yelp data set. Each user review has a business dimension key, user dimension key and a bunch of data associated with the review. All of the ...
bfmcneill's user avatar
0 votes
1 answer
62 views

Data modeling for Warehouse

I am learning about database warehouses and data modeling. I came across this blog post. In the section, "DIFFERENT MODELS HAVE DIFFERENT PURPOSES" the author mentions that depending upon ...
adam's user avatar
  • 1
0 votes
1 answer
2k views

SSAS error deploying project: Ensure that 'SQL Browser' service is running

After successfully building my project, I got this error when deploying it: The project could not be deployed to the '\SQLEXPRESS' server because of the following connectivity problems : A connection ...
dbybanez's user avatar
0 votes
0 answers
198 views

Schema and/or reporting SQL for customer acquisition cost?

This a design question around creating fact/dimension tables to query customer acquisition cost and how to properly model the tables and query them. Business Rule We spend some money per day per ...
Dave Copeland's user avatar

1
2 3 4 5
9