Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
60 views

ImportError: cannot import name 'pkg_resources' from 'ydata_profiling'

I am new to streamlit and I am trying to use the ydata_profiling library for AutoML, but I keep getting the error of ImportError: cannot import name 'pkg_resources' from 'ydata_profiling'. I tried ...
Harshad Nawghare's user avatar
0 votes
0 answers
46 views

Python checkpoint module for estimation of remaning script time

I run background jobs for different users in our service. Subtasks and runtime is different based on the size and setting of the user account. Subtasks within the job are something like: downloading ...
ddofborg's user avatar
  • 2,173
0 votes
0 answers
158 views

Why to_notebook_iframe (ydata-profiling) does not render the report on SageMaker notebook?

I am facing an issue to show the ydata-profiling report in the notebook using SageMaker studio. Everything looks fine to create the report, but the report render does not show up at the end and the ...
Yuri Santos's user avatar
0 votes
0 answers
44 views

How to identify all possible differences in duplicate data from two different datasets and calculate frequency?

I have two datasets each containing Name, First Name, Street, House Number, Postal Code and City. I have noticed these datasets contain multiple cases of duplicates. For instance, in one dataset the ...
Lucia's user avatar
  • 1
0 votes
1 answer
532 views

AttributeError when attempting to generate a report with ydata-profiling in Python

I am attempting to generate a data profiling report using the ydata-profiling library in Python. Upon executing the following code: import ydata_profiling profile = ydata_profiling.ProfileReport(...
Santiago Aguilar's user avatar
0 votes
2 answers
3k views

Saving and Reloading a ydata-profiling / pandas-profiling ProfileReport object for later use

I am using the ydata-profiling library to generate profile reports of my pandas DataFrame. I would like to save the entire ProfileReport object, so I can load it later without having to regenerate the ...
Ananth Babu's user avatar
4 votes
2 answers
4k views

Data Profiling using Pyspark

I'm trying create a PySpark function that can take input as a Dataframe and returns a data-profile report. I already used describe and summary function which gives out result like min, max, count etc. ...
Chirag Kaushik's user avatar
0 votes
1 answer
162 views

How to customize customize alerts + other metrics in pandas_profiling / y_data_profiling alerts

pandas_profiling, or as it is now called, y_data_profiling provides a detailed breakdown of data quality. How can we customize alerts + other metrics included in their default report? I see options to ...
wantering_otter's user avatar
1 vote
1 answer
886 views

Is it possible in snowflake to write a query that lists the columns that have all null values?

In snowsight within snowflake, you can profile tables and see the % of null values in the UI, but is there an easy way to query for this data or export it from the UI? I just need to create a new ...
0004's user avatar
  • 1,250
1 vote
1 answer
2k views

Databricks : Export data profiling report

Databricks can create a data profiling report after using the display(dataframe_name). I have created a data profiling report using Azure Databricks but I do not know how do I export it. Can you ...
venus's user avatar
  • 1,258
2 votes
1 answer
382 views

Using Pydequu on Jupyter Notebook and having this "An error occurred while calling o70.run.'

I'm trying to use Pydequu on Jupyter Notebook when i try to use ConstraintSuggestionRunner and show this error: Py4JJavaError: An error occurred while calling o70.run. : java.lang.NoSuchMethodError: '...
LuisRicardo's user avatar
0 votes
2 answers
327 views

Detecting similar columns across multiple files based on statistical profile

I'm attempting to clean up a set of old files that contain sensor data measurements. Many of the files don't have headers, and the format (column ordering, etc.) is inconsistent. I'm thinking the ...
Ryan Gross's user avatar
  • 6,515
0 votes
0 answers
162 views

How can I connect a local delta lake with talend for data profiling purpose?

As I am new to talend, I am trying to connect my local delta lake with talend to do some data profiling on it.
khÜs h's user avatar
  • 61
1 vote
1 answer
55 views

Not able to perform operations on resulting dataframe after "join" operation in PySpark

df=spark.read.csv('data.csv',header=True,inferSchema=True) rule_df=spark.read.csv('job_rules.csv',header=True) query_df=spark.read.csv('rules.csv',header=True) join_df=rule_df.join(query_df,rule_df....
Aishani Singh's user avatar
0 votes
0 answers
631 views

How to create multiple pandas profiling reports for multiple csv files in a directory? The report name should match the file name

I tried this, import glob import os import pandas as pd import pandas_profiling from pandas_profiling import ProfileReport files = glob.glob("D:\home_health_services_current_data\*.csv") df ...
Mike's user avatar
  • 1
1 vote
0 answers
124 views

SSIS Data Profiling Task - Not showing all in Data Profile Outputs

Chose following request for Data Profiling Task in SSDT 2017. But, it's only showing NullRationReq in output and NOT the other requests. I tried few times, and when checked profiler output xml - in ...
BPen's user avatar
  • 43
0 votes
1 answer
696 views

Data profiling of columns for big table (SQL Server)

I have table with over 40 million records. I need to make data profiling, including Nulls count, Distinct Values, Zeros and Blancs, %Numeric, %Date, Needs to be Trimmed, etc. The examples that I was ...
Yana's user avatar
  • 975
0 votes
2 answers
212 views

Validation for columns work very slow (SQL Server)

I want to perform data profiling on the columns of a table. In this particular case - what percentage of data is date/integer/numeric/bit. The query that I am using: SELECT CAST(SUM(CASE WHEN ...
Yana's user avatar
  • 975
0 votes
1 answer
441 views

when i execute pandas-profiling package it won't return min, max and mean values

When i profiling the following data using pandas-profiling==2.8.0 it won't return min, max and mean values. CSV data a,b,c 12,2.5,0 12,4.7,5 33,5,4 44,44.21,67 python code import json import ...
ArunKumar's user avatar
  • 229
0 votes
1 answer
1k views

Db2 tables - finding all blank columns in a table that has 100+ columns

I have a table with 78 columns and 100k rows. Is there a way to find all the blank columns in the table without querying on each column to find their counts? Running a not null query is time consuming ...
Vinney_143's user avatar
1 vote
0 answers
409 views

pandas-profiling "Duplicate rows" section is not showing-up in the HTML Report

I am using pandas-profiling=2.8.0 and I have generated an HTML report in which 2 duplicates are shown in the Overview Section, as seen below But the "Duplicate rows" option/section is ...
PraveenS's user avatar
  • 135
0 votes
3 answers
3k views

data profiling on bigquery table covering min,max,unique, null count statistics

I am looking for solution to perform data profiling on bigquery table covering below statistics for each column in table. Some of the columns are ARRAY and STRUCT as given below. I tried multiple ...
Mallik Tiru's user avatar
1 vote
1 answer
774 views

why do I get IndexError while trying to get data profiling report?

I recently started using python. And, I am trying to get the report using pandas_profiling, I am running into IndexError. Can someone please explain how I can debug this? Data has like 30 variables ...
Greeshma A Shivaramu's user avatar
2 votes
2 answers
951 views

How to detect and convert units of column values without using python loop?

As per my knowledge Python loops are slow, hence it is preferred to use pandas inbuilt functions. In my problem, one column will have different currencies, I need to convert them to dollar. How can I ...
Kiran's user avatar
  • 2,397
-2 votes
1 answer
65 views

DB2 : Need to get the list of columns and distinct value counts for a given db2 table

For data profiling purpose , I just need to get the idea if a columns in a given table has values populated or not. For that, I need to get the list of columns and distinct value counts for a given ...
Edayadulla M's user avatar
2 votes
2 answers
839 views

How to loop through all tables and fields in each table to get percentage of missing values

I am trying to, using SSIS, obtain a table to get the percentage of missing values of every field in every table of a SQL Server database. Ideally I would like to create a new table in another ...
fmarm's user avatar
  • 4,284
1 vote
0 answers
412 views

Error when running Data Profiling Task with Azure SQL Server data

When running a Data Profiling Task in SSIS with data from an Azure SQL Server, I receive the following error message: System.Data.SqlClient.SqlException (0x80131904): USE statement is not supported ...
Michelle Turner's user avatar
1 vote
1 answer
189 views

Profiling the empty string in SSIS Data Profiling

I've just started using the Data Profiling Task in SSIS to profile some data on our databases. I've found the option for profiling the column null ratios ("Column Null Ratio Profiles") but I'm ...
t_warsop's user avatar
  • 1,260
0 votes
2 answers
447 views

Find Multi-Column Primary key

I have about 30 tables from an old ERP which have multi-column primary keys. Unfortunately I don't know what those keys are. I've used the SSIS profiling task to determine primary key candidates for ...
Jeremiah's user avatar
10 votes
2 answers
1k views

Data profiling Task - custom Profile Request

Is there any option to create a custom Profile Request for SSIS Data Profiling Task? At the moment there are 5 standard profile requests under SSIS Data Profiling task: Column Null Ratio Profile ...
Barsham's user avatar
  • 769
0 votes
1 answer
38 views

XSLT: Copy two files into one common structure

I try to merge result of SSIS Data Profiler Task for several tables into one XML for inspection of the results within one single file inside "Data Profiler Viewer". The whole problem shrinks to the ...
Christian4145's user avatar
1 vote
3 answers
2k views

Data profiling in Power BI

I want to profile every single data table I have in my Power BI report. By data profile I mean something like this: Are there ways to make a data profile view in Power BI? DAX measure or calculated ...
Reza Azimi's user avatar
0 votes
1 answer
2k views

generate PostgreSQL stats / data profiling [closed]

I would like to automate data profiling on PostgreSQL with a free tool, a tool that inspects data content through a column profile or percentage distribution of values. like max, min, avg.
rachid's user avatar
  • 2,476
3 votes
3 answers
978 views

Measuring peak disk use of a process

I am trying to benchmark a tool I'm developing in terms of time, memory, and disk use. I know /usr/bin/time gives me basically what I want for the first two, but for disk use I came to the conclusion ...
roro's user avatar
  • 177
0 votes
1 answer
122 views

Extract pattern from dataset

I have a table with several columns filled with data from different parameters. As some of the rows might share the same column values I'd like to extract the most repeating values for each column so ...
Bruno Fernandes's user avatar
0 votes
2 answers
2k views

Data Profiling on a File through SSIS

I'm new to SSIS Development. I need some guidance from experts on SSIS. Following are the list of questions : We are having files with sizes from 1GB to 25 GB of type txt or dat files with tab ...
user145610's user avatar
  • 3,025
13 votes
5 answers
4k views

Cannot start Concurrency Visualizer in Visual Studio 2012. Got error "Unable to start the ETW collection"

When I tried to profile a WPF application with Concurrency Visualzer (tried both launch and attach to process), I got the following error pop up - "Unable to start the ETW collection" ETW clearly ...
user2415364's user avatar
-1 votes
5 answers
684 views

Tool for table_schema and table_name relationship

Do you know any tools for profiling,to see the structure and relationship of each tables inside the db? it is look like this one : See screenShot below, For bigger resolution, Please click here. ...
Database Admin's user avatar
4 votes
1 answer
3k views

MySQL capacity planning

In my production environment, I have a single instance of MySQL server running on 16gig of memory that handles up to 20,000 queries an hour. The size of one my table is growing at the rate of 2 ...
Dennis Y.'s user avatar
  • 135
0 votes
1 answer
542 views

Suggestion on Customer Profiling System: Books, Articles, etc

I'm going to work on a Customer Profiling project (similar but not same to Google Analytics) for our own E-Commerce website using C#. I'm pretty new to this kind of project, and the Customer Profiling ...
Mouhong Lin's user avatar
  • 4,509