1,330 questions
0
votes
1
answer
59
views
Not getting decimals when extracting values [duplicate]
So I am practicing data wrangling and I have encountered an issue.
food['GPA'].unique()
And the output is
array(['2.4', '3.654', '3.3', '3.2', '3.5', '2.25', '3.8', '3.904', '3.4',
'3.6', '3.1'...
0
votes
1
answer
22
views
pivot_longer() with parallel (unlinked) sets of columns [duplicate]
I'm trying to use pivot_longer() to rearrange a dataset I was given, which looks like the result of a database join operation. Here's an example of what it looks like:
dat <- tibble('Plant_Name'=c('...
1
vote
2
answers
42
views
(ERROR) Select one object and all float & int in pandas groupby
I have this dataframe.
import pandas as pd
x = {
"year": ["2012", "2012", "2013", "2014", "2012", "2014", "2013", &...
1
vote
2
answers
49
views
Regex to extract a part of URL using stringr r package
I have the following URLS:
www.google.com?utm_source=site_corriere&utm_medium=video&utm_content=box
www.google.com?utm_source=site_rep&utm_medium=display&utm_content=box
www.google.com?...
0
votes
4
answers
90
views
Fill in column based on condition with another column in R [closed]
I have the following input table:
input <- structure(
list(individual = c(1, 2, 3, 4),
age = c(20, 34, 29, 30),
earnings_2020 = c(0, 0, 1, 0),...
4
votes
3
answers
93
views
Having trouble with which.min inside dplyr pipe
I have some trouble with which.min function inside a dplyr pipe
I have a cumbersome solution (*) and I'm looking form more compact and elegant way to do this
reproducible example
library(dplyr)
...
2
votes
2
answers
66
views
Is there a R function that detects a specific string and replaces it by the value of another observation based on a number within the string?
So, I am using constituency data of the German Election 1994 and some observations contain strings that indicate that the value is given in a different row (based on the Scheme "siehe Wkr xxx&...
0
votes
1
answer
31
views
Advanced pivot_longer transformation sequentially on a group of columns
I'm a little perplex concerning the exact way to proceed with this wrangling procedure.
I've a dataset which consist in raters that are assessing lung sounds (S1,...,S40). For each sound the assessed ...
3
votes
1
answer
80
views
Behavior of %>% when piping values to functions containing pipes
The below examples demonstrate that passing an object to deparse() and substitute() produces different output depending on whether the object is passed to the function with %>% and whether the ...
0
votes
2
answers
57
views
Reformatting pdf text into dataframe to remove extra information [closed]
I am trying to load the text from a pdf into R for text analysis. The pdf is formatted so that the text has columns for extra information. Please see the screen shot below.
I'd like to load the main ...
0
votes
1
answer
94
views
How to Rearrange Values in Each Row to Avoid Duplicates Across Columns in R?
Question
I have a data frame in R where each row contains multiple columns with categorical values. My goal is to rearrange the values within each row so that no value is repeated across columns in ...
1
vote
1
answer
50
views
restrict to those with data at specific age ranges in R
I have the following long format data frame with columns, id, age, and BMI. I have restricted the dataset such that only people (id) with at least 3 repeated measurements between age 2 weeks and 10 ...
-3
votes
1
answer
46
views
More elegant solution for conditional filtering? [closed]
The code below works perfectly fine and outputs the data of interest. However, I am wondering if there is a better solution or different way think about the logic.
Essentially, I need filter for the ...
0
votes
1
answer
36
views
Can't Open .xlsx Document
I tried to download a .xlsx file from my course. But when I opened the .xlsx file, it turned into something like this.
UEsDBBQABgAIAAAAIQBBN4LPbgEAAAQFAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAAC
...
0
votes
1
answer
55
views
How to Add New Column in Dictionary?
Based on the data below, I want to calculate the BMI Index for each row and the average for the total row. The BMI Index formula is 'berat' / 'tinggi'.
enter image description here
data = [{'nama': '...
0
votes
1
answer
42
views
How to lengthen data in one column separated by semicolons, and repeat elements from the other column?
I have received a dataset in a .csv table. The first three lines of the table looks like this:
Species,Methods
Chlamydomonas pisiformis; Stichococcus bacillaris; Stichococcus subtilis; Pleurococcus ...
0
votes
0
answers
34
views
for deep learning: save each sample individually or keep blocks? data doesnt fit memory
I am training a classifier. My data comes from multiple datasets, each dataset contains multiple subjects, each subject has performed multiple trials. Currently my data structure on disk looks like ...
1
vote
1
answer
56
views
Creating a large number of columns in R tidyverse based on a comparison with a specific column
I have a dataset in R tidyverse and I want to create 192 columns based on comparison with the sp column, just like the mp_comp_1 column. How can I do this for 192 columns in tidyverse?
library(...
1
vote
2
answers
73
views
Pattern matching in a dataframe
I am having some trouble conducting pattern matching within a data frame. I am working with grepl function in R.
I have a data frame of 5 local districts in two years (2001 and 2002). I want to check ...
2
votes
3
answers
94
views
Complete and fill missing rows with groups of uneven length
I have a dataframe of county executives and the year they were inaugurated. I am running a panel study with county-year as the unit of analysis. The date range is 2000 to 2004.
I will like to expand ...
-1
votes
3
answers
81
views
Remove duplicate rows, keep first row [duplicate]
I am working with a dataframe on county executives. I want to run a panel study where the unit of analysis is the county-year.
The problem is that sometimes two or more county executives serve during ...
-1
votes
1
answer
36
views
Fill in missing rows
I have a data frame of county executives and the year they were inaugurated.
I am runnig a panel study with county-year as the unit of analyis. The date range is 2000 to 2004.
I will like to expand ...
1
vote
2
answers
55
views
dataframe breakdown by year
I have a dataset on county executives and their year of inaguration. I need break down which year each executive was inaugurated.
The problem is that the notation under the "year" variable ...
1
vote
3
answers
133
views
Add values across dataframe columns
I have a dataframe where missingness in indicated by "Z" (there may also be some "z" and NA entries present in the data), and values are entered as characters ("0", "...
1
vote
3
answers
45
views
Drop columns that are replicated in a data frame
I have a large data frame with repeated variables. This is just a sample of my data to illustrate the question:
df <- data.frame(
ID = rep(1:4, each = 1),
CMW = rep(c(10, 20, 30, 30), each = 1),...
-1
votes
1
answer
44
views
I need some help creating a loop/automatic way of cleaning my data [duplicate]
I'm quite new to programmin language and I am starting with R in my research predicting dengue desease cases with climatic data.
I'm still cleaning my data to work with and this particular one has ...
0
votes
1
answer
45
views
Add Column to R Data Frame from Another Data Frame with Matching Index Column, Only When Values are in A Certain Range
I am trying to add a column to a data frame (df1) from another data frame (df2), but only when the "depth range" from df1 lies within the "depth range" from df2. I'll explain below ...
0
votes
1
answer
53
views
SQL data wrangling help using the Having statement
The below code (Databricks SQL) produces the table following it. I am trying to adjust this code so that the output only includes zip5 records that have only 1 (or less) of each facility_type ...
1
vote
1
answer
43
views
Join tables based on a range instead of exact match [duplicate]
I have two datasets as the ones described below:
dfA <- tibble(
name = c("John", "Michael", "Brian", "Thomas", "Peter"),
expected = c(128.34, ...
0
votes
0
answers
14
views
How to transform nested data from long format to wide format without using nested structure?
I have a big dataset and have data in long format ('longdf') with one column for subjectnr., one for illness (e.g., rows are epilepsy, ms, diabetes etc.) and other columns for the variables (...
0
votes
0
answers
49
views
How can I load data in Rstudio but making it accessible in other computers when opening the file?
I'm working on an assignment and we were asked to load the data and make the file run without errors when opening from the teacher's computer. He said: "When writing your code, keep the data ...
0
votes
2
answers
73
views
R: Alternatives/approaches to read_html() + html_text() that also work on strings without HTML/XML tags
In this solution to removing HTML tags from a string, the string is passed to rvest::read_html() to create an html_document object and then the object is passed to rvest::html_text() to return "...
0
votes
1
answer
30
views
Why does the order of functions within summarise() affect its output?
When I use two functions within dplyr::summarise(), the ordering of the functions affects the output. While this post shows this can happen when the first function affects the columns the second ...
0
votes
1
answer
74
views
R flag a change in column value
I have the following dataset with 20 million rows. It's data on companies and user by month.
I have created first_app_company, which flags first appeareance of a company in the dataset. The code is as ...
0
votes
0
answers
16
views
Creating interaction terms after splitting dataset into train and test sets
I have the following preprocessing stage before applying linear regression. Would there be a way to add interaction terms 'Numeric_Var1Categorical_Var1'and 'Numeric_Var2Categorical_Var1' after ...
1
vote
1
answer
62
views
converting a dictionary of nested lists into a row in a data frame
I have some data (shown below), which is a dictionary of nested lists of dictionaries.
I want to make the whole dictionary into one row. A very wide row.
At present I can get my desired result. It is ...
0
votes
3
answers
77
views
Recoding Multiple Likert-scale Columns at Once
I usually do this the hard way, but I'm sure one of you coding experts has something less tedious.
Using this data set below:
#Example Dataset
Q1 <- c("Agree", "Disagree", &...
1
vote
1
answer
70
views
How to compute which values and how many values of a given variable satisfy a condition for another variable?
I have a dataframe of the form
# Minimum example
> data.frame(variable = c("A", "B", "C", "A", "B", "C"),
+ quantity1 = c(2,4,...
2
votes
1
answer
47
views
How to do groupby on multi index dataframe based on condition
I have a multi index dataframe, and I want to combine rows based on certain conditions and I want to combine rows per index.
import pandas as pd
# data
data = {
'date': ['01/01/17', '02/01/17', '...
0
votes
1
answer
20
views
I'm looking to find a simple way to replace levels of columns by columns with yes/no
I'm looking to find a simple way to do something like the following; from a data frame with the following variables.
ID symptom_1 symptom_2 symptom_3
1 ...
4
votes
1
answer
203
views
Python - Rolling Indexing in Polars library?
I'd like to ask around if anyone knows how to do rolling indexing in polars?
I have personally tried a few solutions which did not work for me (I'll show them below):
What I'd like to do: Indexing the ...
0
votes
1
answer
33
views
IBNR development factor calculation in data.table
Given the following table:
IBNR <- data.table(IncurredYear=c(2020,2020,2020,2020,2021,2021,2021,2022,2022,2023),
DevYear=c(0,1,2,3,0,1,2,0,1,0),
Amount=c(100,80,...
0
votes
0
answers
206
views
Data-wrangler extention doesnt display complete data tables
Im using Jupiter Notebook.
I need to open the csv file in a panda dataframe with the extention, but it doesnt work,it makes this error:
"Could not get current stack focus. This likely means that ...
0
votes
1
answer
59
views
Dynamically calculate rolling average conditional upon NA values
I have data that looks something like this.
df <- data.frame(
Week = seq(1:10),
BA.1 = c(.55, .52, .45, .39, .25, .10, 0, NA, NA, NA),
JN.1 = c(0, 0, 0.1, 0.3, 0.56, 0.71, 0.79, NA, NA, NA),
...
2
votes
3
answers
90
views
Lag by n rows over specific columns while extending the lenght of dataframe
I am trying to shift down (lag) specific columns in a data frame by n rows (e.g. 2 rows). I have only found posts on lagging over specific columns by 1 row. Here is some mock data.
df <- data.frame(...
2
votes
1
answer
46
views
change structure of a csv file in R
I am currently working on a csv file data wrangling, i have a feature "track" that has several values for the same instance, like a list of data points. I want to create a loop to split ...
0
votes
1
answer
72
views
Creating a RAG datatable in R
Based on the solution here, Add colours to datatable
I am trying to produce a RAG table and want to assign four colours.
library(tidyverse)
library(DT)
rag_df <- data.frame(Vulnerabilities = c(&...
0
votes
1
answer
47
views
Python: merge two subsequent rows by pattern
I am working with data files of a single column containing strings, but sometimes the content of one rows carries over to the next one, like so:
...
"This is a str-
-ing"
...
This ...
2
votes
1
answer
45
views
Create grouped indicator of observations with the same value in R
I have a bit of specific question.
I have a dataset that looks like this in R:
name person_id year municipality_id
<chr> <dbl> <dbl> <dbl>
1 Brown ...
2
votes
0
answers
2k
views
Polars: How to prevent: polars.exceptions.InvalidOperationError: `min` operation not supported for dtype `null`
Hi I have the following data frame with only one row, that can contain an empty list (with only one value null).
When I try to get the min over the list, and the one row dataframe does contain only an ...