Newest 'data-wrangling' Questions

0 votes

1 answer

59 views

Not getting decimals when extracting values [duplicate]

So I am practicing data wrangling and I have encountered an issue. food['GPA'].unique() And the output is array(['2.4', '3.654', '3.3', '3.2', '3.5', '2.25', '3.8', '3.904', '3.4', '3.6', '3.1'...

Sjaikisan

3

asked 2 days ago

0 votes

1 answer

22 views

pivot_longer() with parallel (unlinked) sets of columns [duplicate]

I'm trying to use pivot_longer() to rearrange a dataset I was given, which looks like the result of a database join operation. Here's an example of what it looks like: dat <- tibble('Plant_Name'=c('...

S. Robinson

229

asked Nov 22 at 20:28

1 vote

2 answers

42 views

(ERROR) Select one object and all float & int in pandas groupby

I have this dataframe. import pandas as pd x = { "year": ["2012", "2012", "2013", "2014", "2012", "2014", "2013", &...

lokalhangatt

65

asked Nov 16 at 16:02

1 vote

2 answers

49 views

Regex to extract a part of URL using stringr r package

I have the following URLS: www.google.com?utm_source=site_corriere&utm_medium=video&utm_content=box www.google.com?utm_source=site_rep&utm_medium=display&utm_content=box www.google.com?...

paolotroia

29

asked Nov 10 at 8:29

0 votes

4 answers

90 views

Fill in column based on condition with another column in R [closed]

I have the following input table: input <- structure( list(individual = c(1, 2, 3, 4), age = c(20, 34, 29, 30), earnings_2020 = c(0, 0, 1, 0),...

Chloe

1

asked Nov 8 at 8:20

4 votes

3 answers

93 views

Having trouble with which.min inside dplyr pipe

I have some trouble with which.min function inside a dplyr pipe I have a cumbersome solution (*) and I'm looking form more compact and elegant way to do this reproducible example library(dplyr) ...

Wael

1,778

asked Nov 7 at 12:52

2 votes

2 answers

66 views

Is there a R function that detects a specific string and replaces it by the value of another observation based on a number within the string?

So, I am using constituency data of the German Election 1994 and some observations contain strings that indicate that the value is given in a different row (based on the Scheme "siehe Wkr xxx&...

Paul-Markus Rudolf

23

asked Oct 29 at 11:36

0 votes

1 answer

31 views

Advanced pivot_longer transformation sequentially on a group of columns

I'm a little perplex concerning the exact way to proceed with this wrangling procedure. I've a dataset which consist in raters that are assessing lung sounds (S1,...,S40). For each sound the assessed ...

Buczinski

45

asked Oct 9 at 19:03

3 votes

1 answer

80 views

Behavior of %>% when piping values to functions containing pipes

The below examples demonstrate that passing an object to deparse() and substitute() produces different output depending on whether the object is passed to the function with %>% and whether the ...

socialscientist

4,219

asked Sep 26 at 22:02

0 votes

2 answers

57 views

Reformatting pdf text into dataframe to remove extra information [closed]

I am trying to load the text from a pdf into R for text analysis. The pdf is formatted so that the text has columns for extra information. Please see the screen shot below. I'd like to load the main ...

Ashley Wu

1

asked Sep 24 at 15:14

0 votes

1 answer

94 views

How to Rearrange Values in Each Row to Avoid Duplicates Across Columns in R?

Question I have a data frame in R where each row contains multiple columns with categorical values. My goal is to rearrange the values within each row so that no value is repeated across columns in ...

Ruam Pimentel

1,329

asked Sep 9 at 22:34

1 vote

1 answer

50 views

restrict to those with data at specific age ranges in R

I have the following long format data frame with columns, id, age, and BMI. I have restricted the dataset such that only people (id) with at least 3 repeated measurements between age 2 weeks and 10 ...

aelhak

415

asked Aug 14 at 13:45

-3 votes

1 answer

46 views

Can't Open .xlsx Document

I tried to download a .xlsx file from my course. But when I opened the .xlsx file, it turned into something like this. UEsDBBQABgAIAAAAIQBBN4LPbgEAAAQFAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAAC ...

lokalhangatt

65

asked Aug 6 at 23:44

0 votes

1 answer

55 views

How to Add New Column in Dictionary?

Based on the data below, I want to calculate the BMI Index for each row and the average for the total row. The BMI Index formula is 'berat' / 'tinggi'. enter image description here data = [{'nama': '...

lokalhangatt

65

asked Aug 5 at 22:44

0 votes

1 answer

42 views

How to lengthen data in one column separated by semicolons, and repeat elements from the other column?

I have received a dataset in a .csv table. The first three lines of the table looks like this: Species,Methods Chlamydomonas pisiformis; Stichococcus bacillaris; Stichococcus subtilis; Pleurococcus ...

Ginko-Mitten

390

asked Aug 5 at 22:39

0 votes

0 answers

34 views

for deep learning: save each sample individually or keep blocks? data doesnt fit memory

I am training a classifier. My data comes from multiple datasets, each dataset contains multiple subjects, each subject has performed multiple trials. Currently my data structure on disk looks like ...

Samuel

57

asked Aug 2 at 18:52

1 vote

1 answer

56 views

Creating a large number of columns in R tidyverse based on a comparison with a specific column

I have a dataset in R tidyverse and I want to create 192 columns based on comparison with the sp column, just like the mp_comp_1 column. How can I do this for 192 columns in tidyverse? library(...

Hamideh

697

asked Jul 25 at 1:33

1 vote

2 answers

73 views

Pattern matching in a dataframe

I am having some trouble conducting pattern matching within a data frame. I am working with grepl function in R. I have a data frame of 5 local districts in two years (2001 and 2002). I want to check ...

YouLocalRUser

401

asked Jul 24 at 23:32

2 votes

3 answers

94 views

Complete and fill missing rows with groups of uneven length

I have a dataframe of county executives and the year they were inaugurated. I am running a panel study with county-year as the unit of analysis. The date range is 2000 to 2004. I will like to expand ...

YouLocalRUser

401

asked Jul 16 at 22:36

-1 votes

3 answers

81 views

Remove duplicate rows, keep first row [duplicate]

I am working with a dataframe on county executives. I want to run a panel study where the unit of analysis is the county-year. The problem is that sometimes two or more county executives serve during ...

YouLocalRUser

401

asked Jul 16 at 17:11

-1 votes

1 answer

36 views

Fill in missing rows

I have a data frame of county executives and the year they were inaugurated. I am runnig a panel study with county-year as the unit of analyis. The date range is 2000 to 2004. I will like to expand ...

YouLocalRUser

401

asked Jul 16 at 17:00

1 vote

2 answers

55 views

dataframe breakdown by year

I have a dataset on county executives and their year of inaguration. I need break down which year each executive was inaugurated. The problem is that the notation under the "year" variable ...

YouLocalRUser

401

asked Jul 15 at 21:55

1 vote

3 answers

133 views

Add values across dataframe columns

I have a dataframe where missingness in indicated by "Z" (there may also be some "z" and NA entries present in the data), and values are entered as characters ("0", "...

jbmchls

13

asked Jul 9 at 22:06

1 vote

3 answers

45 views

Drop columns that are replicated in a data frame

I have a large data frame with repeated variables. This is just a sample of my data to illustrate the question: df <- data.frame( ID = rep(1:4, each = 1), CMW = rep(c(10, 20, 30, 30), each = 1),...

Raquel Feltrin

147

asked Jun 27 at 19:21

-1 votes

1 answer

44 views

I need some help creating a loop/automatic way of cleaning my data [duplicate]

I'm quite new to programmin language and I am starting with R in my research predicting dengue desease cases with climatic data. I'm still cleaning my data to work with and this particular one has ...

André Ferrari

3

asked Jun 24 at 23:04

0 votes

1 answer

45 views

Add Column to R Data Frame from Another Data Frame with Matching Index Column, Only When Values are in A Certain Range

I am trying to add a column to a data frame (df1) from another data frame (df2), but only when the "depth range" from df1 lies within the "depth range" from df2. I'll explain below ...

Chris Wheeler

1

asked Jun 17 at 21:06

0 votes

1 answer

53 views

SQL data wrangling help using the Having statement

The below code (Databricks SQL) produces the table following it. I am trying to adjust this code so that the output only includes zip5 records that have only 1 (or less) of each facility_type ...

Dr.Data

181

asked Jun 6 at 22:13

1 vote

1 answer

43 views

Join tables based on a range instead of exact match [duplicate]

I have two datasets as the ones described below: dfA <- tibble( name = c("John", "Michael", "Brian", "Thomas", "Peter"), expected = c(128.34, ...

jpm92

152

asked May 29 at 7:40

0 votes

0 answers

14 views

How to transform nested data from long format to wide format without using nested structure?

I have a big dataset and have data in long format ('longdf') with one column for subjectnr., one for illness (e.g., rows are epilepsy, ms, diabetes etc.) and other columns for the variables (...

Lea

1

asked May 23 at 16:33

0 votes

0 answers

49 views

How can I load data in Rstudio but making it accessible in other computers when opening the file?

I'm working on an assignment and we were asked to load the data and make the file run without errors when opening from the teacher's computer. He said: "When writing your code, keep the data ...

Ashraf Taha

1

asked May 22 at 3:38

0 votes

2 answers

73 views

R: Alternatives/approaches to read_html() + html_text() that also work on strings without HTML/XML tags

In this solution to removing HTML tags from a string, the string is passed to rvest::read_html() to create an html_document object and then the object is passed to rvest::html_text() to return "...

socialscientist

4,219

asked May 16 at 20:47

0 votes

1 answer

30 views

Why does the order of functions within summarise() affect its output?

When I use two functions within dplyr::summarise(), the ordering of the functions affects the output. While this post shows this can happen when the first function affects the columns the second ...

socialscientist

4,219

asked May 15 at 22:45

0 votes

1 answer

74 views

R flag a change in column value

I have the following dataset with 20 million rows. It's data on companies and user by month. I have created first_app_company, which flags first appeareance of a company in the dataset. The code is as ...

susznik

31

asked May 15 at 12:01

0 votes

0 answers

16 views

Creating interaction terms after splitting dataset into train and test sets

I have the following preprocessing stage before applying linear regression. Would there be a way to add interaction terms 'Numeric_Var1Categorical_Var1'and 'Numeric_Var2Categorical_Var1' after ...

J.K.

371

asked May 12 at 19:51

1 vote

1 answer

62 views

converting a dictionary of nested lists into a row in a data frame

I have some data (shown below), which is a dictionary of nested lists of dictionaries. I want to make the whole dictionary into one row. A very wide row. At present I can get my desired result. It is ...

Lavacave

81

asked May 8 at 0:24

0 votes

3 answers

77 views

Recoding Multiple Likert-scale Columns at Once

I usually do this the hard way, but I'm sure one of you coding experts has something less tedious. Using this data set below: #Example Dataset Q1 <- c("Agree", "Disagree", &...

HelplessStatistician

15

asked May 2 at 19:01

1 vote

1 answer

70 views

How to compute which values and how many values of a given variable satisfy a condition for another variable?

I have a dataframe of the form # Minimum example > data.frame(variable = c("A", "B", "C", "A", "B", "C"), + quantity1 = c(2,4,...

Carlos

39

asked Apr 30 at 8:53

2 votes

1 answer

47 views

How to do groupby on multi index dataframe based on condition

I have a multi index dataframe, and I want to combine rows based on certain conditions and I want to combine rows per index. import pandas as pd # data data = { 'date': ['01/01/17', '02/01/17', '...

Ezio

458

asked Apr 30 at 7:20

0 votes

1 answer

20 views

I'm looking to find a simple way to replace levels of columns by columns with yes/no

I'm looking to find a simple way to do something like the following; from a data frame with the following variables. ID symptom_1 symptom_2 symptom_3 1 ...

Ana Paulo

23

asked Apr 29 at 10:36

4 votes

1 answer

203 views

Python - Rolling Indexing in Polars library?

I'd like to ask around if anyone knows how to do rolling indexing in polars? I have personally tried a few solutions which did not work for me (I'll show them below): What I'd like to do: Indexing the ...

user24758287

191

asked Apr 29 at 2:44

0 votes

1 answer

33 views

IBNR development factor calculation in data.table

Given the following table: IBNR <- data.table(IncurredYear=c(2020,2020,2020,2020,2021,2021,2021,2022,2022,2023), DevYear=c(0,1,2,3,0,1,2,0,1,0), Amount=c(100,80,...

highbury

159

asked Apr 26 at 13:51

0 votes

0 answers

206 views

Data-wrangler extention doesnt display complete data tables

Im using Jupiter Notebook. I need to open the csv file in a panda dataframe with the extention, but it doesnt work,it makes this error: "Could not get current stack focus. This likely means that ...

Mendo

1

asked Apr 24 at 15:54

0 votes

1 answer

59 views

Dynamically calculate rolling average conditional upon NA values

I have data that looks something like this. df <- data.frame( Week = seq(1:10), BA.1 = c(.55, .52, .45, .39, .25, .10, 0, NA, NA, NA), JN.1 = c(0, 0, 0.1, 0.3, 0.56, 0.71, 0.79, NA, NA, NA), ...

shollaback

129

asked Apr 23 at 16:10

2 votes

3 answers

90 views

Lag by n rows over specific columns while extending the lenght of dataframe

I am trying to shift down (lag) specific columns in a data frame by n rows (e.g. 2 rows). I have only found posts on lagging over specific columns by 1 row. Here is some mock data. df <- data.frame(...

shollaback

129

asked Apr 18 at 15:23

2 votes

1 answer

46 views

change structure of a csv file in R

I am currently working on a csv file data wrangling, i have a feature "track" that has several values for the same instance, like a list of data points. I want to create a loop to split ...

sikimimi

49

asked Apr 11 at 9:44

0 votes

1 answer

72 views

Creating a RAG datatable in R

Based on the solution here, Add colours to datatable I am trying to produce a RAG table and want to assign four colours. library(tidyverse) library(DT) rag_df <- data.frame(Vulnerabilities = c(&...

Sam

550

asked Apr 11 at 9:28

0 votes

1 answer

47 views

Python: merge two subsequent rows by pattern

I am working with data files of a single column containing strings, but sometimes the content of one rows carries over to the next one, like so: ... "This is a str- -ing" ... This ...

eazyezy

163

asked Apr 9 at 17:48

2 votes

1 answer

45 views

Create grouped indicator of observations with the same value in R

I have a bit of specific question. I have a dataset that looks like this in R: name person_id year municipality_id <chr> <dbl> <dbl> <dbl> 1 Brown ...

AntVal

645

asked Apr 7 at 20:09

2 votes

0 answers

2k views

Polars: How to prevent: polars.exceptions.InvalidOperationError: `min` operation not supported for dtype `null`

Hi I have the following data frame with only one row, that can contain an empty list (with only one value null). When I try to get the min over the list, and the one row dataframe does contain only an ...

Björn

1,822

asked Apr 5 at 23:54

Collectives™ on Stack Overflow

Related Tags