Skip to main content
Filter by
Sorted by
Tagged with
0 votes
1 answer
59 views

Not getting decimals when extracting values [duplicate]

So I am practicing data wrangling and I have encountered an issue. food['GPA'].unique() And the output is array(['2.4', '3.654', '3.3', '3.2', '3.5', '2.25', '3.8', '3.904', '3.4', '3.6', '3.1'...
Sjaikisan's user avatar
0 votes
1 answer
22 views

pivot_longer() with parallel (unlinked) sets of columns [duplicate]

I'm trying to use pivot_longer() to rearrange a dataset I was given, which looks like the result of a database join operation. Here's an example of what it looks like: dat <- tibble('Plant_Name'=c('...
S. Robinson's user avatar
1 vote
2 answers
42 views

(ERROR) Select one object and all float & int in pandas groupby

I have this dataframe. import pandas as pd x = { "year": ["2012", "2012", "2013", "2014", "2012", "2014", "2013", &...
lokalhangatt's user avatar
1 vote
2 answers
49 views

Regex to extract a part of URL using stringr r package

I have the following URLS: www.google.com?utm_source=site_corriere&utm_medium=video&utm_content=box www.google.com?utm_source=site_rep&utm_medium=display&utm_content=box www.google.com?...
paolotroia's user avatar
0 votes
4 answers
90 views

Fill in column based on condition with another column in R [closed]

I have the following input table: input <- structure( list(individual = c(1, 2, 3, 4), age = c(20, 34, 29, 30), earnings_2020 = c(0, 0, 1, 0),...
Chloe's user avatar
  • 1
4 votes
3 answers
93 views

Having trouble with which.min inside dplyr pipe

I have some trouble with which.min function inside a dplyr pipe I have a cumbersome solution (*) and I'm looking form more compact and elegant way to do this reproducible example library(dplyr) ...
Wael's user avatar
  • 1,778
2 votes
2 answers
66 views

Is there a R function that detects a specific string and replaces it by the value of another observation based on a number within the string?

So, I am using constituency data of the German Election 1994 and some observations contain strings that indicate that the value is given in a different row (based on the Scheme "siehe Wkr xxx&...
Paul-Markus Rudolf's user avatar
0 votes
1 answer
31 views

Advanced pivot_longer transformation sequentially on a group of columns

I'm a little perplex concerning the exact way to proceed with this wrangling procedure. I've a dataset which consist in raters that are assessing lung sounds (S1,...,S40). For each sound the assessed ...
Buczinski's user avatar
3 votes
1 answer
80 views

Behavior of %>% when piping values to functions containing pipes

The below examples demonstrate that passing an object to deparse() and substitute() produces different output depending on whether the object is passed to the function with %>% and whether the ...
socialscientist's user avatar
0 votes
2 answers
57 views

Reformatting pdf text into dataframe to remove extra information [closed]

I am trying to load the text from a pdf into R for text analysis. The pdf is formatted so that the text has columns for extra information. Please see the screen shot below. I'd like to load the main ...
Ashley Wu's user avatar
0 votes
1 answer
94 views

How to Rearrange Values in Each Row to Avoid Duplicates Across Columns in R?

Question I have a data frame in R where each row contains multiple columns with categorical values. My goal is to rearrange the values within each row so that no value is repeated across columns in ...
Ruam Pimentel's user avatar
1 vote
1 answer
50 views

restrict to those with data at specific age ranges in R

I have the following long format data frame with columns, id, age, and BMI. I have restricted the dataset such that only people (id) with at least 3 repeated measurements between age 2 weeks and 10 ...
aelhak's user avatar
  • 415
-3 votes
1 answer
46 views

More elegant solution for conditional filtering? [closed]

The code below works perfectly fine and outputs the data of interest. However, I am wondering if there is a better solution or different way think about the logic. Essentially, I need filter for the ...
Eizy's user avatar
  • 351
0 votes
1 answer
36 views

Can't Open .xlsx Document

I tried to download a .xlsx file from my course. But when I opened the .xlsx file, it turned into something like this. UEsDBBQABgAIAAAAIQBBN4LPbgEAAAQFAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAAC ...
lokalhangatt's user avatar
0 votes
1 answer
55 views

How to Add New Column in Dictionary?

Based on the data below, I want to calculate the BMI Index for each row and the average for the total row. The BMI Index formula is 'berat' / 'tinggi'. enter image description here data = [{'nama': '...
lokalhangatt's user avatar
0 votes
1 answer
42 views

How to lengthen data in one column separated by semicolons, and repeat elements from the other column?

I have received a dataset in a .csv table. The first three lines of the table looks like this: Species,Methods Chlamydomonas pisiformis; Stichococcus bacillaris; Stichococcus subtilis; Pleurococcus ...
Ginko-Mitten's user avatar
0 votes
0 answers
34 views

for deep learning: save each sample individually or keep blocks? data doesnt fit memory

I am training a classifier. My data comes from multiple datasets, each dataset contains multiple subjects, each subject has performed multiple trials. Currently my data structure on disk looks like ...
Samuel's user avatar
  • 57
1 vote
1 answer
56 views

Creating a large number of columns in R tidyverse based on a comparison with a specific column

I have a dataset in R tidyverse and I want to create 192 columns based on comparison with the sp column, just like the mp_comp_1 column. How can I do this for 192 columns in tidyverse? library(...
Hamideh's user avatar
  • 697
1 vote
2 answers
73 views

Pattern matching in a dataframe

I am having some trouble conducting pattern matching within a data frame. I am working with grepl function in R. I have a data frame of 5 local districts in two years (2001 and 2002). I want to check ...
YouLocalRUser's user avatar
2 votes
3 answers
94 views

Complete and fill missing rows with groups of uneven length

I have a dataframe of county executives and the year they were inaugurated. I am running a panel study with county-year as the unit of analysis. The date range is 2000 to 2004. I will like to expand ...
YouLocalRUser's user avatar
-1 votes
3 answers
81 views

Remove duplicate rows, keep first row [duplicate]

I am working with a dataframe on county executives. I want to run a panel study where the unit of analysis is the county-year. The problem is that sometimes two or more county executives serve during ...
YouLocalRUser's user avatar
-1 votes
1 answer
36 views

Fill in missing rows

I have a data frame of county executives and the year they were inaugurated. I am runnig a panel study with county-year as the unit of analyis. The date range is 2000 to 2004. I will like to expand ...
YouLocalRUser's user avatar
1 vote
2 answers
55 views

dataframe breakdown by year

I have a dataset on county executives and their year of inaguration. I need break down which year each executive was inaugurated. The problem is that the notation under the "year" variable ...
YouLocalRUser's user avatar
1 vote
3 answers
133 views

Add values across dataframe columns

I have a dataframe where missingness in indicated by "Z" (there may also be some "z" and NA entries present in the data), and values are entered as characters ("0", "...
jbmchls's user avatar
  • 13
1 vote
3 answers
45 views

Drop columns that are replicated in a data frame

I have a large data frame with repeated variables. This is just a sample of my data to illustrate the question: df <- data.frame( ID = rep(1:4, each = 1), CMW = rep(c(10, 20, 30, 30), each = 1),...
Raquel Feltrin's user avatar
-1 votes
1 answer
44 views

I need some help creating a loop/automatic way of cleaning my data [duplicate]

I'm quite new to programmin language and I am starting with R in my research predicting dengue desease cases with climatic data. I'm still cleaning my data to work with and this particular one has ...
André Ferrari's user avatar
0 votes
1 answer
45 views

Add Column to R Data Frame from Another Data Frame with Matching Index Column, Only When Values are in A Certain Range

I am trying to add a column to a data frame (df1) from another data frame (df2), but only when the "depth range" from df1 lies within the "depth range" from df2. I'll explain below ...
Chris Wheeler's user avatar
0 votes
1 answer
53 views

SQL data wrangling help using the Having statement

The below code (Databricks SQL) produces the table following it. I am trying to adjust this code so that the output only includes zip5 records that have only 1 (or less) of each facility_type ...
Dr.Data's user avatar
  • 181
1 vote
1 answer
43 views

Join tables based on a range instead of exact match [duplicate]

I have two datasets as the ones described below: dfA <- tibble( name = c("John", "Michael", "Brian", "Thomas", "Peter"), expected = c(128.34, ...
jpm92's user avatar
  • 152
0 votes
0 answers
14 views

How to transform nested data from long format to wide format without using nested structure?

I have a big dataset and have data in long format ('longdf') with one column for subjectnr., one for illness (e.g., rows are epilepsy, ms, diabetes etc.) and other columns for the variables (...
Lea's user avatar
  • 1
0 votes
0 answers
49 views

How can I load data in Rstudio but making it accessible in other computers when opening the file?

I'm working on an assignment and we were asked to load the data and make the file run without errors when opening from the teacher's computer. He said: "When writing your code, keep the data ...
Ashraf Taha's user avatar
0 votes
2 answers
73 views

R: Alternatives/approaches to read_html() + html_text() that also work on strings without HTML/XML tags

In this solution to removing HTML tags from a string, the string is passed to rvest::read_html() to create an html_document object and then the object is passed to rvest::html_text() to return "...
socialscientist's user avatar
0 votes
1 answer
30 views

Why does the order of functions within summarise() affect its output?

When I use two functions within dplyr::summarise(), the ordering of the functions affects the output. While this post shows this can happen when the first function affects the columns the second ...
socialscientist's user avatar
0 votes
1 answer
74 views

R flag a change in column value

I have the following dataset with 20 million rows. It's data on companies and user by month. I have created first_app_company, which flags first appeareance of a company in the dataset. The code is as ...
susznik's user avatar
  • 31
0 votes
0 answers
16 views

Creating interaction terms after splitting dataset into train and test sets

I have the following preprocessing stage before applying linear regression. Would there be a way to add interaction terms 'Numeric_Var1Categorical_Var1'and 'Numeric_Var2Categorical_Var1' after ...
J.K.'s user avatar
  • 371
1 vote
1 answer
62 views

converting a dictionary of nested lists into a row in a data frame

I have some data (shown below), which is a dictionary of nested lists of dictionaries. I want to make the whole dictionary into one row. A very wide row. At present I can get my desired result. It is ...
Lavacave's user avatar
0 votes
3 answers
77 views

Recoding Multiple Likert-scale Columns at Once

I usually do this the hard way, but I'm sure one of you coding experts has something less tedious. Using this data set below: #Example Dataset Q1 <- c("Agree", "Disagree", &...
HelplessStatistician's user avatar
1 vote
1 answer
70 views

How to compute which values and how many values of a given variable satisfy a condition for another variable?

I have a dataframe of the form # Minimum example > data.frame(variable = c("A", "B", "C", "A", "B", "C"), + quantity1 = c(2,4,...
Carlos's user avatar
  • 39
2 votes
1 answer
47 views

How to do groupby on multi index dataframe based on condition

I have a multi index dataframe, and I want to combine rows based on certain conditions and I want to combine rows per index. import pandas as pd # data data = { 'date': ['01/01/17', '02/01/17', '...
Ezio's user avatar
  • 458
0 votes
1 answer
20 views

I'm looking to find a simple way to replace levels of columns by columns with yes/no

I'm looking to find a simple way to do something like the following; from a data frame with the following variables. ID symptom_1 symptom_2 symptom_3 1 ...
Ana Paulo's user avatar
4 votes
1 answer
203 views

Python - Rolling Indexing in Polars library?

I'd like to ask around if anyone knows how to do rolling indexing in polars? I have personally tried a few solutions which did not work for me (I'll show them below): What I'd like to do: Indexing the ...
user24758287's user avatar
0 votes
1 answer
33 views

IBNR development factor calculation in data.table

Given the following table: IBNR <- data.table(IncurredYear=c(2020,2020,2020,2020,2021,2021,2021,2022,2022,2023), DevYear=c(0,1,2,3,0,1,2,0,1,0), Amount=c(100,80,...
highbury's user avatar
  • 159
0 votes
0 answers
206 views

Data-wrangler extention doesnt display complete data tables

Im using Jupiter Notebook. I need to open the csv file in a panda dataframe with the extention, but it doesnt work,it makes this error: "Could not get current stack focus. This likely means that ...
Mendo's user avatar
  • 1
0 votes
1 answer
59 views

Dynamically calculate rolling average conditional upon NA values

I have data that looks something like this. df <- data.frame( Week = seq(1:10), BA.1 = c(.55, .52, .45, .39, .25, .10, 0, NA, NA, NA), JN.1 = c(0, 0, 0.1, 0.3, 0.56, 0.71, 0.79, NA, NA, NA), ...
shollaback's user avatar
2 votes
3 answers
90 views

Lag by n rows over specific columns while extending the lenght of dataframe

I am trying to shift down (lag) specific columns in a data frame by n rows (e.g. 2 rows). I have only found posts on lagging over specific columns by 1 row. Here is some mock data. df <- data.frame(...
shollaback's user avatar
2 votes
1 answer
46 views

change structure of a csv file in R

I am currently working on a csv file data wrangling, i have a feature "track" that has several values for the same instance, like a list of data points. I want to create a loop to split ...
sikimimi's user avatar
0 votes
1 answer
72 views

Creating a RAG datatable in R

Based on the solution here, Add colours to datatable I am trying to produce a RAG table and want to assign four colours. library(tidyverse) library(DT) rag_df <- data.frame(Vulnerabilities = c(&...
Sam's user avatar
  • 550
0 votes
1 answer
47 views

Python: merge two subsequent rows by pattern

I am working with data files of a single column containing strings, but sometimes the content of one rows carries over to the next one, like so: ... "This is a str- -ing" ... This ...
eazyezy's user avatar
  • 163
2 votes
1 answer
45 views

Create grouped indicator of observations with the same value in R

I have a bit of specific question. I have a dataset that looks like this in R: name person_id year municipality_id <chr> <dbl> <dbl> <dbl> 1 Brown ...
AntVal's user avatar
  • 645
2 votes
0 answers
2k views

Polars: How to prevent: polars.exceptions.InvalidOperationError: `min` operation not supported for dtype `null`

Hi I have the following data frame with only one row, that can contain an empty list (with only one value null). When I try to get the min over the list, and the one row dataframe does contain only an ...
Björn's user avatar
  • 1,822

1
2 3 4 5
27