I need to read a csv file and fill the empty/null values in column "Phone,Email,address,city,state,Zip" based on the relationshiptype and LastName and write to a new csv file. Example: If a person relationshipType is "Employer" and his dependents share same lastname and if the dependents doesn't have "Phone,Email,address,city,state,zip", I should fill the null values with Employer's "Phone,Email,address,city,state,zip". There are two main concerns which I am facing, 1) LastName can be same for different people/families(i.e., row 5-10)so the loop should break and continue whenever "Relationship" changes to "Employer" 2) In Some cases, Dependent Lastname won't be same as Employer Lastname but still we should fill the null values if it falls in-between the same last names(i.e., row 8).
I won't be able to use python pandas as the rest of the code/programs are purely based on python 2.7.
Input format/table looks like below with empty cells (csv file):
[FirstName LastName DoB Relationship Phone Email Address City State Zip
Hannah Kahnw 9/12/1972 Employer 1457871452 [email protected] Han Ave hannas UT 563425
Michel Kahnw 2/9/1993 Dependent
Jonaas Kahnw 2/22/1997 Dependent
Mikkel Nielsen 1/25/1976 Employer 4509213887 [email protected] 887 Street neil NY 72356
Magnus Nielsen 9/20/1990 Dependent
Ulrich Nielsen 9/12/1983 Employer 7901234516 [email protected] Ulric Build mavric KS 421256
kathari Nielsen 10/2/2003 Dependent
kathy storm 12/12/1999 Dependent
kiiten Nielsen 6/21/1999 Dependent
Elisab Doppler 2/22/1987 Employer 5439001211 [email protected] Elisa apart Elis AR 758475
Peterp Doppler 1/25/1977 Employer 6847523758 [email protected] park Ave Pete PT 415253
bartos Doppler 9/21/1990 Dependent][1]
Output format should be like below:
FirstName LastName DoB Relationship Phone Email Address City State Zip
Hannah Kahnw 9/12/1972 Employer 1457871452 [email protected] Han Ave hannas UT 563425
Michel Kahnw 2/9/1993 Dependent 1457871453 [email protected] Han Ave hannas UT 563426
Jonaas Kahnw 2/22/1997 Dependent 1457871454 [email protected] Han Ave hannas UT 563427
Mikkel Nielsen 1/25/1976 Employer 4509213887 [email protected] 887 Street neil NY 72356
Magnus Nielsen 9/20/1990 Dependent 4509213888 [email protected] 888 Street neil NY 72357
Ulrich Nielsen 9/12/1983 Employer 7901234516 [email protected] Ulric Build mavric KS 421256
kathari Nielsen 10/2/2003 Dependent 7901234517 [email protected] Ulric Build mavric KS 421257
kathy storm 12/12/1999 Dependent 7901234518 [email protected] Ulric Build mavric KS 421258
kiiten Nielsen 6/21/1999 Dependent 7901234519 [email protected] Ulric Build mavric KS 421259
Elisab Doppler 2/22/1987 Employer 5439001211 [email protected] Elisa apart Elis AR 758475
Peterp Doppler 1/25/1977 Employer 6847523758 [email protected] park Ave Pete PT 415253
bartos Doppler 9/21/1990 Dependent 6847523759 [email protected] park Ave Pete PT 415254
import csv
from collections import namedtuple
def get_info(file_path):
# Read data from file and convert to list of namedtuples
# dictionary to use to fill in missing information from others.
with open(file_path, 'rb') as fin:
csv_reader = csv.reader(fin, skipinitialspace=True)
header = next(csv_reader)
Record = namedtuple('Record', header)
addr_dict = {}
data = [header]
for rec in (Record._make(row) for row in csv_reader):
if rec.Email or rec.Phone or rec.Address or rec.City or rec.State or rec.Zip:
addr_dict.setdefault(rec.LastName, []).append(rec) # Remember it.
# Try to fill in missing data from any other records with same Address.
for i, row in enumerate(data[1:], 1):
if not (row.Phone and row.Email and rec.Address and rec.City and rec.State and rec.Zip): # Info missing?
# Try to copy it from others at same address.
updated = False
for other in addr_dict.get(row.LastName, []):
if not row.Phone and other.Phone:
row = row._replace(Phone=other.Phone)
updated = True
if not row.Email and other.Email:
row = row._replace(Email=other.Email)
updated = True
if not row.Address and other.Address:
row = row._replace(Address=other.Address)
updated = True
if not row.City and other.City:
row = row._replace(City=other.City)
updated = True
if not row.Zip and other.Zip:
row = row._replace(Zip=other.Zip)
updated = True
if row.Phone and row.Email and rec.Address and rec.City and rec.State and rec.Zip: # Info now filled in?
break
if updated:
data[i] = row
return data
INPUT_FILE = 'null_cols.csv'
OUTPUT_FILE = 'fill_cols.csv'
data = get_info(INPUT_FILE)
with open(OUTPUT_FILE, 'wb') as fout:
writer = csv.DictWriter(fout, data[0]) # First elem has column names.
writer.writeheader()
for row in data[1:]:
writer.writerow(row._asdict())
#(i got this code from earlier question which i asked in S.O This script doesn't include relationshiptype logic and also it doesn't consider the Duplicate LastName issue)
Thanks for the help !!