Home > Net >  Python: How to compare two lists and if it is present, get all the details about them
Python: How to compare two lists and if it is present, get all the details about them

Time:01-30

I have a 1000-rows tsv data with tens of columns and arranged accordingly starting with their name and usernames that I got from Google Spreadsheet in which I need to extract all data with only 400 among those 1000 by using their Usernames and get them into a group of 40 randomly.

What I did was: I handpicked 400 usernames from the 1000 tsv. Now I need to put those 400 into 10 separated groups randomly. (which I did using online list shuffler)

Below is the python script I did to group 400 peoples into groups of 40.

person_list = []
max_person_in_a_class = 40

with open("randomized_participants.txt", "r") as group_input:
    iterator = 1
    for line in group_input:
        line = line.strip("\n")
        if iterator == max_person_in_a_class:
            iterator = 1
        else:
            person_list.append(line)
            iterator  = 1
i = 1
k = 1
print("\nGroup {}".format(k))
for person in person_list:
    if i != 41:
        print("{} {}".format(i, person))
        i  = 1  
    else:
        print("\nGroup {}".format(k 1))
        i = 1
        k  = 1

I successfully got those 400 into 10 different groups.

Now similar to search function, I need to get the details of a person using their username from the tsv.

For example: TSV file contains Name, Username, Age, Phone Number, Email, Address, Salaries, Quiz Marks. Meanwhile Handpicked file contains Username.

By using Handpicked's Username to match with the TSV's Username and get the whole Name, Username, Phone Number, Email, etc.

I want to use 'John' and 'Mary' from Handpicked and would like to automate the task of fetching their details such as Email, Address, Salaries from the TSV file.

with open('shortlisted.tsv', 'r') as tsv:
    for person_details in tsv:
        person_details = person_details.strip("\n")
        for person in person_list:
            if person in person_details:
                print(person)
            else:
                print("No match found")

Example of a line from randomized_participants.txt and shortlisted.tsv Line from randomized_participants.txt and shortlisted.tsv

Hopefully someone can guide me through what I'm missing or the term for this problem. Or an alternative for this.

Thank you in advance!

CodePudding user response:

Hopefully the below is adaptable enough to solve your problem.

# Using dict to store person data for flexibility

person_data = {}
# Simplified getting people into groups

N_PER_GROUP = 40
groups = []

with open('randomized_participants.txt', 'r') as f:
    group = []
    for (i, line) in enumerate(f.readlines()):
        person = line.strip()
        if i and not i % N_PER_GROUP:
            groups.append(group)
            group = []
        group.append(person)
        person_data[person] = []
    groups.append(group)
# Printing groups for debugging

for (i, group) in enumerate(groups):
    print(f'Group {i   1:>2}')
    for person in group:
        print(f'\t{person}')
    print()
# Collating each person's details

with open('shortlisted.tsv', 'r') as tsv:
    
    for line in tsv.readlines():
        details = line.split('\t')
        username = details[3]  # 3 b/c it was column D in your screenshot
        if username in person_data:
            person_data[username] = details
# Printing person data for debugging

for (person, data) in person_data.items():
    print(f'{person}: {data}')

Note that while you could eliminate person_data and just use the names in the groups, if you did that, the lookup for whether a given username is in each line becomes O(n•m) rather than O(n) (unless you switched to sets instead of lists, but for this application, the dictionary's memory usage is not a problem).

  •  Tags:  
  • Related