jeudi 1 mai 2014

python - cocher correspondance partielle dans 1 liste avec partielle correspondant à une autre liste - Possible avec compréhension de liste ? -Débordement de pile


somewhat of a python/programming newbie here.


I have written code that does what I need it to:


import re
syns = ['professionals|experts|specialists|pros', 'repayed|payed back', 'ridiculous|absurd|preposterous', 'salient|prominent|significant' ]
new_syns = ['repayed|payed back', 'ridiculous|crazy|stupid', 'salient|prominent|significant', 'winter-time|winter|winter season', 'professionals|pros']

def pipe1(syn):
# Find first word/phrase in list element up to and including the 1st pipe
r = r'.*?\|'
m = re.match(r, syn)
m = m.group()
return m

def find_non_match():
# Compare 'new_syns' with 'syns' and create new list from non-matches in 'new_syns'
p = '@#&' # Place holder created
joined = p.join(syns)
joined = p + joined # Adds place holder to beginning of string too
non_match = []
for syn in new_syns:
m = pipe1(syn)
m = p + m
if m not in joined:
non_match.append(syn)
return non_match

print find_non_match()

Printed output:


['winter-time|winter|winter season']

The code checks if the word/phrase up to and including the first pipe for each element in new_syns is a match for the same partial match in syns list. The purpose of the code is to actually find the non-matches and then append them to a new list called non_match, which it does.


However, I am wonder if it is possible to achieve the same purpose, but in much fewer lines using list comprehension. I have tried, but I am not getting what I want exactly. This is what I have come up with so far:


import re
syns = ['professionals|experts|specialists|pros', 'repayed|payed back', 'ridiculous|absurd|preposterous', 'salient|prominent|significant' ]
new_syns = ['repayed|payed back', 'ridiculous|crazy|stupid', 'salient|prominent|significant', 'winter-time|winter|winter season', 'professionals|pros']

def pipe1(syn):
# Find first word/phrase in list element up to and including the 1st pipe
r = r'.*?\|'
m = re.match(r, syn)
m = '@#&' + m.group() # Add unusual symbol combo to creatte match for beginning of element
return m

non_match = [i for i in new_syns if pipe1(i) not in '@#&'.join(syns)]
print non_match

Printed output:


['winter-time|winter|winter season', 'professionals|pros'] # I don't want 'professionals|pros' in the list

The caveat in the list comprehension is that when joining syns with @#&, I don't have the @#& at the beginning of the now joined string, whereas in my original code above that does not use the list comprehension I add @#& to the beginning of the joined string. The result is that 'professionals|pros' have slipped through the net. But I don't know how to pull that off within the list comprehension.


So my question is "Is this possible with the list comprehension?".




I think you want something like:


non_match = [i for i in new_syns if not any(any(w == s.split("|")[0] 
for w in i.split("|"))
for s in syns)]

This doesn't use regular expressions, but does give the result


non_match == ['winter-time|winter|winter season']

The list includes any items from new_syns where none (not any) of the '|'-separated words w are in any of the first word (split("|")[0]) of each synonym group s from syns



somewhat of a python/programming newbie here.


I have written code that does what I need it to:


import re
syns = ['professionals|experts|specialists|pros', 'repayed|payed back', 'ridiculous|absurd|preposterous', 'salient|prominent|significant' ]
new_syns = ['repayed|payed back', 'ridiculous|crazy|stupid', 'salient|prominent|significant', 'winter-time|winter|winter season', 'professionals|pros']

def pipe1(syn):
# Find first word/phrase in list element up to and including the 1st pipe
r = r'.*?\|'
m = re.match(r, syn)
m = m.group()
return m

def find_non_match():
# Compare 'new_syns' with 'syns' and create new list from non-matches in 'new_syns'
p = '@#&' # Place holder created
joined = p.join(syns)
joined = p + joined # Adds place holder to beginning of string too
non_match = []
for syn in new_syns:
m = pipe1(syn)
m = p + m
if m not in joined:
non_match.append(syn)
return non_match

print find_non_match()

Printed output:


['winter-time|winter|winter season']

The code checks if the word/phrase up to and including the first pipe for each element in new_syns is a match for the same partial match in syns list. The purpose of the code is to actually find the non-matches and then append them to a new list called non_match, which it does.


However, I am wonder if it is possible to achieve the same purpose, but in much fewer lines using list comprehension. I have tried, but I am not getting what I want exactly. This is what I have come up with so far:


import re
syns = ['professionals|experts|specialists|pros', 'repayed|payed back', 'ridiculous|absurd|preposterous', 'salient|prominent|significant' ]
new_syns = ['repayed|payed back', 'ridiculous|crazy|stupid', 'salient|prominent|significant', 'winter-time|winter|winter season', 'professionals|pros']

def pipe1(syn):
# Find first word/phrase in list element up to and including the 1st pipe
r = r'.*?\|'
m = re.match(r, syn)
m = '@#&' + m.group() # Add unusual symbol combo to creatte match for beginning of element
return m

non_match = [i for i in new_syns if pipe1(i) not in '@#&'.join(syns)]
print non_match

Printed output:


['winter-time|winter|winter season', 'professionals|pros'] # I don't want 'professionals|pros' in the list

The caveat in the list comprehension is that when joining syns with @#&, I don't have the @#& at the beginning of the now joined string, whereas in my original code above that does not use the list comprehension I add @#& to the beginning of the joined string. The result is that 'professionals|pros' have slipped through the net. But I don't know how to pull that off within the list comprehension.


So my question is "Is this possible with the list comprehension?".



I think you want something like:


non_match = [i for i in new_syns if not any(any(w == s.split("|")[0] 
for w in i.split("|"))
for s in syns)]

This doesn't use regular expressions, but does give the result


non_match == ['winter-time|winter|winter season']

The list includes any items from new_syns where none (not any) of the '|'-separated words w are in any of the first word (split("|")[0]) of each synonym group s from syns


0 commentaires:

Enregistrer un commentaire