Splitting line
into words
using words = line.split(" ")
will only split the line on space characters. If two words are separated by a tab character, or other white space character other than a space, the split will not happen at that point, which may confuse your explicit checker.
Using words = line.split()
would split on spaces, tab characters, and any other white space characters.
You might also want to investigate splitting using the word break regular expression \b
, which would treat non-word characters as break points, so that a line containing an explicit word immediately followed by a comma (such as "explicit,"
) or period, or semicolon, etc. won't sneak through the filter.
for explicit_word in explicit_words:
if word == explicit_word:
return False
Python has great searching built into it. Instead of explicitly writing a loop over all explicit words, you could just ask Python to tell you if a word is in the list of explicit words:
if word in explicit_words:
return False
Or better, use a set()
of explicit words, and the in
operation drops from \$O(n)\$ to \$O(1)\$ time complexity:
explicit_words = {
"explicit", "bad", "ugly", "offensive", "words"
}
# ...
if word in explicit_words:
return False
We've improved things a little, but we still have an explicit (pun intended) loop over the list of words in a line. We can use the any()
method, and a list comprehension, to get Python to do the looping for us, which should be faster.
for line in song:
words = line.split()
if any(word in explicit_words for word in words):
return False
If word in explicits
returns True
for any of the word
in the list words
, the any()
method will "short-circuit" and immediately return True
without examining the remaining items in words
.
If "explicit"
is a bad word, and the song lyrics contains "Explicit"
or "EXPLICIT"
, you wouldn't want to mark the song as "CLEAN", right? Perhaps you want word.lower() in explicit_words
? As @wizzwizz4 pointed out, .casefold()
is better than .lower()
.
Resulting method:
explicit_words = set(explicit_word.casefold()
for explicit_word in ("explicit", "bad", "ugly", "offensive", "words"))
def isClean(song_path):
with open(song_path) as song:
for line in song:
words = line.split()
if any(word.casefold() in explicit_words for word in words):
return False
return True
song
contains? It cannot be a song, because that is something that you hear. So probably the lyrics then? Or the file? Ah, the path to the file that contains the lyrics of the song. I would try to make variable names as "explicit" as possible ;-) Sosong_path
(as used in the function) is already much better :-) \$\endgroup\$