Skip to content

Commit d818133

Browse files
committed
Modify UnicodeDecodeError text. You'll use utf-8
1 parent 9cebdb7 commit d818133

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

ch05/classify.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ def prepare_sent_features():
5454
if not text:
5555
meta[pid]['AvgSentLen'] = meta[pid]['AvgWordLen'] = 0
5656
else:
57+
text = text.decode('utf-8')
5758
sent_lens = [len(nltk.word_tokenize(
5859
sent)) for sent in nltk.sent_tokenize(text)]
5960
meta[pid]['AvgSentLen'] = np.mean(sent_lens)

0 commit comments

Comments
 (0)