Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
liwc		liwc
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Repository files navigation

`liwc`

Linguistic Inquiry and Word Count (LIWC) analyzer.

The LIWC lexicon is proprietary, so it is not included in this repository, but this Python package requires it. The lexicon data can be acquired (purchased) from liwc.net. This package reads from the LIWC2007_English100131.dic (MD5: 2a8c06ee3748218aa89b975574b4e84d) file, which must be available on any system where this package is used.

The LIWC2007 .dic format looks like this:

%
1   funct
2   pronoun
[...]
%
a   1   10
abdomen*    146 147
about   1   16  17
[...]

Setup

Install from PyPI:

pip install -U liwc

Example

import re

def tokenize(text):
    # you may want to use a smarter tokenizer
    for match in re.finditer(r'\w+', text, re.UNICODE):
        yield match.group(0)

import liwc
parse, category_names = liwc.load_token_parser('LIWC2007_English100131.dic')

parse is a function from a token of text (a string) to a list of matching LIWC categories (a list of strings)
category_names is all LIWC categories in the lexicon (a list of strings)

gettysburg = '''Four score and seven years ago our fathers brought forth on
  this continent a new nation, conceived in liberty, and dedicated to the
  proposition that all men are created equal. Now we are engaged in a great
  civil war, testing whether that nation, or any nation so conceived and so
  dedicated, can long endure. We are met on a great battlefield of that war.
  We have come to dedicate a portion of that field, as a final resting place
  for those who here gave their lives that that nation might live. It is
  altogether fitting and proper that we should do this.'''.lower()
gettysburg_tokens = tokenize(gettysburg)

Now, count all the categories in all of the tokens, and print the results:

from collections import Counter
gettysburg_counts = Counter(category for token in gettysburg_tokens for category in parse(token))
print(gettysburg_counts)
#=> Counter({'funct': 58, 'pronoun': 18, 'cogmech': 17, ...})

N.B.:

The LIWC lexicon only matches lowercase strings, so you will most likely want to lowercase your input text before passing it to parse(...). In the example above, I call .lower() on the entire string, but you could alternatively incorporate that into your tokenization process (e.g., by using spaCy's token.lower_).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`liwc`

Setup

Example

N.B.:

License

About

Releases

Packages

Languages

License

chbrown/liwc-python

Folders and files

Latest commit

History

Repository files navigation

liwc

Setup

Example

N.B.:

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`liwc`

Packages