In this repository we present a pretrained model for classifiying Wittgenstein's remarks. The pretrained model can detect and classify 70 different categories for a remark: Jetzt, Regel, Sprache, Gedanke, Behauptung, Mengenlehre, Gleich, Unendliche Möglichkeit, Begriff, Idealismus, Gegenstand, Kardinalzahlen, Phänomenologie, Hypothese, Ursache, Ungefähr, Unendlichkeit, Entdeckung, Problem, Mathematik Metamathematik, Schmerzen, Sprache Sprachspiel, Satz, Klasse, Erwartung und Erfüllung, Gesichtsraum, XXX, Bedeutung, Grund, Sinn, Philosophie, Versuchen Suchen, Vorstellung, Abbild, Fähigkeit, Zeit, Logik, Farben und Farbenmischung, Minima Visibilia, Grund des Denkens, W-F-Notation, Undeutlichkeit, Glaube, Wissen, Logische Form, Tabelle, Anwendung, Unmittelbares, Allgemeinheit, Grammatik, Zeichen, Schach, Folgen, Beweis, Mathematik, Induktion Induktionsbeweis, Wahrscheinlichkeit, Gebrauch, Meinen, Physikalischer Raum, Absicht, Im selben Sinn, Zahlen, Regel Erfahrungssatz, Nicht, Verifikation, Verstehen, Tonfolge, Physikalische Sprache and Denken.
This work was done during summer semester 2017 with support by Dr. Maximilian Hadersbeck (LMU Munich). Hand-labeled data is provided by Dr. Josef G. F. Rothhaupt (LMU Munich).
This project was funded by Lehre@LMU with a NVIDIA Jetson TX-1.
Input remark:
Der Unterschied der Wortarten ist immer wie der Unterschied der Spielfiguren,
oder, wie der noch größere, einer Spielfigur und des Schachbrettes.
Hand-labeled gold label: "Grammatik"
The multi-label classification approach is implemented with Keras, TensorFlow and the magpie library. The following libraries must be installed:
Library | Version (tested) |
---|---|
magpie | 2.0 |
Keras | 2.1.3 |
TensorFlow | 1.5.0 |
h5py | 2.7.1 |
Notice: magpie should be installed via:
pip3 install --user git+https://github.com/inspirehep/[email protected]
Hand-labeled data is available for the complete Ts-212. Thus, hand-labeled categories for 7099 remarks are used. Then this corpus is split into training, development and test set.
Dataset | # Remarks |
---|---|
Training | 5620 |
Development | 719 |
Test | 760 |
The pretrained model consists of four files:
Description | Download |
---|---|
Word Embeddings | embedding.pkl |
Model | model.h5 |
Scaler | scaler.pkl |
Category labels | categories.labels |
Word embeddings, model and scaler are located in the current_model
of this
repository. categories.labels
is located in the root folder of this repository.
To classify new remarks of Ludwig Wittgenstein, the following script can be used:
from magpie import Magpie
with open('categories.labels') as f:
labels = [line.rstrip() for line in f.readlines()]
magpie = Magpie(
keras_model='current_model/model.h5',
word2vec_model='current_model/embedding.pkl',
scaler='current_model/scaler.pkl',
labels=labels
)
This loaded the pretrained model with all its dependencies like word embeddings or labels.
Then the following command can be used to classifiy a remark:
predicted = magpie.predict_from_text('“Ich denke, Du wirst die Scheibe irgendwo innerhalb dieses Kreises treffen”.')
print(predicted)
This will output of 5 best predicted categories for the input remark:
[('Allgemeinheit', 0.66499853), ('Folgen', 0.53158545),
('Regel', 0.004923807), ('Satz', 0.0018804041), ('Meinen', 0.0017680882)]
The gold categories are "Allgemeinheit" and "Folgen".
This classification script is located under classification.py
.
We would like to thank Dr. Maximilian Hadersbeck for his great support during the development phase. We also want to thank Dr. Josef G. F. Rothhaupt for providing us high-quality hand-labeled data for over 7000 remarks of Ludwig Wittgenstein.
We are deeply grateful that Lehre@LMU funded our research project with a NVIDIA Jetson TX1 developer board and we would like thank LMU Munich for this awesome program. This really helps students and boosts research.
For questions about deep-wittgenstein, contact the current maintainer: Stefan Schweter [email protected].
To respect the Free Software Movement and the enormous work of Dr. Richard Stallman
this repository is released under the GNU Affero General Public License
in version 3. More information can be found here
and in COPYING
.