-
Notifications
You must be signed in to change notification settings - Fork 2
/
Changelog
103 lines (65 loc) · 2.87 KB
/
Changelog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
Finnish language model for spaCy
Version 0.15.1, 2024-11-14
* Better cleaning of training data
* Redacted person names in email addresses in the word frequency data
Version 0.15.0, 2024-10-19
* Compatible with spaCy 3.8
* Improved spam filter on the MC4 corpus
Version 0.14.0, 2023-10-14
* Compatible with spaCy 3.7
* The noun chunker includes chains of flats and nmods: e.g. "maaliskuun 7. päivänä"
* The parser doesn't try to detect nsubj:outer, dislocated and goeswith
dependencies anymore. There's not enough training data to learn those.
* Tokenize "-kampanja" as ["-", "kampanja"]
* Tokenize "maa-" as ["maa", "-"]
* Tokenize "/kk" as ["/", "kk"]
* Other tokenizer improvements
Version 0.13.0, 2023-07-21
* Compatible with spaCy 3.6
Version 0.12.0, 2023-02-01
* Compatible with spaCy 3.5
* Word occurrence probabilities (they have been broken in the past several versions)
Version 0.11.0, 2022-07-23
* Ported to spaCy 3.4
* Updated word vectors and word frequencies
* Minor fixes to the lemmatization
Version 0.10.0, 2022-05-07
* Floret embedding vectors trained on MC4_fi_cleaned
Version 0.10.0b1, 2022-04-09
* Ported to spaCy 3.3.0.dev0. Older spacy versions are not supported anymore.
* Noun chunker now splits off appositions as independent phrases
Version 0.9.0, 2022-01-19
* The pipeline now includes a named-entity recognizer (NER)
Version 0.8.0, 2021-11-21
* Ported to spaCy 3.2. Older spaCy versions are not supported anymore.
* Vectors for out-of-vocabulary words generated by Floret embeddings
* The default spaCy morphologizer instead of the custom Voikko-based morphologizer
Version 0.7.1, 2021-08-21
* Works on Python 3.7 again
Version 0.7.0, 2021-07-12
* Compatibility with spaCy v3.1
* Minor improvements to analysis: prefer non-compound words
Version 0.6.0, 2021-04-11
* Improved tagging and parsing accuracy by pretraining
* Improved lemmatization accuracy by better handling of ambiguous inflections
* Morphological features (case, verb tense, person, etc.)
* Properly set POS SPACE on whitespace tokens
Version 0.5.0, 2021-03-14
* Ported to spaCy 3.0. Does not support SpaCy 2.0 anymore.
Version 0.4.1, 2020-08-29
* Published as a PyPI package. The package name is spacy_fi_experimental_web_md
Version 0.4.0, 2020-07-06
* Ported to SpaCy 2.3
* Include 500k keys and 20k vectors like in the official *_md models
* Include the word vectors for the most frequent words
Version 0.3.0, 2020-05-17
* Extract noun phrases
* Lemmatize conjugated abbreviations: EU:ssa => EU
* Requires SpaCy 2.2.4 or later
Version 0.2.0, 2020-01-26
* Tagging auxiliary verbs as AUX (previously VERB) following the UD convention
* Fixed bugs in lemmatization of compounds words: ilmakuivata, esiopetus, etc
* Improved lemmatization of pronouns, especially clitics: sinäkin, mekään, etc
* Using the same Finnish tokenizer rules as the spaCy master branch
Version 0.1.0, 2020-01-11
Initial release