Tags: JuliaText/TextAnalysis.jl
## TextAnalysis v0.8.1 [Diff since v0.7.5](v0.7.5...v0.8.1) **Merged pull requests:** - allow DocumentMetadata to hold arbirtary data (#158) (@tanmaykm) - Directional coom (#264) (@atantos) - Fixed UNICODE processing with the `strip_non_letters` flag in src/preprocessing.jl (#265) (@sigmundv) - ROUGE: fixed sentences calculation and some minor refactoring (#272) (@rssdev10) - CI: updated scripts. Minimal Julia is 1.6 now (#275) (@rssdev10) - Code refactoring (#276) (@rssdev10) - documentation update (#277) (@rssdev10) - CompatHelper: add new compat entry for Statistics at version 1, (keep existing compat) (#278) (@github-actions[bot]) - Fix/showprogress (#281) (@rssdev10) - Fix/style improvement (#282) (@rssdev10) **Closed issues:** - error on LDA Julia 0.4 (#37) - remove_corrupt_utf8() not working (#41) - remove_corrupt_utf8! giving "no method matching zero" error (#68) - stemming issue for certain words e.g. providing -> provid (#69) - rouge_n not defined (#193) - error strip_spares_terms not defined (#212) - Eval can be replaced by getfield in tag_scheme! (#242) - Seems there are some typos in documents (#249) - StringIndexError when trying to create a StringDocument based on a UTF8 string (#255) - Converting Corpus to Dataframe not working. (#279)
## TextAnalysis v0.7.5 [Diff since v0.7.4](v0.7.4...v0.7.5) **Merged pull requests:** - CompatHelper: add new compat entry for DelimitedFiles at version 1, (keep existing compat) (#269) (@github-actions[bot]) - Clean README, docs and docstrings (#270) (@pitmonticone) - Update coom.jl (#271) (@ms10596) - added BLEU score (#273) (@rssdev10) - Update README.md (#274) (@ms10596) **Closed issues:** - Implementation of cosine similarity? (#215) - Dependence on BinaryProvider.jl prevents TextAnalysis from working on arm64-apple-darwin natively. (#260)
## TextAnalysis v0.7.4 [Diff since v0.7.3](v0.7.3...v0.7.4) **Closed issues:** - PerceptronTagger is not defined (#262) - Libstemmer not defined for ARM (M1 Mac) (#263) **Merged pull requests:** - Update README.md (#254) (@dunefox) - Move some docs to TextModels (#256) (@AdarshKumar712) - fix string indexing in `summary` (#257) (@ericphanson) - CompatHelper: bump compat for StatsBase to 0.34, (keep existing compat) (#268) (@github-actions[bot])
## TextAnalysis v0.7.3 [Diff since v0.7.2](v0.7.2...v0.7.3) **Closed issues:** - CI is failing on the latest Julia master (#252) **Merged pull requests:** - add cosine similarity calculation (#248) (@hhaensel) - Latent Dirichlet allocation: display a progress bar during Gibbs sampling (#250) (@DilumAluthge) - remove `write_sub` (#253) (@aviks)
## TextAnalysis v0.7.2 [Diff since v0.7.1](v0.7.1...v0.7.2) **Closed issues:** - Methods to merge two DocumentTermMatrix instances (#243) **Merged pull requests:** - CompatHelper: bump compat for "DataFrames" to "0.22" (#239) (@github-actions[bot]) - Use Tables.jl, remove explicit DataFrames dependency (#240) (@aviks) - methods to help manipulate and update DocumentTermMatrix incrementally (#244) (@tanmaykm) - optimize document term sparse matrix operations (#245) (@tanmaykm) - fix Project.toml, add Tables compat entry (#246) (@tanmaykm)
## TextAnalysis v0.7.1 [Diff since v0.7.0](v0.7.0...v0.7.1) **Closed issues:** - Move models to TextModels.jl (#111) - Tag a new release (#177) - Provide libstemmer through Yggdrasil (#204) - Julia TextAnalysis NERTagger (#214) - Unable to convert corpus to DataFrame (#236) **Merged pull requests:** - Fix conversion to DataFrame (#237) (@aviks) - fix link to the docs in README.md (#238) (@gxyd)
## TextAnalysis v0.7.0 [Diff since v0.6.0](v0.6.0...v0.7.0) **Closed issues:** - Feature Request: Part of speech tagging (#2) - Implement Named Entity Recognition (NER) (#117) - Can a new release be tagged? (#139) - Need API documentation (#146) - Extend Naive Bayes Classifier to support the various document types (#152) - Summarize function throws error for docs with less than 5 sentences. (#153) - UndefVarError when `prepare!` called on Corpus (#171) - Need to export Flux, Tracker (#178) - Docs and docstring for Sentiment Analysis model needs fixing (#182) - NaiveBayesClassifier scope error. (#192) - APIs to avoid datatype constraint between CorpusLoaders.jl and TextAnalysis.jl (#195) - Add entry for ULMFiT in docs/make.jl (#196) - Unexpected behaviour of ngram(sd, 3) (#202) - "resulting" bug (#205) - Statistical tokenization algorithms (#207) - Trying to use NaiveBayesClassifier results in UndefVarError (#216) **Merged pull requests:** - Simple document classifier (AKA spam filter) (#106) (@MikeInnes) - Average Perceptron POS Tagger (Issue #2) (#131) (@ComputerMaestro) - Remove HTML style tags in preprocessing (#137) (@phereford) - PR: To address performance issues with stopword removal (#141) (@asbisen) - Indentation fix patch (#142) (@Ayushk4) - Fix deprecated function in extended example (#144) (@ViralBShah) - Add characters to list of punctuations (#145) (@asbisen) - Add API documentation (#147) (@aquatiko) - Update ngramizer.jl (#148) (@djokester) - Add offline Documentation (Docstrings) to the codebase (#150) (@Ayushk4) - Documentation for Bayes.jl (#151) (@Ayushk4) - Update summarizer.jl (#154) (@Ayushk4) - Fix deprecations in show.jl (#155) (@Ayushk4) - Added ROUGE Score to TextAnalysis.jl (#156) (@djokester) - allow multiple ngram complexity in NGramDocument, ngrams and ngrammize (#157) (@tanmaykm) - Update the documentation reflecting changes in show.jl (#159) (@Ayushk4) - Add functions for Tagging Schemes and Conversion. (#161) (@Ayushk4) - Conditional Random Fields (#162) (@Ayushk4) - BM25, Co-occurrence Matrix, faster ROUGE, Fixing LSA. (#165) (@Ayushk4) - Use datadeps for AvgPerceptronTagger, add pos tagging over document types (#166) (@Ayushk4) - Named Entity Recognition (#167) (@Ayushk4) - Add API for Part of Speech Tagging (#169) (@Ayushk4) - Add favicon to the docs (#170) (@Ayushk4) - Fix prepare! on strip_whitespace (#172) (@Ayushk4) - Readme updated. Docs edited to provide API Reference online. (#173) (@Ayushk4) - ULMFiT (#179) (@aviks) - Fix Sequence Labelling Models, fixes #178 (#180) (@Ayushk4) - Drop support for 0.7 and add support for 1.3 (#181) (@Ayushk4) - Minor fix of doc and docstring of Sentiment Analysis (#184) (@tejasvaidhyadev) - Remove duplicate entries in Project.toml, and fix a broken build (#189) (@DilumAluthge) - Bump version number from "0.6.0" to "0.7.0" (#190) (@DilumAluthge) - Install TagBot as a GitHub Action (#194) (@JuliaTagBot) - updated docs/make.jl (#198) (@tejasvaidhyadev) - make DTM type generic (#199) (@baggepinnen) - bug fix in get_sentiment function (#206) (@tejasvaidhyadev) - Language Model Interface (#210) (@tejasvaidhyadev) - Modify loop in initial assignments of lda to use sparse structure. (#213) (@jmoralez) - export NaiveBayesClassifier (#217) (@agarie) - Extend NaiveBayesClassifier to support Documents as input #152 (#219) (@KimBue) - Minor Fixes (#220) (@tejasvaidhyadev) - LM doc fix (#233) (@tejasvaidhyadev) - Split project, separate TextModels (#234) (@aviks)