Skip to content

Commit

Permalink
Merge pull request #8 from fidelity/maintenance
Browse files Browse the repository at this point in the history
Maintenance Updates for Updated Dependencies
  • Loading branch information
dorukkilitcioglu authored Jun 7, 2022
2 parents 2c5bdd7 + 7c6e02b commit 5e2a16c
Show file tree
Hide file tree
Showing 48 changed files with 12,077 additions and 943 deletions.
9 changes: 5 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ on:
pull_request:
branches:
- master
schedule:
- cron: '25 16 * * 1' # Schedule at 16:25 UTC on Mondays, which is midday in EST

jobs:
Test:
Expand All @@ -24,7 +26,6 @@ jobs:
- name: Check
shell: bash
run: |
python3 -m pip install --upgrade pip
pip install -e .
python3 -m unittest discover -v tests
python3 setup.py install
python3 -m pip install --upgrade pip wheel
pip3 install -e .[full]
PYTHONHASHSEED=0 TEST_WORD_EMBEDDINGS=bert,elmo python3 -m unittest discover -v tests
18 changes: 18 additions & 0 deletions CHANGELOG.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,24 @@
TextWiser CHANGELOG
=====================

-------------------------------------------------------------------------------
Mar 03, 2022 1.4.0
-------------------------------------------------------------------------------
major:
- Update UMAP to return deterministic output in latest version
Source: https://github.com/fidelity/textwiser/blob/39bad042104c41d0d57174b49941882af79cc3db/textwiser/transformations/umap_.py#L23
- Update SVD to return deterministic output
Source: https://github.com/fidelity/textwiser/blob/39bad042104c41d0d57174b49941882af79cc3db/textwiser/transformations/svd.py#L26
- Update Word2Vec and Doc2Vec with latest gensim training
Source: https://github.com/fidelity/textwiser/blob/39bad042104c41d0d57174b49941882af79cc3db/textwiser/embeddings/word.py#L204
Source: https://github.com/fidelity/textwiser/blob/39bad042104c41d0d57174b49941882af79cc3db/textwiser/embeddings/doc2vec.py#L25

minor:
- Disable non-relevant transformers warning
- Add full requirement install option (see requirements_full.txt)
- Make gensim requirement explicit instead of relying on flair's dependency tree
- Directly utilize ELMo from allennlp instead of flair by bumping allennlp requirement

-------------------------------------------------------------------------------
Feb 23, 2022 1.3.2
-------------------------------------------------------------------------------
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,8 @@ vecs = emb.fit_transform(documents)
Examples can be found under the [notebooks](notebooks) folder.

## Installation

TextWiser requires **Python 3.6+** and can be installed from PyPI using ``pip install textwiser`` or by building from source as shown in [installation instructions](https://fidelity.github.io/textwiser/installation.html).
TextWiser requires **Python 3.6+** and can be installed from PyPI using ``pip install textwiser``, using ``pip install textwiser[full]`` to install from PyPI with all optional dependencies, or by building from source by following the instructions
in our [documentation](https://fidelity.github.io/textwiser/installation.html).

## Compound Embedding
A unique research contribution of TextWiser lies in its novel approach in creating embeddings from components,
Expand Down
2 changes: 1 addition & 1 deletion docs/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 82e0b592078d6ed744d4345e0c2b35df
config: 6434429e0258ad68daae5d25176eb3ff
tags: 645f666f9bcd5a90fca523b33c5a78b7
2 changes: 1 addition & 1 deletion docs/_sources/embeddings.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Embeddings
"Word Embedding: `Character <https://github.com/zalandoresearch/flair/blob/master/resources/docs/TUTORIAL_3_WORD_EMBEDDING.md#character-embeddings>`_", "| Initialized randomly and not pretrained
| Useful when trained for a downstream task
| Enable :ref:`fine-tuning<fine_tuning>` to get good embeddings"
"Word Embedding: `BytePair <https://github.com/zalandoresearch/flair/blob/master/resources/docs/embeddings/BYTE_PAIR_EMBEDDINGS.md>`_ ", "| Supported by these `pretrained embeddings <https://nlp.h-its.org/bpemb/#download>>`_
"Word Embedding: `BytePair <https://github.com/zalandoresearch/flair/blob/master/resources/docs/embeddings/BYTE_PAIR_EMBEDDINGS.md>`_ ", "| Supported by these `pretrained embeddings <https://nlp.h-its.org/bpemb/#download>`_
| Pretrained options can be specified with the string ``<lang>_<dim>_<vocab_size>``
| Default options can be omitted like ``en``, ``en_100``, or ``en__10000``
| Defaults to ``en``, which is equal to ``en_100_10000``"
Expand Down
8 changes: 5 additions & 3 deletions docs/_sources/installation.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,16 @@ The library is based on PyTorch but it also relies on:
* Spacy and it's ``en`` model are optional imports for OpenAI GPT; the model can be installed using ``python -m spacy download en``
* Tensorflow is an optional import for Universal Sentence Encoder. If you want to use USE, make sure you satisfy ``tensorflow>=2.0.0`` and ``tensorflow-hub>=0.7.0``.
* AllenNLP is an optional import for ELMo. If you want to use ELMo, make sure you satisfy ``allennlp``
* UMAP is an optional import for UMAP transformation. If you want to use UMAP, make sure you satisfy ``umap-learn``
* UMAP is an optional import for UMAP transformation. If you want to use UMAP, make sure you satisfy ``umap-learn>=0.5.1``

PyPI
----

TextWiser can be installed using ``pip install textwiser``, which will download the latest wheel from
`PyPI <http://pypi.org/project/textwiser/>`_. This will also install all required dependencies.

Alternatively, you can use ``pip install textwiser[full]`` to install TextWiser with all the optional dependencies.

Source Code
-----------

Expand All @@ -53,11 +55,11 @@ Alternatively, you can build a wheel package on your platform from scratch using

Test Your Setup
---------------
To confirm that installing the package was successful, run the first example in the :ref:`Quick Start<quick>`. To confirm that the whole installation was successful, run the tests and all should pass. When running the tests, it will download a 50MB pretrained model.
To confirm that installing the package was successful, run the first example in the :ref:`Quick Start<quick>`. To confirm that the whole installation was successful, run the tests and all should pass. When running the tests, it will download a 50MB pretrained model. Note that the ``PYTHONHASHSEED=0`` variable is necessary to ensure Doc2Vec training is reproducible - you do not need this if reproducibility is not important, or if you're not using Doc2Vec.

.. code-block:: bash
python -m unittest discover -v tests
PYTHONHASHSEED=0 python -m unittest discover -v tests
You can also set the ``TEST_WORD_EMBEDDINGS`` environmental variable to comma-separated word embeddings (ex: ``bert,flair``) to test them, or to ``all`` to test all possible word embeddings. Note that this will download all word embeddings, which is very time-consuming, and it assumes all optional requirements are satisfied.

Expand Down
134 changes: 134 additions & 0 deletions docs/_static/_sphinx_javascript_frameworks_compat.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
/*
* _sphinx_javascript_frameworks_compat.js
* ~~~~~~~~~~
*
* Compatability shim for jQuery and underscores.js.
*
* WILL BE REMOVED IN Sphinx 6.0
* xref RemovedInSphinx60Warning
*
*/

/**
* select a different prefix for underscore
*/
$u = _.noConflict();


/**
* small helper function to urldecode strings
*
* See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent#Decoding_query_parameters_from_a_URL
*/
jQuery.urldecode = function(x) {
if (!x) {
return x
}
return decodeURIComponent(x.replace(/\+/g, ' '));
};

/**
* small helper function to urlencode strings
*/
jQuery.urlencode = encodeURIComponent;

/**
* This function returns the parsed url parameters of the
* current request. Multiple values per key are supported,
* it will always return arrays of strings for the value parts.
*/
jQuery.getQueryParameters = function(s) {
if (typeof s === 'undefined')
s = document.location.search;
var parts = s.substr(s.indexOf('?') + 1).split('&');
var result = {};
for (var i = 0; i < parts.length; i++) {
var tmp = parts[i].split('=', 2);
var key = jQuery.urldecode(tmp[0]);
var value = jQuery.urldecode(tmp[1]);
if (key in result)
result[key].push(value);
else
result[key] = [value];
}
return result;
};

/**
* highlight a given string on a jquery object by wrapping it in
* span elements with the given class name.
*/
jQuery.fn.highlightText = function(text, className) {
function highlight(node, addItems) {
if (node.nodeType === 3) {
var val = node.nodeValue;
var pos = val.toLowerCase().indexOf(text);
if (pos >= 0 &&
!jQuery(node.parentNode).hasClass(className) &&
!jQuery(node.parentNode).hasClass("nohighlight")) {
var span;
var isInSVG = jQuery(node).closest("body, svg, foreignObject").is("svg");
if (isInSVG) {
span = document.createElementNS("http://www.w3.org/2000/svg", "tspan");
} else {
span = document.createElement("span");
span.className = className;
}
span.appendChild(document.createTextNode(val.substr(pos, text.length)));
node.parentNode.insertBefore(span, node.parentNode.insertBefore(
document.createTextNode(val.substr(pos + text.length)),
node.nextSibling));
node.nodeValue = val.substr(0, pos);
if (isInSVG) {
var rect = document.createElementNS("http://www.w3.org/2000/svg", "rect");
var bbox = node.parentElement.getBBox();
rect.x.baseVal.value = bbox.x;
rect.y.baseVal.value = bbox.y;
rect.width.baseVal.value = bbox.width;
rect.height.baseVal.value = bbox.height;
rect.setAttribute('class', className);
addItems.push({
"parent": node.parentNode,
"target": rect});
}
}
}
else if (!jQuery(node).is("button, select, textarea")) {
jQuery.each(node.childNodes, function() {
highlight(this, addItems);
});
}
}
var addItems = [];
var result = this.each(function() {
highlight(this, addItems);
});
for (var i = 0; i < addItems.length; ++i) {
jQuery(addItems[i].parent).before(addItems[i].target);
}
return result;
};

/*
* backward compatibility for jQuery.browser
* This will be supported until firefox bug is fixed.
*/
if (!jQuery.browser) {
jQuery.uaMatch = function(ua) {
ua = ua.toLowerCase();

var match = /(chrome)[ \/]([\w.]+)/.exec(ua) ||
/(webkit)[ \/]([\w.]+)/.exec(ua) ||
/(opera)(?:.*version|)[ \/]([\w.]+)/.exec(ua) ||
/(msie) ([\w.]+)/.exec(ua) ||
ua.indexOf("compatible") < 0 && /(mozilla)(?:.*? rv:([\w.]+)|)/.exec(ua) ||
[];

return {
browser: match[ 1 ] || "",
version: match[ 2 ] || "0"
};
};
jQuery.browser = {};
jQuery.browser[jQuery.uaMatch(navigator.userAgent).browser] = true;
}
46 changes: 37 additions & 9 deletions docs/_static/basic.css
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
*
* Sphinx stylesheet -- basic theme.
*
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
Expand Down Expand Up @@ -222,7 +222,7 @@ table.modindextable td {
/* -- general body styles --------------------------------------------------- */

div.body {
min-width: 450px;
min-width: 360px;
max-width: 800px;
}

Expand Down Expand Up @@ -335,13 +335,13 @@ p.sidebar-title {
font-weight: bold;
}

div.admonition, div.topic, blockquote {
div.admonition, div.topic, aside.topic, blockquote {
clear: left;
}

/* -- topics ---------------------------------------------------------------- */

div.topic {
div.topic, aside.topic {
border: 1px solid #ccc;
padding: 7px;
margin: 10px 0 10px 0;
Expand Down Expand Up @@ -380,13 +380,15 @@ div.body p.centered {
div.sidebar > :last-child,
aside.sidebar > :last-child,
div.topic > :last-child,
aside.topic > :last-child,
div.admonition > :last-child {
margin-bottom: 0;
}

div.sidebar::after,
aside.sidebar::after,
div.topic::after,
aside.topic::after,
div.admonition::after,
blockquote::after {
display: block;
Expand Down Expand Up @@ -428,10 +430,6 @@ table.docutils td, table.docutils th {
border-bottom: 1px solid #aaa;
}

table.footnote td, table.footnote th {
border: 0 !important;
}

th {
text-align: left;
padding-right: 5px;
Expand Down Expand Up @@ -615,6 +613,7 @@ ul.simple p {
margin-bottom: 0;
}

/* Docutils 0.17 and older (footnotes & citations) */
dl.footnote > dt,
dl.citation > dt {
float: left;
Expand All @@ -632,6 +631,33 @@ dl.citation > dd:after {
clear: both;
}

/* Docutils 0.18+ (footnotes & citations) */
aside.footnote > span,
div.citation > span {
float: left;
}
aside.footnote > span:last-of-type,
div.citation > span:last-of-type {
padding-right: 0.5em;
}
aside.footnote > p {
margin-left: 2em;
}
div.citation > p {
margin-left: 4em;
}
aside.footnote > p:last-of-type,
div.citation > p:last-of-type {
margin-bottom: 0em;
}
aside.footnote > p:last-of-type:after,
div.citation > p:last-of-type:after {
content: "";
clear: both;
}

/* Footnotes & citations ends */

dl.field-list {
display: grid;
grid-template-columns: fit-content(30%) auto;
Expand Down Expand Up @@ -731,8 +757,9 @@ dl.glossary dt {

.classifier:before {
font-style: normal;
margin: 0.5em;
margin: 0 0.5em;
content: ":";
display: inline-block;
}

abbr, acronym {
Expand All @@ -756,6 +783,7 @@ span.pre {
-ms-hyphens: none;
-webkit-hyphens: none;
hyphens: none;
white-space: nowrap;
}

div[class*="highlight-"] {
Expand Down
Loading

0 comments on commit 5e2a16c

Please sign in to comment.