ENH: Improve partial diagrams algo #116
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes
Improves performance of
diagrams.generate_partial_diagrams
by efficiently filtering out invalid
edges that do not span all nodes in
the kinetic diagram. For small diagrams
the performance benefit is small, but
for more complex diagrams (e.g., the
8-state model of EmrE), there is a
statistically significant performance
increase.
Fixes ENH: Update KDA partial and directional diagram generation algorithm #22
Description
While there is not a lot of performance to gain from changes to
generate_partial_diagrams
, I spent some time trying other algorithms to see how they compared to KDA.Wang Algebra Algorithm
The
KAPattern
software uses Wang algebra to efficiently generate the state probability expressions, so I gave that a shot first.Here is the function I wrote, which uses
SymPy
to generate the algebraic expressions:The code works perfectly well and all KDA tests pass. However, it is extremely slow. For EmrE it takes roughly
4.6
s to generate the spanning trees, whereas the KDA algo takes<20
ms. The issue is with theSymPy
.expand()
and.subs()
methods, which have to "foil" multivariate polynomials which are incredibly complex, then perform variable substitutions. I tried to improve the performance by expanding/substituting as the expressions are being built (~33% faster), but it still was nowhere close to the performance of KDA. I believeSymPy
is written in pure Python and is not known to provide great performance for tasks like this.NetworkX
AlgorithmDisappointed by the Wang algebra code, I figured I might as well try the
NetworkX.SpanningTreeIterator
.Here is the code:
Again, this passes all KDA tests and works perfectly well. In terms of code it is pretty straightforward since we hand off the spanning tree generation completely. However, again, it does not perform well compared to KDA. For the EmrE 8-state model I believe the spanning trees were generated in roughly
1
s, which is not terrible, but still considerably slower than the current KDA implementation.KDA Updated Algorithm
This brings us to the changes here. I took another look at the current algorithm and couldn't find much room for improvement. I knew we were generating every combination of edges and filtering them so I took a look at the invalid edge cases. I discovered that many of the invalid edges did not include all nodes, so I found a fast way to reject these cases from the edges alone, before any diagrams are created. It turns out that for complex models this does show a noticeable improvement:
On this specific run the 3-state model came up as a slower case, which is why it says
PERFORMANCE DECREASED
, but I'm not worried about the 300 microseconds we lost 😄