WIP: Add memory_usage simulation method #432

benjello · 2017-01-15T11:10:54Z

When using survey data with the full openfisca-france model I end up with very large memory
usage. I need to know which are the variables that uses a lot of memory to either:

neutralize them
use a reform to lower memory usage (use a variable yearly instead of monthly etc)

With the help of @eraviart, the simulation method memory_usage was implemented. It is fairly sufficient to help me do my urgent tasks.

But you may have suggestions to improve it, and to choose the right location to put it.
Since

all the information is available at the simulation level,
memory_usage can be useful many situation (many type of SurveyScenatio but also test case scenario with large tax-benefit system like france),
simulations.py was my favorite location.

@cbenz @fpagnoux @MattiSG

…check

fpagnoux

I'm not convinced this should be in the core classes of openfisca.

If I understand well, you run this method, and it prints some info about the memory use by variable.

This can be a useful tool, such as the yaml test runner, so I would maybe put it in the tools folder, but not in the simulation class, which is very general and should remain as simple as possible.

fpagnoux · 2017-01-16T10:09:31Z

.noseids

@@ -0,0 +1,26 @@
+(dp1


This file doesn't belong here :)

fpagnoux · 2017-01-16T10:10:32Z

openfisca_core/simulations.py

@@ -258,6 +258,36 @@ def find_traceback_step(self, variable_name, period):
        step = self.traceback.get((variable_name, period))
        return step

+    def memory_usage(self):


Usually a method name contains a verb, as it does something

fpagnoux · 2017-01-16T10:12:06Z

openfisca_core/simulations.py

+    def memory_usage(self):
+        infos = []
+        for column_name in self.tax_benefit_system.column_by_name.iterkeys():
+            holder = self.holder_by_name.get(column_name)


Why not directly loop over self.holder_by_name ?

fpagnoux · 2017-01-16T10:20:04Z

Also, let's try to add clear doc when we add new features ;)

benjello · 2017-01-16T10:36:40Z

Thanks

benjello · 2017-01-16T10:54:39Z

Ok I will move it to tools

benjello · 2017-01-16T16:10:44Z

@fpagnoux : actually I prefer to leave it as a simulation method (and may be improve it to deliver a more structured information about memory usage).
For my use case I rely on the Scenario (test case or survey based scenario) and the natural holder of the whole data at some point is the simulation. I do not see the point to create a separate function which argument is a simulation and not use a method.

fpagnoux · 2017-01-16T16:25:45Z

I do not see the point to create a separate function which argument is a simulation and not use a method.

My reasoning :

simulation is a core class of openfisca. It is used by every simulation on openfisca.
The print_memory_usage() is never used when openfisca is doing its core job. It's a diagnosis tool, that is called manually to investigate performances.
Core classes must be as simple as possible, to be more maintainable. Extra tools that are used out of the main workflows don't really belong there imo.

I understand that the drawback of splitting would be a coupling between this tool and the simulation class. I'm willing to hear other point of views @MattiSG @cbenz

MattiSG · 2017-01-17T18:32:38Z

I do not see the point to create a separate function which argument is a simulation and not use a method.

The point is separation of concerns. Performance diagnosis is not a concern of the core.

I must say I'm very surprised such an introspection function has to be written at all, no matter where it belongs. That's a job for profilers, not for custom code. Which profilers have you tried to diagnose memory? Have you had a look at objgraph (guide)?

benjello · 2017-01-17T18:48:28Z

You need to have a simulation that fits into RAM.
The trade-off is between the number of variables and the size of your population.
When neutralizing the variables you may be interested to know which ones uses a lot of variables.
This memory usage may also depend on your use case etc.

This is actually you want to do when you actually use your tax-benefit-system: find the right tarde-off.
And I think that such a method is very convenient and adapted to the actual user needs.

And yes, I almost never used profilers and I was in a rush ;-)

MattiSG · 2017-01-17T19:31:50Z

OK. For the mid-term, I would definitely recommend you have a look at profilers, as they will probably bring much more insights to the trade-offs you want to make.
As a second step, and if they don't, we can consider adding such a feature to automatically advise on optimisations after some memory threshold is crossed.

Right now:

If you need to use this feature today, use this branch.
If you need to use this feature in the next two weeks, I recommend this piece of code is merged in a separate helper module, or at least in a separate file.

Add memory usage utility in tools

Fix scalar array

benjello · 2017-01-20T13:21:03Z

@fpagnoux @MattiSG : is it ok to merge it as is. I will definitely use memory usage and it belongs near the simulation class.

Sorry for messing thinhs up with other bits of code, I got confused with my branches.

fpagnoux · 2017-01-20T16:20:44Z

openfisca_core/tools/memory.py

+import numpy as np
+
+
+def memory_usage(simulation):


Verbs are better for functions.

get_memory_usage() ?

fpagnoux · 2017-01-20T16:21:28Z

openfisca_core/tools/memory.py

+        print(line.rjust(100))
+
+
+def print_memory_usage_old(simulation):


What is the difference with print_memory_usage ? Do we need both ?

My bad will clean this

benjello · 2017-01-27T15:06:54Z

Todo-list:

deal with variables that are not in cache

MattiSG

I see you added “WIP” after reviews, most probably because you added the to-do “deal with variables that are not in cache”. However, this was one month ago, and there was seemingly no more development since then.

Are you actively developing on this?
Is this useful to you even without the handling of variables that are not in the cache?
Is this add of “scalar” needed for the memory usage or is it a side optimisation? It's ok, I'd just like to make sure I understood properly :)

MattiSG · 2017-03-01T10:03:23Z

openfisca_core/tests/dummy_country/model/model.py

+class rempli_obligation_scolaire(Variable):
+    column = BoolCol(default = True)
+    entity = Individu
+    label = u"La personne rempli ses obligations scolaires"


MattiSG · 2017-03-01T10:03:25Z

openfisca_core/tests/dummy_country/model/model.py

@@ -51,6 +51,12 @@ class a_charge_fiscale(Variable):
    label = u"La personne n'est pas fiscalement indépendante"


+class rempli_obligation_scolaire(Variable):


MattiSG · 2017-03-01T10:03:31Z

openfisca_core/tests/test_reforms.py

+        def apply(self):
+            self.neutralize_column('rempli_obligation_scolaire')
+
+    reform = test_rempli_obligation_scolaire_neutralization(tax_benefit_system)


MattiSG · 2017-03-01T10:04:10Z

setup.py

@@ -7,7 +7,7 @@

 setup(
    name = 'OpenFisca-Core',
-    version = '4.2.1',
+    version = '4.2.2',


New API feature, I would recommend a minor bump rather than a patch.

MattiSG · 2017-03-01T10:05:04Z

openfisca_core/tools/memory.py

+# -*- coding: utf-8 -*-
+
+"""
+A module to investigate openfisca memory usage


Please add one or two lines explaining how to use it, when it can be useful (and where the alternatives such as profilers fail if you know of any).

MattiSG · 2017-03-01T10:05:49Z

CHANGELOG.md

-  Fix permanent and period size independent variables neutralization
+*  Fix permanent and period size independent variables neutralization
+
+* Fix occasionnal `NaN` creation in `MarginalRateTaxScale.calc` resulting from `0 * np.inf`


"occasionnal" → "occasional"

MattiSG · 2017-03-01T10:06:16Z

CHANGELOG.md

@@ -2,7 +2,9 @@

 ## 4.2.1

-  Fix permanent and period size independent variables neutralization
+*  Fix permanent and period size independent variables neutralization


There is no mention of the add of “scalar” and of the memory usage API.

MattiSG · 2017-03-01T10:07:45Z

openfisca_core/taxscales.py

@@ -194,7 +194,9 @@ def calc(self, base, factor = 1, round_base_decimals = None):
        base1 = np.tile(base, (len(self.thresholds), 1)).T
        if isinstance(factor, (float, int)):
            factor = np.ones(len(base)) * factor
-        thresholds1 = np.outer(factor, np.array(self.thresholds + [np.inf]))
+        # thresholds1 = np.outer(factor, np.array(self.thresholds + [np.inf]))
+        # changed to below to avoind NaN creation


Do we really need to keep this history as comments? git blame is there for that kind of use :)

cbenz

I reported some changes about print and encoding.

I'd like to understand why the Column.scalar attribute has been added, and the changelog to explain it.

cbenz · 2017-03-01T15:13:26Z

openfisca_core/tools/memory.py

+    infos_by_variable = get_memory_usage(simulation)
+    infos_lines = list()
+    for variable, infos in infos_by_variable.iteritems():
+        infos_lines.append((infos['nbytes'], variable, "{}: {} periods * {} cells * item size {} ({}) = {}".format(


Add u for strings in Python < 3

cbenz · 2017-03-01T15:15:01Z

openfisca_core/tools/memory.py

+            )))
+    infos_lines.sort()
+    for _, _, line in infos_lines:
+        print(line.rjust(100))


Encode the printed string in UTF-8 :

print(line.rjust(100).encode('utf-8'))

(Or use the logging module.)

benjello · 2017-12-09T14:03:29Z

@fpagnoux: puis-je fermer cette PR et virer la branche ?

benjello · 2017-12-13T14:28:48Z

@fpagnoux : je ferme. Dis-moi si je peux l'effacer définitivement.

fpagnoux · 2017-12-15T18:43:51Z

Oui, on peut fermer et supprimer, merci !

benjello added 10 commits January 9, 2017 22:21

Fix #410

4f0459c

Adjust absolute_error_margin for test_tax_cales

b4a9e80

Bump

81d3c9e

Merge branch 'master' into fix-nan-taxscale-calc

e399941

Bump

5ff9fdf

Add requested_period_default_value_neutralized to base_function type …

a932910

…check

Add memory_usage simulation method

0c7d425

Improve neutralization to reduce memory usage

213eabf

Add memory_usage simulation method

b6a3d40

flake8

6492e59

benjello requested review from cbenz and fpagnoux January 15, 2017 11:10

benjello added enhancement labels Jan 15, 2017

benjello mentioned this pull request Jan 15, 2017

Improve neutralization to reduce memory usage #433

Closed

fpagnoux requested changes Jan 16, 2017

View reviewed changes

benjello added 2 commits January 16, 2017 16:56

Merge remote-tracking branch 'origin/master' into memory_usage

2b72ae5

Improve and document method

58572fc

Delete unwanted file

e760273

benjello added 2 commits January 19, 2017 23:32

Merge branch 'fix-neutralization' into memory_usage

f2ae0ad

Fix merge

af45aed

Add memory usage utility in tools

benjello added 4 commits January 19, 2017 23:58

Fix setup.py

12f3a59

Add bareme scaled inverse test

d07fcba

Fix scalar array

Add memory to tools

ee8c727

Clean

dea550c

benjello changed the title ~~WIP: Add memory_usage simulation method~~ Add memory_usage simulation method Jan 20, 2017

benjello assigned fpagnoux Jan 20, 2017

fpagnoux reviewed Jan 20, 2017

View reviewed changes

benjello added 2 commits January 20, 2017 17:26

Fix memory usage tools

cb9d539

test neutralization variable with custom default

4f183aa

benjello changed the title ~~Add memory_usage simulation method~~ WIP: Add memory_usage simulation method Jan 27, 2017

MattiSG requested changes Mar 1, 2017

View reviewed changes

MattiSG assigned benjello and unassigned fpagnoux Mar 1, 2017

cbenz requested changes Mar 1, 2017

View reviewed changes

MattiSG removed prio-high labels May 2, 2017

fpagnoux added contribution and removed contribution labels Sep 18, 2017

benjello closed this Dec 13, 2017

fpagnoux deleted the memory_usage branch December 15, 2017 18:43

bonjourmauko added kind:perf A code change that improves performance and removed meta:performance labels Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add memory_usage simulation method #432

WIP: Add memory_usage simulation method #432

benjello commented Jan 15, 2017

fpagnoux left a comment

fpagnoux Jan 16, 2017

fpagnoux Jan 16, 2017

fpagnoux Jan 16, 2017

fpagnoux commented Jan 16, 2017

benjello commented Jan 16, 2017 via email •

edited

Loading

benjello commented Jan 16, 2017 via email •

edited

Loading

benjello commented Jan 16, 2017

fpagnoux commented Jan 16, 2017 •

edited

Loading

MattiSG commented Jan 17, 2017

benjello commented Jan 17, 2017 •

edited

Loading

MattiSG commented Jan 17, 2017

benjello commented Jan 20, 2017

fpagnoux Jan 20, 2017

benjello Jan 20, 2017

fpagnoux Jan 20, 2017

benjello Jan 20, 2017

benjello commented Jan 27, 2017

MattiSG left a comment

MattiSG Mar 1, 2017

MattiSG Mar 1, 2017

MattiSG Mar 1, 2017

MattiSG Mar 1, 2017

MattiSG Mar 1, 2017

MattiSG Mar 1, 2017

MattiSG Mar 1, 2017

MattiSG Mar 1, 2017

cbenz left a comment

cbenz Mar 1, 2017

cbenz Mar 1, 2017

benjello commented Dec 9, 2017

benjello commented Dec 13, 2017

fpagnoux commented Dec 15, 2017

		print(line.rjust(100))


		def print_memory_usage_old(simulation):

		@@ -51,6 +51,12 @@ class a_charge_fiscale(Variable):
		label = u"La personne n'est pas fiscalement indépendante"


		class rempli_obligation_scolaire(Variable):

WIP: Add memory_usage simulation method #432

WIP: Add memory_usage simulation method #432

Conversation

benjello commented Jan 15, 2017

fpagnoux left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpagnoux commented Jan 16, 2017

benjello commented Jan 16, 2017 via email • edited Loading

benjello commented Jan 16, 2017 via email • edited Loading

benjello commented Jan 16, 2017

fpagnoux commented Jan 16, 2017 • edited Loading

MattiSG commented Jan 17, 2017

benjello commented Jan 17, 2017 • edited Loading

MattiSG commented Jan 17, 2017

benjello commented Jan 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjello commented Jan 27, 2017

MattiSG left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cbenz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjello commented Dec 9, 2017

benjello commented Dec 13, 2017

fpagnoux commented Dec 15, 2017

benjello commented Jan 16, 2017 via email •

edited

Loading

benjello commented Jan 16, 2017 via email •

edited

Loading

fpagnoux commented Jan 16, 2017 •

edited

Loading

benjello commented Jan 17, 2017 •

edited

Loading