-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bottleneck in 072 and 074 fuzzers #1214
Comments
Looks like the node job time is pretty small compared to the tiles job time, so that is where we should look for issues. |
@litghost Yep, I have thought that maybe, we exceed in the production of the We could extract the tiles from 1 or 2 clock regions, maintaining the run-time constant with the change of the part. I need to verify whether this is doable though. |
This is a fragile solution, for a number of reasons. There are "weird" tiles around the following areas:
As a result, there would be a fairly manual process to identify all the "weird" stuff. My baseline assumption right now is we are doing something in the tiles loop that is "expensive", e.g. a linear lookup, that needs to be fixed. I suggest bisecting the work that jobtiles.tcl does until the runtime drops. As a concrete example, what if jobtiles.tcl only outputs the wires in the tiles, does it still take as long, etc, etc |
@litghost Right, I'll start to profile run-time in a more detailed way to get where exactly is the bottleneck of the process. |
@litghost I think I have identified what the problem is and where is the bottleneck. By disabling the pip loop that extracts all pips related to a tile, run-time dropped from Moreover, I am keen to think that the issue is in the INT tiles. They are the most popular tiles, and each of them contains hundreds of pips, resulting in |
Try dropping anything that uses |
@litghost That was the right call, run-time is now
|
Ok, so rather than writing out the full timing info, just write the speed index. Then merge all the tile jsons (e.g. merging the speed index), then create a tcl script to back annotate the speed indices with the timing data originally dumps from the tcl script. |
Hi, the last comments may be a bit old, but the issue is still real ;-) Disk usage of 074 is 82 GB, and this is nearly exclusively the 174k very tiny json5 files. I did an experiment : concatenate all these, and compress with lz4 with fastest compression => result is one 4 GB file (to be compared to 40+ GB of file contents and 82 GB of actual disk usage). So a reduction of 40x. Looking casually into the python code, it looks like these json files are accessed by bulk with processing interleaved, so it could make sense also for CPU, to have this packed+compressed storage. Everything would fit cached in RAM, too :-) Other issue for scalability, I monitored RAM usage => result is up to 66.5 GB of virtual memory. EDIT : The fuzzer 074 took 20 days to finish xD What do you think of these observations ? |
The data dumping process for fuzzers 072 and 074 is taking a huge part in the run-time, expecially for big parts (e.g. artix 200T).
For what regards fuzzer 074, the run-time to get the data is divided in
tiles
andnodes
:The above is related to the zynq7010 part.
This is an issue, as it prevents scaling on bigger parts.
There is the need to find a more optimal solution to dump all data necessary for the reduction step.
The text was updated successfully, but these errors were encountered: