Applications
-
@@ -23,7 +126,8 @@
Basic Chares
- - Primality test -
- Part A: Primality testing with a chare for each number being tested -
- Part B: Batched primality testing -
Chare Arrays
- - Data balancing -
- Design exercise -
- Method 1 -
- Method 2 -
- Method 3 -
- K-Means Clustering -
- Odd-even sort -
- Particle exercise -
- Particle exercise with load balancing -
- Particle exercise with projections -
- Particle exercise with liveViz -
Structured Dagger
- - Rewrite odd-even sort using sdag (and see how simple it became) -
- Key-value pairs -
Threaded Methods and Futures
- Libraries / Modules, and Quiesence Detection
-- - Extend each chare to contain particles of three different colors. Each color should move at different speeds - green should move at the default speed, red should move twice as fast and green should move half as fast. - - -
- - Distribute colours to reproduce the imbalance as shown in the figure: reds bunched in the centre, greens in the upper-left, blues in the lower-right. - - -
-
- Download the LiveViz client from GitHub. During link time, use the
-module liveViz
flag. Note that LiveViz will not work with multicore charm builds. For additional instructions - refer to the documentation. -
-
- -
- Run your C++ code using
++server
and++server-port 1234
, and then run the liveViz client usingliveVize localhost 1234
, and observe how the particles behave. -
- -
- Let
m × m
be the chare-array dimensions andk
the baseline number of particles per chare. Update the number of particles per chare to be -N_{x,y} = k + \frac{k(x + y)}{m}. -
-
- -
- Build with tracing enabled using the
-tracemode projections
flag. -
- - - Install projections, and focus on the time profile view to see how the imbalance affects performance and efficiency. - -
- The pseudocode for the overall algorithm is: -
- For each of the particles that belong to my chare: change its x and y coordinate by a small random amount. - If this change causes the particle to move to an invalid coordinate, keep the particle in its original position. -
- Move all the particles that do not belong to a chare's bounding box to their correct homes. - Since the movement is small, this will mean communication to the eight near neighbor chares. Some of these messages may contain no particles. - For both efficiency and ease of implementation, all communication with neighbor chares should happen at the end of an iteration rather than after each particle. -
if(iteration%10 == 0)
- - Do reductions to calculate average and max number of particles -
- Choose to move the particle along either the x or y axis using an additional random variable, which simplifies communication as there are now only four neighbor chares. -
- If a movement causes a particle to move to an invalid coordinate, move in the opposite direction instead. -
-
- Implement
PUP
serialization for your chare array as documented. -
- -
- At appropriate intervals, call
AtSync()
so the runtime knows the chare is ready for potential migration. -
- -
- Select a load-balancing strategy. For this exercise we will use
-
GreedyRefineLB
(use option-balancer GreedyRefineLB
). -
-
- - - Re-run the application as in Part 1 and compare how the program behaves differently. - -
- - First, get a copy of Charm++ from the download page. - If you want to do development on Charm++ itself, you may want to clone our - git repository at http://charm.cs.illinois.edu/gerrit/charm.git to make it - easier to contribute patches and merge your work with the latest version of - Charm. - - -
- - Next, you need to compile Charm++ on your system. You'll need basic Unix - build utilities like make and a C++ compiler. Charm++ provides a script - called 'build' in its top-level directory that will build Charm++ for you, - given some information about the system you're compiling for and the - features you want access to. The README file in Charm++'s top-level directory - gives details about the various options you have available to you when - compiling Charm++. - - -
- - Now it's time to get started building and running Charm++ programs. To get - started, you may want to look at the many examples in the examples - directory of Charm++. More information about compiling and running - applications is available in the - installation and usage guide. For more in-depth information about - the features of Charm++, consult the - Charm++ Parallel Programming System Manual. - -
- Charm++ Release 7.0.0 -
- The 19th Annual Workshop on Charm++ and its Applications -
- Charm++ Release 6.10.2 -
- Charm++ Release 6.10.1 -
- Charm++ Release 6.10.0 -
- Charm++ Release 6.9.0 -
- Charm++ at SuperComputing 2018 -
- Charm++ Release 6.8.2 -
- Charm++ at SuperComputing 2017 -
- Charm++ Release 6.8.1 -
- Charm++ Release 6.8.0 -
- The 15th Annual Workshop on Charm++ and its Applications -
- Charm++ at SuperComputing 2016 -
- Stable release of Charm++ version 6.7.1 -
- The 14th Annual Workshop on Charm++ and its Applications -
- Stable release of Charm++ version 6.7.0 -
- Charm++ issue tracker now publicly accessible -
- Local SIAM chapter hosted Charm++ Tutorial -
- Fernbach Award for Profs. Kale, Schulten -
-
-
environments
--
-
- x86 (workstations, laptops, etc.) -
- ARM7/8 -
- Cray XC30/40, XE6, XK6/7 -
- IBM BlueGene Q -
- IBM POWER8/9 -
- Beowulf clusters -
- Linux, Mac, Windows -
- -
-
network interfaces
--
-
- TCP, UDP -
- Infiniband verbs -
- MPI -
- OFI -
- PAMI (IBM BlueGene Q and POWER) -
- uGNI (Cray Gemini and Aries) -
- Shared Memory -
- -
-
compilers
--
-
- clang -
- cray -
- fujitsu -
- gcc -
- ibm -
- intel -
- microsoft -
- portland group -
- - Hybrid Tutorial 2023 -
- 19th Workshop 2021 -
- 18th Workshop 2020 -
- 17th Workshop 2019 -
- 16th Workshop 2018 -
- 15th Workshop 2017 -
- 14th Workshop 2016 -
- 13th Workshop 2015 -
- 12th Workshop 2014 -
- 11th Workshop 2013 -
- 10th Workshop 2012 -
- 9th Workshop 2011 -
- 8th Workshop 2010 -
- 7th Workshop 2009 -
- 6th Workshop 2008 -
- 5th Workshop 2007 -
- 4th Workshop 2005 -
- 3rd Workshop 2004 -
- 2nd Workshop 2003 -
- 1st Workshop 2002 - - -
- Express algorithms and program designs using objects -
- Decompose computation into interacting object collections -
- Write native C++ code -
- Use all C++ capabilities (OO, generics etc) -
- Some classes are elevated into global visibility -
- Objects of these classes can be addressed from any processor -
- Parallel control flow primarily involves these globally visible objects -
- Some methods of globally visible classes are also elevated into global visibility -
- Only these methods of a globally visible object can be invoked from any processor -
- Objects in computation interact using method invocations -
- Do not return any data -
- Method invocation does not block -
- No promise of immediate execution -
- Collections of objects of a given type can be created, managed and addressed collectively -
- Collections are indexed -
- Each object in collection is globally visible. can be addressed via the tuple of collection handle and object index. -
- Method invocations on such collections are implicit broadcasts to all objects -
- Easily control placement of objects on processors -
- Objects can be migrated -
- Any data in program can be migrated -
- Simply needs a serialization function -
- Typical data decomposition is by decomposing the data across multiple objects of a single class -
- Massively parallel computations on each portion of the data can be performed by simply invoking methods on the whole collection -
-
-
Check out Projections from the repository:
-git clone https://github.com/UIUC-PPL/projections
-
- -
-
This will create a directory named projections. Move to this directory:
-cd projections
-
- -
-
And now build Projections:
-make
-
- -
-
Check out Charm Debug from the repository:
-git clone https://github.com/UIUC-PPL/ccs_tools
-
- -
-
This will create a directory named ccs_tools. Move to this directory:
-cd ccs_tools
-
- -
-
And now build Charm Debug:
-ant
-
- - - broadcast: In the Array "Hello World" program, the last thing the Main::Main() - constructor did was tell the first element of the helloArray to sayHi(). - In this version of "Hello World," Main::Main tells the entire array of Hello chare - objects to sayHi(). This is done by calling Hello::sayHi() on the array - itself (instead of just a single element of the array). -
- - sayHi(): The Hello::sayHi() entry method no longer has the if statement that - tests if the object is the last object in the chare array. Instead, every element of the - chare arrays invokes done() on the Main chare object (sends a message). -
- - done(): The Main::done() entry method is now invoked multiple times. The - Main chare object has a counter, doneCount, that counts the number of times - Main::done() has been called. Once this count reaches the number of elements in - the Hello chare array, the program exits. -
- - numElements: Since the Main chare object is the only object that needs access - to the numElements value, it has been made into a member variable of the Main chare class. -
- - <target>: This indicates which portion of the Charm++ distribution should be - built. Because there are many components to the Charm++ distribution this allows the user - to only build the portions that are needed for their purpose. Targets include: charm++, AMPI, - FEM, LIBS, bigemulator, pose, jade, and msa. For example, the charm++ target builds the - basic Charm++ Runtime System. The LIBS target builds additional libraries with various - functionality (also builds the charm++ target on which it depends). If the pose target is - used, both charm++ and LIBS targets will be built followed - by the pose target. The pose target will create the required libraries needed for applications - that will use its PDES functionality. -
- - <version>: This is one of the available platforms. For a cluster of linux - workstations connected using an Ethernet network, the target platform is net-linux. - For the full list of supported platforms, use "./build --help". -
- - <options> (Optional): This is where platform specific options - should be specified. For example, multiple compilers are supported for various platforms. - If the user wishes to use a compiler different from the default compiler for the platform, - this would be specified here. Other platform specific options can include options for the - type of network, SMP-mode, PAPI support, and so on. -
- - [ charmc-options ... ] (Optional): This is where compiler specific - options can be specified. The compiler wrapper charmc will use the options specified here - by default when it is used to compile Charm++ programs. The compiler that it wraps is - either the default compiler for the platform or the user defined compiler specified in - the <options> portion of the build command. -
- The pseudocode for the overall algorithm is: -
- For each of the particles that belong to my chare: change its x and y coordinate by a small random amount. -
- Move all the particles that do not belong to a chare's bounding box to their correct homes. Since the movement is small, this will mean communication to the eight near neighbor chares. Some of these messages may contain no particles. -
if(iteration%10 == 0)
- - Do reductions to calculate average and max number of particles -
- The particles will have three colors: blue, red, and green. To represent this, you need to add a variable to the Particle object. -
- To make particles move at different speeds, add a constant speed factor to the perturb function depending on the color of the objects. Change the range of blue particles to half, red particles will have the full range (so they will move faster), and green particles will move at one-quarter speed. -
- You are also required to change the initial distribution of the particles depending on the color. In the beginning green particles will be distributed to the right upper triangle, blue particles will be on the left lower triangle of the cell, and red particles will be in the middle square. Chare array elements having red particles will have 2*particlesPerCell number of particles per cell and the rest of the chare array elements will have particlesPerCell number of elements where particlesPerCell is a command line argument as in Particles Code. -
- This will result in chares along the diagonal generating both green and blue particles in their bounding box. Thus, they will have 2*particlesPerCell. Additionally, chares along the diagonal inside the red box will generate red particles in addition to the green and blue chares. In total they will have 4 * particlesPerCell (1 (for Green) + 1 (for Blue) + 2 (for Red) * particlesPerCell). The other chares in the red box not on the diagonal will generate both red and green or blue particles. Their total will be 3 * particlesPerCell (2 (Red) + 1 (Green or Blue) * particlesPerCell). See Figure 1. -
Basic Chares
+ - Primality test +
- Part A: Primality testing with a chare for each number being tested +
- Part B: Batched primality testing +
Chare Arrays
+ - Data balancing +
- Design exercise +
- Method 1 +
- Method 2 +
- Method 3 +
- Odd-even sort +
- Particle exercise +
Load Balancing
+ - Particle Exercise with multiple types of particles +
- Add load balancing +
- Add projections and show impact of load balancing via time-profile and timeline +
- Add liveviz +
Structured Dagger
+ - Rewrite odd-even sort using sdag (and see how simple it became) +
- Key-value pairs +
Threaded Methods and Futures
+ Libraries / Modules, and Quiesence Detection
+
+
+
+
+
+
+
+
+
+
+
+
+- The pseudocode for the overall algorithm is: +
- For each of the particles that belong to my chare: change its x and y coordinate by a small random amount. +
- Move all the particles that do not belong to a chare's bounding box to their correct homes. Since the movement is small, this will mean communication to the eight near neighbor chares. Some of these messages may contain no particles. +
if(iteration%10 == 0)
+ - Do reductions to calculate average and max number of particles +
- + First, get a copy of Charm++ from the download page. + If you want to do development on Charm++ itself, you may want to clone our + git repository at http://charm.cs.illinois.edu/gerrit/charm.git to make it + easier to contribute patches and merge your work with the latest version of + Charm. + + +
- + Next, you need to compile Charm++ on your system. You'll need basic Unix + build utilities like make and a C++ compiler. Charm++ provides a script + called 'build' in its top-level directory that will build Charm++ for you, + given some information about the system you're compiling for and the + features you want access to. The README file in Charm++'s top-level directory + gives details about the various options you have available to you when + compiling Charm++. + + +
- + Now it's time to get started building and running Charm++ programs. To get + started, you may want to look at the many examples in the examples + directory of Charm++. More information about compiling and running + applications is available in the + installation and usage guide. For more in-depth information about + the features of Charm++, consult the + Charm++ Parallel Programming System Manual. + +
-
+
migratable objects
++ Use our unified data / task parallel model. + Express parallelism in terms of interacting collections of objects. + Use work and data units natural to your app. + Don't shackle performance by explicitly managing cores / threads. + +
+
+ -
+
asynchronous methods
+Communication is as simple as invoking methods on remote objects. + Get zero-effort overlap of your computation with your communication. + Define your own serializable data or message types. + + +
+
+ -
+
adaptive runtime system
+Allow our intelligent runtime system to orchestrate execution. + You design and decompose the parallel algorithm; the runtime + observes and optimizes performance. Win-Win! + +
+
+ more...
+ - Automatic overlap +
- Automatic load balancing +
- Automatic checkpointing +
- Automatic fault tolerance +
- Portable code +
- Independent modules, interleaved execution +
- Interoperable with MPI and OpenMP +
- Ecosystem of tools +
- Charm++ Release 6.10.1 +
- Charm++ Release 6.10.0 +
- Charm++ Release 6.9.0 +
- Charm++ at SuperComputing 2018 +
- Charm++ Release 6.8.2 +
- Charm++ at SuperComputing 2017 +
- Charm++ Release 6.8.1 +
- Charm++ Release 6.8.0 +
- The 15th Annual Workshop on Charm++ and its Applications +
- Charm++ at SuperComputing 2016 +
- Stable release of Charm++ version 6.7.1 +
- The 14th Annual Workshop on Charm++ and its Applications +
- Stable release of Charm++ version 6.7.0 +
- Charm++ issue tracker now publicly accessible +
- Local SIAM chapter hosted Charm++ Tutorial +
- Fernbach Award for Profs. Kale, Schulten +
-
+
environments
+-
+
- x86 (workstations, laptops, etc.) +
- ARM7/8 +
- Cray XC30/40, XE6, XK6/7 +
- IBM BlueGene Q +
- IBM POWER8/9 +
- Beowulf clusters +
- Linux, Mac, Windows +
+ -
+
network interfaces
+-
+
- TCP, UDP +
- Infiniband verbs +
- MPI +
- OFI +
- PAMI (IBM BlueGene Q and POWER) +
- uGNI (Cray Gemini and Aries) +
- Shared Memory +
+ -
+
compilers
+-
+
- clang +
- cray +
- fujitsu +
- gcc +
- ibm +
- intel +
- microsoft +
- portland group +
+ -
-
migratable objects
-- Use our unified data / task parallel model. - Express parallelism in terms of interacting collections of objects. - Use work and data units natural to your app. - Don't shackle performance by explicitly managing cores / threads. - -
-
- -
-
asynchronous methods
-Communication is as simple as invoking methods on remote objects. - Get zero-effort overlap of your computation with your communication. - Define your own serializable data or message types. - - -
-
- -
-
adaptive runtime system
-Allow our intelligent runtime system to orchestrate execution. - You design and decompose the parallel algorithm; the runtime - observes and optimizes performance. Win-Win! - -
-
- more...
- - Automatic overlap -
- Automatic load balancing -
- Automatic checkpointing -
- Automatic fault tolerance -
- Portable code -
- Independent modules, interleaved execution -
- Interoperable with MPI and OpenMP -
- Ecosystem of tools -
- The Parallel Programming Laboratory is pleased to announce the next stable version of Charm++, 7.0.0, has been released! -
- This release contains many new features, bug fixes, and performance improvements. -
- The Parallel Programming Laboratory is pleased to announce the next stable version of Charm++, 6.10.2, has been released! -
- This release contains bug fixes. -
- The Parallel Programming Laboratory is pleased to announce the next stable version of Charm++, 6.10.1, has been released! -
- This release contains bug fixes. +
- This release contains many new features, bug fixes, and performance enhancements.
- The Parallel Programming Laboratory is pleased to announce the next stable version of Charm++, 6.10.0, has been released! @@ -54,7 +133,9 @@
- The Parallel Programming Laboratory is pleased to announce the next stable version of Charm++, 6.9.0, has been released! @@ -62,7 +143,9 @@
- This is a backwards-compatible patch/bug-fix release, containing just a few changes. @@ -79,7 +164,9 @@
- Charm++ Features -
- Calls to entry methods taking a single fixed-size parameter can now automatically be aggregated and routed through the TRAM library by marking them with the [aggregate] attribute. @@ -183,7 +275,8 @@
- More robust derived datatype support, optimizations for truly contiguous types.
- ROMIO is now built on AMPI and linked in by ampicc by default.
- A version of HDF5 v1.10.1 that builds and runs on AMPI with virtualization - is now available at https://charm.cs.illinois.edu/gerrit/#/admin/projects/hdf5-ampi + is now available at https://charm.cs.illinois.edu/gerrit/#/admin/projects/hdf5-ampi +
- Improved support for performance analysis and visualization with Projections.
- Platforms and Portability @@ -204,10 +297,11 @@
- Charmrun can automatically detect rank and node count from Slurm/srun environment variables. -
- 17th Workshop 2019 +
- 16th Workshop 2018 +
- 15th Workshop 2017 +
- 14th Workshop 2016 +
- 13th Workshop 2015 +
- 12th Workshop 2014 +
- 11th Workshop 2013 +
- 10th Workshop 2012 +
- 9th Workshop 2011 +
- 8th Workshop 2010 +
- 7th Workshop 2009 +
- 6th Workshop 2008 +
- 5th Workshop 2007 +
- 4th Workshop 2005 +
- 3rd Workshop 2004 +
- 2nd Workshop 2003 +
- 1st Workshop 2002 + + +
- Express algorithms and program designs using objects +
- Decompose computation into interacting object collections +
- Write native C++ code +
- Use all C++ capabilities (OO, generics etc) +
- Some classes are elevated into global visibility +
- Objects of these classes can be addressed from any processor +
- Parallel control flow primarily involves these globally visible objects +
- Some methods of globally visible classes are also elevated into global visibility +
- Only these methods of a globally visible object can be invoked from any processor +
- Objects in computation interact using method invocations +
- Do not return any data +
- Method invocation does not block +
- No promise of immediate execution +
- Collections of objects of a given type can be created, managed and addressed collectively +
- Collections are indexed +
- Each object in collection is globally visible. can be addressed via the tuple of collection handle and object index. +
- Method invocations on such collections are implicit broadcasts to all objects +
- Easily control placement of objects on processors +
- Objects can be migrated +
- Any data in program can be migrated +
- Simply needs a serialization function +
- Typical data decomposition is by decomposing the data across multiple objects of a single class +
- Massively parallel computations on each portion of the data can be performed by simply invoking methods on the whole collection +
-
+
Check out Projections from the repository:
+git clone https://github.com/UIUC-PPL/projections
+
+ -
+
This will create a directory named projections. Move to this directory:
+cd projections
+
+ -
+
And now build Projections:
+make
+
+ -
+
Check out Charm Debug from the repository:
+git clone https://github.com/UIUC-PPL/ccs_tools
+
+ -
+
This will create a directory named ccs_tools. Move to this directory:
+cd ccs_tools
+
+ -
+
And now build Charm Debug:
+ant
+
+ -
Makefile: The Makefile for this application should be very similar to the makefiles
used in the
@@ -234,7 +339,7 @@
Implement It
- + broadcast: In the Array "Hello World" program, the last thing the Main::Main() + constructor did was tell the first element of the helloArray to sayHi(). + In this version of "Hello World," Main::Main tells the entire array of Hello chare + objects to sayHi(). This is done by calling Hello::sayHi() on the array + itself (instead of just a single element of the array). + +
- + sayHi(): The Hello::sayHi() entry method no longer has the if statement that + tests if the object is the last object in the chare array. Instead, every element of the + chare arrays invokes done() on the Main chare object (sends a message). + +
- + done(): The Main::done() entry method is now invoked multiple times. The + Main chare object has a counter, doneCount, that counts the number of times + Main::done() has been called. Once this count reaches the number of elements in + the Hello chare array, the program exits. + +
- + numElements: Since the Main chare object is the only object that needs access + to the numElements value, it has been made into a member variable of the Main chare class. + +
-
Single Value per Chare Object: To keep things simple, start by having one array value
@@ -154,38 +258,38 @@
Implement It
( ) indicate a pair of chare objects that are communicating during the given phase.- +
-
- initial array: { 0, 1, 6, 3, 7, 2, 9, 3 } ++ -initial array: { 0, 1, 6, 3, 7, 2, 9, 3 } - Iteration and Phase -Before Phase starts -After Phase Completes ++ -Iteration and Phase +Before Phase starts +After Phase Completes - iteration 1, phase 1 -{ (0,1), (6,3), (7,2), (9,3) } -{ (0,1), (3,6), (2,7), (3,9) } ++ -iteration 1, phase 1 +{ (0,1), (6,3), (7,2), (9,3) } +{ (0,1), (3,6), (2,7), (3,9) } - iteration 1, phase 2 -{ 0, (1,3), (6,2), (7,3), 9 } -{ 0, (1,3), (2,6), (3,7), 9 } ++ -iteration 1, phase 2 +{ 0, (1,3), (6,2), (7,3), 9 } +{ 0, (1,3), (2,6), (3,7), 9 } - iteration 2, phase 1 -{ (0,1), (3,2), (6,3), (7,9) } -{ (0,1), (2,3), (3,6), (7,9) } ++ iteration 2, phase 1 +{ (0,1), (3,2), (6,3), (7,9) } +{ (0,1), (2,3), (3,6), (7,9) } - @@ -205,7 +309,7 @@+ Figure 3: Example array being sorted using the parallel bubble sort algorithm. Implement It
- +Solution
@@ -218,7 +322,7 @@Solution
-Extensions / Performance Considerations
+Extensions / Performance Considerations
Multiple Values per Chare Object
@@ -265,3 +369,17 @@No Need for Barriers
bubble sort after having become a bit more experienced with Charm++ (specifically after learning more about callbacks). + + + + + + + + + diff --git a/tutorial/BuildingTheCharmRuntimeSystem.html b/tutorial/BuildingTheCharmRuntimeSystem.html new file mode 100644 index 0000000..d68679d --- /dev/null +++ b/tutorial/BuildingTheCharmRuntimeSystem.html @@ -0,0 +1,215 @@ + + + + + + +Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + + ++ + + ++ ++++ + + ++Building the Charm++ Runtime System
+ ++ In the root directory of the Charm++ distribution is a build script. This build script + is used to compile the Charm++ Runtime System for a particular platform. The exact platform that + should be used depends on the setup of the machine being used by the user. For the sake of this + example, we will assume that the target platform is a cluster of Linux workstations. + + Please Note: If the build script is executed with the "--help" command-line argument + (i.e. the command "./build --help"), it will print out a usage message which includes a list + of platforms available. + +
+ ++ The general usage form of the build script is:
+ +
+ + ++ build <target> <version> <options> [ charmc-options ... ] +
+ where: +
+ +-
+
- + <target>: This indicates which portion of the Charm++ distribution should be + built. Because there are many components to the Charm++ distribution this allows the user + to only build the portions that are needed for their purpose. Targets include: charm++, AMPI, + FEM, LIBS, bigemulator, pose, jade, and msa. For example, the charm++ target builds the + basic Charm++ Runtime System. The LIBS target builds additional libraries with various + functionality (also builds the charm++ target on which it depends). If the pose target is + used, both charm++ and LIBS targets will be built followed + by the pose target. The pose target will create the required libraries needed for applications + that will use its PDES functionality. + +
- + <version>: This is one of the available platforms. For a cluster of linux + workstations connected using an Ethernet network, the target platform is net-linux. + For the full list of supported platforms, use "./build --help". + +
- + <options> (Optional): This is where platform specific options + should be specified. For example, multiple compilers are supported for various platforms. + If the user wishes to use a compiler different from the default compiler for the platform, + this would be specified here. Other platform specific options can include options for the + type of network, SMP-mode, PAPI support, and so on. + +
- + [ charmc-options ... ] (Optional): This is where compiler specific + options can be specified. The compiler wrapper charmc will use the options specified here + by default when it is used to compile Charm++ programs. The compiler that it wraps is + either the default compiler for the platform or the user defined compiler specified in + the <options> portion of the build command. + +
Examples
+ ++ (1) For a cluster of Linux workstations connected by an Ethernet network (32-bit x86 architecture): +
+ +./build charm++ net-linux
+ + (2) 1 with SMP support: +./build charm++ net-linux smp
+ + (3) 1 with the "-O3" compiler option on by default for charmc: +./build charm++ net-linux "-O3"
+ + (4) 1 with both SMP support and the "-O3" compiler option on by default for charmc: +./build charm++ net-linux smp "-O3"
+ ++ + WARNING: Depending on the speed of the machine being used and the options specified, the build + process of the Charm++ Runtime System can take several minutes (as high as 10-20 minutes for + older machines). + +
+ + + + + + + + + diff --git a/content/tutorial/CCS.html b/tutorial/CCS.html similarity index 52% rename from content/tutorial/CCS.html rename to tutorial/CCS.html index 1285921..7540ced 100644 --- a/content/tutorial/CCS.html +++ b/tutorial/CCS.html @@ -1,14 +1,117 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- + + + + + + +Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + ++ + + +++Converse Client-Server (CCS) - Steering the 2D Jacobi Program
@@ -26,9 +129,11 @@Using CCS
- Modify a Charm++ program to add the methods that will perform the required operations, and register them to the system; -
- Create a client program with the desired user interface to use the +
+- Create a client program with the desired user interface to use the features exported by the server. - +
+ The server, when launched, will need to be instructed to listen for incoming requests. This is performed with the option ++server. The option @@ -37,9 +142,9 @@Using CCS
necessary for the client to connect: IP and port number. Following is an example: -+
- + ccs: Server IP = 192.168.3.23, Server port = 15763 @@ -57,13 +162,12 @@
Modifying the 2D Jacobi Program
modify the value and proceed to evolve the system until a new equilibrium is reached. - In Charm++ CCS requests can be mapped to callbacks (see here for more info on callbacks). The main function + In Charm++ CCS requests can be mapped to callbacks (see here for more info on callbacks). The main function to register callbacks to CCS is: -+
- + void CcsRegisterHandler(const char *handlerName, const CkCallback &cb); @@ -85,9 +189,10 @@
Modifying the 2D Jacobi Program
an incoming request named "changeValue", and a callback to the mainchare when there is an incoming request named "exit". -+ +
- + // Register the callbacks for CCS
@@ -99,18 +204,17 @@Modifying the 2D Jacobi Program
+
- + struct CkCcsRequestMsg {
@@ -125,7 +229,8 @@
- CcsDelayedReply reply; + CcsDelayedReply reply; //Object to send reply to.
- int length; + int length; //Number of bytes of request data.
- char *data; + char *data; //Actual data sent along with request.
};Modifying the 2D Jacobi Program
client, as well as the information to reply to such client. The structure of CkCcsRequestMsg is shown on the right. -Being "changeValue" triggered on the entire chare array, every element will +
+Being "changeValue" triggered on the entire chare array, every element will execute the code. Each element will therefore pull the information out of the request, and update the values that belong to it. The format of this information must be agreed between client as server. This agreement, in our @@ -133,7 +238,8 @@
Modifying the 2D Jacobi Program
SingleValue inside the header file jacobi-CS.h (which is included by both the client and the server). -Once the modification has happened, the array contributes back to the +
+Once the modification has happened, the array contributes back to the mainchare in stepCheckin, and a new iteration will be triggered if the modification exceeds the value of the threshold. The computation will then continue until a new balance is reached. To notice that this example follows @@ -141,27 +247,31 @@
Modifying the 2D Jacobi Program
left corner of the matrix is maintained fixed at one, and the border of the entire matrix is maintained fixed at zero. -Another modification we made to Jacobi is eliminating the call to CkExit in +
+Another modification we made to Jacobi is eliminating the call to CkExit in Main::stepCheckin when the equilibrium is reached. This gives the possibility to a client to connect and send requests. In this case, to terminate the application, the client has to send a request to the CCS handler "exit", which will provide to call CkExit. -
In the case of "exit" we have no desire to reply to the client, so we can +
+In the case of "exit" we have no desire to reply to the client, so we can simply ignore it. Nevertheless, in the case of "changeValue" we would like to let the client know if the modification has succeeded. In order to do this, we use the same structure of the request to send back a list of value change request that could not be applied by the server (clearly an empty list means that everything went ok). -
The only caveat to be careful when replying to the client, is that only +
+The only caveat to be careful when replying to the client, is that only one single reply is allowed per request. This implies that not every element of the chare array can respond to the client, but only one element can. In our example, the chare element with index (0, 0) will reply to the client. -
Creating the client
+ +Creating the client
Together with the modification of the parallel application to accept requests through CCS, we need to create a specific application to generate @@ -178,7 +288,8 @@Creating the client
before asking the user for the next input. An empty input from the user will trigger the other request to the server, the one to terminate. -To notice that the server does not implement security measures to prevent +
+To notice that the server does not implement security measures to prevent requests coming from a client to corrupt the data. This can happen if a request arrives to the server while it is still updating the matrix, before it has reached the equilibrium. Here we will consider only the case that the @@ -187,7 +298,8 @@
Creating the client
to allow the server to accept requests at any time, even if the matrix has not yet reached an equilibrium. -Other than the usual system libraries, the client requires to header files. +
+Other than the usual system libraries, the client requires to header files. One is jacobi-CS.h which specifies the format of request and replies; the other is ccs-client.h which contains the classes and functions necessary to use CCS. When compiling, we will still need to use the Charm++ @@ -197,11 +309,11 @@
Creating the client
The command line used to compile the client (present in the makefile) is shown here below. -- + +
+ + + + + + + + + + diff --git a/tutorial/Callbacks.html b/tutorial/Callbacks.html new file mode 100644 index 0000000..6bdd28f --- /dev/null +++ b/tutorial/Callbacks.html @@ -0,0 +1,209 @@ + + + + + + +- + charmc -o client.o client.C
@@ -213,3 +325,17 @@Creating the client
Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + + ++ + + ++ + + diff --git a/content/tutorial/CharmComponents.html b/tutorial/CharmComponents.html similarity index 57% rename from content/tutorial/CharmComponents.html rename to tutorial/CharmComponents.html index 043c5c4..291f8f5 100644 --- a/content/tutorial/CharmComponents.html +++ b/tutorial/CharmComponents.html @@ -1,14 +1,117 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- + + + + + + +++ + ++ + + ++Callbacks
+ ++ Callbacks are used by the Charm++ Runtime System, frameworks, libraries, and so on to notify + application code that some kind of event has occurred. For example, reductions use callbacks to + notify the application code when the reduction has completed. The basic idea is that the application + code can specify what action to take when the callback is triggered (i.e. the specific + event occurs). The action taken can be a variety of things including calling function or + having the application exit. +
+ ++"Do Nothing" Callback
+ ++ To create a callback object that does nothing, simply create it as follows: + CkCallback cb(CkCallback::ignore). When triggered, the callback does nothing. +
+ ++CkExit Callback
+ ++ To create a callback object that will cause CkExit() to be called when the callback is triggered, + simply create it as follows: CkCallback cb(CkCallback::ckExit). +
+ ++C Function Callback
+ ++ To create a callback object that will cause a C function to be executed, create it as + follows: CkCallback cb(CkCallbackFunc cFunc, void* param). Here, the type of the + CkCallbackFunc is void myCFunction(void* param, void* msg). The param + pointer used when the callback object was created will be passed to the C function as the + param parameter. The msg parameter will point to the message created by whatever + entity (e.g. a library) triggers the callback. +
+ ++Entry Method Callback
+ ++ To create a callback object that will cause an entry method to be called on a chare object, + create it as follows: CkCallback cb(int entryMethodIndex, CkChareID &id). Each entry + method for a chare class has an associated entry method index. The entry method + index is used here to identify which entry method on the chare object (specified by the + id parameter) should be executed. +
+ ++ To get the entry method index for any given + entry method, do the following: + int entryMethodIndex = CkIndex_MyChareClass::myEntryMethod(...parameters for entry method...) + where MyChareClass is the name of the chare class and myEntryMethod is the name + of the entry method. The parameters passed to this function are not actually used when retrieving + the entry method index; they are only used to identify which entry method is actually being + called in cases where multiple entry methods share the same name (i.e. are overloaded). For example, + to get the entry method index for an entry method with the following prototype: + void MyChareClass::myEntryMethod(int* x) one could use + int myEntryMethodIndex = CkIndex_MyChareClass::myEntryMethod((int*)NULL). (Remember, this is not + actually calling myEntryMethod so it is alright for the pointer to have a NULL value. Only + the type of the parameter is important so overloaded functions can be differentiated.) +
+ +Example Usage
+ ++ For an example of how a callback would be created an used, please see the + Reductions Section. +
+ +More Information
+ ++ For more information on callbacks, please see + Section 3.15: Callbacks of the + The Charm++ Programming Language Manual +
+ +Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + ++ + + +++Components of a Charm++ Program
@@ -19,13 +122,13 @@Introduction
--
+ +
@@ -43,7 +146,7 @@ -- +
+ Figure 1: Files Needed by C++ and by Charm++ Introduction
have function bodies contained in the class definition, to have the class definition in the source file, and so on. Presenting these items in this manner is only meant to give the reader a clear idea of the components needed. - + Please note: For Charm++ programs, it is common to see the ".C" file extension used instead of the ".cpp" file extension for source files. Using either extension is fine. @@ -77,13 +180,13 @@Charmc
Charmxi and the Interface File
--
+ +
@@ -143,9 +246,9 @@ -- +
+ Figure 2: Compilation Process for a Chare Class Charmrun
The Nodelist File
-+
- + group main ++shell ssh
@@ -160,7 +263,7 @@The Nodelist File
- @@ -180,7 +283,7 @@+ Figure 3: Example Nodelist File The Nodelist File
the machines alpha and bravo have two processors each. If five processors are needed by charmrun, all of the hosts will have a single process running on them except for alpha which will have two processes running on it. - + Please Note: If number of processors used by charmrun is greater than the number of host lines specified in the nodelist file, charmrun will loop around on the list and start using host lines from the beginning of the list. Additionally, some target platforms @@ -203,3 +306,17 @@The Nodelist File
Charm++/Converse Installation and Usage Manual. + + + + + + + + + diff --git a/content/tutorial/CharmConcepts.html b/tutorial/CharmConcepts.html similarity index 69% rename from content/tutorial/CharmConcepts.html rename to tutorial/CharmConcepts.html index 1486b93..799740d 100644 --- a/content/tutorial/CharmConcepts.html +++ b/tutorial/CharmConcepts.html @@ -1,14 +1,117 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- + + + + + + +Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + ++ + + +++Introduction to Charm++ Concepts
@@ -79,15 +182,16 @@What is a Charm++ Program?
--
- + +
+
- + entry method may perform one or more operations/calculations, it may send more messages to other chare objects, it may buffer the contents of the message for later @@ -96,7 +200,7 @@ -+ + Figure 1: User's View of a Charm++ Application What is a Charm++ Program?
does some computation and then sends out more messages to other chares, and so on, and so on. Execution begins with a special chare called the main chare (similar to how execution of a C++ program begins with the execution of a special function called main). - +It is worth pointing out that the description of a Charm++ application does not involve @@ -194,13 +298,13 @@
Entry Methods
Proxies
--
+ +
@@ -272,3 +376,17 @@ -- +
+ Figure 2: Proxies and the Global Object Space Chare NodeGroups
A chare nodegroup is similar to a chare group. The difference is that there is only a single representative per node, instead of per processor. + + + + + + + + + diff --git a/content/tutorial/CharmRuntimeSystem.html b/tutorial/CharmRuntimeSystem.html similarity index 61% rename from content/tutorial/CharmRuntimeSystem.html rename to tutorial/CharmRuntimeSystem.html index a3ad91a..68067b3 100644 --- a/content/tutorial/CharmRuntimeSystem.html +++ b/tutorial/CharmRuntimeSystem.html @@ -1,36 +1,139 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - + + + + + + +Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + + ++ + + +++Introduction to the Charm++ Runtime System
-+
-
+ -- +
+ -Figure 1: User's View of a Charm++ Application + -- +
+ @@ -168,16 +271,16 @@Figure 2: System's View of a Charm++ Application Scheduler
-
+
--
+ +
@@ -203,21 +306,26 @@ -- +
+ Figure 3: A Simplified View of the Charm++ Universe Support for Other Languages / Models
- Adaptive MPI: MPI programs can take advantage of the virtualization, load balancing, fault tolerance, etc. capabilities of the Charm++ Runtime System. -
- +
+- Charisma: Charisma is a high-level language that allows a programmer to clearly specify the control flow of a Charm++ application. -
- +
+- Structured Dagger (SDag): Like Charisma, Structured Dagger allows a programmer to specify the control flow in a Charm++ program at a high-level. However, SDag code is specified directly in a chare object's interface file and has a scope local to the object (i.e. it can be used to specify the control flow for a particular chare class). -
- +
+- Multiphase Shared Arrays (MSA): MSAs are data arrays that can be placed into one of many access modes (read-only, write-once, accumulate). The idea is to reduce pressure on the memory system with programmer provided knowledge of the data access patterns. -
- - Threaded Charm (TCharm): ... fill me in ... - +
+- + Threaded Charm (TCharm): ... fill me in ... +
+Frameworks
@@ -228,9 +336,11 @@Frameworks
- ParFUM: ParFUM is an adaptive unstructured mesh framework. -
- + +
- POSE: POSE is a PDES (Parallel Discrete Event Simulation) framework. -
Tools
@@ -241,8 +351,25 @@Tools
- Projections: Performance analysis tool. -
- + +
- Faucets: Job submission tool. -
- + +
- CharmDebug: Debugging tool for Charm++ programs. -
Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + + ++ + + ++ + + diff --git a/content/tutorial/InterfaceFileReference.html b/tutorial/InterfaceFileReference.html similarity index 50% rename from content/tutorial/InterfaceFileReference.html rename to tutorial/InterfaceFileReference.html index d93430a..666122c 100644 --- a/content/tutorial/InterfaceFileReference.html +++ b/tutorial/InterfaceFileReference.html @@ -1,27 +1,130 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - + + + + + + +++ + ++ + + ++Commonly Used (or Otherwise Useful) Functions
+ + + ++ + ++ +
++ + ++ + Processor/Node/Rank Information + ++ + +int CkMyPe(): ++ Returns the processor number for the processor it is called on (0 through P-1 + where P is the number of processors available to the application). + ++ + +int CkNumPes(): ++ Returns the number of processors available to the application. + ++ + +int CkMyNode(): ++ Returns the node number of the node it was called on (0 through N-1 + where N is the number of nodes available to the application) and a node is + defined as a single address space. + ++ + +int CkNumNodes(): ++ Returns the number of nodes available to the application. + ++ + +int CkMyRank(): ++ Returns the rank number of the processor it was called on. The rank of a processor + is its number within the node (address space) starting at 0. For example, if a single + node contains four processors (i.e. a node size of four), the processors' rank numbers + would be 0 through 3 with each having a unique value. + ++ + +int CkNodeSize(): ++ Returns the number of processors within the node it was called. on. + ++ + +int CkNodeFirst(int nodeNumber): ++ Returns the processor number of the processor at rank zero within the specified node (address + space). + ++ + +int CkNodeOf(int procNumber): ++ Returns the node number of the specified processor. + ++ + + +int CkRankOf(int procNumber): ++ Returns the rank number of the specified processor. + ++ + ++ + Program Termination + ++ + +void CkExit(): ++ This function causes the Charm++ application to terminate. The call does not return. All + other processors to notified that execution of the application should end. + ++ + +void CkExitAfterQuiescence(): ++ Informs the Charm++ Runtime System that the Charm++ application should exit if quiescence is + detected. Quiescence is described in + Section 3.13: Quiescence Detection of the + The Charm++ Programming Language Manual. + ++ + + +void CkAbort(const char* message): ++ This function causes the Charm++ application to abort. The speicified message is displayed + before the application terminates. This function does not return. + ++ + ++ + Timing Functions + ++ + +double CkCpuTimer(): ++ Returns the value of the system timer (in seconds). The system timer is started when the + application begins and measures processor time (both user and system time). + ++ + +double CkWallTimer(): ++ Returns the amount of time that has elapsed since the application started from the wall + clock timer (in seconds). + ++ + + +double CkTimer(): ++ Aliases to either CkCpuTimer() or CkWallTimer() depending on the system being used. Typically, + dedicated machines (i.e. program has complete use of CPU such as ASCI Red) this function aliases to + CkWallTimer(). For machines which have multiple processes sharing the same CPU (such as a + workstation), this function aliases to CkCpuTimer(). + +Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + + ++ + + +++ -Interface File Reference
+Interface File Reference
- +
-
- Covered Items ++ Covered Items - Chares and Groups: +Chares and Groups: Chare Objects, Chare Arrays, @@ -30,7 +133,7 @@ Interface File Reference
- Entry Methods: +Entry Methods: Read-Only Messages, Inline Messages, @@ -44,7 +147,7 @@ Interface File Reference
- Initialization Functions: +Initialization Functions: Nodes, Processors @@ -52,7 +155,7 @@ Interface File Reference
- Miscellaneous: +Miscellaneous: Include @@ -203,24 +306,24 @@Non-Traced
Initialization Functions
--
+ +
@@ -253,3 +356,17 @@ -module foo {
- initnode void nodeInitFunction(void);
- initproc void procInitFunction(void);
+ initnode void nodeInitFunction(void);
+ initproc void procInitFunction(void);
- chare myChare {
- initnode void charesNodeInitFunction(void);
- initnode void charesProcInitFunction(void);
- }
+ chare myChare {
+ initnode void charesNodeInitFunction(void);
+ initnode void charesProcInitFunction(void);
+ }
}+ Figure 1: Declaring Initialization Functions in the Interface File Include
...
+ + + + + + + + + diff --git a/content/tutorial/LiveViz.html b/tutorial/LiveViz.html similarity index 55% rename from content/tutorial/LiveViz.html rename to tutorial/LiveViz.html index e82b3fb..93f0a4a 100644 --- a/content/tutorial/LiveViz.html +++ b/tutorial/LiveViz.html @@ -1,35 +1,139 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- + + + + + + +Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + ++ + + +++ -LiveViz (using the 2D Jacobi Program)
+LiveViz (using the 2D Jacobi Program) +
-+
- +
- + - (Before: Only a few outer-loop iterations) -(After: After many outer-loop iterations) +(Before: Only a few outer-loop iterations) +(After: After many outer-loop iterations) - + Figure 1: Before and after screenshots of LiveViz output. If the data values are thought of as temperature. Red represents hot (value near 1.0). Blue represents cold (value near 0.0). Black is a middle temperature (value near 0.5). @@ -41,12 +145,12 @@ LiveViz (using the 2D Jacobi Program)
- +
- +- @@ -187,3 +291,17 @@+ Figure 2: Color Scale More on LiveViz
Section 6: LiveViz Library in the Converse and Charm++ Libraries Manual. + + + + + + + + + diff --git a/content/tutorial/PackUnPackRoutines.html b/tutorial/PackUnPackRoutines.html similarity index 54% rename from content/tutorial/PackUnPackRoutines.html rename to tutorial/PackUnPackRoutines.html index ed85613..af27894 100644 --- a/content/tutorial/PackUnPackRoutines.html +++ b/tutorial/PackUnPackRoutines.html @@ -1,14 +1,117 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - + + + + + + +Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + + ++ + + +++Pack-UnPack (PUP) Routines
@@ -36,7 +139,8 @@Object Migration
of the processing elements based on actual runtime data so that all of the processing elements are being fully utilized. -- +
+- Fault Tolerance: Another benefit of being able to migrate objects is being able to duplicate objects (either on disk for a checkpoint or in-memory to enabling dynamic fault recovery). If one of the processing elements were to fail/crash during the execution @@ -47,7 +151,8 @@
Object Migration
(e.g. temperature sensor warning), the chare objects on that particular processing element(s) can be migrated away from the defective hardware in an attempt to avoid the fault altogether. -- +
+- Shrinking/Growing Executions: As a Charm++ application is executing, the load of the overall cluster may changed (i.e. another job is launched, a more important job is launched, another job may complete freeing processing elements, and so on). Because the chare objects can be @@ -57,25 +162,27 @@
Object Migration
migrate work onto newly available processing elements. Once again, this can be done dynamically by the Charm++ Runtime System while the Charm++ application continues to run. -- +
+- Parameter Marshaling: Generally speaking, an object (not just chare objects; C++ objects, C structures, and so on) can have PUP routines. Any object that has a PUP routine can then be passed as an argument in an entry method invocation. - +
+ --
- +
+
- + -+ + Figure: Process of "Object Migration": A Chare Object Moving from One Processing
Element to Another Processing ElementPack-UnPack (PUP) Routines
@@ -99,21 +206,25 @@Pack-UnPack (PUP) Routines
to encapsulate the state of the chare object. Once the size of the buffer is known, the buffer is allocated by the Charm++ Runtime System. -- +
+- Packing: (On source processing element) Second, the PUP routine is called again to actually pack the state of the chare object into the buffer. -
- +
+- Migration: At this point, the buffer can be sent to another processing element via a message, it can be stored into a file on a hard disk for later use (checkpoint), and so on. -
- +
+- Unpacking: (On target processing element) Finally, once the buffer arrives to its target processing element, the PUP routine is called again to unpack the state of the chare object thus recreating the chare object that was originally packed on the source processing element. - +
+The process of object migration, can be controlled by the Charm++ application. By default, @@ -123,70 +234,70 @@
Pack-UnPack (PUP) Routines
times when it is safe to do so. -
+
PUP Routine Example
- +
- + - +
- Header File +Header File - + class MyChare : public CBase_MyChare {
- private:
- /// Member Variables ///
- int a;
- float b;
- char c;
- float localArray[LOCAL_SIZE];
- int heapArraySize; // length of heapArray array
- float* heapArray;
+ private:
+ /// Member Variables ///
+ int a;
+ float b;
+ char c;
+ float localArray[LOCAL_SIZE];
+ int heapArraySize; // length of heapArray array
+ float* heapArray;
- public:
- /// Constructor(s)/Destructor ///
- MyChare();
- MyChare(CkMigrateMessage *msg);
- ~MyChare() {
- if (heapArray != NULL) {
- delete [] heapArray;
- heapArray = NULL;
- }
- }
+ public:
+ /// Constructor(s)/Destructor ///
+ MyChare();
+ MyChare(CkMigrateMessage *msg);
+ ~MyChare() {
+ if (heapArray != NULL) {
+ delete [] heapArray;
+ heapArray = NULL;
+ }
+ }
- /// PUP Routine ///
- void pup(PUP::er &p) {
+ /// PUP Routine ///
+ void pup(PUP::er &p) {
- // Call PUP Routine for superclass
- CBase_MyChare::pup(p);
+ // Call PUP Routine for superclass
+ CBase_MyChare::pup(p);
- // Basic primitives
- p | a;
- p | b;
- p | c;
+ // Basic primitives
+ p | a;
+ p | b;
+ p | c;
- // Member Arrays
- p(localArray, LOCAL_SIZE);
+ // Member Arrays
+ p(localArray, LOCAL_SIZE);
- // Heap-Allocated Data Array
- p | heapArraySize;
- if (p.isUnpacking()) {
- heapArray = new float[heapArraySize];
- }
- p(heapArray, heapArraySize);
- }
+ // Heap-Allocated Data Array
+ p | heapArraySize;
+ if (p.isUnpacking()) {
+ heapArray = new float[heapArraySize];
+ }
+ p(heapArray, heapArraySize);
+ }
};
PUP Routine Example
- Please Note: The code to the right assumes that (1) the migration constructor, + Please Note: The code to the right assumes that (1) the migration constructor, MyChare::MyChare(CkMigrateMessage *msg), does not allocate memory for heapArray and that (2) the heapArray pointer points to something by the time the object migration process begins. A similar technique, as was used for heapArray in this example, can be used for handling the opening @@ -278,14 +389,32 @@
Useful PUP::er Functions
- isSizing(): Returns true if the sizing step is occurring, false otherwise. -
- +
+- isPacking(): Returns true if the packing step is occurring, false otherwise. -
- +
+- isDeleting(): Returns true if the packing step is occurring and the object being PUP'ed will be deleted immediately afterwards, false otherwise. -
- +
+- isUnpacking(): Returns true if the unpacking step is occurring, false otherwise. - +
+ + + + + + + + + + diff --git a/content/tutorial/ParallelPrefix.html b/tutorial/ParallelPrefix.html similarity index 59% rename from content/tutorial/ParallelPrefix.html rename to tutorial/ParallelPrefix.html index d778236..f8f0217 100644 --- a/content/tutorial/ParallelPrefix.html +++ b/tutorial/ParallelPrefix.html @@ -1,14 +1,117 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - + + + + + + +Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + + ++ + + +++Parallel Prefix Program
@@ -16,42 +119,42 @@Parallel Prefix Program
The Prefix Calculation
--
+ +
- -// Assumed variable declarations for both methods
- #define N ... // Size of array
- double v[N]; // Contains input values
- double v_prime[N]; // Will contain output values
+ #define N ... // Size of array
+ double v[N]; // Contains input values
+ double v_prime[N]; // Will contain output values
// Method 1 - Straight Forward - O(N2)
- for (int k = 0; k < N; k++) {
- v_prime[k] = 0;
- for (int i = 0; i <= k; i++) {
- v_prime[k] = v_prime[k] + v[i];
- }
+ for (int k = 0; k < N; k++) {
+ v_prime[k] = 0;
+ for (int i = 0; i <= k; i++) {
+ v_prime[k] = v_prime[k] + v[i];
+ }
}
// Method 2 - Running Sum - O(N)
v_prime[0] = v[0];
- for (int k = 1; k < N; k++) {
- v_prime[k] = v_prime[k - 1] + v[k];
+ for (int k = 1; k < N; k++) {
+ v_prime[k] = v_prime[k - 1] + v[k];
}+ Figure 2: C++ Code for Prefix Calculation -
- +
+
@@ -74,29 +177,32 @@ -+ + Figure 1: Parallel Prefix Formula The Prefix Calculation
To make the above a bit more concrete, below is an example set of arrays. The V array is the input array. The V' array is the output array (result of the calculation.
-- -
V -= { -3, -4, -1, -0, -3, -2, -8 -} + ++ @@ -136,15 +242,15 @@+
+ -V += { +3, +4, +1, +0, +3, +2, +8 +} V' -= { -3, -7, -8, -8, -11, -13, -21 -} ++ V' += { +3, +7, +8, +8, +11, +13, +21 +} A Parallel Version of the Prefix Calculation
--
- +
+
- + -+ + Figure 3: Overview of Parallel Prefix Calculation Since we are writing this program using Charm++, will setup the parallel version of this @@ -187,10 +293,10 @@
A Parallel Version of the Prefix Calculation
X, each chare object will send a message to the chare object at thisIndex + 2X-1. After step X has completed, the first 2X chare objects in the chare array will have correct final results. If we set - 2X = N, we can conclude that ⌈ log2(N) ⌉ overall steps + 2X = N, we can conclude that ⌈ log2(N) ⌉ overall steps in the calculation will be needed. If we assume that each chare object is on a different processor, the overall calculation will take O(log2(N)) time. - + Note: If we assume that communication costs are small and constant, then this parallel version of prefix is faster than either of the serial methods described above for large values of N. @@ -203,7 +309,8 @@Implement It
Try implementing the parallel prefix calculation in Charm++. To get you started, here are some hints that you might find useful: --
+
+
-
Makefile: The Makefile for this application should be very similar to the makefiles
used in the
@@ -214,13 +321,13 @@
Implement It
Initial Values: Create a chare array and have each chare object assign iself a random number in its constructor.
-
+
-
Data Dependencies: The most straight forward method of ensuring data dependencies are
met is to create a barrier between the steps. That is, tell each chare object in the array
of chare objects to perform a single step of the calculation (e.g.
PrefixChareClass::doStep(int stepNum) which will send a message to its neighbor at
- I = thisIndex + (1 << stepNum) assuming stepNum is zero-based OR, if I <= N,
+ I = thisIndex + (1 << stepNum) assuming stepNum is zero-based OR, if I <= N,
then it directly checks in with the main chare). Once, the receiving chare object in
the chare array receives and adds the other chare object's value, have it check in with the
main chare. Once the main chare has received N check in messages,
@@ -258,21 +365,21 @@
Implement It
-
Initially:
State:
- Chare 0 : stepCounter = 0, value = v0
- Chare 1 : stepCounter = 0, value = v1
- Chare 2 : stepCounter = 0, value = v2
- Chare 3 : stepCounter = 0, value = v3
+ Chare 0 : stepCounter = 0, value = v0
+ Chare 1 : stepCounter = 0, value = v1
+ Chare 2 : stepCounter = 0, value = v2
+ Chare 3 : stepCounter = 0, value = v3
Messages In-Flight:
- none
+ none
-
First: All chare objects send their messages for step 1.
State:
- Chare 0 : stepCounter = 1, value = v0
- Chare 1 : stepCounter = 1, value = v1
- Chare 2 : stepCounter = 1, value = v2
- Chare 3 : stepCounter = 1, value = v3
+ Chare 0 : stepCounter = 1, value = v0
+ Chare 1 : stepCounter = 1, value = v1
+ Chare 2 : stepCounter = 1, value = v2
+ Chare 3 : stepCounter = 1, value = v3
Messages In-Flight:
- Chare 0 -> Chare 1 : value = v0
- Chare 1 -> Chare 2 : value = v1
- Chare 2 -> Chare 3 : value = v2
- Chare 3 -> Chare 4 : value = v3
+ Chare 0 -> Chare 1 : value = v0
+ Chare 1 -> Chare 2 : value = v1
+ Chare 2 -> Chare 3 : value = v2
+ Chare 3 -> Chare 4 : value = v3
-
@@ -343,33 +451,33 @@
No Need for Barriers
for Chare 1 (the incoming value is added, the sum is passed to Chare 3, and Chare 1's stepCounter is incremented).
State:
- Chare 0 : stepCounter = 1, value = v0
- Chare 1 : stepCounter = 2, value = v0+v1
- Chare 2 : stepCounter = 1, value = v2
- Chare 3 : stepCounter = 1, value = v3
+ Chare 0 : stepCounter = 1, value = v0
+ Chare 1 : stepCounter = 2, value = v0+v1
+ Chare 2 : stepCounter = 1, value = v2
+ Chare 3 : stepCounter = 1, value = v3
Messages In-Flight:
- Chare 1 -> Chare 2 : value = v1
- Chare 2 -> Chare 3 : value = v2
- Chare 3 -> Chare 4 : value = v3
- Chare 1 -> Chare 3 : value = v0+v1
+ Chare 1 -> Chare 2 : value = v1
+ Chare 2 -> Chare 3 : value = v2
+ Chare 3 -> Chare 4 : value = v3
+ Chare 1 -> Chare 3 : value = v0+v1
-
Third: Chare 3 receives the message from Chare 1 before it receives the message
from Chare 2.
State:
- Chare 0 : stepCounter = 1, value = v0
- Chare 1 : stepCounter = 2, value = v0+v1
- Chare 2 : stepCounter = 1, value = v2
- Chare 3 : stepCounter = 2, value = v0+v1+v3 !!! INCORRECT !!!
+ Chare 0 : stepCounter = 1, value = v0
+ Chare 1 : stepCounter = 2, value = v0+v1
+ Chare 2 : stepCounter = 1, value = v2
+ Chare 3 : stepCounter = 2, value = v0+v1+v3 !!! INCORRECT !!!
Messages In-Flight:
- Chare 1 -> Chare 2 : value = v1
- Chare 2 -> Chare 3 : value = v2
- Chare 3 -> Chare 4 : value = v3
- Chare 3 -> Chare 5 : value = v0+v1+v3 !!! INCORRECT !!!
+ Chare 1 -> Chare 2 : value = v1
+ Chare 2 -> Chare 3 : value = v2
+ Chare 3 -> Chare 4 : value = v3
+ Chare 3 -> Chare 5 : value = v0+v1+v3 !!! INCORRECT !!!
-
Initially:
State:
- Chare 0 : stepCounter = 0, value = v0
- Chare 1 : stepCounter = 0, value = v1
- Chare 2 : stepCounter = 0, value = v2
- Chare 3 : stepCounter = 0, value = v3
+ Chare 0 : stepCounter = 0, value = v0
+ Chare 1 : stepCounter = 0, value = v1
+ Chare 2 : stepCounter = 0, value = v2
+ Chare 3 : stepCounter = 0, value = v3
Messages In-Flight:
- none
+ none
-
First: All chare objects send their messages for step 1.
State:
- Chare 0 : stepCounter = 1, value = v0
- Chare 1 : stepCounter = 1, value = v1
- Chare 2 : stepCounter = 1, value = v2
- Chare 3 : stepCounter = 1, value = v3
+ Chare 0 : stepCounter = 1, value = v0
+ Chare 1 : stepCounter = 1, value = v1
+ Chare 2 : stepCounter = 1, value = v2
+ Chare 3 : stepCounter = 1, value = v3
Messages In-Flight:
- Chare 0 -> Chare 1 : fromStep = 1, value = v0
- Chare 1 -> Chare 2 : fromStep = 1, value = v1
- Chare 2 -> Chare 3 : fromStep = 1, value = v2
- Chare 3 -> Chare 4 : fromStep = 1, value = v3
+ Chare 0 -> Chare 1 : fromStep = 1, value = v0
+ Chare 1 -> Chare 2 : fromStep = 1, value = v1
+ Chare 2 -> Chare 3 : fromStep = 1, value = v2
+ Chare 3 -> Chare 4 : fromStep = 1, value = v3
-
@@ -426,15 +535,15 @@
No Need for Barriers
for Chare 1 (the incoming value is added, the sum is passed to Chare 3, and Chare 1's stepCounter is incremented).
State:
- Chare 0 : stepCounter = 1, value = v0
- Chare 1 : stepCounter = 2, value = v0+v1
- Chare 2 : stepCounter = 1, value = v2
- Chare 3 : stepCounter = 1, value = v3
+ Chare 0 : stepCounter = 1, value = v0
+ Chare 1 : stepCounter = 2, value = v0+v1
+ Chare 2 : stepCounter = 1, value = v2
+ Chare 3 : stepCounter = 1, value = v3
Messages In-Flight:
- Chare 1 -> Chare 2 : fromStep = 1, value = v1
- Chare 2 -> Chare 3 : fromStep = 1, value = v2
- Chare 3 -> Chare 4 : fromStep = 1, value = v3
- Chare 1 -> Chare 3 : fromStep = 2, value = v0+v1
+ Chare 1 -> Chare 2 : fromStep = 1, value = v1
+ Chare 2 -> Chare 3 : fromStep = 1, value = v2
+ Chare 3 -> Chare 4 : fromStep = 1, value = v3
+ Chare 1 -> Chare 3 : fromStep = 2, value = v0+v1
-
@@ -442,14 +551,14 @@
No Need for Barriers
from Chare 2. Chare 3 notices that this message is out of order (it was in step 1 and is expecting a message with fromState = 1 but this message has fromState = 2).
State:
- Chare 0 : stepCounter = 1, value = v0
- Chare 1 : stepCounter = 2, value = v0+v1
- Chare 2 : stepCounter = 1, value = v2
- Chare 3 : stepCounter = 1, value = v3, buffered message(s) = { (fromStep = 2, value = v0+v1) }
+ Chare 0 : stepCounter = 1, value = v0
+ Chare 1 : stepCounter = 2, value = v0+v1
+ Chare 2 : stepCounter = 1, value = v2
+ Chare 3 : stepCounter = 1, value = v3, buffered message(s) = { (fromStep = 2, value = v0+v1) }
Messages In-Flight:
- Chare 1 -> Chare 2 : fromStep = 1, value = v1
- Chare 2 -> Chare 3 : fromStep = 1, value = v2
- Chare 3 -> Chare 4 : fromStep = 1, value = v3
+ Chare 1 -> Chare 2 : fromStep = 1, value = v1
+ Chare 2 -> Chare 3 : fromStep = 1, value = v2
+ Chare 3 -> Chare 4 : fromStep = 1, value = v3
-
@@ -459,34 +568,34 @@
No Need for Barriers
fromState value). In this case, there is one, so it processes the buffered message also (advancing Chare 3 through two steps).
State:
- Chare 0 : stepCounter = 1, value = v0
- Chare 1 : stepCounter = 2, value = v0+v1
- Chare 2 : stepCounter = 1, value = v2
- Chare 3 : stepCounter = 3, value = v0+v1+v2+v3
+ Chare 0 : stepCounter = 1, value = v0
+ Chare 1 : stepCounter = 2, value = v0+v1
+ Chare 2 : stepCounter = 1, value = v2
+ Chare 3 : stepCounter = 3, value = v0+v1+v2+v3
Messages In-Flight:
- Chare 1 -> Chare 2 : fromStep = 1, value = v1
- Chare 2 -> Chare 3 : fromStep = 1, value = v2
- Chare 3 -> Chare 4 : fromStep = 1, value = v3
- Chare 3 -> Chare 5 : fromStep = 2, value = v2+v3
- Chare 3 -> Chare 7 : fromStep = 3, value = v0+v1+v2+v3
+ Chare 1 -> Chare 2 : fromStep = 1, value = v1
+ Chare 2 -> Chare 3 : fromStep = 1, value = v2
+ Chare 3 -> Chare 4 : fromStep = 1, value = v3
+ Chare 3 -> Chare 5 : fromStep = 2, value = v2+v3
+ Chare 3 -> Chare 7 : fromStep = 3, value = v0+v1+v2+v3
- The pseudocode for the overall algorithm is: +
- For each of the particles that belong to my chare: change its x and y coordinate by a small random amount. +
- Move all the particles that do not belong to a chare's bounding box to their correct homes. Since the movement is small, this will mean communication to the eight near neighbor chares. Some of these messages may contain no particles. +
if(iteration%10 == 0)
+ - Do reductions to calculate average and max number of particles +
- The particles will have three colors: blue, red, and green. To represent this, you need to add a variable to the Particle object. +
- To make particles move at different speeds, add a constant speed factor to the perturb function depending on the color of the objects. Change the range of blue particles to half, red particles will have the full range (so they will move faster), and green particles will move at one-quarter speed. +
- You are also required to change the initial distribution of the particles depending on the color. In the beginning green particles will be distributed to the right upper triangle, blue particles will be on the left lower triangle of the cell, and red particles will be in the middle square. Chare array elements having red particles will have 2*particlesPerCell number of particles per cell and the rest of the chare array elements will have particlesPerCell number of elements where particlesPerCell is a command line argument as in Particles Code. +
- This will result in chares along the diagonal generating both green and blue particles in their bounding box. Thus, they will have 2*particlesPerCell. Additionally, chares along the diagonal inside the red box will generate red particles in addition to the green and blue chares. In total they will have 4 * particlesPerCell (1 (for Green) + 1 (for Blue) + 2 (for Red) * particlesPerCell). The other chares in the red box not on the diagonal will generate both red and green or blue particles. Their total will be 3 * particlesPerCell (2 (Red) + 1 (Green or Blue) * particlesPerCell). See Figure 1. +
Solution
--
- +
+
- + -+ + Figure 4: Control Flow for Parallel Prefix Solution A simple solution for the parallel prefix calculation can be found @@ -281,7 +388,7 @@
Solution
-Extensions / Performance Considerations
+Extensions / Performance Considerations
Multiple Values per Chare Object
@@ -312,30 +419,31 @@No Need for Barriers
initial conditions and assuming that each chare object is on a different processor, a possible race condition is as follows (remember, assuming no barriers between steps): --
+
+
The problem is cause by the fact that messages are not guaranteed to be delivered in the order @@ -395,30 +503,31 @@
No Need for Barriers
overhead to the program, i.e. resend to self many times and thus execute the receiveValue() entry method many times before expected message arrives). --
+
+
With this change in place (modification to chare array objects so they don't need to wait for a barrier), interaction with the main chare object is no longer needed during the prefix computation. Instead, each chare object in the array would only need to send a single message to the main chare object after its stepCounter has reached - ⌈ log2(N) ⌉. This helps reduce the required amount of communication + ⌈ log2(N) ⌉. This helps reduce the required amount of communication (assuming buffered messages, not re-sends). It also allows each chare object to proceed with computation based solely on data dependencies being met (i.e. it has the data it needs to move forward with its portion of the computation). With the barriers, all chare objects have to wait for the slowest chare object to complete the current step before they can move onto the next step even if they already have the data they need to continue. - + In simple example programs like this, this effect may not give a large speedup (in fact, the additional overhead compared to the tiny amount of work being done, a single addition, may even cause a slowdown), however, in actual applications, this can prove quite useful and give @@ -500,3 +609,17 @@
+ + + + + + + + + diff --git a/tutorial/ParticlesCode.html b/tutorial/ParticlesCode.html new file mode 100644 index 0000000..8e30eec --- /dev/null +++ b/tutorial/ParticlesCode.html @@ -0,0 +1,193 @@ + + + + + + +No Need for Barriers
This version of the solution does not have implicit barriers while the prefix calculation itself is being preformed.Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + + ++ + + ++ + + diff --git a/tutorial/ParticlesCode_lb.html b/tutorial/ParticlesCode_lb.html new file mode 100644 index 0000000..702d0c3 --- /dev/null +++ b/tutorial/ParticlesCode_lb.html @@ -0,0 +1,177 @@ + + + + + + +++ + ++ + + + ++Particles Code
+ + ++
+ + ++ + + +
++ Figure 1 + Description
+ ++ For the assignment, you will write code that simulates a set of particles moving randomly in a 2-dimensional space within a bounding box. The coordinates of the overall simulation box are between 0.0 and 100.0 along each dimension. The particles are divided among chares based on their coordinates. The chares should be organized in a 2-dimensional array of chares (using a chare array) of size k x k. So, each chare owns a bounding box of its own with size 100.0/k. See Figure 1. +
+ ++ Your program should generate n particles per chare during construction with a random (but valid, i.e. within the chare) position for particles. Your program should accept the number of particles per cell n, and k as command line parameters in that sequence. +
+ +Skeleton Code
+ ++ A base code for Particles Code can be found here. The skeleton code includes base code for Mainchare Main, 2-D Chare Array Cell and Particle class representing the particles the Chare Array contains. There are also comments in the skeleton code that will guide you through the assignment. +
+ +Expected Output
+ ++ Your program should calculate and print to screen the maximum and total number of particles every 10 iterations. Use the provided print function. Additionally, the simulation should not be delayed by this calculation (i.e. you should use reductions). +
+ ++ For testing your Particles Code, you can use 10000 (=n) particles per chare, simulated over 100 steps and a chare array of size 16 x 16 (k=16). Experiment with different number of particles and chare array sizes as our test cases will use values in addition to the defaults. +
+ ++ Note: There might be multiple particles having the same x and y coordinates, especially if you increase the density of each cell. You do not need to handle this case separately; it is a valid case assumption. +
+ + + ++
+-
+
for(iteration=0; iteration < ITERATION; iteration++){
+-
+
-
+
}
+Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + + ++ + + ++ + + diff --git a/tutorial/PiExample.html b/tutorial/PiExample.html new file mode 100644 index 0000000..c4c229d --- /dev/null +++ b/tutorial/PiExample.html @@ -0,0 +1,281 @@ + + + + + + +++ + ++ + + ++Particles Code with Load Balancing and Performance Analysis
+ + ++
+ + ++ + + +
++ Figure 1 + Description
+ ++ This assignment is an extension of Particles Code. You will use your code from Particles Code and extend it to do load balancing and visualization using LiveViz. +
+ ++ In Particles Code, the particles moved randomly. There was no prominent load imbalance between chare array elements. Now we will create load imbalance between chares by coloring the particles, moving them at different speeds, and changing their initial distribution. +
+ +-
+
Part A) Load Balancing and Projections:
+ ++ In this part, you'll try 3 different load balancing strategies and comment on the behavior you observe. The first two load balancing strategies you will use are GreedyLB and RefineLB. For the third strategy, you will get to choose one of the other load balancing strategies. Observe the effect of load balancing (overhead and benefit) on the total execution time of the application using Projections. Are they beneficial? Why or why not? How much is the overhead? Which strategy is the best for this Particle application? etc. +
+ +Part B) Visualization using LiveViz:
+ ++ In this part you will visualize the 2-D grid with moving particles using LiveViz. Particles should be shown in color. You need to submit two images from the visualization: one in the beginning showing the initial particle distribution and one when the application is more advanced (after the particles are moved and intermixed). +
+ ++ Please read the LiveViz manual for details about LiveViz setup and usage. You can also look at the Wave2D application for an example code that uses LiveViz. It's located in charm/examples/charm++/wave2d. +
+ + +Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + + ++ + + ++ + diff --git a/tutorial/Preface.html b/tutorial/Preface.html new file mode 100644 index 0000000..b3aedc4 --- /dev/null +++ b/tutorial/Preface.html @@ -0,0 +1,165 @@ + + + + + + ++++ + + ++Introducing Reductions
+ + +Reductions
+ ++ By now, we've spent a lot of time in this tutorial talking about how to decompose problems + into chares and distribute them across processors. That's great, but what good is splitting + up your problem if you can't put it back together? When you need to combine values from an array + of chares, you can turn to one of the most important parallel programming tools: a reduction. +
+ ++ Reductions turn data that is scattered across a chare array into a single value using a reduction + operation, such as sum or maximum. Each of the chares in the array contribute some local data +
+ +"Hello World" Code
+ +The "Hello" Chare Class
+ + ++ ++
++ + ++ + + + ++ + +
++ +Header File (hello.h) ++ + + ++ ++++ #ifndef __HELLO_H__
+
+ #define __HELLO_H__
+
+ class Hello : public CBase_Hello {
+
+ public:
+
+ /// Constructors ///
+ Hello();
+ Hello(CkMigrateMessage *msg);
+
+ /// Entry Methods ///
+ void sayHi(int from);
+ };
+
+ #endif //__HELLO_H__ ++ +Interface File (hello.ci) ++ + ++ ++++ module hello {
+
+
+ array [1D] Hello {
+ entry Hello();
+ entry void sayHi(int);
+ };
+
+ }; ++ + + + + + + + + + ++ + +
++ +Header File (hello.C) ++ + ++ ++++ #include "hello.decl.h"
+
+
+ #include "hello.h"
+ #include "main.decl.h"
+
+
+ extern /* readonly */ CProxy_Main mainProxy;
+ extern /* readonly */ int numElements;
+
+
+ Hello::Hello() {
+ // Nothing to do when the Hello chare object is created.
+ // This is where member variables would be initialized
+ // just like in a C++ class constructor.
+ }
+
+
+ // Constructor needed for chare object migration (ignore for now)
+ // NOTE: This constructor does not need to appear in the ".ci" file
+ Hello::Hello(CkMigrateMessage *msg) { }
+
+
+ void Hello ::sayHi(int from) {
+
+ // Have this chare object say hello to the user.
+ CkPrintf("\"Hello\" from Hello chare # %d on "
+ "processor %d (told by %d).\n",
+ thisIndex, CkMyPe(), from);
+
+ // Tell the next chare object in this array of chare objects
+ // to also say hello. If this is the last chare object in
+ // the array of chare objects, then tell the main chare
+ // object to exit the program.
+ if (thisIndex < (numElements - 1))
+ thisProxy[thisIndex + 1].sayHi(thisIndex); + else
+ mainProxy.done(); + }
+
+
+ #include "hello.def.h" +Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + + ++ + + ++ + + diff --git a/tutorial/Projections.html b/tutorial/Projections.html new file mode 100644 index 0000000..c844fdc --- /dev/null +++ b/tutorial/Projections.html @@ -0,0 +1,202 @@ + + + + + + +++ + ++ + + ++Preface
+ + +Welcome to the Charm++ Tutorial
+ ++ Welcome to the Charm++ Programming Tutorial! We are happy that you are interested in learning about the Charm++ Programming Model and what it can do for you as a programmer. The purpose of this tutorial is to give the reader a basic understanding of how to program using the Charm++ parallel programming model. Charm++ is an object-oriented message passing parallel programming model which has mainly been used in the realm of HPC and scientific computing. Charm++ can be used to write parallel applications that run on a variety of machines ranging from a single notebook/desktop computer to computer clusters containing tens of thousands of processors. To date, Charm++ applications have been shown to scale to as many as forty thousand processors on some of the world's largest and most powerful supercomputers. + + We are always eager to get feedback from our users. If you have any questions or comments about this tutorial or about Charm++ in general, please feel free to contact us at charm@cs.illinois.edu. We hope you find this tutorial helpful. +
+ + + +What the Reader Should Know Before Starting
+ ++ The reader is expected to be familiar with the C++ programming language and, at least, + the basics of the object-oriented programming style used in C++. The reader is also + recommended to have a basic understanding of multiple "threads of execution" or concurrency. + Previous experience in parallel programming will help, but is not required. +
+ ++ The Charm++ distribution, which contains the Charm++ Runtime System and related tools, libraries, + and so on, can be downloaded + here. +
+ ++ While working through this tutorial, the reader may find the information in the various manuals helpful (in particular, the The Charm++ Programming Language Manual). For additional information on Charm++ and related software, please see the Charm++ Homepage. +
+ +Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + + ++ + + ++ + + diff --git a/content/tutorial/Reductions.html b/tutorial/Reductions.html similarity index 58% rename from content/tutorial/Reductions.html rename to tutorial/Reductions.html index 5d70471..46f922a 100644 --- a/content/tutorial/Reductions.html +++ b/tutorial/Reductions.html @@ -1,14 +1,117 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - + + + + + + +++ + ++ + + ++Projections
+ ++ Projections is a performance visualization tool designed to work closely with the + Charm++ Runtime System. The Charm++ Runtime System collects performance data about the Charm++ + application during runtime. This data is then analyzed by Projections. Projections can + present this data to the user using several different graphs/representations. +
+ ++
+ + +How Projections Works
+ ++ The Charm++ Runtime System keeps track of when each entry methods starts and stops. Additionally, + it keeps track of the various messages that are sent by the entry methods (including aspects such + as where the message is headed, the size of the message, and so on). This information is recorded + in log files which are analyzed after the program has finished executing. + The user can also choose to enable and disable the collection of the performance data dynamically + during runtime. +
+ ++
+ +Commonly Used Graphs/Visualizations Available In Projections
+ + ++ + + ++ +
++ + + ++ + ++ +
+ Summary: When projections is initially opened, a summary graph of the application execution + is presented to the user (assuming the summary module was linked to the application; see the + Compile Time Options section below). The utilization is an + aggregate of all the processors. + ++ + ++ + ++ +
+ Overview: In the overview graph, each processor is represented by a horizontal bar + with time increasing time from left to right. The color of the bar at that point in time + represents the utilization of that processor (see the color scale to the left of the overview + graph). + +Using Projections
+ + +Compile Time Options
+ +Runtime Options
+ +Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + + ++ + + ++ + + diff --git a/tutorial/ShadowArrays.html b/tutorial/ShadowArrays.html new file mode 100644 index 0000000..ec477d6 --- /dev/null +++ b/tutorial/ShadowArrays.html @@ -0,0 +1,197 @@ + + + + + + +++ + +++Reductions
@@ -33,7 +136,7 @@How to use Reductions
This call takes an integer specifying the size in bytes of the data you're contributing, a pointer to the data, and a special object of type CkReduction::reducerType that specifies what kind of reduction is being performed (e.g. sum of integers). For example, if you are summing a local variable representing force on each chare, the contribution call might look like this:
-double force = get_force(); // find local force
+double force = get_force(); // find local force
contribute(sizeof(double), &force, CkReduction::sum_double);In this tutorial we will only use simple reduction types like sum or max of basic types. For more information on the variety of reduction types available and on how to write your own reduction types, see the Charm++ manual.
@@ -56,3 +159,17 @@An Example
Now look at pireduce, which replaces the direct messages to the main chare with a reduction. Now instead of invoking one of Main's entry methods directly, the chares contribute to a reduction whose client invokes one of Main's methods. Main's receive method extracts the result of the reduction, reports the result, and then exits. Thus with only minor changes to the program we have converted it to use Charm++ reductions. Reductions are a basic building-block of parallel applications, and they will quickly become second nature to you as you become an experienced Charm++ programmer.
+ +Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + + ++ + + ++ + + diff --git a/content/tutorial/TutorialStyle.css b/tutorial/TutorialStyle.css similarity index 100% rename from content/tutorial/TutorialStyle.css rename to tutorial/TutorialStyle.css diff --git a/content/tutorial/examples/index.htm b/tutorial/examples.htm similarity index 100% rename from content/tutorial/examples/index.htm rename to tutorial/examples.htm diff --git a/content/tutorial/examples/ArrayHelloWorld.tar.gz b/tutorial/examples/ArrayHelloWorld.tar.gz similarity index 100% rename from content/tutorial/examples/ArrayHelloWorld.tar.gz rename to tutorial/examples/ArrayHelloWorld.tar.gz diff --git a/content/tutorial/examples/Basic2DJacobi.tar.gz b/tutorial/examples/Basic2DJacobi.tar.gz similarity index 100% rename from content/tutorial/examples/Basic2DJacobi.tar.gz rename to tutorial/examples/Basic2DJacobi.tar.gz diff --git a/content/tutorial/examples/Basic2DJacobi_CCS.tar.gz b/tutorial/examples/Basic2DJacobi_CCS.tar.gz similarity index 100% rename from content/tutorial/examples/Basic2DJacobi_CCS.tar.gz rename to tutorial/examples/Basic2DJacobi_CCS.tar.gz diff --git a/content/tutorial/examples/Basic2DJacobi_LiveViz.tar.gz b/tutorial/examples/Basic2DJacobi_LiveViz.tar.gz similarity index 100% rename from content/tutorial/examples/Basic2DJacobi_LiveViz.tar.gz rename to tutorial/examples/Basic2DJacobi_LiveViz.tar.gz diff --git a/content/tutorial/examples/Basic2DJacobi_LiveVizOld.tar.gz b/tutorial/examples/Basic2DJacobi_LiveVizOld.tar.gz similarity index 100% rename from content/tutorial/examples/Basic2DJacobi_LiveVizOld.tar.gz rename to tutorial/examples/Basic2DJacobi_LiveVizOld.tar.gz diff --git a/content/tutorial/examples/BasicHelloWorld.tar.gz b/tutorial/examples/BasicHelloWorld.tar.gz similarity index 100% rename from content/tutorial/examples/BasicHelloWorld.tar.gz rename to tutorial/examples/BasicHelloWorld.tar.gz diff --git a/content/tutorial/examples/BroadcastHelloWorld.tar.gz b/tutorial/examples/BroadcastHelloWorld.tar.gz similarity index 100% rename from content/tutorial/examples/BroadcastHelloWorld.tar.gz rename to tutorial/examples/BroadcastHelloWorld.tar.gz diff --git a/content/tutorial/examples/BubbleSort.tar.gz b/tutorial/examples/BubbleSort.tar.gz similarity index 100% rename from content/tutorial/examples/BubbleSort.tar.gz rename to tutorial/examples/BubbleSort.tar.gz diff --git a/content/tutorial/examples/ParallelPrefix.tar.gz b/tutorial/examples/ParallelPrefix.tar.gz similarity index 100% rename from content/tutorial/examples/ParallelPrefix.tar.gz rename to tutorial/examples/ParallelPrefix.tar.gz diff --git a/content/tutorial/examples/ParallelPrefix_NoBarrier.tar.gz b/tutorial/examples/ParallelPrefix_NoBarrier.tar.gz similarity index 100% rename from content/tutorial/examples/ParallelPrefix_NoBarrier.tar.gz rename to tutorial/examples/ParallelPrefix_NoBarrier.tar.gz diff --git a/content/tutorial/examples/ParallelPrefix_sdag.tar.gz b/tutorial/examples/ParallelPrefix_sdag.tar.gz similarity index 100% rename from content/tutorial/examples/ParallelPrefix_sdag.tar.gz rename to tutorial/examples/ParallelPrefix_sdag.tar.gz diff --git a/content/tutorial/examples/particlescode.tar b/tutorial/examples/particlescode.tar similarity index 100% rename from content/tutorial/examples/particlescode.tar rename to tutorial/examples/particlescode.tar diff --git a/content/tutorial/examples/script.sh b/tutorial/examples/script.sh similarity index 100% rename from content/tutorial/examples/script.sh rename to tutorial/examples/script.sh diff --git a/content/tutorial/images/2DJacobi_Decomposition.jpg b/tutorial/images/2DJacobi_Decomposition.jpg similarity index 100% rename from content/tutorial/images/2DJacobi_Decomposition.jpg rename to tutorial/images/2DJacobi_Decomposition.jpg diff --git a/content/tutorial/images/2DJacobi_NeighborComm.jpg b/tutorial/images/2DJacobi_NeighborComm.jpg similarity index 100% rename from content/tutorial/images/2DJacobi_NeighborComm.jpg rename to tutorial/images/2DJacobi_NeighborComm.jpg diff --git a/content/tutorial/images/2DJacobi_NeighborComm_sm.jpg b/tutorial/images/2DJacobi_NeighborComm_sm.jpg similarity index 100% rename from content/tutorial/images/2DJacobi_NeighborComm_sm.jpg rename to tutorial/images/2DJacobi_NeighborComm_sm.jpg diff --git a/content/tutorial/images/Advanced.png b/tutorial/images/Advanced.png similarity index 100% rename from content/tutorial/images/Advanced.png rename to tutorial/images/Advanced.png diff --git a/content/tutorial/images/ArrayHelloWorld_ProgramFlow.jpg b/tutorial/images/ArrayHelloWorld_ProgramFlow.jpg similarity index 100% rename from content/tutorial/images/ArrayHelloWorld_ProgramFlow.jpg rename to tutorial/images/ArrayHelloWorld_ProgramFlow.jpg diff --git a/content/tutorial/images/ArrayHelloWorld_ProgramFlow_sm.jpg b/tutorial/images/ArrayHelloWorld_ProgramFlow_sm.jpg similarity index 100% rename from content/tutorial/images/ArrayHelloWorld_ProgramFlow_sm.jpg rename to tutorial/images/ArrayHelloWorld_ProgramFlow_sm.jpg diff --git a/content/tutorial/images/ArrayProxyIndexing.jpg b/tutorial/images/ArrayProxyIndexing.jpg similarity index 100% rename from content/tutorial/images/ArrayProxyIndexing.jpg rename to tutorial/images/ArrayProxyIndexing.jpg diff --git a/content/tutorial/images/ArrayProxyIndexing_sm.jpg b/tutorial/images/ArrayProxyIndexing_sm.jpg similarity index 100% rename from content/tutorial/images/ArrayProxyIndexing_sm.jpg rename to tutorial/images/ArrayProxyIndexing_sm.jpg diff --git a/content/tutorial/images/BGFade_Left_DDFFDD.png b/tutorial/images/BGFade_Left_DDFFDD.png similarity index 100% rename from content/tutorial/images/BGFade_Left_DDFFDD.png rename to tutorial/images/BGFade_Left_DDFFDD.png diff --git a/content/tutorial/images/BGFade_Left_EEEEFF.png b/tutorial/images/BGFade_Left_EEEEFF.png similarity index 100% rename from content/tutorial/images/BGFade_Left_EEEEFF.png rename to tutorial/images/BGFade_Left_EEEEFF.png diff --git a/content/tutorial/images/BGFade_Left_FFA4A4.png b/tutorial/images/BGFade_Left_FFA4A4.png similarity index 100% rename from content/tutorial/images/BGFade_Left_FFA4A4.png rename to tutorial/images/BGFade_Left_FFA4A4.png diff --git a/content/tutorial/images/BGFade_Left_FFDDEE.png b/tutorial/images/BGFade_Left_FFDDEE.png similarity index 100% rename from content/tutorial/images/BGFade_Left_FFDDEE.png rename to tutorial/images/BGFade_Left_FFDDEE.png diff --git a/content/tutorial/images/BGFade_Left_FFEECC.png b/tutorial/images/BGFade_Left_FFEECC.png similarity index 100% rename from content/tutorial/images/BGFade_Left_FFEECC.png rename to tutorial/images/BGFade_Left_FFEECC.png diff --git a/content/tutorial/images/Basic.png b/tutorial/images/Basic.png similarity index 100% rename from content/tutorial/images/Basic.png rename to tutorial/images/Basic.png diff --git a/content/tutorial/images/BeyondCharm.png b/tutorial/images/BeyondCharm.png similarity index 100% rename from content/tutorial/images/BeyondCharm.png rename to tutorial/images/BeyondCharm.png diff --git a/content/tutorial/images/BroadcastHelloWorld_ProgramFlow.jpg b/tutorial/images/BroadcastHelloWorld_ProgramFlow.jpg similarity index 100% rename from content/tutorial/images/BroadcastHelloWorld_ProgramFlow.jpg rename to tutorial/images/BroadcastHelloWorld_ProgramFlow.jpg diff --git a/content/tutorial/images/BroadcastHelloWorld_ProgramFlow_sm.jpg b/tutorial/images/BroadcastHelloWorld_ProgramFlow_sm.jpg similarity index 100% rename from content/tutorial/images/BroadcastHelloWorld_ProgramFlow_sm.jpg rename to tutorial/images/BroadcastHelloWorld_ProgramFlow_sm.jpg diff --git a/content/tutorial/images/BubbleSortPhase.jpg b/tutorial/images/BubbleSortPhase.jpg similarity index 100% rename from content/tutorial/images/BubbleSortPhase.jpg rename to tutorial/images/BubbleSortPhase.jpg diff --git a/content/tutorial/images/BubbleSortPhase_sm.jpg b/tutorial/images/BubbleSortPhase_sm.jpg similarity index 100% rename from content/tutorial/images/BubbleSortPhase_sm.jpg rename to tutorial/images/BubbleSortPhase_sm.jpg diff --git a/content/tutorial/images/ChareClassCompileProcess.jpg b/tutorial/images/ChareClassCompileProcess.jpg similarity index 100% rename from content/tutorial/images/ChareClassCompileProcess.jpg rename to tutorial/images/ChareClassCompileProcess.jpg diff --git a/content/tutorial/images/ChareClassCompileProcess_sm.jpg b/tutorial/images/ChareClassCompileProcess_sm.jpg similarity index 100% rename from content/tutorial/images/ChareClassCompileProcess_sm.jpg rename to tutorial/images/ChareClassCompileProcess_sm.jpg diff --git a/content/tutorial/images/CharmFiles.jpg b/tutorial/images/CharmFiles.jpg similarity index 100% rename from content/tutorial/images/CharmFiles.jpg rename to tutorial/images/CharmFiles.jpg diff --git a/content/tutorial/images/CharmFiles_sm.jpg b/tutorial/images/CharmFiles_sm.jpg similarity index 100% rename from content/tutorial/images/CharmFiles_sm.jpg rename to tutorial/images/CharmFiles_sm.jpg diff --git a/content/tutorial/images/CharmHierarchy.jpg b/tutorial/images/CharmHierarchy.jpg similarity index 100% rename from content/tutorial/images/CharmHierarchy.jpg rename to tutorial/images/CharmHierarchy.jpg diff --git a/content/tutorial/images/CharmHierarchy_sm.jpg b/tutorial/images/CharmHierarchy_sm.jpg similarity index 100% rename from content/tutorial/images/CharmHierarchy_sm.jpg rename to tutorial/images/CharmHierarchy_sm.jpg diff --git a/content/tutorial/images/Charm_SysView.jpg b/tutorial/images/Charm_SysView.jpg similarity index 100% rename from content/tutorial/images/Charm_SysView.jpg rename to tutorial/images/Charm_SysView.jpg diff --git a/content/tutorial/images/Charm_SysView_sm.jpg b/tutorial/images/Charm_SysView_sm.jpg similarity index 100% rename from content/tutorial/images/Charm_SysView_sm.jpg rename to tutorial/images/Charm_SysView_sm.jpg diff --git a/content/tutorial/images/Charm_UserView.jpg b/tutorial/images/Charm_UserView.jpg similarity index 100% rename from content/tutorial/images/Charm_UserView.jpg rename to tutorial/images/Charm_UserView.jpg diff --git a/content/tutorial/images/Charm_UserView_sm.jpg b/tutorial/images/Charm_UserView_sm.jpg similarity index 100% rename from content/tutorial/images/Charm_UserView_sm.jpg rename to tutorial/images/Charm_UserView_sm.jpg diff --git a/content/tutorial/images/Intro.png b/tutorial/images/Intro.png similarity index 100% rename from content/tutorial/images/Intro.png rename to tutorial/images/Intro.png diff --git a/content/tutorial/images/LiveViz_2DJacobiExample_After.jpg b/tutorial/images/LiveViz_2DJacobiExample_After.jpg similarity index 100% rename from content/tutorial/images/LiveViz_2DJacobiExample_After.jpg rename to tutorial/images/LiveViz_2DJacobiExample_After.jpg diff --git a/content/tutorial/images/LiveViz_2DJacobiExample_Before.jpg b/tutorial/images/LiveViz_2DJacobiExample_Before.jpg similarity index 100% rename from content/tutorial/images/LiveViz_2DJacobiExample_Before.jpg rename to tutorial/images/LiveViz_2DJacobiExample_Before.jpg diff --git a/content/tutorial/images/LiveViz_2DJacobiExample_BeforeAfter.jpg b/tutorial/images/LiveViz_2DJacobiExample_BeforeAfter.jpg similarity index 100% rename from content/tutorial/images/LiveViz_2DJacobiExample_BeforeAfter.jpg rename to tutorial/images/LiveViz_2DJacobiExample_BeforeAfter.jpg diff --git a/content/tutorial/images/LiveViz_2DJacobiExample_ColorScale.jpg b/tutorial/images/LiveViz_2DJacobiExample_ColorScale.jpg similarity index 100% rename from content/tutorial/images/LiveViz_2DJacobiExample_ColorScale.jpg rename to tutorial/images/LiveViz_2DJacobiExample_ColorScale.jpg diff --git a/content/tutorial/images/PPLLogo_icon60.jpg b/tutorial/images/PPLLogo_icon60.jpg similarity index 100% rename from content/tutorial/images/PPLLogo_icon60.jpg rename to tutorial/images/PPLLogo_icon60.jpg diff --git a/content/tutorial/images/PackUnPackProcess.jpg b/tutorial/images/PackUnPackProcess.jpg similarity index 100% rename from content/tutorial/images/PackUnPackProcess.jpg rename to tutorial/images/PackUnPackProcess.jpg diff --git a/content/tutorial/images/PackUnPackProcess_sm.jpg b/tutorial/images/PackUnPackProcess_sm.jpg similarity index 100% rename from content/tutorial/images/PackUnPackProcess_sm.jpg rename to tutorial/images/PackUnPackProcess_sm.jpg diff --git a/content/tutorial/images/ParallelPrefixCalculation.jpg b/tutorial/images/ParallelPrefixCalculation.jpg similarity index 100% rename from content/tutorial/images/ParallelPrefixCalculation.jpg rename to tutorial/images/ParallelPrefixCalculation.jpg diff --git a/content/tutorial/images/ParallelPrefixCalculation_sm.jpg b/tutorial/images/ParallelPrefixCalculation_sm.jpg similarity index 100% rename from content/tutorial/images/ParallelPrefixCalculation_sm.jpg rename to tutorial/images/ParallelPrefixCalculation_sm.jpg diff --git a/content/tutorial/images/ParallelPrefixFormula.jpg b/tutorial/images/ParallelPrefixFormula.jpg similarity index 100% rename from content/tutorial/images/ParallelPrefixFormula.jpg rename to tutorial/images/ParallelPrefixFormula.jpg diff --git a/content/tutorial/images/ParallelPrefixSolution_ControlFlow.jpg b/tutorial/images/ParallelPrefixSolution_ControlFlow.jpg similarity index 100% rename from content/tutorial/images/ParallelPrefixSolution_ControlFlow.jpg rename to tutorial/images/ParallelPrefixSolution_ControlFlow.jpg diff --git a/content/tutorial/images/ParallelPrefixSolution_ControlFlow_sm.jpg b/tutorial/images/ParallelPrefixSolution_ControlFlow_sm.jpg similarity index 100% rename from content/tutorial/images/ParallelPrefixSolution_ControlFlow_sm.jpg rename to tutorial/images/ParallelPrefixSolution_ControlFlow_sm.jpg diff --git a/content/tutorial/images/Projections_Overview.jpg b/tutorial/images/Projections_Overview.jpg similarity index 100% rename from content/tutorial/images/Projections_Overview.jpg rename to tutorial/images/Projections_Overview.jpg diff --git a/content/tutorial/images/Projections_Overview_sm.jpg b/tutorial/images/Projections_Overview_sm.jpg similarity index 100% rename from content/tutorial/images/Projections_Overview_sm.jpg rename to tutorial/images/Projections_Overview_sm.jpg diff --git a/content/tutorial/images/Projections_Summary.jpg b/tutorial/images/Projections_Summary.jpg similarity index 100% rename from content/tutorial/images/Projections_Summary.jpg rename to tutorial/images/Projections_Summary.jpg diff --git a/content/tutorial/images/Projections_Summary_sm.jpg b/tutorial/images/Projections_Summary_sm.jpg similarity index 100% rename from content/tutorial/images/Projections_Summary_sm.jpg rename to tutorial/images/Projections_Summary_sm.jpg diff --git a/content/tutorial/images/ProxiesAndGOS.jpg b/tutorial/images/ProxiesAndGOS.jpg similarity index 100% rename from content/tutorial/images/ProxiesAndGOS.jpg rename to tutorial/images/ProxiesAndGOS.jpg diff --git a/content/tutorial/images/ProxiesAndGOS_sm.jpg b/tutorial/images/ProxiesAndGOS_sm.jpg similarity index 100% rename from content/tutorial/images/ProxiesAndGOS_sm.jpg rename to tutorial/images/ProxiesAndGOS_sm.jpg diff --git a/content/tutorial/images/Ref.png b/tutorial/images/Ref.png similarity index 100% rename from content/tutorial/images/Ref.png rename to tutorial/images/Ref.png diff --git a/content/tutorial/images/TOCArrowRight.png b/tutorial/images/TOCArrowRight.png similarity index 100% rename from content/tutorial/images/TOCArrowRight.png rename to tutorial/images/TOCArrowRight.png diff --git a/content/tutorial/images/VertBackRepeat.jpg b/tutorial/images/VertBackRepeat.jpg similarity index 100% rename from content/tutorial/images/VertBackRepeat.jpg rename to tutorial/images/VertBackRepeat.jpg diff --git a/content/tutorial/images/VertBackRepeat_sm.jpg b/tutorial/images/VertBackRepeat_sm.jpg similarity index 100% rename from content/tutorial/images/VertBackRepeat_sm.jpg rename to tutorial/images/VertBackRepeat_sm.jpg diff --git a/content/tutorial/images/VertBackRepeat_sm_r.jpg b/tutorial/images/VertBackRepeat_sm_r.jpg similarity index 100% rename from content/tutorial/images/VertBackRepeat_sm_r.jpg rename to tutorial/images/VertBackRepeat_sm_r.jpg diff --git a/content/exercises/images/particlescode.png b/tutorial/images/particlescode.png similarity index 100% rename from content/exercises/images/particlescode.png rename to tutorial/images/particlescode.png diff --git a/content/exercises/images/particlescode_lb.png b/tutorial/images/particlescode_lb.png similarity index 100% rename from content/exercises/images/particlescode_lb.png rename to tutorial/images/particlescode_lb.png diff --git a/content/tutorial/index.html b/tutorial/index.html similarity index 66% rename from content/tutorial/index.html rename to tutorial/index.html index 1457a13..9a58c70 100644 --- a/content/tutorial/index.html +++ b/tutorial/index.html @@ -1,14 +1,117 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - + + + + + + +++ + ++ + + ++Shadow Arrays (AKA: Bound Arrays)
+ ++ It may be useful to have corresponding elements of two or more chare arrays to be located on the + same processor. For example, if the corresponding elements of chare arrays X and Y + frequently communicate with one another, it would be advantageous to have those elements, + X[i] and Y[i], on the same processor to reduce communication costs. Shadow arrays + in Charm++ are a way of accomplishing just this type of behavior. When two or more arrays are + bound, the Charm++ Runtime System ensures that the objects are always located on the same + physical processor. +
+ + ++
+ + ++ +++ // Assumed variable declarations
+
+ int numElements = ...;
+
+ // Creation of first array (standard process)
+ CProxy_myArrayClass1 myArray1 = CProxy_myArrayClass1::ckNew(...parameters... , numElements);
+
+ // Creation of the second array (bound to the first)
+ CkArrayOptions ckOptions(numElements);
+ opts.bindTo(myArray1);
+ CProxy_myArrayClass2 myArray2 = CProxy_myArrayClass2::ckNew(ckOptions);
+ ++ Figure 1: Code to create two chare arrays, myArray1 and myArray2, which are + bound together. + + Shadow (or bound) arrays create a relationship in terms of element mapping and load balancing between + two or more arrays. The idea is fairly straight forward. First, a chare array is created in the + standard manner. Then, a second array is created as a shadow of the first array (or bound to the first + array). This is done through the CkArrayOptions parameter to the ckNew call which creates + the second array. Additional arrays can also be bound to this set of bound arrays in a similar manner. +
+ ++ There is no restriction to the type or number of chare elements in the arrays. The bound arrays can + be instances of the same chare array class or be different chare array classes. Also, the arrays do not + have to have the same number of elements. For indexes where the other array doesn't have a corresponding + element, the element that exists is free to move between processors arbitrarily. The Charm++ Runtime + System simply ensures that when there are corresponding elements (that is, elements at + corresponding indexes), in a set of bound arrays, the elements will be located on the same physical + processor. +
+ ++ Since the corresponding objects are always located on the same processor, the objects can take advantage + of this in various ways. For example, they can obtain local pointers to one another via a call to + ckLocal() on the proxy for the element. However, each time the objects migrate (i.e. the objects + are unpacked via the Pack-UnPack Routines), the local pointer needs + to be refreshed so it is a valid pointer on the new physical processor. +
+ ++ For more information on callbacks, please see + Section 3.8.6: Advanced Array Creation: Bound Arrays of the + The Charm++ Programming Language Manual +
+ +Charm++: Tutorial + + + + + + + + + + + + + + + + + + + + + ++ + + +++ + +
Applications
Molecular Dynamics - NAMD
+Molecular Dynamics - NAMD
NAMD, recipient of a 2002 Gordon Bell Award, is a parallel molecular dynamics application designed for high-performance simulation of large biomolecular systems. NAMD is a result of many years of collaboration between Prof. @@ -59,8 +163,9 @@
Applications
-
N-Body Cosmological Simulations - ChaNGa
++
N-Body Cosmological Simulations - ChaNGa

Applications
GPU clusters. Over time, ChaNGa is being actively developed and improved, with an eye for efficient utilization and scaling of current and future supercomputing systems. -See ChaNGa web +See ChaNGa web site for download and instructions. --
Contagion in Social Networks - Episimdemics
++
Contagion in Social Networks - Episimdemics


Applications
-
Ab initio Molecular Dynamics - OpenAtom
++
Ab initio Molecular Dynamics - OpenAtom


Many important problems in material science, chemistry, solid-state physics, and biophysics require a modeling approach based on fundamental @@ -167,13 +271,16 @@
Applications
using a large number of virtual processors, which are mapped flexibly to available processors with assistance from the Charm++ runtime system. See the OpenAtom web site for more -details.+details.
+
+
-
Computational Science and Engineering Applications - CSE
++
Computational Science and Engineering Applications - CSE
Professor P. Geubelle and S. Breitenfeld of the Computational Solid Mechanics Group have developed CrackProp, an explicit Finite Element method @@ -184,8 +291,7 @@
Applications


Professor J. Dantzig and Jun-Ho Jeong of the Solidification Processing Lab @@ -204,12 +310,12 @@
Applications
-
Advanced Rocket Simulation
++
Advanced Rocket Simulation


@@ -238,11 +344,11 @@
Applications
-
PSTIP - Parallel Stochastic Integer Programming
++
PSTIP - Parallel Stochastic Integer Programming


Stochastic optimization is used for optimal resource allocation under @@ -271,6 +377,21 @@
Applications
tree and parallel evaluation of scenarios. Our designs show strong scaling to hundreds of cores. See the report for more details. - +Capabilities
+ +Automatic Overlap
+ +Because parallelism in Charm++ is expressed via interacting parallel objects +instead of processors, the runtime system can seamlessly provide overlap of +communication and computation as an application runs.
+ +Automatic Load Balancing
+ +Charm++ ships with an entire suite of load balancers, which can be selected +at runtime. All the application must do is provide a hint on when it is a good +time to synchronize for load balancing.
+ +Automatic Checkpointing and Fault Tolerance
+ +Charm++ can easily checkpoint an application's data to disk or to the memory +of a buddy node. If a fault occurs and the job persists, Charm++ will detect a +hard node failure and automatically continue execution from the previous +in-memory checkpoint. The programmer simply specifies the data to checkpoint +using a clean interface that is used for load balancing to serialize the +data.
+ +Power/Energy Optimization
+ +Charm++ runtime can reduce total energy consumption i.e. both +machine and cooling energy consumptions, by combining control over the +processor operating frequency/voltage with object migration. It also enables +restraining core temperatures to save cooling energy.
+ +Portable Code
+ +Charm++ comes pre-packaged with many machine layers that are tuned to the +latest supercomputer architectures, ranging from Blue Gene/Q to Cray XK6.
+ +Independent Modules, Interleaved Execution
+ +Because Charm++ programs are written in terms of a set of modules that +define parallel objects, multiple modules can execute concurrently. When one +module has little work to do or is idle, another module can fill the gap. Because +work can be prioritized in Charm++, the user can specify which objects have +priority and they will be treated accordingly by the Charm++ scheduler
+ +Interoperable with MPI, OpenMP and CUDA
+ ++Charm++ supports time and space sharing with MPI, allowing MPI to execute with Charm++ code either +in phases or partitioned by processors. Charm++ comes with its own OpenMP runtime which can be used +with Clang, GCC, and ICC compilers and which allows co-scheduling OpenMP tasks and Charm++ entry methods. +Charm++ also provides support for asynchronously executing CUDA kernels on the GPU and for orchestrating +data movement between hosts and devices.
+ +Ecosystem of Tools
+ +Charm++ is not just a programming language or runtime system; it also comes with a +full suite of tools, ranging from a parallel debugger to performance +visualization. You can even inject python code on the fly as your application +runs using the CCS tool.
+ +Topic | +Slides | +Further Reading | +
Introduction | +pptx pdf | ++ |
Migratable-Objects, Programming Model | +pptx pdf | +A short article about Charm | +
Model Contd. + Basic Charm++ | ++ | |
Task parallelism and grainsize | ++ | |
Chare Arrays | +Basic Chares Arrays | +|
Structure Dagger(Message coordination) | +LJdyanmics | +|
Dynamic Load Balancing and PUP | ++ | |
Static Assignment + Debugging Techniques | +pptx | ++ |
Load balancing, Basic Messages, LiveViz | + LiveViz + GreedyVsRefineLB |
+ + |
Using Projections, and Charmdebug | + Projections + CharmDebug |
+ Projections Manual + CharmDebug Manual + |
+
Threaded Methods and blocking methods | +pptx | ++ |
Threads, futures, Cth calls, and explicit suspension | +pptx | +Threaded Methods | +
Messages, Priorities, Groups and Nodegroups | +pptx | +Messages | +
Array Sections | +pptx | ++ |
Shared Memory within a node: Basic Mechanisms, Conventions, + Conditional Packing, Boom-arrays |
+ + | + |
Shared Memory: Charm+OpenMP, CkLoop | ++ | + |
Adaptive MPI, MSA | ++ | + |
Writing your own load balancer | ++ | + |
State Space Search, Divide/Conquer apps, Seed balancer, Mixing data-parallel programs with task parallelism |
+ + | + |
Writing your own load balancer | ++ | + |
Fault Tolerance | ++ | + |
Charisma, Charj | ++ | + |
Application Case studies: NAMD, ChaNGa | ++ | + |
Application Case studies: OpenAtom, CharmSimdemics + Interoperability with MPI |
+ + | + |
Email the Charm++ developers + at charm@lists.cs.illinois.edu for bug + reports or Charm++ questions.
+ +Email the Parallel Programming Laboratory + at ppl@lists.cs.illinois.edu.
+ +Contact Professor Laxmikant (Sanjay) Kale at: +
+Email: | +kale@illinois.edu | +
Phone: | +(217) 244-0094 | +
Office: | +4212 Siebel Center | +
Coordinator: | +Shanna M. DeSouza | +
Email: | +desouzas@illinois.edu | +
Phone: | +(217) 244-9104 | +
Capabilities
- -Automatic Overlap
- -Because parallelism in Charm++ is expressed via interacting parallel objects -instead of processors, the runtime system can seamlessly provide overlap of -communication and computation as an application runs.
- -Automatic Load Balancing
- -Charm++ ships with an entire suite of load balancers, which can be selected -at runtime. All the application must do is provide a hint on when it is a good -time to synchronize for load balancing.
- -Automatic Checkpointing and Fault Tolerance
- -Charm++ can easily checkpoint an application's data to disk or to the memory -of a buddy node. If a fault occurs and the job persists, Charm++ will detect a -hard node failure and automatically continue execution from the previous -in-memory checkpoint. The programmer simply specifies the data to checkpoint -using a clean interface that is used for load balancing to serialize the -data.
- -Power/Energy Optimization
- -Charm++ runtime can reduce total energy consumption i.e. both -machine and cooling energy consumptions, by combining control over the -processor operating frequency/voltage with object migration. It also enables -restraining core temperatures to save cooling energy.
- -Portable Code
- -Charm++ comes pre-packaged with many machine layers that are tuned to the -latest supercomputer architectures, ranging from Blue Gene/Q to Cray XK6.
- -Independent Modules, Interleaved Execution
- -Because Charm++ programs are written in terms of a set of modules that -define parallel objects, multiple modules can execute concurrently. When one -module has little work to do or is idle, another module can fill the gap. Because -work can be prioritized in Charm++, the user can specify which objects have -priority and they will be treated accordingly by the Charm++ scheduler
- -Interoperable with MPI, OpenMP and CUDA
- --Charm++ supports time and space sharing with MPI, allowing MPI to execute with Charm++ code either -in phases or partitioned by processors. Charm++ comes with its own OpenMP runtime which can be used -with Clang, GCC, and ICC compilers and which allows co-scheduling OpenMP tasks and Charm++ entry methods. -Charm++ also provides support for asynchronously executing CUDA kernels on the GPU and for orchestrating -data movement between hosts and devices.
- -Ecosystem of Tools
- -Charm++ is not just a programming language or runtime system; it also comes with a -full suite of tools, ranging from a parallel debugger to performance -visualization. You can even inject python code on the fly as your application -runs using the CCS tool.
diff --git a/content/classmaterial.html b/content/classmaterial.html deleted file mode 100644 index 050cf3b..0000000 --- a/content/classmaterial.html +++ /dev/null @@ -1,148 +0,0 @@ ---- -mec: home -tutorialc: tutorial -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - -Topic | -Slides | -Further Reading | -
Introduction | -pptx pdf | -- |
Migratable-Objects, Programming Model | -pptx pdf | -A short article about Charm | -
Model Contd. + Basic Charm++ | -- | |
Task parallelism and grainsize | -- | |
Chare Arrays | -Basic Chares Arrays | -|
Structure Dagger(Message coordination) | -LJdyanmics | -|
Dynamic Load Balancing and PUP | -- | |
Static Assignment + Debugging Techniques | -pptx | -- |
Load balancing, Basic Messages, LiveViz | - LiveViz - GreedyVsRefineLB |
- - |
Using Projections, and Charmdebug | - Projections - CharmDebug |
- Projections Manual - CharmDebug Manual |
-
Threaded Methods and blocking methods | -pptx | -- |
Threads, futures, Cth calls, and explicit suspension | -pptx | -Threaded Methods | -
Messages, Priorities, Groups and Nodegroups | -pptx | -Messages | -
Array Sections | -pptx | -- |
Shared Memory within a node: Basic Mechanisms, Conventions, - Conditional Packing, Boom-arrays |
- - | - |
Shared Memory: Charm+OpenMP, CkLoop | -- | - |
Adaptive MPI, MSA | -- | - |
Writing your own load balancer | -- | - |
State Space Search, Divide/Conquer apps, Seed balancer, Mixing data-parallel programs with task parallelism |
- - | - |
Writing your own load balancer | -- | - |
Fault Tolerance | -- | - |
Charisma, Charj | -- | - |
Application Case studies: NAMD, ChaNGa | -- | - |
Application Case studies: OpenAtom, CharmSimdemics - Interoperability with MPI |
- - | - |
Email the Charm++ developers - at charm@lists.cs.illinois.edu for bug - reports or Charm++ questions.
- -Email the Parallel Programming Laboratory - at ppl@lists.cs.illinois.edu.
- -Contact Professor Laxmikant (Sanjay) Kale at: -
Email: | kale@illinois.edu |
Phone: | (217) 244-0094 |
Office: | 4212 Siebel Center |
Coordinator: | Shanna M. DeSouza |
Email: | desouzas@illinois.edu |
Phone: | (217) 244-9104 |
Downloads
-Charm++
- -Latest release version: Charm++ v7.0.0 (tar.gz)
-Latest dev version: Nightly Snapshot (tar.gz)
- -The latest development version of Charm++ can be downloaded directly from our - git source archive. To get the Charm++ source code:
-git clone https://github.com/UIUC-PPL/charm-
This will create a directory named charm. Move to this directory:
-cd charm-
And now build Charm++ (netlrts-linux example):
-./build charm++ netlrts-linux-x86_64 [ --with-production | -g ]-
This will make a netlrts-linux-x86_64 directory, with bin, include, lib, etc. subdirectories.
-Note that this development version may not be as portable or robust as the released - versions. Therefore, it may be prudent to keep a backup of old copies of - Charm++.
- - -Projections
- -Projections is Charm++'s performance analysis tool.
- -Latest release binaries: Projections (tar.gz)
- -To get the latest Projections source code:
-git clone https://github.com/UIUC-PPL/projections-
To build (requires gradle):
-cd projections- - -
make
LiveViz and Charm Debug
- -LiveViz is Charm++'s live program visualization tool. Charm Debug is Charm++'s parallel debugger.
- -Latest Charm Debug release binaries: Charm Debug (tar.gz)
- -To get the latest ccs_tools (includes both LiveViz and Charm Debug source code):
-git clone https://github.com/UIUC-PPL/ccs_tools-
To build:
-cd ccs_tools- - -
ant
License
- -Charm++ and associated software are licensed under the Charm++/Converse License
. - -Learn Charm++ via a series of programming exercises
--
-
!! Under Construction, the following is currently being updated and will include links to the exercises !!
- --
-
-
-
-
-
-
-
-
-
-
-
Chare Arrays: Design Exercise
- -Data Balancing:
- - -Assume you have a 1D chare array -A. Each chare (say A[i]) in it holds a vector of numbers. The size of -this vector is different on different chares (say sizei on -A[i]). Your task is to equalize the load on all processors by -exchanging the numbers. It is not necessary to do minimal data -movement, but it is desirable. The balance at the end needs to be -almost exact. If there are a total of N numbers, and v chares, there -should be between floor (N/v): ceil(N/v) items on each chare. Note -that the only way to send information to another chare is by sending -an (entry) method invocation to it.
- -There are many distinct algorithms possible. Sketch the -alternatives without coding them, and write cost estimates for them. -Keep in mind that the simplest (i.e. approximate) cost model in -Charm++: entry methods invocation's cost \alpha + n . {\beta}, where -α is a fixed cost, and β is a per-byte cost. For the sake of -intuition, you may assume α is about a thousand times larger than -β, say a microsecond vs a nanosecond. -Reductions and broadcasts of size N data on P processors cost \alpha log (P) + -N \beta. -Keep in mind that many -(but not all) -of the algorithms for this problem have two phases: first phase to identify who -should send how many numbers to whom, and second to actually do the -data exchange. Make sure to write your time estimates for both phases. -Compare two of the interesting algorithms in terms of cost, performance -tradeoffs if any (e.g. is each algorithm better in different -scenarios), scalability and coding complexity. -By scalability, here, we mean how well the algorithm behaves with a -large number of chares and/or a large number of physical processors.
\ No newline at end of file diff --git a/content/exercises/arraysmethod1.html b/content/exercises/arraysmethod1.html deleted file mode 100644 index 3eabb90..0000000 --- a/content/exercises/arraysmethod1.html +++ /dev/null @@ -1,14 +0,0 @@ ---- -title: Chare Arrays Method 1 Exercise -homec: home -tutorialc: tutorial -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - -Chare Arrays: Method 1 Exercise
- diff --git a/content/exercises/arraysmethod2.html b/content/exercises/arraysmethod2.html deleted file mode 100644 index e03d9ae..0000000 --- a/content/exercises/arraysmethod2.html +++ /dev/null @@ -1,14 +0,0 @@ ---- -title: Chare Arrays Method 2 Exercise -homec: home -tutorialc: tutorial -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - -Chare Arrays: Method 2 Exercise
- diff --git a/content/exercises/arraysmethod3.html b/content/exercises/arraysmethod3.html deleted file mode 100644 index 9a4493f..0000000 --- a/content/exercises/arraysmethod3.html +++ /dev/null @@ -1,14 +0,0 @@ ---- -title: Chare Arrays Method 3 Exercise -homec: home -tutorialc: tutorial -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - -Chare Arrays: Method 3 Exercise
- diff --git a/content/exercises/k-means.html b/content/exercises/k-means.html deleted file mode 100644 index e39ac56..0000000 --- a/content/exercises/k-means.html +++ /dev/null @@ -1,26 +0,0 @@ ---- -title: Chare Arrays Design Exercise -homec: home -tutorialc: tutorial -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - -Chare Arrays: K-Means Clustering Exercise
- -K-Means Clustering:
- - -Give a collection M points in 2 dimensions, optimally classify them in K clusters using an iterative refinement algorithm, and output only the centroid of each cluster.
-PART A: (for simplicity, we will use 2 dimensions, the real problems typically involve a larger number of dimensions). More concretely: create a chare array called Points of N chares, each with M/N data points. The constructors initialize each data point with random X and Y coordinates, (0 <= X,Y < 1.0). The main chare generates K random points as an initial guess for centroids of the K clusters, and broadcasts them (as an array of x-y pairs) to the Points chare array, to an entry method called Assign. This method decides, for each point it owns, which centroid it is closest to. It then contributes into 2 reductions: one a sum of points it added to each cluster (so an integer array of size K) and another a sum of X and Y coordinates for each cluster (so, an array of 2K doubles). The target of the reductions are the UpdateCounts and UpdateCoords methods in the main chare. When both reductions are complete the main chare updates the centroids of each cluster (simply calclulate, for each of the K clusters, the sum of X coordinates i’th cluster divided by the count of points assigned to i’th cluster, and similarly for Y.) The algorithm then repeats the Assign (via broadcast) and Update steps, until the assignment of points to cluster remains unchanged. We will approximate this by calculating (in the main chare) the changes to any centroid, and when no centroid coordinate changes beyond a small threshold T (say 0.001), we will assume the algorithm has converged.
-Part B: Reduce the number of reductions in each iteration from 2 to 1. Hint: there are 2 ways of doing this: first involves approximating the counts to be double precision numbers and the second involves writing a custom reduction).
-Part C: The “no change to centroids above a threshold” method above is an approximation. Implement a more accurate method by using an additional reduction of the number of points which have changed their “allegiance” (i.e. the cluster to which they belong in each iteration. If this number is 0, the algorithm has converged. Again, as in part B, reduce the number of reductions to just 1 if possible.
-Part D: Change the random number generation so that you create K sets, with each set centered around a distinct coordinate (purported centroid), and see if your program is able to retrieve the clusters correctly. Use any method to restrict points around a centroid to within the 0.0:1.0 box, along each dimension.
\ No newline at end of file diff --git a/content/exercises/liveViz.html b/content/exercises/liveViz.html deleted file mode 100644 index 3568061..0000000 --- a/content/exercises/liveViz.html +++ /dev/null @@ -1,44 +0,0 @@ ---- -title: LiveViz Particle Exercise (Extension) -homec: home -tutorialc: tutorial -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - - -LiveViz: Particle Exercise (Extension)
- -
-
- ![]() |
Figure 1 |
For this exercise, continue by building upon your work for the previous exercise. Add colour to emphasise the skew and watch the balancer work.
- --
-
Load Balancing: Particle Exercise (Extension)
- -For this exercise, continue by building upon your work for the previous exercise. You will now work to create measurable imbalance so that load balancers have something meaningful to fix.
- --
-
Variants: Instructors may choose to use a different formula for introducing imbalance
\ No newline at end of file diff --git a/content/exercises/oddevensort.html b/content/exercises/oddevensort.html deleted file mode 100644 index 2f12364..0000000 --- a/content/exercises/oddevensort.html +++ /dev/null @@ -1,22 +0,0 @@ ---- -title: Chare Arrays Odd-Even Sort Exercise -homec: home -tutorialc: tutorial -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - -Chare Arrays: Odd-Even Sort Exercise
- -A mainchare has a chare array of size N. Each chare in the array generates its own random number, with a unique seed. The array must be sorted in N steps, where at each step, each chare may communicate with at most one of its neighbors (i.e. on an 'odd' step adjacent chares are paired in one fashion, and on an 'even' step the pairing is switched). No barriers are allowed, and there is only one reduction, at the end, which checks that the elements have been properly sorted. No global synchronizations are allowed during iterations, the objective of the assignment is to learn to handle asynchrony.
- -For example:
-Even steps: 0--1 2--3 4--5 6--7
-Odd steps: 0 1--2 3--4 5--6 7
-N is a command line parameter.
- -No SDAG is to be used.
diff --git a/content/exercises/particle.html b/content/exercises/particle.html deleted file mode 100644 index ff77812..0000000 --- a/content/exercises/particle.html +++ /dev/null @@ -1,75 +0,0 @@ ---- -title: Chare Arrays Particle Exercise -homec: home -tutorialc: tutorial -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - -Chare Arrays: Particle Exercise
- - -
-
- ![]() |
- Figure 1 - |
Random Migrating Particles
- -For this exercise, you will write code that simulates a set of particles moving - randomly in a 2-dimensional space within a bounding box. The coordinates of the - overall simulation box are between 0.0 and 100.0 along each dimension. The particles - are divided among chares based on their coordinates. The chares should be organized - in a 2-dimensional array of chares (using a chare array) of size k \times k. So, each - chare owns a bounding box of its own with size 100.0/k. The particles in each chare - can be stored as a vector.
- -Your program should generate n particles per chare during construction with a - random (but valid, i.e. within the chare) position for particles. Your program - should accept the number of particles per cell n, and k as command line parameters - in that sequence.
- -Expected Output: Your program should calculate and print to screen the maximum and - total number of particles every 10 iterations. Additionally, the simulation should - not be delayed by this calculation (i.e. you should use reductions).
- -For testing your program, you can use 10000 (=n) particles per chare, simulated over 100 steps - and a chare array of size 16 \times 16 (k=16). Experiment with different number of particles and - chare array sizes.
- -Note: There might be multiple particles having the same x and y coordinates, especially - if you increase the density of each cell. You do not need to handle this case separately; - it is a valid case assumption.
- --
-
for(iteration=0; iteration<ITERATION; iteration++){
- -
-
-
-
}
-Instructor's Note: This exercise can be implemented in several ways, feel free to choose from the variations below or create your own based on your learning objectives:
--
-
Basic Chares: Primality Testing Exercise
-Part A
-Write a program based on the outline below. -(Note that the program has a few artificial restrictions/elements that -are meant to teach you specific concepts. So, please follow the -instructions to the letter.)
- -The main chare generates K random integers, and fires a -checkPrimality chare for each. -The chare checks if the number given to it is a prime using a variant of the function -below, and returns the result to the main chare. The main chare -maintains an array of pairs: <number, Boolean>, and prints it at the -end. An entry should be added to this array (with the number being tested, and a -default value such as "False") as soon as the chare is fired. In -particular, you are not allowed to delay adding the entry after the -result is returned by the chare. -Make sure that your program does not search the array when a response -comes. So, figure out a bookkeeping scheme to avoid it.
- -Obtain K from a command line argument. You may use rand() from -the math library for generating random integers.
- -For testing primality, use the following function. For extra credit, -modify it so that it is not -testing for every i, but (a) avoids testing even numbers except 2 and -(b) don’t let the loop run all the way to “number-1”).
- -
-
-int isPrime(const long number)
-
-{
- if(number<=1) return 0;
- for(int i=2; i<number; i++)
- {
- if(0 == number%i)
- return 0;
- }
- return 1;
-}
-
Part B (grainsize control)
- -Measuring performance and improving it via grainsize control:
- -Grainsize control -is a way to improve performance of the above program. -Use information from the Charm++ manual about how to pass -arrays of data to entry methods, and send a bunch (M) of numbers to be -tested to each -new Chare, and experiment with different values of M to get good -performance. -You may wish to read M as a command line parameter, for ease of experimentation. -Measure -performance by adding two calls to CkTimer() in the main chare, one -just before -starting creation of checkPrimality chares, and the other after all -the results have been returned (but before they are printed), and -printing the difference between the timers. You may -omit (and probably should omit) printing primality results for performance runs. -Vary M and report smallest M for -which performance was within 5% infinite grainsize (i.e. $M == K$). -Again, make sure our artificial restriction is obeyed: do not send -back the numbers the number being tested (because you are not allowed -to search for it anyway).
- -Part C:
-Let the numbers being tested be 64 bit random numbers. For simplicity, -generate them by concatenating 2 32 bit random numbers.
\ No newline at end of file diff --git a/content/exercises/projections.html b/content/exercises/projections.html deleted file mode 100644 index 2e005a6..0000000 --- a/content/exercises/projections.html +++ /dev/null @@ -1,33 +0,0 @@ ---- -title: Projections Particle Exercise (Extension) -homec: home -tutorialc: tutorial -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - - -Projections: Particle Exercise (Extension)
- -For this exercise, continue by building upon your work for the previous exercise. You will now work to enable dynamic migration so Charm++ balancers can actually move work around.
- --
-
Getting Started with Charm++
--
-
-
-
History
- -The Charm software was developed as a group effort of the Parallel Programming -Laboratory at the University of Illinois at Urbana-Champaign. Researchers at -the Parallel Programming Laboratory keep Charm++ updated for the new machines, -new programming paradigms, and for supporting and simplifying development of -emerging applications for parallel processing. The earliest prototype, Chare -Kernel(1.0), was developed in the late eighties. It consisted only of basic -remote method invocation constructs available as a library. The second -prototype, Chare Kernel(2.0), a complete re-write with major design changes. -This included C language extensions to denote Chares, messages and -asynchronous remote method invocation. Charm(3.0) improved on this syntax, and -contained important features such as information sharing abstractions, and -chare groups (called Branch Office Chares). Charm(4.0) included Charm++ and -was released in fall 1993. Charm++ in its initial version consisted of -syntactic changes to C++ and employed a special translator that parsed the -entire C++ code while translating the syntactic extensions. Charm(4.5) had a -major change that resulted from a significant shift in the research agenda of -the Parallel Programming Laboratory. The message-driven runtime system code of -the Charm++ was separated from the actual language implementation, resulting -in an interoperable parallel runtime system called Converse. The Charm++ -runtime system was retargetted on top of Converse, and popular programming -paradigms such as MPI and PVM were also implemented on Converse. This allowed -interoperability between these paradigms and Charm++. This release also -eliminated the full-fledged Charm++ translator by replacing syntactic -extensions to C++ with C++ macros, and instead contained a small language and -a translator for describing the interfaces of Charm++ entities to the runtime -system. This version of Charm++, which, in earlier releases was known as -Interface Translator Charm++, is the default version of Charm++ now, and hence -referred simply as Charm++. In early 1999, the runtime system of Charm++ was -rewritten in C++. Several new features were added. The interface language -underwent significant changes, and the macros that replaced the syntactic -extensions in original Charm++, were replaced by natural C++ constructs. Late -1999, and early 2000 reflected several additions to Charm++, when a load -balancing framework and migratable objects were added to Charm++. diff --git a/content/index.html b/content/index.html deleted file mode 100644 index 0b975a0..0000000 --- a/content/index.html +++ /dev/null @@ -1,121 +0,0 @@ ---- -title: Parallel Programming Framework -homec: home selected homeSelected -tutorialc: tutorial -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - -News
--
-
Charm++ is a parallel programming framework in C++ supported by - an adaptive runtime system, which enhances user productivity and allows - programs to run portably from small multicore computers (your laptop) to - the largest supercomputers.
- -It enables users to easily expose and express much of the parallelism - in their algorithms while automating many of the requirements for high - performance and scalability. It permits writing parallel programs in - units that are natural to the domain, without having to deal with - processors and threads.
- -Charm++ has been in production use for over 15 years in the - scientific and high performance computing communities and currently has - thousands of users in dozens of countries across a wide variety of - computing disciplines. It has been adopted by many computing teams, and - has been used to produce several large parallel applications. It is - actively developed, maintained, and supported by the - Parallel Programming Laboratory at - UIUC and its collaborators.
- - --
-
Charm++ Workshops:
- --Charm++ Workshop Introduction video. -For more tutorial material, slides and readings refer here. -
diff --git a/content/ppt_pdfs/tutorial_2023/01_a_CharmModel2023.pptx b/content/ppt_pdfs/tutorial_2023/01_a_CharmModel2023.pptx deleted file mode 100644 index 6369e15..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/01_a_CharmModel2023.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/01_a_CharmModel2023PDF.pdf b/content/ppt_pdfs/tutorial_2023/01_a_CharmModel2023PDF.pdf deleted file mode 100644 index 52e923b..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/01_a_CharmModel2023PDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/01_b_Grainsize_2023.pptx b/content/ppt_pdfs/tutorial_2023/01_b_Grainsize_2023.pptx deleted file mode 100644 index b6e2ac9..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/01_b_Grainsize_2023.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/01_b_Grainsize_2023PDF.pdf b/content/ppt_pdfs/tutorial_2023/01_b_Grainsize_2023PDF.pdf deleted file mode 100644 index eea744b..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/01_b_Grainsize_2023PDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/01_c_SomeBenefits.pptx b/content/ppt_pdfs/tutorial_2023/01_c_SomeBenefits.pptx deleted file mode 100644 index 7af457c..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/01_c_SomeBenefits.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/01_c_SomeBenefitsPDF.pdf b/content/ppt_pdfs/tutorial_2023/01_c_SomeBenefitsPDF.pdf deleted file mode 100644 index 34035a5..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/01_c_SomeBenefitsPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/02_a_HelloWorld.pptx b/content/ppt_pdfs/tutorial_2023/02_a_HelloWorld.pptx deleted file mode 100644 index 3abf699..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/02_a_HelloWorld.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/02_a_HelloWorldPDF.pdf b/content/ppt_pdfs/tutorial_2023/02_a_HelloWorldPDF.pdf deleted file mode 100644 index 4c70aa0..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/02_a_HelloWorldPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/02_b_CharmInterface.pptx b/content/ppt_pdfs/tutorial_2023/02_b_CharmInterface.pptx deleted file mode 100644 index e197a4e..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/02_b_CharmInterface.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/02_b_CharmInterfacePDF.pdf b/content/ppt_pdfs/tutorial_2023/02_b_CharmInterfacePDF.pdf deleted file mode 100644 index 8218f8b..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/02_b_CharmInterfacePDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/02_c_AsynchronousMethods.pptx b/content/ppt_pdfs/tutorial_2023/02_c_AsynchronousMethods.pptx deleted file mode 100644 index 4be5bd2..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/02_c_AsynchronousMethods.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/02_c_AsynchronousMethodsPDF.pdf b/content/ppt_pdfs/tutorial_2023/02_c_AsynchronousMethodsPDF.pdf deleted file mode 100644 index cb9c5d9..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/02_c_AsynchronousMethodsPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/04_a_ChareArrays.pptx b/content/ppt_pdfs/tutorial_2023/04_a_ChareArrays.pptx deleted file mode 100644 index 294a4af..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/04_a_ChareArrays.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/04_a_ChareArraysPDF.pdf b/content/ppt_pdfs/tutorial_2023/04_a_ChareArraysPDF.pdf deleted file mode 100644 index 812658b..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/04_a_ChareArraysPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/04_b_RuntimeBroadcastReduction.pptx b/content/ppt_pdfs/tutorial_2023/04_b_RuntimeBroadcastReduction.pptx deleted file mode 100644 index 6298904..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/04_b_RuntimeBroadcastReduction.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/04_b_RuntimeBroadcastReductionPDF.pdf b/content/ppt_pdfs/tutorial_2023/04_b_RuntimeBroadcastReductionPDF.pdf deleted file mode 100644 index ed3b548..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/04_b_RuntimeBroadcastReductionPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/06_a_LoadBalancing.pptx b/content/ppt_pdfs/tutorial_2023/06_a_LoadBalancing.pptx deleted file mode 100644 index 8bc7e45..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/06_a_LoadBalancing.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/06_a_LoadBalancingPDF.pdf b/content/ppt_pdfs/tutorial_2023/06_a_LoadBalancingPDF.pdf deleted file mode 100644 index 650b654..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/06_a_LoadBalancingPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/06_b_PUP.pptx b/content/ppt_pdfs/tutorial_2023/06_b_PUP.pptx deleted file mode 100644 index 0146e5c..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/06_b_PUP.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/06_b_PUPPDF.pdf b/content/ppt_pdfs/tutorial_2023/06_b_PUPPDF.pdf deleted file mode 100644 index d33f587..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/06_b_PUPPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/06_c_FaultTolerance.pptx b/content/ppt_pdfs/tutorial_2023/06_c_FaultTolerance.pptx deleted file mode 100644 index 98a02cf..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/06_c_FaultTolerance.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/06_c_FaultTolerancePDF.pdf b/content/ppt_pdfs/tutorial_2023/06_c_FaultTolerancePDF.pdf deleted file mode 100644 index 9ec6c4a..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/06_c_FaultTolerancePDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/08_a_sdag.pptx b/content/ppt_pdfs/tutorial_2023/08_a_sdag.pptx deleted file mode 100644 index dc1ee07..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/08_a_sdag.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/08_a_sdagPDF.pdf b/content/ppt_pdfs/tutorial_2023/08_a_sdagPDF.pdf deleted file mode 100644 index 926ee3a..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/08_a_sdagPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/08_b_ParallelPrefixFixed.pptx b/content/ppt_pdfs/tutorial_2023/08_b_ParallelPrefixFixed.pptx deleted file mode 100644 index f9a0cea..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/08_b_ParallelPrefixFixed.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/08_b_ParallelPrefixFixedPDF.pdf b/content/ppt_pdfs/tutorial_2023/08_b_ParallelPrefixFixedPDF.pdf deleted file mode 100644 index 14af1ac..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/08_b_ParallelPrefixFixedPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/09_advanced.pptx b/content/ppt_pdfs/tutorial_2023/09_advanced.pptx deleted file mode 100644 index 8acd600..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/09_advanced.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/09_advancedPDF.pdf b/content/ppt_pdfs/tutorial_2023/09_advancedPDF.pdf deleted file mode 100644 index 315058f..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/09_advancedPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/09xy_sections.pptx b/content/ppt_pdfs/tutorial_2023/09xy_sections.pptx deleted file mode 100644 index acf50bf..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/09xy_sections.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/09xy_sectionsPDF.pdf b/content/ppt_pdfs/tutorial_2023/09xy_sectionsPDF.pdf deleted file mode 100644 index d95c03a..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/09xy_sectionsPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/10_appDesign_2023.pptx b/content/ppt_pdfs/tutorial_2023/10_appDesign_2023.pptx deleted file mode 100644 index 790bdea..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/10_appDesign_2023.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/10_appDesign_2023PDF.pdf b/content/ppt_pdfs/tutorial_2023/10_appDesign_2023PDF.pdf deleted file mode 100644 index 1790cca..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/10_appDesign_2023PDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/11_threadedMethods.pptx b/content/ppt_pdfs/tutorial_2023/11_threadedMethods.pptx deleted file mode 100644 index d3babac..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/11_threadedMethods.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/11_threadedMethodsPDF.pdf b/content/ppt_pdfs/tutorial_2023/11_threadedMethodsPDF.pdf deleted file mode 100644 index 01d3df2..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/11_threadedMethodsPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/13_a1_LibsAndModules.pptx b/content/ppt_pdfs/tutorial_2023/13_a1_LibsAndModules.pptx deleted file mode 100644 index b33aa86..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/13_a1_LibsAndModules.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/13_a1_LibsAndModulesPDF.pdf b/content/ppt_pdfs/tutorial_2023/13_a1_LibsAndModulesPDF.pdf deleted file mode 100644 index 1548507..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/13_a1_LibsAndModulesPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/13_a2_prefixAsLib.pptx b/content/ppt_pdfs/tutorial_2023/13_a2_prefixAsLib.pptx deleted file mode 100644 index 208a8c2..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/13_a2_prefixAsLib.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/13_a2_prefixAsLibPDF.pdf b/content/ppt_pdfs/tutorial_2023/13_a2_prefixAsLibPDF.pdf deleted file mode 100644 index e8e1288..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/13_a2_prefixAsLibPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/Projections.pptx b/content/ppt_pdfs/tutorial_2023/Projections.pptx deleted file mode 100644 index d077f19..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/Projections.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/ProjectionsPDF.pdf b/content/ppt_pdfs/tutorial_2023/ProjectionsPDF.pdf deleted file mode 100644 index 0b45224..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/ProjectionsPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/charm4py.pptx b/content/ppt_pdfs/tutorial_2023/charm4py.pptx deleted file mode 100644 index cae8c9a..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/charm4py.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/charm4pyPDF.pdf b/content/ppt_pdfs/tutorial_2023/charm4pyPDF.pdf deleted file mode 100644 index 8e6b855..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/charm4pyPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/examplesMedianQDhistoSort.pptx b/content/ppt_pdfs/tutorial_2023/examplesMedianQDhistoSort.pptx deleted file mode 100644 index 7dd1c1e..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/examplesMedianQDhistoSort.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/examplesMedianQDhistoSortPDF.pdf b/content/ppt_pdfs/tutorial_2023/examplesMedianQDhistoSortPDF.pdf deleted file mode 100644 index 1654da3..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/examplesMedianQDhistoSortPDF.pdf and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/usingGpgpusInCharm.pptx b/content/ppt_pdfs/tutorial_2023/usingGpgpusInCharm.pptx deleted file mode 100644 index 7f75a9a..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/usingGpgpusInCharm.pptx and /dev/null differ diff --git a/content/ppt_pdfs/tutorial_2023/usingGpgpusInCharmPDF.pdf b/content/ppt_pdfs/tutorial_2023/usingGpgpusInCharmPDF.pdf deleted file mode 100644 index 39615e5..0000000 Binary files a/content/ppt_pdfs/tutorial_2023/usingGpgpusInCharmPDF.pdf and /dev/null differ diff --git a/content/progmodel.html b/content/progmodel.html deleted file mode 100644 index 5025ca5..0000000 --- a/content/progmodel.html +++ /dev/null @@ -1,70 +0,0 @@ ---- -title: Programming Model ---- - -Programming Model
- -Object-based program design
--
-
Globally addressable objects
--
-
Globally invocable methods
--
-
Asynchronous methods
--
-
Object collections
--
-

Object placement
--
-
Migratable / serializable objects
--
-
Decompose data across object collection
--
-
Decompose data manipulation across methods
- -Easily task parallelism
- diff --git a/content/release.html b/content/release.html deleted file mode 100644 index a72977f..0000000 --- a/content/release.html +++ /dev/null @@ -1,7 +0,0 @@ ---- -title: Release Info ---- - -Release Information
- - diff --git a/content/tools.html b/content/tools.html deleted file mode 100644 index 25928a0..0000000 --- a/content/tools.html +++ /dev/null @@ -1,91 +0,0 @@ ---- -title: Tools -homec: home -tutorialc: tutorial -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools selected toolsSelected -helpc: help ---- -Tools
--
Parallel Performance Analysis: Projections
-


-
-
-
The latest development version of Projections can be downloaded directly from our source archive. The Git version control system is used, which is available from here.
--
-
-
-
Parallel Debugging: Charm Debug
-
-
- In addition, Charm++ offers several additional features designed to simplify application development. Linking with "-memory paranoid" checks all dynamic heap allocation calls for common errors, such as double-delete, random-delete, read-after-delete, buffer over- and under-write errors. Charm++, when compiled without "-DCMK_OPTIMIZE", contains hundreds of assertions to catch invalid parameters and uninitialized data passed to API routines. -
-
- Charm Debug is a sophisticated parallel debugger, with the ability to - set breakpoints, examine variables, objects, and messages across the - entire machine. The design of this debugger is described in the research page. -
-
- -
The latest development version of Charm Debug can be downloaded directly from our source archive. The Git version control system is used, which is available from here.
--
-
-
Mapping Chare Array Elements to Physical Processors
- -- For various reasons (performance, etc.) the programmer may wish to control where the initial chare - array elements in a chare array reside (that is, on which physical processor they are located). There - are multiple methods the programmer can use to control the initial placement of chare array elements - in a chare array on the physical processors. -
- - -Inserting Individual Chare Array Elements
- -- The most direct method of controlling which processors the chare array elements are located on - initially is to explicitly create the chare array element one-by-one, specifying with processor - it should be located on during creation. This can be done by creating the chare array with a - call to myArrayProxy = CProxy_myArrayClass::CkNew() with no arguments (or the number of elements set to zero - if a CkArrayOptions object is used). Once the chare array has been created, individual chare - array elements can be inserted by using the myArrayProxy[i].insert(pe) function call where i - is the index of the chare array element to create and pe is the processor number on which the - chare array element should be created on. Once all chare array elements have been added to the array, - the myArrayProxy::doneInserting() call must be called to indicate to the Charm++ Runtime System - that the initial elements of the chare array have been added. Additional chare array elements can be - added at a later time followed by an additional call to myArrayProxy::doneInserting(). -
- - -Using the CkArrayMap Class
- -- Chare arrays also have an associated CkArrayMap class which maps indexes in the arrays - to specific processors. By default, a round-robin (RRMap) map object is used. - When the chare array is being created, the programmer can specify which map object the array - should use via a call to CkArrayOptions::setMap(). For example, instead of round-robin, - the programmer might use the BlockMap map class to have the objects mapped in a blocked - scheme. -
- -- The programmer may create their own map object to be used for a chare array. Simply create a - child class of the CkArrayMap class. The important member functions are outlined below. Then, - create an instance of the class and associate the map object with the chare array (via CkArrayOptions) - during array creation time. -
- -- CkArrayMap::populateInitial(): This function is responsible for creating all of the - initial array elements located on the physical processor. That is, it is called on every processor - and makes a call to the chare array manager's insertInitial() function for each of the chare - array elements local to that processor (triggering the creation of the actual chare array elements on - the physical processor). This is a virtual function; the child class can provide - its own implementation of this function. However, one is provided by CkArrayMap itself which - simply uses the CkArrayMap::procNum() function to decide where the chare array elements - initially reside. -
- -- CkArrayMap::procNum(): This function simply maps an array index (input) to the number - of the physical processor (output). For example, the round-robin map class's (RRMap) version - of this function simply returns 'index % CkNumPes()'. This function is a pure virtual member - function. The child class must provide an implementation of this function. -
- -- CkArrayMap::homePe(): This function is similar to the procNum() function, however, the value - returned indicates which physical processor is the home processor for the chare array element. - This is a virtual function, the child class can provide its own implementation of this function. However, - one is provided by CkArrayMap itself which simply returns the same value that procNum() returns. -
\ No newline at end of file diff --git a/content/tutorial/ArraySections.html b/content/tutorial/ArraySections.html deleted file mode 100644 index 4f37aab..0000000 --- a/content/tutorial/ArraySections.html +++ /dev/null @@ -1,135 +0,0 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - - -Array Sections
- -- Sometimes only certain elements of a chare array need to have messages sent to them (i.e. their entry - methods invoked). Furthermore, sometimes the same subset of chares in the chare array may need to - have this occur many times throughout the course of the application execution. - Array sections are a way of specifying a arbitrary subset of elements in a chare array. Once the - array section has been defined, it can be used to multicast a message to the members of the chare - array subset. -
- - -Defining and Using Chare Array Sections
- -- The first way of creating array sections is to specify a triplet of numbers to the constructor of the - array section proxy's constructor. The triplet contains three values: the starting index, the ending - index, and a stride. One triplet can be given for each dimension of the array. For example, consider - a 1D chare array. To create array sections, one containing all of the even numbered indexes (including - index 0) and one containing all the odd numbered indexes, the following code could be used. -
- -
-
- // Create a chare array
- |
-
- Figure 1: Code to create even and odd array sections for a 1D chare array - |
- While the triplet method is a quick and easy way of quickly defining large subsets of chare array elements, - the subsets need to be regular. This is not always sufficient to identify useful subsets. Another method - of defining a subset of elements is to create a vector of indexes and then create the array section using - the vector of indexes. For example, the following code will also create an array section containing all of - the odd index elements in the chare array, however, it will do it by creating a vector with all the odd - indexes. -
- -
-
- // Create a chare array
- |
-
- Figure 2: Code to create an odd array section for a 1D chare array using a vector of indexes - |
-
- ![]() |
- Figure 3: Indexing scheme for array proxies and array section proxies - |
Indexing for Individual Elements of Array Sections
- -- The individual elements in the array section can be index individually. Starting with index zero, each - member of the subset of elements (i.e. the array section members) can be individually indexed according - to the scheme presented in Figure 3. -
- - -More Information
- -- For more information on callbacks, please see - Section 3.8.13: Array Section of the - The Charm++ Programming Language Manual -
diff --git a/content/tutorial/BasicsOfCharmProgramming.html b/content/tutorial/BasicsOfCharmProgramming.html deleted file mode 100644 index 2c9cde0..0000000 --- a/content/tutorial/BasicsOfCharmProgramming.html +++ /dev/null @@ -1,36 +0,0 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - - -Basics of Charm++ Programming
- -- Throughout this section, we will be using a simple Hello World program to demonstrate - the ideas presented. -
- - -Components of a Charm++ Program
- - - -Basic "Hello World" Program (single processor, single chare)
- - - -Array "Hello World": A Slightly More Advanced "Hello World" Program (multiple processors, multiple chares)
- - - -Broadcast "Hello World": A Parallel "Hello World" Program (multiple processors, multiple chares)
- - \ No newline at end of file diff --git a/content/tutorial/BroadcastHelloWorld.html b/content/tutorial/BroadcastHelloWorld.html deleted file mode 100644 index 31a7651..0000000 --- a/content/tutorial/BroadcastHelloWorld.html +++ /dev/null @@ -1,456 +0,0 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - - -Broadcast "Hello World" Program: A Parallel "Hello World" Program
- - - -
- ![]() |
- Figure: Control Flow of the Broadcast "Hello World" Program - |
- This version of "Hello World" is basically the same as the Array "Hello World" program - except that all of the Hello chare objects will be told to say "Hello" at the same time - instead of each of them doing it one-by-one. The figure to the right shows the control - flow of the program. The source code is located below. Here are the differences from - the Array "Hello World" program: -
- --
-
- - -
Broadcast "Hello World" Code
- -- The source code for this example can be found - here (BroadcastHelloWorld.tar.gz). -
- - -The "Hello" Chare Class
- - -
-
-
-
|
-
-
-
-
-
|
-
-
The "Main" Chare Class
- - -
-
-
-
|
-
-
-
-
-
|
-
-
Makefile
- -- The makefile for this program is the same as it is for the Array "Hello World" program. -
- - -Makefile | -
-
-
-
- CHARMDIR = [put Charm++ install directory here]
- |
-
Output
- -- The only difference in the output of this program and the Array "Hello World" program is - that all of the Hello chare objects are told to sayHi() by the Main chare object. - This is reflected by the fact that all the output lines below have "told by -1" in them. -
- - -
-
-
-
- $ ./charmrun +p3 ./hello 10
- |
-
- Figure: Output of the Broadcast "Hello World" Program - | -
Building the Charm++ Runtime System
- -- In the root directory of the Charm++ distribution is a build script. This build script - is used to compile the Charm++ Runtime System for a particular platform. The exact platform that - should be used depends on the setup of the machine being used by the user. For the sake of this - example, we will assume that the target platform is a cluster of Linux workstations. - - Please Note: If the build script is executed with the "--help" command-line argument - (i.e. the command "./build --help"), it will print out a usage message which includes a list - of platforms available. - -
- -
- The general usage form of the build script is:
-
- build <target> <version> <options> [ charmc-options ... ]
-
- where: -
- --
-
Examples
- -
- (1) For a cluster of Linux workstations connected by an Ethernet network (32-bit x86 architecture):
- ./build charm++ net-linux
-
- (2) 1 with SMP support:
- ./build charm++ net-linux smp
-
- (3) 1 with the "-O3" compiler option on by default for charmc:
- ./build charm++ net-linux "-O3"
-
- (4) 1 with both SMP support and the "-O3" compiler option on by default for charmc:
- ./build charm++ net-linux smp "-O3"
-
-
- - WARNING: Depending on the speed of the machine being used and the options specified, the build - process of the Charm++ Runtime System can take several minutes (as high as 10-20 minutes for - older machines). - -
diff --git a/content/tutorial/Callbacks.html b/content/tutorial/Callbacks.html deleted file mode 100644 index f0856e9..0000000 --- a/content/tutorial/Callbacks.html +++ /dev/null @@ -1,88 +0,0 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - - -Callbacks
- -- Callbacks are used by the Charm++ Runtime System, frameworks, libraries, and so on to notify - application code that some kind of event has occurred. For example, reductions use callbacks to - notify the application code when the reduction has completed. The basic idea is that the application - code can specify what action to take when the callback is triggered (i.e. the specific - event occurs). The action taken can be a variety of things including calling function or - having the application exit. -
- -"Do Nothing" Callback
- -- To create a callback object that does nothing, simply create it as follows: - CkCallback cb(CkCallback::ignore). When triggered, the callback does nothing. -
- -CkExit Callback
- -- To create a callback object that will cause CkExit() to be called when the callback is triggered, - simply create it as follows: CkCallback cb(CkCallback::ckExit). -
- -C Function Callback
- -- To create a callback object that will cause a C function to be executed, create it as - follows: CkCallback cb(CkCallbackFunc cFunc, void* param). Here, the type of the - CkCallbackFunc is void myCFunction(void* param, void* msg). The param - pointer used when the callback object was created will be passed to the C function as the - param parameter. The msg parameter will point to the message created by whatever - entity (e.g. a library) triggers the callback. -
- -Entry Method Callback
- -- To create a callback object that will cause an entry method to be called on a chare object, - create it as follows: CkCallback cb(int entryMethodIndex, CkChareID &id). Each entry - method for a chare class has an associated entry method index. The entry method - index is used here to identify which entry method on the chare object (specified by the - id parameter) should be executed. -
- -- To get the entry method index for any given - entry method, do the following: - int entryMethodIndex = CkIndex_MyChareClass::myEntryMethod(...parameters for entry method...) - where MyChareClass is the name of the chare class and myEntryMethod is the name - of the entry method. The parameters passed to this function are not actually used when retrieving - the entry method index; they are only used to identify which entry method is actually being - called in cases where multiple entry methods share the same name (i.e. are overloaded). For example, - to get the entry method index for an entry method with the following prototype: - void MyChareClass::myEntryMethod(int* x) one could use - int myEntryMethodIndex = CkIndex_MyChareClass::myEntryMethod((int*)NULL). (Remember, this is not - actually calling myEntryMethod so it is alright for the pointer to have a NULL value. Only - the type of the parameter is important so overloaded functions can be differentiated.) -
- -Example Usage
- -- For an example of how a callback would be created an used, please see the - Reductions Section. -
- -More Information
- -- For more information on callbacks, please see - Section 3.15: Callbacks of the - The Charm++ Programming Language Manual -
diff --git a/content/tutorial/CommonFunctions.html b/content/tutorial/CommonFunctions.html deleted file mode 100644 index 59ae7cc..0000000 --- a/content/tutorial/CommonFunctions.html +++ /dev/null @@ -1,176 +0,0 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - - -Commonly Used (or Otherwise Useful) Functions
- - - -- - Processor/Node/Rank Information - | -|
int CkMyPe(): | -- Returns the processor number for the processor it is called on (0 through P-1 - where P is the number of processors available to the application). - | -
int CkNumPes(): | -- Returns the number of processors available to the application. - | -
int CkMyNode(): | -- Returns the node number of the node it was called on (0 through N-1 - where N is the number of nodes available to the application) and a node is - defined as a single address space. - | -
int CkNumNodes(): | -- Returns the number of nodes available to the application. - | -
int CkMyRank(): | -- Returns the rank number of the processor it was called on. The rank of a processor - is its number within the node (address space) starting at 0. For example, if a single - node contains four processors (i.e. a node size of four), the processors' rank numbers - would be 0 through 3 with each having a unique value. - | -
int CkNodeSize(): | -- Returns the number of processors within the node it was called. on. - | -
int CkNodeFirst(int nodeNumber): | -- Returns the processor number of the processor at rank zero within the specified node (address - space). - | -
int CkNodeOf(int procNumber): | -- Returns the node number of the specified processor. - | -
int CkRankOf(int procNumber): | -- Returns the rank number of the specified processor. - | -
- - Program Termination - | -|
void CkExit(): | -- This function causes the Charm++ application to terminate. The call does not return. All - other processors to notified that execution of the application should end. - | -
void CkExitAfterQuiescence(): | -- Informs the Charm++ Runtime System that the Charm++ application should exit if quiescence is - detected. Quiescence is described in - Section 3.13: Quiescence Detection of the - The Charm++ Programming Language Manual. - | -
void CkAbort(const char* message): | -- This function causes the Charm++ application to abort. The speicified message is displayed - before the application terminates. This function does not return. - | -
- - Timing Functions - | -|
double CkCpuTimer(): | -- Returns the value of the system timer (in seconds). The system timer is started when the - application begins and measures processor time (both user and system time). - | -
double CkWallTimer(): | -- Returns the amount of time that has elapsed since the application started from the wall - clock timer (in seconds). - | -
double CkTimer(): | -- Aliases to either CkCpuTimer() or CkWallTimer() depending on the system being used. Typically, - dedicated machines (i.e. program has complete use of CPU such as ASCI Red) this function aliases to - CkWallTimer(). For machines which have multiple processes sharing the same CPU (such as a - workstation), this function aliases to CkCpuTimer(). - | -
Particles Code
- - -
-
- ![]() |
- Figure 1 - |
Description
- -- For the assignment, you will write code that simulates a set of particles moving randomly in a 2-dimensional space within a bounding box. The coordinates of the overall simulation box are between 0.0 and 100.0 along each dimension. The particles are divided among chares based on their coordinates. The chares should be organized in a 2-dimensional array of chares (using a chare array) of size k x k. So, each chare owns a bounding box of its own with size 100.0/k. See Figure 1. -
- -- Your program should generate n particles per chare during construction with a random (but valid, i.e. within the chare) position for particles. Your program should accept the number of particles per cell n, and k as command line parameters in that sequence. -
- -Skeleton Code
- -- A base code for Particles Code can be found here. The skeleton code includes base code for Mainchare Main, 2-D Chare Array Cell and Particle class representing the particles the Chare Array contains. There are also comments in the skeleton code that will guide you through the assignment. -
- -Expected Output
- -- Your program should calculate and print to screen the maximum and total number of particles every 10 iterations. Use the provided print function. Additionally, the simulation should not be delayed by this calculation (i.e. you should use reductions). -
- -- For testing your Particles Code, you can use 10000 (=n) particles per chare, simulated over 100 steps and a chare array of size 16 x 16 (k=16). Experiment with different number of particles and chare array sizes as our test cases will use values in addition to the defaults. -
- -- Note: There might be multiple particles having the same x and y coordinates, especially if you increase the density of each cell. You do not need to handle this case separately; it is a valid case assumption. -
- - - --
-
-
for(iteration=0; iteration < ITERATION; iteration++){
- -
-
-
-
}
- Particles Code with Load Balancing and Performance Analysis
- - -
-
- ![]() |
- Figure 1 - |
Description
- -- This assignment is an extension of Particles Code. You will use your code from Particles Code and extend it to do load balancing and visualization using LiveViz. -
- -- In Particles Code, the particles moved randomly. There was no prominent load imbalance between chare array elements. Now we will create load imbalance between chares by coloring the particles, moving them at different speeds, and changing their initial distribution. -
- --
-
Part A) Load Balancing and Projections:
- -- In this part, you'll try 3 different load balancing strategies and comment on the behavior you observe. The first two load balancing strategies you will use are GreedyLB and RefineLB. For the third strategy, you will get to choose one of the other load balancing strategies. Observe the effect of load balancing (overhead and benefit) on the total execution time of the application using Projections. Are they beneficial? Why or why not? How much is the overhead? Which strategy is the best for this Particle application? etc. -
- -Part B) Visualization using LiveViz:
- -- In this part you will visualize the 2-D grid with moving particles using LiveViz. Particles should be shown in color. You need to submit two images from the visualization: one in the beginning showing the initial particle distribution and one when the application is more advanced (after the particles are moved and intermixed). -
- -- Please read the LiveViz manual for details about LiveViz setup and usage. You can also look at the Wave2D application for an example code that uses LiveViz. It's located in charm/examples/charm++/wave2d. -
- diff --git a/content/tutorial/PiExample.html b/content/tutorial/PiExample.html deleted file mode 100644 index a354433..0000000 --- a/content/tutorial/PiExample.html +++ /dev/null @@ -1,158 +0,0 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - - -Introducing Reductions
- - -Reductions
- -- By now, we've spent a lot of time in this tutorial talking about how to decompose problems - into chares and distribute them across processors. That's great, but what good is splitting - up your problem if you can't put it back together? When you need to combine values from an array - of chares, you can turn to one of the most important parallel programming tools: a reduction. -
- -- Reductions turn data that is scattered across a chare array into a single value using a reduction - operation, such as sum or maximum. Each of the chares in the array contribute some local data -
- -"Hello World" Code
- -The "Hello" Chare Class
- - -
-
|
-
-
-
-
-
|
-
-
diff --git a/content/tutorial/Preface.html b/content/tutorial/Preface.html
deleted file mode 100644
index 4140a91..0000000
--- a/content/tutorial/Preface.html
+++ /dev/null
@@ -1,48 +0,0 @@
----
-title: Tutorial
-homec: home
-tutorialc: tutorial selected tutorialSelected
-applicationsc: applications
-miniAppsc: miniApps
-downloadc: download
-toolsc: tools
-helpc: help
----
-
-
-
-
-
- ![]() |
- - Summary: When projections is initially opened, a summary graph of the application execution - is presented to the user (assuming the summary module was linked to the application; see the - Compile Time Options section below). The utilization is an - aggregate of all the processors. - | -
-
- ![]() |
- - Overview: In the overview graph, each processor is represented by a horizontal bar - with time increasing time from left to right. The color of the bar at that point in time - represents the utilization of that processor (see the color scale to the left of the overview - graph). - | -
Using Projections
- - -Compile Time Options
- -Runtime Options
diff --git a/content/tutorial/ShadowArrays.html b/content/tutorial/ShadowArrays.html deleted file mode 100644 index ee04260..0000000 --- a/content/tutorial/ShadowArrays.html +++ /dev/null @@ -1,80 +0,0 @@ ---- -title: Tutorial -homec: home -tutorialc: tutorial selected tutorialSelected -applicationsc: applications -miniAppsc: miniApps -downloadc: download -toolsc: tools -helpc: help ---- - - - -Shadow Arrays (AKA: Bound Arrays)
- -- It may be useful to have corresponding elements of two or more chare arrays to be located on the - same processor. For example, if the corresponding elements of chare arrays X and Y - frequently communicate with one another, it would be advantageous to have those elements, - X[i] and Y[i], on the same processor to reduce communication costs. Shadow arrays - in Charm++ are a way of accomplishing just this type of behavior. When two or more arrays are - bound, the Charm++ Runtime System ensures that the objects are always located on the same - physical processor. -
- - -
-
-
-
- // Assumed variable declarations
- |
- Figure 1: Code to create two chare arrays, myArray1 and myArray2, which are - bound together. - |
- Shadow (or bound) arrays create a relationship in terms of element mapping and load balancing between - two or more arrays. The idea is fairly straight forward. First, a chare array is created in the - standard manner. Then, a second array is created as a shadow of the first array (or bound to the first - array). This is done through the CkArrayOptions parameter to the ckNew call which creates - the second array. Additional arrays can also be bound to this set of bound arrays in a similar manner. -
- -- There is no restriction to the type or number of chare elements in the arrays. The bound arrays can - be instances of the same chare array class or be different chare array classes. Also, the arrays do not - have to have the same number of elements. For indexes where the other array doesn't have a corresponding - element, the element that exists is free to move between processors arbitrarily. The Charm++ Runtime - System simply ensures that when there are corresponding elements (that is, elements at - corresponding indexes), in a set of bound arrays, the elements will be located on the same physical - processor. -
- -- Since the corresponding objects are always located on the same processor, the objects can take advantage - of this in various ways. For example, they can obtain local pointers to one another via a call to - ckLocal() on the proxy for the element. However, each time the objects migrate (i.e. the objects - are unpacked via the Pack-UnPack Routines), the local pointer needs - to be refreshed so it is a valid pointer on the new physical processor. -
- -- For more information on callbacks, please see - Section 3.8.6: Advanced Array Creation: Bound Arrays of the - The Charm++ Programming Language Manual -
diff --git a/content/tutorial/images/particlescode.png b/content/tutorial/images/particlescode.png deleted file mode 100644 index 252eca4..0000000 Binary files a/content/tutorial/images/particlescode.png and /dev/null differ diff --git a/content/tutorial/images/particlescode_lb.png b/content/tutorial/images/particlescode_lb.png deleted file mode 100644 index 69e3ec9..0000000 Binary files a/content/tutorial/images/particlescode_lb.png and /dev/null differ diff --git a/content/workshops_and_tutorials.html b/content/workshops_and_tutorials.html deleted file mode 100644 index 1940adc..0000000 --- a/content/workshops_and_tutorials.html +++ /dev/null @@ -1,72 +0,0 @@ ---- -title: Workshops and Tutorials ---- - -Workshops and Tutorials
- -Hybrid Tutorial 2023
-We have decided to postpone the normal workshop to spring 2024. Instead, we will conduct a 2-day Charm++ tutorial on October 23rd and 24th. --Dates: Monday October 23rd - Tuesday October 24th -
-Location: University of Illinois Urbana-Champaign (remote option available as well) -
-For in person attendees, both days of the tutorial will be at the Beckman Institute (Room 3035) on the UIUC campus. -
-For remote attendees, we will also have zoom available. -
-Registration: Please complete this form to register. -
-Schedule: -
-All times are central time -
-Monday October 23rd -
-
10:00 AM to 10:20 AM - Introduction
-10:20 AM to 10:50 AM - Chares and Asynchronous Method Invocations
-10:50 AM to 12:00 PM - Chare Arrays
-12:00 PM to 1:30 PM - Lunch/Hands on Exercise 1 - SimpleMax
-1:30 PM to 2:00 PM - Hands on Exercise 2 - CheckSorted
-2:00 PM to 2:30 PM - Load Balancing and Fault Tolerance
-2:30 PM to 3:00 PM - SDAG
-3:00 PM to 3:10 PM - Hands on Exercise 3 intro - Even/Odd Sort SDAG
-3:10 PM to 3:30 PM - Discussion and Q&A
-evening - hands on exercises with online help
--Day 1 Content -
- - - - - -
Asynchronous Methods: pptx pdf
- -Broadcast Reductions: pptx pdf
- - - - -Parallel Prefix Example: pptx pdf
-Advanced Topic Summary: pptx pdf
-Charm++ Application Examples: pptx pdf
--Tuesday October 24th -
-
10:30 AM to 11:15 AM - charm4py
-11:15 AM to 12:30 PM - Advanced Topics
-12:30 PM to 2:00 PM - Lunch/Exercises
-2:00 PM to 4:00 PM - Advanced Topics(continued)
-evening - hands on exercises with online help
--Day 2 Content -
- -
Median Calculation, Quiescence Detection, and Histogramming Example: pptx pdf
-Libraries and Modules: pptx pdf
- -Prefix as a Library Example: pptx pdf
- -Chare Array Sections: pptx pdf
- diff --git a/download/index.html b/download/index.html new file mode 100644 index 0000000..8b690e0 --- /dev/null +++ b/download/index.html @@ -0,0 +1,177 @@ + + + + + + +Downloads
+Charm++
+ +Latest release version: Charm++ v6.10.1 (tar.gz)
+Latest dev version: Current master (tar.gz)
+ +The latest development version of Charm++ can be downloaded directly from our + git source archive. To get the Charm++ source code:
+git clone https://github.com/UIUC-PPL/charm+
This will create a directory named charm. Move to this directory:
+cd charm+
And now build Charm++ (netlrts-linux example):
+./build charm++ netlrts-linux-x86_64 [ --with-production | -g ]+
This will make a netlrts-linux-x86_64 directory, with bin, include, lib, etc. subdirectories.
+Note that this development version may not be as portable or robust as the released + versions. Therefore, it may be prudent to keep a backup of old copies of + Charm++.
+ + +Projections
+ +Projections is Charm++'s performance analysis tool.
+ +Latest release binaries: Projections (tar.gz)
+ +To get the latest Projections source code:
+git clone https://github.com/UIUC-PPL/projections+
To build (requires gradle):
+cd projections+ + +
make
LiveViz and Charm Debug
+ +LiveViz is Charm++'s live program visualization tool. Charm Debug is Charm++'s parallel debugger.
+ +Latest Charm Debug release binaries: Charm Debug (tar.gz)
+ +To get the latest ccs_tools (includes both LiveViz and Charm Debug source code):
+git clone https://github.com/UIUC-PPL/ccs_tools+
To build:
+cd ccs_tools+ + +
ant
License
+ +Charm++ and associated software are licensed under the Charm++/Converse License
. + +Execution Model
@@ -111,3 +221,17 @@
Execution Model
an exception to this. They must be called from a threaded method, and so are allowed to return (certain types of) values. + +Chare Arrays: Design Exercise
+ +Data Balancing:
+ + +Assume you have a 1D chare array +A. Each chare (say A[i]) in it holds a vector of numbers. The size of +this vector is different on different chares (say sizei on +A[i]). Your task is to equalize the load on all processors by +exchanging the numbers. It is not necessary to do minimal data +movement, but it is desirable. The balance at the end needs to be +almost exact. If there are a total of N numbers, and v chares, there +should be between floor (N/v): ceil(N/v) items on each chare. Note +that the only way to send information to another chare is by sending +an (entry) method invocation to it.
+ +There are many distinct algorithms possible. Sketch the +alternatives without coding them, and write cost estimates for them. +Keep in mind that the simplest (i.e. approximate) cost model in +Charm++: entry methods invocation's cost \alpha + n . {\beta}, where +α is a fixed cost, and β is a per-byte cost. For the sake of +intuition, you may assume α is about a thousand times larger than +β, say a microsecond vs a nanosecond. +Reductions and broadcasts of size N data on P processors cost \alpha log (P) + +N \beta. +Keep in mind that many +(but not all) +of the algorithms for this problem have two phases: first phase to identify who +should send how many numbers to whom, and second to actually do the +data exchange. Make sure to write your time estimates for both phases. +Compare two of the interesting algorithms in terms of cost, performance +tradeoffs if any (e.g. is each algorithm better in different +scenarios), scalability and coding complexity. +By scalability, here, we mean how well the algorithm behaves with a +large number of chares and/or a large number of physical processors.
+Chare Arrays: Method 1 Exercise
+ + +Chare Arrays: Method 2 Exercise
+ + +Chare Arrays: Method 3 Exercise
+ + +Learn Charm++ via a series of programming exercises
++
+
!! Under Construction, the following is currently being updated and will include links to the exercises !!
+ +-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
Chare Arrays: Odd-Even Sort Exercise
+ +A mainchare has a chare array of size N. Each chare in the array generates its own random number. The array must be sorted in N steps, where at each step, each chare may communicate with at most one of its neighbors (i.e. on an 'odd' step adjacent chares are paired in one fashion, and on an 'even' step the pairing is switched). No barriers are allowed, and there is only one reduction, at the end, which checks that the elements have been properly sorted.
+ +N is a command line parameter.
+ +No SDAG is to be used.
+ +Chare Arrays: Particle Exercise
+ + +
+
+ ![]() |
+ Figure 1 + |
Random Migrating Particles
+ +For this exercise, you will write code that simulates a set of particles moving + randomly in a 2-dimensional space within a bounding box. The coordinates of the + overall simulation box are between 0.0 and 100.0 along each dimension. The particles + are divided among chares based on their coordinates. The chares should be organized + in a 2-dimensional array of chares (using a chare array) of size k \times k. So, each + chare owns a bounding box of its own with size 100.0/k. The particles in each chare + can be stored as a vector.
+ +Your program should generate n particles per chare during construction with a + random (but valid, i.e. within the chare) position for particles. Your program + should accept the number of particles per cell n, and k as command line parameters + in that sequence.
+ +Expected Output: Your program should calculate and print to screen the maximum and + total number of particles every 10 iterations. Additionally, the simulation should + not be delayed by this calculation (i.e. you should use reductions).
+ +For testing your program, you can use 10000 (=n) particles per chare, simulated over 100 steps + and a chare array of size 16 \times 16 (k=16). Experiment with different number of particles and + chare array sizes.
+ +Note: There might be multiple particles having the same x and y coordinates, especially + if you increase the density of each cell. You do not need to handle this case separately; + it is a valid case assumption.
+ +-
+
for(iteration=0; iteration<ITERATION; iteration++){
+ -
+
-
+
}
+Basic Chares: Primality Testing Exercise
+Part A
+Write a program based on the outline below. +(Note that the program has a few artificial restrictions/elements that +are meant to teach you specific concepts. So, please follow the +instructions to the letter.)
+ +The main chare generates K random integers, and fires a +checkPrimality chare for each. +The chare checks if the number given to it is a prime using a variant of the function +below, and returns the result to the main chare. The main chare +maintains an array of pairs: <number, Boolean>, and prints it at the +end. An entry should be added to this array (with the number being tested, and a +default value such as "False") as soon as the chare is fired. In +particular, you are not allowed to delay adding the entry after the +result is returned by the chare. +Make sure that your program does not search the array when a response +comes. So, figure out a bookkeeping scheme to avoid it.
+ +Obtain K from a command line argument. You may use rand() from +the math library for generating random integers.
+ +For testing primality, use the following function. For extra credit, +modify it so that it is not +testing for every i, but (a) avoids testing even numbers except 2 and +(b) don’t let the loop run all the way to “number-1”).
+ +
+
+int isPrime(const long number)
+
+{
+ if(number<=1) return 0;
+ for(int i=2; i<number; i++)
+ {
+ if(0 == number%i)
+ return 0;
+ }
+ return 1;
+}
+
Part B (grainsize control)
+ +Measuring performance and improving it via grainsize control:
+ +Grainsize control +is a way to improve performance of the above program. +Use information from the Charm++ manual about how to pass +arrays of data to entry methods, and send a bunch (M) of numbers to be +tested to each +new Chare, and experiment with different values of M to get good +performance. +You may wish to read M as a command line parameter, for ease of experimentation. +Measure +performance by adding two calls to CkTimer() in the main chare, one +just before +starting creation of checkPrimality chares, and the other after all +the results have been returned (but before they are printed), and +printing the difference between the timers. You may +omit (and probably should omit) printing primality results for performance runs. +Vary M and report smallest G for +which performance was within 5% infinite grainsize (i.e. $G == K$). +Again, make sure our artificial restriction is obeyed: do not send +back the numbers the number being tested (because you are not allowed +to search for it anyway).
+ +Part C:
+Let the numbers being tested be 64 bit random numbers. For simplicity, +generate them by concatenating 2 32 bit random numbers.
+Getting Started with Charm++
+-
+
+
+ +
History
+ +The Charm software was developed as a group effort of the Parallel Programming +Laboratory at the University of Illinois at Urbana-Champaign. Researchers at +the Parallel Programming Laboratory keep Charm++ updated for the new machines, +new programming paradigms, and for supporting and simplifying development of +emerging applications for parallel processing. The earliest prototype, Chare +Kernel(1.0), was developed in the late eighties. It consisted only of basic +remote method invocation constructs available as a library. The second +prototype, Chare Kernel(2.0), a complete re-write with major design changes. +This included C language extensions to denote Chares, messages and +asynchronous remote method invocation. Charm(3.0) improved on this syntax, and +contained important features such as information sharing abstractions, and +chare groups (called Branch Office Chares). Charm(4.0) included Charm++ and +was released in fall 1993. Charm++ in its initial version consisted of +syntactic changes to C++ and employed a special translator that parsed the +entire C++ code while translating the syntactic extensions. Charm(4.5) had a +major change that resulted from a significant shift in the research agenda of +the Parallel Programming Laboratory. The message-driven runtime system code of +the Charm++ was separated from the actual language implementation, resulting +in an interoperable parallel runtime system called Converse. The Charm++ +runtime system was retargetted on top of Converse, and popular programming +paradigms such as MPI and PVM were also implemented on Converse. This allowed +interoperability between these paradigms and Charm++. This release also +eliminated the full-fledged Charm++ translator by replacing syntactic +extensions to C++ with C++ macros, and instead contained a small language and +a translator for describing the interfaces of Charm++ entities to the runtime +system. This version of Charm++, which, in earlier releases was known as +Interface Translator Charm++, is the default version of Charm++ now, and hence +referred simply as Charm++. In early 1999, the runtime system of Charm++ was +rewritten in C++. Several new features were added. The interface language +underwent significant changes, and the macros that replaced the syntactic +extensions in original Charm++, were replaced by natural C++ constructs. Late +1999, and early 2000 reflected several additions to Charm++, when a load +balancing framework and migratable objects were added to Charm++. + +Charm++
+parallel programming framework
+-
+
capabilities
+-
+
News
+-
+
Charm++ is a parallel programming framework in C++ supported by + an adaptive runtime system, which enhances user productivity and allows + programs to run portably from small multicore computers (your laptop) to + the largest supercomputers.
+ +It enables users to easily expose and express much of the parallelism + in their algorithms while automating many of the requirements for high + performance and scalability. It permits writing parallel programs in + units that are natural to the domain, without having to deal with + processors and threads.
+ +Charm++ has been in production use for over 15 years in the + scientific and high performance computing communities and currently has + thousands of users in dozens of countries across a wide variety of + computing disciplines. It has been adopted by many computing teams, and + has been used to produce several large parallel applications. It is + actively developed, maintained, and supported by the + Parallel Programming Laboratory at + UIUC and its collaborators.
+ + +-
+
Charm++
-parallel programming framework
--
-
Charm++ Workshop 2024 Announcement
-- We will be having the 21st Annual Workshop on Charm++ and its applications at UIUC on Thursday - April 25th and Friday April 26th 2024. - For more information and for registration, please follow this link. -
-Hybrid Tutorial 2023 Announcement
-- We will be having a hybrid tutorial at UIUC on Monday October 23rd and Tuesday October 24th. - For more information and for registration, please follow this link. -
-capabilities
--
-
LeanMD
The computation performed in this code mimics the short-range non-bonded force calculation in NAMD, and resembles the LJ force computation in miniMD benchmark in the Mantevo benchmark suite maintained by Sandia National Laboratories. -The force calculation in Lennard-Jones dynamics is done within a cutoff-radius, rc for every atom. +
+The force calculation in Lennard-Jones dynamics is done within a cutoff-radius, rc for every atom. In LeanMD, the computation is parallelized using a hybrid scheme of spatial and force decomposition. The three-dimensional (3D) simulation space consisting of atoms is divided into cells of dimensions that are equal to the sum of the cutoff distance, rc and a margin. In each iteration, @@ -54,19 +158,20 @@
LeanMD
the computes, the cells perform the force integration and update various properties of their atoms – acceleration, velocity and positions. -Features: Automated load balancing, fault tolerance, multicast manager.
Features: Automated load balancing, fault tolerance, multicast manager.








AMR
The parallel mesh restructuring algorithm operates in terms of near-neighbor communication among individual blocks, and a single synchronization-only collective. -Traditional AMR algorithms phrase the design in terms of processors that +
+Traditional AMR algorithms phrase the design in terms of processors that contain many blocks. Instead, we promote blocks to first-class entities that act as a virtual processor. As the mesh is refined or coarsened in AMR, the number of blocks will change and traditional algorithms require @@ -99,7 +205,8 @@
AMR
no data. Besides termination detection, blocks execute completely asynchronously, communicating only with neighboring blocks when required. -Traditional AMR implementations store the quad-tree instance on each +
+Traditional AMR implementations store the quad-tree instance on each process consuming O(P) memory and taking O(log N) time for neighbor lookup. We organize the blocks into a quad-tree but address each block by their location in the tree using bit vectors to represent quadrants recursively. @@ -109,23 +216,24 @@
AMR
provides direct, efficient communication between them. The runtime system can then redistribute the blocks periodically without any change to the logic. -Features: quiescence detection, dynamic chare creation, load balancing.
Features: quiescence detection, dynamic chare creation, load balancing.










Barnes-Hut
hierarchical recursive subdivision of space into cubic cells. Barnes-Hut method is widely used in cosmological simulations. -The N-body problem involves the numerical calculation of the trajectories +
+The N-body problem involves the numerical calculation of the trajectories of N point masses (or charges) moving under the influence of a conservative force field such as that induced by gravity (or electrical charges). In its simplest form, the method models bodies as particles of zero @@ -163,7 +272,8 @@
Barnes-Hut
quadratic complexity, the amount of work done by this all-pairs method makes it infeasible for systems with large N. -Barnes and Hut devised a hierarchical N-body method that +
+Barnes and Hut devised a hierarchical N-body method that performs significantly fewer computations but at the cost of a greater relative error in the computed solution. The method relies on the spatial partitioning of the input system of particles, thereby imposing a tree-structure on it. @@ -176,16 +286,17 @@
Barnes-Hut
yields an expected complexity of O(N lg N), making it suitable for large systems of particles. - + +



Dense LU
programming paradigm, our LU implementation does not employ any linear algebra specific notations. Our implementation is very succinct and presents several distinct benefits. -We use a typical block-decomposed algorithm for the LU factorization +
+We use a typical block-decomposed algorithm for the LU factorization process. Our focus in this effort has been less on choosing the best possible factorization algorithm than on demonstrating productivity with a reasonable choice. The input matrix of n x n elements is decomposed into square blocks @@ -209,19 +321,23 @@
Dense LU
high performance linear algebra library, typically a platform-specific implementation of BLAS and perform partial pivoting for each matrix column. -Features: composable library, flexible data placement, +
+Features: composable library, flexible data placement, block-centric control flow, separation of concerns. -
+ +

Dense LU
HPCCG
HPCCG is Charm++ implementation of HPCCG mini-application in the Mantevo Suite. It was originally developed as the first Mantevo mini-app in order to be the best representation of an unstructured implicit finite element or finite volume application in 800 lines of code. -
+ +Kripke
Kripke is a proxy application from Lawrence Livermore National Laboratory (LLNL) for Sn discrete particle transport codes. Given a volume of interest, knowledge of its boundary conditions and the particle-generating sources within and outside the domain, Kripke solves for the flux of particles at every point in the domain at a subsequent time. It does so by decomposing the domain into 3-dimensional subdomains that we call zone sets, and then sweeps the over the zone sets for all energy group sets and angular direction sets. These parallel sweeps are all independent of each other, and so Kripke sweeps from all corners of the domain simultaneously and pipelines sweeps originating from the same domain corner. -
+ +Triangular Solver
its independent parts and waits for its dependency messages from the left. Nondiagonal blocks wait for the solution values from their corresponding diagonal block, and then start their computation. - +

1D FFT
operations are executed via point-to-point messages and external libraries (FFTW or ESSL) perform serial FFTs on the rows of the matrix. -Features: interoperability with MPI, adaptive overlap.
Features: interoperability with MPI, adaptive overlap.


Random Access
topologies with dimensions matching the network representation of the current run led to good performance. -We used a Charm++ group to partition the global table across the nodes in +
+We used a Charm++ group to partition the global table across the nodes in a run. Each element of the group allocates its part of the global table, generates random update keys, and sends the updates to the appropriate destination. -
In the context of TRAM, each processor is limited to sending to and receiving +
+In the context of TRAM, each processor is limited to sending to and receiving messages from a subset of the processors involved in the run. When determining where to send a particular data item (in our case table update), TRAM selects a destination from among its peers so that data items always make forward @@ -320,15 +440,16 @@
Random Access
separately for each destination on the network. -Features: TRAM, automated topology discovery.
Features: TRAM, automated topology discovery.




Random Access
EP Stream
This benchmark is a simple Charm++ implementation of the HPC Challenge stream benchmark. -
+ +Charm++ Release 7.0.0
-10/25/2021 --
-
You can view the release notes and find the source code and binaries for download here. - -
The 19th Annual Workshop on Charm++ and its Applications (2021)
-10/18/2021 -- The Parallel Programming Laboratory is hosting the 19th Annual Workshop - on Charm++ and its Applications October 18-19, 2021. - - The workshop will be streamed live. You can find the list of talks and webcast - here. -
- - -Charm++ Release 6.10.2
-08/05/2020 --
-
You can view the release notes and find the source code and binaries for download here. - -
Charm++ Release 6.10.1
+ + + + + + ++Charm++ Release 6.10.1
03/05/2020You can view the release notes and find the source code and binaries for download here. -
Charm++ Release 6.10.0
+ + ++Charm++ Release 6.10.0
02/13/2020Charm++ Release 6.10.0
You can view the release notes and find the source code and binaries for download here. -
Charm++ Release 6.9.0
+ ++Charm++ Release 6.9.0
11/12/2018Charm++ Release 6.9.0
You can view the release notes and find the source code and binaries for download here. -
Charm++ at SuperComputing 2018
+ ++Charm++ at SuperComputing 2018
11/07/2018There are several Charm++ related talks and events at SuperComputing 2018, @@ -71,7 +154,9 @@
Charm++ at SuperComputing 2018
You can find a list of PPL and Charm++ talks here. -Charm++ Release 6.8.2
+ ++Charm++ Release 6.8.2
10/26/2017Charm++ Release 6.8.2
More release notes can be found here. -
Charm++ at SuperComputing 2017
+ ++Charm++ at SuperComputing 2017
10/22/2017@@ -89,7 +176,9 @@
Charm++ at SuperComputing 2017
You can find a list of PPL and Charm++ talks here. -Charm++ Release 6.8.1
+ ++Charm++ Release 6.8.1
10/13/2017@@ -99,12 +188,15 @@
Charm++ Release 6.8.1
In depth release notes as well as full version control change logs can be found here. -Charm++ Release 6.8.0
+ ++Charm++ Release 6.8.0
Over 900 commits (bugfixes + improvements + cleanups) have been applied across the entire system. Major changes are described below:
-
+
Charm++ Release 6.8.0
Charm++ Release 6.8.0
The complete list of issues that have been merged/resolved in 6.8.0 can be found here. The associated git commits can be viewed here.
-The 15th Annual Workshop on Charm++ and its Applications (2017)
++The 15th Annual Workshop on Charm++ and its Applications (2017)
4/5/2017@@ -217,7 +311,9 @@
The 15th Annual Workshop on Charm++ and its Appli
The workshop will be streamed live. You can find the list of talks and webcast
here.
-Charm++ at SuperComputing 2016
+
+
+Charm++ at SuperComputing 2016
11/14/2016
@@ -228,7 +324,9 @@
Charm++ at SuperComputing 2016
You can find a list of PPL and Charm++ talks here. -Stable release of Charm++ version 6.7.1
+ ++Stable release of Charm++ version 6.7.1
4/20/2016@@ -240,7 +338,9 @@
Stable release of Charm++ version 6.7.1
The source code for this release can be downloaded here -The 14th Annual Workshop on Charm++ and its Applications (2016)
+ ++The 14th Annual Workshop on Charm++ and its Applications (2016)
4/19/2016@@ -250,7 +350,9 @@
The 14th Annual Workshop on Charm++ and its Appli
You can find the list of talks and their slides
here.
-Stable release of Charm++ version 6.7.0
+
+
+Stable release of Charm++ version 6.7.0
12/22/2015
@@ -263,7 +365,9 @@
Stable release of Charm++ version 6.7.0
The source code for this release can be downloaded here -Stable release of Charm++ version 6.6.0
+ ++Stable release of Charm++ version 6.6.0
09/06/2014@@ -275,7 +379,9 @@
Stable release of Charm++ version 6.6.0
The source code for this release can be downloaded here -Stable release of Charm++ version 6.5.1
+ ++Stable release of Charm++ version 6.5.1
07/01/2013@@ -293,14 +399,14 @@
Stable release of Charm++ version 6.5.1
Oak Ridge National Lab, SDSC, and TACC. -Charm++ application results featured in Nature
++Charm++ application results featured in Nature
05/30/2013NAMD, an application developed using Charm++, was recently used in an all-atom molecular dynamics simulation to determine the chemical structure of the HIV -Capsid, as reported in a Nature research article. The simulation +Capsid, as reported in a Nature research article. The simulation involving about 64 million atoms, was carried out on the Blue Waters system at the University of Illinois, and benefited from many features and performance optimizations implemented in Charm++. Results from the simulation @@ -319,15 +425,17 @@
Charm++ application results featured in Nature
Theoretical and Computational Biophysics Group led by Prof. Klaus Schulten. -Charm++ issue tracker now publicly accessible.
++Charm++ issue tracker now publicly accessible.
04/03/2013-Charm++ issue tracker is now publicly accessible. +Charm++ issue tracker is now publicly accessible.
-Stable release of Charm++ version 6.5.0
++Stable release of Charm++ version 6.5.0
03/29/2013@@ -348,14 +456,18 @@
Stable release of Charm++ version 6.5.0
National Lab, NERSC, NCSA, NICS, Oak Ridge National Lab, SDSC, and TACC. -Fernbach Award for Profs. Kale, Schulten
+ ++Fernbach Award for Profs. Kale, Schulten
10/10/2012Profs. Kale was named one of the winners of the Sidney Fernbach Award, to be presented at Supercomputing 2012. -
Local SIAM chapter hosting Charm++ Tutorial
+ ++Local SIAM chapter hosting Charm++ Tutorial
04/27/2012@@ -380,7 +492,9 @@
Local SIAM chapter hosting Charm++ Tutorial
-First Beta of Charm++ version 6.4.0 released
+ ++ First Beta of Charm++ version 6.4.0 released
03/12/2012@@ -397,3 +511,18 @@
First Beta of Charm++ version 6.4.0 released
here, and compiled binaries for our autobuild platforms can be found here. + + +Charm++ Workshops:
+ ++Charm++ Workshop Introduction video. +For more tutorial material, slides and readings refer here. +
+ +Programming Model
+ +Object-based program design
+-
+
Globally addressable objects
+-
+
Globally invocable methods
+-
+
Asynchronous methods
+-
+
Object collections
+-
+

Object placement
+-
+
Migratable / serializable objects
+-
+
Decompose data across object collection
+-
+
Decompose data manipulation across methods
+ +Easily task parallelism
+ + +Release Information
+ + +Tools
++
Parallel Performance Analysis: Projections
+


+
+
+
The latest development version of Projections can be downloaded directly from our source archive. The Git version control system is used, which is available from here.
+-
+
+
+
Parallel Debugging: Charm Debug
+
+
+ In addition, Charm++ offers several additional features designed to simplify application development. Linking with "-memory paranoid" checks all dynamic heap allocation calls for common errors, such as double-delete, random-delete, read-after-delete, buffer over- and under-write errors. Charm++, when compiled without "-DCMK_OPTIMIZE", contains hundreds of assertions to catch invalid parameters and uninitialized data passed to API routines. +
+
+ Charm Debug is a sophisticated parallel debugger, with the ability to + set breakpoints, examine variables, objects, and messages across the + entire machine. The design of this debugger is described in the research page. +
+
+ +
The latest development version of Charm Debug can be downloaded directly from our source archive. The Git version control system is used, which is available from here.
+-
+
+
Array "Hello World": A Slightly More Advanced "Hello World" Program
@@ -81,15 +184,15 @@How the Array "Hello World" Program Works
-
- ![]()
There are two chare classes in this version of the "Hello World" program (aka. the Array @@ -139,7 +242,7 @@ How the Array "Hello World" Program Works(the Broadcast "Hello World" Program) will actually exhibit parallelism. -+ Array "Hello World" Code@@ -154,18 +257,18 @@The "Hello" Chare Class
|