Docs(github wiki) and src.

QVQZZZ · May 26, 2024 · d700364 · d700364
1 parent d3e9144
commit d700364
Show file tree

Hide file tree

Showing 27 changed files with 539 additions and 137 deletions.
diff --git a/docs/en/api/fed.md b/docs/en/api/fed.md
@@ -0,0 +1,72 @@
+# heflwr.fed
+The `heflwr.fed` module provides direct support for aggregation and distribution operations in system heterogeneous federated learning.
+For federated learning under system heterogeneity, direct aggregation and distribution of parameters are not feasible. `heflwr.fed` provides support for PyTorch and Flower aggregation and distribution functions.
+
+The `heflwr.fed` module provides APIs supporting PyTorch and Flower to facilitate research on federated learning under system heterogeneity. This module provides three functions: `aggregate_layer`, `extract`, and `merge`.
+- With the `aggregate_layer` function and the `reset_parameters_from_father` method of members in `heflwr.nn`, you can quickly build a simulated federated learning system implemented using PyTorch.
+- With the `extract` and `merge` functions, you can quickly build a federated learning system implemented using Flower, which can be deployed in real-world application environments without the need to focus on low-level parameter transmission serialization protocols and communication details.
+
+## aggregate_layer
+> ```python
+> aggregate_layer(global_layer: heflwr.nn.SUPPORT_LAYER,
+> subset_layers: List[heflwr.nn.SUPPORT_LAYER],
+> weights: List[int],
+> ) -> None
+Aggregate parameters from multiple client layers (`subset_layers`) into the global layer (`global_layer`),
+where the influence of each client layer on the global layer is proportional to its associated weight.
+For the definition of `SUPPORT_LAYER`, please refer to [`heflwr.nn` documentation](TODO).
+
+### Parameters
+- **global_layer** (<font color=#ED564A>_heflwr.nn.SUPPORT_LAYER_</font>) - The global parameter layer to aggregate as the target.
+- **subset_layers** (<font color=#ED564A>_List[heflwr.nn.SUPPORT_LAYER]_</font>) - List of multiple local parameter layers.
+- **weights** (<font color=#ED564A>_List[int]_</font>) - Weights for each client layers, typically set to a list consisting of the number of training samples for each federated client.
+
+### Returns and Side Effects
+- Returns - `None`.
+- Side Effects - Parameters of the `global_layer` object are reset to the weighted average of `subset_layers`.
+
+## extract
+> ```python
+> extract(parameters: flwr.common.typing.Parameters,
+> client_net: torch.nn.Module,
+> server_net: torch.nn.Module,
+> ) -> flwr.common.typing.Parameters
+Extract complete parameters `parameters` matching `server_net` according to the pruning structure of `client_net`,
+used for the scenario where the server distributes model parameters to clients in Flower federated learning.
+Returns the extracted local parameters, resets the parameters of `server_net` to `parameters`,
+and resets the parameters of `client_net` to the returned values.
+
+### Parameters
+- **parameters** (<font color=#ED564A>_flwr.common.typing.Parameters_</font>) - Parameters object of the current global model.
+- **client_net** (<font color=#ED564A>_torch.nn.Module_</font>) - Client model object constructed with `SUPPORT_LAYER` or a model object consistent with the client model structure.
+- **server_net** (<font color=#ED564A>_torch.nn.Module_</font>) - Global model object constructed with `SUPPORT_LAYER` or a model object consistent with the global model structure.
+
+### Returns and Side Effects
+- Returns - Parameters extracted from `parameters` that match the structure of `client_net`.
+- Side Effects
+ - Resets the parameters of `server_net` to `parameters`.
+ - Resets the parameters of `client_net` to the returned values.
+
+## merge
+> ```python
+> merge(results: List[Tuple[flwr.server.client_proxy.ClientProxy,
+> flwr.common.typing.FitRes]],
+> client_nets: List[torch.nn.Module],
+> server_net: torch.nn.Module,
+> ) -> flwr.common.typing.Parameters
+The parameters from models of multiple clients (contained in `results`) are aggregated using heterogeneous weighted aggregation.
+used for the scenario where the server aggregates local training results from various clients in Flower federated learning.
+Returns the parameters after heterogeneous weighted aggregation, resets the parameters of each object in `client_nets` to the corresponding parameters in `results`,
+and resets the parameters of `server_net` to the returned values.
+
+### Parameters
+- **results** (<font color=#ED564A>_List[Tuple[flwr.server.client_proxy.ClientProxy, flwr.common.typing.FitRes]]_</font>) -
+Local training results of various clients.
+- **client_nets** (<font color=#ED564A>_List[torch.nn.Module]_</font>) - Multiple client model objects constructed with `SUPPORT_LAYER` or model objects consistent with the client model structure.
+- **server_net** (<font color=#ED564A>_torch.nn.Module_</font>) - Global model object constructed with `SUPPORT_LAYER` or a model object consistent with the global model structure.
+
+### Returns and Side Effects
+- Returns - Parameters after heterogeneous weighted aggregation.
+- Side Effects
+ - Resets the parameters of `server_net` to the returned values.
+ - Resets the parameters of each object in `client_nets` to the corresponding parameters in `results`.
diff --git a/docs/en/api/log.md b/docs/en/api/log.md
@@ -0,0 +1,48 @@
+# heflwr.log
+
+`heflwr.log` module is responsible for logging and tracing the running status of the entire project. This module provides three components: `logger`, `log`, and `configure`.
+
+## logger
+`logger` is a `logging.Logger` object with the name `hetero-flwr`. It is a global logger (_per-project logger_) and can be used like a common `logging.Logger` object. For example:
+
+```python
+from heflwr.log import logger
+
+logger.debug("hello, world!")
+logger.info("hello, world!")
+logger.warning("hello, world!")
+logger.error("hello, world!")
+logger.critical("hello, world!")
+```
+The `logger` object is by default bound to `logging.StreamHandler(stream=sys.stdout)`, so the above example will print logs containing `"hello, world!"` to the standard output.
+
+## log
+The `log` method of the logger provides a quick way to log messages directly. It is equivalent to the global logger's `log` method. You only need to pass the log level and the message, for example:
+```python
+from logging import INFO
+from heflwr.log import log
+
+log(INFO, "hello, world!")
+```
+The example above will print a log with the message `"hello, world!"` to the standard output.
+
+## configure
+
+The `configure` function is used to customize the behavior of the global logger, such as specifying log identifiers, output file paths, and remote HTTP server addresses. It accepts the following parameters:
+- `identifier: str`: A string identifier that will be added as a prefix to each log entry to help trace the source of the log. A typical use is to set it as a client identifier, such as "client-A", to indicate the source of the log to a remote server.
+- `file: Optional[str] = None`: An optional parameter for specifying the path to the log output file. If provided, logs will be written to this file as well.
+- `host: Optional[str] = None`: An optional parameter for specifying the address of a remote HTTP server. If provided, logs will be sent to this server.
+- `simple: Optional[bool] = None`: An optional parameter that specifies the format of remote logs. This parameter must be used in conjunction with the `host` parameter. If the `host` parameter is set to `None`, specifying the `simple` parameter will raise a warning. If host is not `None` and `simple` is set to `None` or `False`, logs will be sent in standard format to the `/log` URL of the remote server. If `host` is not `None` and `simple` is set to `True`, logs will be sent in a concise format to the `/simple_log` URL of the remote server to reduce network transmission overhead.
+
+Calling the `configure` function can add `FileHandler` and/or `HTTPHandler` to the global logger, for example:
+```python
+from heflwr.log import logger, configure
+
+configure(identifier="my-app", file="logs/app.log")
+configure(identifier="client-A", host="127.0.0.1:5000")
+configure(identifier="client-A", file="logs/app.log", host="127.0.0.1:5000")
+configure(identifier="client-A", file="logs/app.log", host="127.0.0.1:5000", simple=True)
+
+logger.info("hello, world!")
+```
+By doing this, logs can not only be output to the standard output but also to the corresponding file and/or HTTP server.
diff --git a/docs/en/api/monitor.md b/docs/en/api/monitor.md
@@ -0,0 +1,150 @@
+# heflwr.monitor
+
+The `heflwr.monitor` module provides functionality for resource monitoring for the execution of any Python program.
+In scenarios like resource-constrained deep learning and Internet of Things (IoT) federated learning, training devices face limited resources such as computation, storage, network, and battery.
+With the `heflwr.monitor` module, the resource usage of training devices will be continuously tracked,
+and written to the local file system or sent to a remote federated learning server in real-time.
+
+Most existing research measures the resource consumption of deep neural networks using theoretical metrics such as MACs (Multiply-Accumulate Operations), FLOPs (Floating Point Operations), and parameters.
+However, these metrics are usually calculated under idealized conditions,
+neglecting various factors in the actual hardware environment, such as memory bandwidth, storage speed, parallel computing capability, and power constraints.
+Therefore, they may not accurately reflect the actual performance on specific hardware devices.
+
+The `heflwr.monitor` module can provide detailed resource usage data in the actual operating environment.
+Compared to evaluation based solely on theoretical metrics,
+it can take into account the complexity of the actual hardware environment, providing more accurate and comprehensive performance analysis.
+
+The `heflwr.monitor` module provides two sub-modules: `process_monitor` and `thread_monitor`.
+Each sub-module offers three forms of monitors: `FileMonitor`, `PrometheusMonitor`, and `RemoteMonitor`.
+Their organizational structure can be illustrated as follows:
+```shell
+├── heflwr.monitor
+ ├── process_monitor
+ │ ├── FileMonitor
+ │ ├── PrometheusMonitor
+ │ └── RemoteMonitor
+ └── thread_monitor
+ ├── FileMonitor
+ ├── PrometheusMonitor
+ └── RemoteMonitor
+```
+Please choose the monitor you want based on the characteristics of each monitor. The comparison between `process_monitor` and `thread_monitor` is as follows:
+- `process_monitor` runs the monitor in the form of a process, providing more accurate resource and performance monitoring.
+- `thread_monitor` runs the monitor in the form of a thread, offering simpler control and less additional overhead. However, it may slightly affect the accuracy of monitoring.
+
+Furthermore, the comparison between `FileMonitor` / `PrometheusMonitor` / `RemoteMonitor` is as follows:
+- `FileMonitor` stores resource monitoring information in the local file system.
+- `PrometheusMonitor` exposes monitoring information to an HTTP port and receives scraping by Prometheus running on the federated learning server (or other servers).
+- `RemoteMonitor` sends monitoring information to the federated learning server (or other servers).
+
+Their platform support is as follows:
+
+| | File | Prometheus | Remote |
+|-------------|-----------------|-----------------|-----------------|
+| **Process** | Linux / Windows | Linux | Linux / Windows |
+| **Thread** | Linux / Windows | Linux / Windows | Linux / Windows |
+
+If you have no specific preference and just want to run the monitor in the simplest form, you can use `heflwr.monitor.process_monitor.FileMonitor`. It runs the monitor in the form of a process and logs monitoring data to a local file.
+
+
+## Quick Start
+Copy the following code into your IDE, and replace `your_main_logic()` with your actual code.
+```python
+from heflwr.monitor.process_monitor import FileMonitor
+
+# Initialize a monitor instance
+monitor = FileMonitor(file='./log.txt', interval=5)
+
+# Start - main - stop
+monitor.start() # Monitor begins continuous monitoring and writes logs to log.txt.
+your_main_logic() # Your deep learning code / federated learning code / any Python code
+monitor.stop() # Monitor stops monitoring and ceases log writing.
+
+# Post-processing
+detail = monitor.stats()
+summary = monitor.summary()
+print(detail)
+print(summary)
+```
+You should be able to observe similar output on the console:
+```shell
+{'cpu_usage': [0.0, 0.0], 'memory_usage': [0.2398851887620762, 0.2398851887620762], 'network_bytes_sent': [14399, 447], 'network_bytes_recv': [11201, 975], 'power_vdd_in': [], 'power_vdd_cpu_gpu_cv': [], 'power_vdd_soc': []}
+{'avg_cpu_usage': 0.0, 'avg_memory_usage': 0.2398851887620762, 'total_network_bytes_sent': 14846, 'total_network_bytes_recv': 12176, 'total_power_vdd_in': 0, 'total_power_vdd_cpu_gpu_cv': 0, 'total_power_vdd_soc': 0}
+```
+At the same time, the program will generate a `log.txt` file in the running directory, recording detailed monitoring information therein.
+
+## Import
+Below are the import methods for all monitors:
+```python
+from heflwr.monitor.process_monitor import FileMonitor
+from heflwr.monitor.process_monitor import PrometheusMonitor
+from heflwr.monitor.process_monitor import RemoteMonitor
+from heflwr.monitor.thread_monitor import FileMonitor
+from heflwr.monitor.thread_monitor import PrometheusMonitor
+from heflwr.monitor.thread_monitor import RemoteMonitor
+```
+
+## Initialization
+Below are the initialization methods for `FileMonitor`, `PrometheusMonitor`, and `RemoteMonitor`.
+- `FileMonitor`: You need to specify the file to write to `file` (if it does not exist, it will be created), and the monitoring interval `interval` (unit: seconds).
+ ```python
+ monitor = FileMonitor(file="./log.txt", interval=5)
+ ```
+- `PrometheusMonitor`: You need to specify the port to expose `port` (default is 8003), and the monitoring interval `interval` (unit: seconds).
+ ```python
+ monitor = PrometheusMonitor(port=8003, interval=5)
+ ```
+- `RemoteMonitor`: You need to specify the address of the remote server host, the monitoring interval `interval` (unit: seconds),
+log identifier `identifier` (if `None`, a random string will be used instead), and the remote log format `simple` (default is `True`, representing concise format).
+ ```python
+ monitor = RemoteMonitor(host="127.0.0.1:5000", interval=5, identifier=None, simple=True)
+ ```
+The above initialization methods apply to both `process_monitor` and `thread_monitor`.
+For detailed information about the initialization parameters `identifier` and `simple` for `RemoteMonitor`, please refer to the [`heflwr.log` documentation](TODO).
+
+
+## Setting Monitoring Metrics
+`heflwr.monitor` supports monitoring four types of metrics:
+- CPU Usage: Monitoring metric enabled by default. It calculates the percentage of CPU usage of the monitored process compared to the total CPU usage of the system. Unit: %.
+- Memory Usage: Monitoring metric enabled by default. It calculates the percentage of memory usage of the monitored process compared to the total memory of the system. Unit: %.
+- Network Traffic: Monitoring metric enabled by default. It monitors the system's upstream and downstream traffic information. Unit: B.
+- Device Power Consumption: Monitoring metric disabled by default, only enabled on devices that support `tegrastats`.
+It monitors the input voltage of the device, the total voltage of CPU / GPU / CV cores, and the SOC voltage excluding CPU / GPU/ CV cores (such as memory and nvdec). Unit: mV.
+
+After initializing the monitor, you can manually add or remove each monitoring metric:
+
+```python
+monitor.set_metrics(cpu=False, memory=True, network=True, power=True)
+```
+The above example is equivalent to:
+```python
+monitor.set_metrics(cpu=False, power=True)
+```
+
+## Start and Stop the Monitor
+Add `monitor.start()` and `monitor.stop()` before and after the function or code segment that needs monitoring.
+```python
+monitor.start()
+your_main_logic()
+monitor.stop()
+```
+
+## Post-processing
+In addition to writing to a local file, exposing to Prometheus, or sending to a remote server,
+you can also use the monitor's `stats()` and `summary()` methods to process or use the monitored information in subsequent programs.
+- The `stats()` method provides monitoring logs for each sampling point (controlled by the `interval` parameter during initialization).
+- The `summary()` method provides average (e.g., CPU usage, unit: %) or cumulative monitoring information (e.g., power consumption, unit: mJ) for all sampling points.
+
+You can use these two methods at any time, including before the `stop()` method is called, as long as the monitor `monitor` has been instantiated.
+
+```python
+detail = monitor.stats()
+summary = monitor.summary()
+print(detail)
+print(summary)
+```
+The output information of the above example is similar to:
+```shell
+{'cpu_usage': [0.0, 0.0], 'memory_usage': [0.2398851887620762, 0.2398851887620762], 'network_bytes_sent': [14399, 447], 'network_bytes_recv': [11201, 975], 'power_vdd_in': [], 'power_vdd_cpu_gpu_cv': [], 'power_vdd_soc': []}
+{'avg_cpu_usage': 0.0, 'avg_memory_usage': 0.2398851887620762, 'total_network_bytes_sent': 14846, 'total_network_bytes_recv': 12176, 'total_power_vdd_in': 0, 'total_power_vdd_cpu_gpu_cv': 0, 'total_power_vdd_soc': 0}
+```