Skip to content

Commit

Permalink
Docs(github wiki) and src.
Browse files Browse the repository at this point in the history
  • Loading branch information
QVQZZZ committed May 26, 2024
1 parent d3e9144 commit d700364
Show file tree
Hide file tree
Showing 27 changed files with 539 additions and 137 deletions.
72 changes: 72 additions & 0 deletions docs/en/api/fed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# heflwr.fed
The `heflwr.fed` module provides direct support for aggregation and distribution operations in system heterogeneous federated learning.
For federated learning under system heterogeneity, direct aggregation and distribution of parameters are not feasible. `heflwr.fed` provides support for PyTorch and Flower aggregation and distribution functions.

The `heflwr.fed` module provides APIs supporting PyTorch and Flower to facilitate research on federated learning under system heterogeneity. This module provides three functions: `aggregate_layer`, `extract`, and `merge`.
- With the `aggregate_layer` function and the `reset_parameters_from_father` method of members in `heflwr.nn`, you can quickly build a simulated federated learning system implemented using PyTorch.
- With the `extract` and `merge` functions, you can quickly build a federated learning system implemented using Flower, which can be deployed in real-world application environments without the need to focus on low-level parameter transmission serialization protocols and communication details.

## aggregate_layer
> ```python
> aggregate_layer(global_layer: heflwr.nn.SUPPORT_LAYER,
> subset_layers: List[heflwr.nn.SUPPORT_LAYER],
> weights: List[int],
> ) -> None
Aggregate parameters from multiple client layers (`subset_layers`) into the global layer (`global_layer`),
where the influence of each client layer on the global layer is proportional to its associated weight.
For the definition of `SUPPORT_LAYER`, please refer to [`heflwr.nn` documentation](TODO).
### Parameters
- **global_layer** (<font color=#ED564A>_heflwr.nn.SUPPORT_LAYER_</font>) - The global parameter layer to aggregate as the target.
- **subset_layers** (<font color=#ED564A>_List[heflwr.nn.SUPPORT_LAYER]_</font>) - List of multiple local parameter layers.
- **weights** (<font color=#ED564A>_List[int]_</font>) - Weights for each client layers, typically set to a list consisting of the number of training samples for each federated client.
### Returns and Side Effects
- Returns - `None`.
- Side Effects - Parameters of the `global_layer` object are reset to the weighted average of `subset_layers`.
## extract
> ```python
> extract(parameters: flwr.common.typing.Parameters,
> client_net: torch.nn.Module,
> server_net: torch.nn.Module,
> ) -> flwr.common.typing.Parameters
Extract complete parameters `parameters` matching `server_net` according to the pruning structure of `client_net`,
used for the scenario where the server distributes model parameters to clients in Flower federated learning.
Returns the extracted local parameters, resets the parameters of `server_net` to `parameters`,
and resets the parameters of `client_net` to the returned values.
### Parameters
- **parameters** (<font color=#ED564A>_flwr.common.typing.Parameters_</font>) - Parameters object of the current global model.
- **client_net** (<font color=#ED564A>_torch.nn.Module_</font>) - Client model object constructed with `SUPPORT_LAYER` or a model object consistent with the client model structure.
- **server_net** (<font color=#ED564A>_torch.nn.Module_</font>) - Global model object constructed with `SUPPORT_LAYER` or a model object consistent with the global model structure.
### Returns and Side Effects
- Returns - Parameters extracted from `parameters` that match the structure of `client_net`.
- Side Effects
- Resets the parameters of `server_net` to `parameters`.
- Resets the parameters of `client_net` to the returned values.
## merge
> ```python
> merge(results: List[Tuple[flwr.server.client_proxy.ClientProxy,
> flwr.common.typing.FitRes]],
> client_nets: List[torch.nn.Module],
> server_net: torch.nn.Module,
> ) -> flwr.common.typing.Parameters
The parameters from models of multiple clients (contained in `results`) are aggregated using heterogeneous weighted aggregation.
used for the scenario where the server aggregates local training results from various clients in Flower federated learning.
Returns the parameters after heterogeneous weighted aggregation, resets the parameters of each object in `client_nets` to the corresponding parameters in `results`,
and resets the parameters of `server_net` to the returned values.
### Parameters
- **results** (<font color=#ED564A>_List[Tuple[flwr.server.client_proxy.ClientProxy, flwr.common.typing.FitRes]]_</font>) -
Local training results of various clients.
- **client_nets** (<font color=#ED564A>_List[torch.nn.Module]_</font>) - Multiple client model objects constructed with `SUPPORT_LAYER` or model objects consistent with the client model structure.
- **server_net** (<font color=#ED564A>_torch.nn.Module_</font>) - Global model object constructed with `SUPPORT_LAYER` or a model object consistent with the global model structure.
### Returns and Side Effects
- Returns - Parameters after heterogeneous weighted aggregation.
- Side Effects
- Resets the parameters of `server_net` to the returned values.
- Resets the parameters of each object in `client_nets` to the corresponding parameters in `results`.
48 changes: 48 additions & 0 deletions docs/en/api/log.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# heflwr.log

`heflwr.log` module is responsible for logging and tracing the running status of the entire project. This module provides three components: `logger`, `log`, and `configure`.

## logger
`logger` is a `logging.Logger` object with the name `hetero-flwr`. It is a global logger (_per-project logger_) and can be used like a common `logging.Logger` object. For example:

```python
from heflwr.log import logger

logger.debug("hello, world!")
logger.info("hello, world!")
logger.warning("hello, world!")
logger.error("hello, world!")
logger.critical("hello, world!")
```
The `logger` object is by default bound to `logging.StreamHandler(stream=sys.stdout)`, so the above example will print logs containing `"hello, world!"` to the standard output.

## log
The `log` method of the logger provides a quick way to log messages directly. It is equivalent to the global logger's `log` method. You only need to pass the log level and the message, for example:
```python
from logging import INFO
from heflwr.log import log

log(INFO, "hello, world!")
```
The example above will print a log with the message `"hello, world!"` to the standard output.

## configure

The `configure` function is used to customize the behavior of the global logger, such as specifying log identifiers, output file paths, and remote HTTP server addresses. It accepts the following parameters:
- `identifier: str`: A string identifier that will be added as a prefix to each log entry to help trace the source of the log. A typical use is to set it as a client identifier, such as "client-A", to indicate the source of the log to a remote server.
- `file: Optional[str] = None`: An optional parameter for specifying the path to the log output file. If provided, logs will be written to this file as well.
- `host: Optional[str] = None`: An optional parameter for specifying the address of a remote HTTP server. If provided, logs will be sent to this server.
- `simple: Optional[bool] = None`: An optional parameter that specifies the format of remote logs. This parameter must be used in conjunction with the `host` parameter. If the `host` parameter is set to `None`, specifying the `simple` parameter will raise a warning. If host is not `None` and `simple` is set to `None` or `False`, logs will be sent in standard format to the `/log` URL of the remote server. If `host` is not `None` and `simple` is set to `True`, logs will be sent in a concise format to the `/simple_log` URL of the remote server to reduce network transmission overhead.

Calling the `configure` function can add `FileHandler` and/or `HTTPHandler` to the global logger, for example:
```python
from heflwr.log import logger, configure

configure(identifier="my-app", file="logs/app.log")
configure(identifier="client-A", host="127.0.0.1:5000")
configure(identifier="client-A", file="logs/app.log", host="127.0.0.1:5000")
configure(identifier="client-A", file="logs/app.log", host="127.0.0.1:5000", simple=True)

logger.info("hello, world!")
```
By doing this, logs can not only be output to the standard output but also to the corresponding file and/or HTTP server.
150 changes: 150 additions & 0 deletions docs/en/api/monitor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# heflwr.monitor

The `heflwr.monitor` module provides functionality for resource monitoring for the execution of any Python program.
In scenarios like resource-constrained deep learning and Internet of Things (IoT) federated learning, training devices face limited resources such as computation, storage, network, and battery.
With the `heflwr.monitor` module, the resource usage of training devices will be continuously tracked,
and written to the local file system or sent to a remote federated learning server in real-time.

Most existing research measures the resource consumption of deep neural networks using theoretical metrics such as MACs (Multiply-Accumulate Operations), FLOPs (Floating Point Operations), and parameters.
However, these metrics are usually calculated under idealized conditions,
neglecting various factors in the actual hardware environment, such as memory bandwidth, storage speed, parallel computing capability, and power constraints.
Therefore, they may not accurately reflect the actual performance on specific hardware devices.

The `heflwr.monitor` module can provide detailed resource usage data in the actual operating environment.
Compared to evaluation based solely on theoretical metrics,
it can take into account the complexity of the actual hardware environment, providing more accurate and comprehensive performance analysis.

The `heflwr.monitor` module provides two sub-modules: `process_monitor` and `thread_monitor`.
Each sub-module offers three forms of monitors: `FileMonitor`, `PrometheusMonitor`, and `RemoteMonitor`.
Their organizational structure can be illustrated as follows:
```shell
├── heflwr.monitor
├── process_monitor
│ ├── FileMonitor
│ ├── PrometheusMonitor
│ └── RemoteMonitor
└── thread_monitor
├── FileMonitor
├── PrometheusMonitor
└── RemoteMonitor
```
Please choose the monitor you want based on the characteristics of each monitor. The comparison between `process_monitor` and `thread_monitor` is as follows:
- `process_monitor` runs the monitor in the form of a process, providing more accurate resource and performance monitoring.
- `thread_monitor` runs the monitor in the form of a thread, offering simpler control and less additional overhead. However, it may slightly affect the accuracy of monitoring.

Furthermore, the comparison between `FileMonitor` / `PrometheusMonitor` / `RemoteMonitor` is as follows:
- `FileMonitor` stores resource monitoring information in the local file system.
- `PrometheusMonitor` exposes monitoring information to an HTTP port and receives scraping by Prometheus running on the federated learning server (or other servers).
- `RemoteMonitor` sends monitoring information to the federated learning server (or other servers).

Their platform support is as follows:

| | File | Prometheus | Remote |
|-------------|-----------------|-----------------|-----------------|
| **Process** | Linux / Windows | Linux | Linux / Windows |
| **Thread** | Linux / Windows | Linux / Windows | Linux / Windows |

If you have no specific preference and just want to run the monitor in the simplest form, you can use `heflwr.monitor.process_monitor.FileMonitor`. It runs the monitor in the form of a process and logs monitoring data to a local file.


## Quick Start
Copy the following code into your IDE, and replace `your_main_logic()` with your actual code.
```python
from heflwr.monitor.process_monitor import FileMonitor

# Initialize a monitor instance
monitor = FileMonitor(file='./log.txt', interval=5)

# Start - main - stop
monitor.start() # Monitor begins continuous monitoring and writes logs to log.txt.
your_main_logic() # Your deep learning code / federated learning code / any Python code
monitor.stop() # Monitor stops monitoring and ceases log writing.

# Post-processing
detail = monitor.stats()
summary = monitor.summary()
print(detail)
print(summary)
```
You should be able to observe similar output on the console:
```shell
{'cpu_usage': [0.0, 0.0], 'memory_usage': [0.2398851887620762, 0.2398851887620762], 'network_bytes_sent': [14399, 447], 'network_bytes_recv': [11201, 975], 'power_vdd_in': [], 'power_vdd_cpu_gpu_cv': [], 'power_vdd_soc': []}
{'avg_cpu_usage': 0.0, 'avg_memory_usage': 0.2398851887620762, 'total_network_bytes_sent': 14846, 'total_network_bytes_recv': 12176, 'total_power_vdd_in': 0, 'total_power_vdd_cpu_gpu_cv': 0, 'total_power_vdd_soc': 0}
```
At the same time, the program will generate a `log.txt` file in the running directory, recording detailed monitoring information therein.

## Import
Below are the import methods for all monitors:
```python
from heflwr.monitor.process_monitor import FileMonitor
from heflwr.monitor.process_monitor import PrometheusMonitor
from heflwr.monitor.process_monitor import RemoteMonitor
from heflwr.monitor.thread_monitor import FileMonitor
from heflwr.monitor.thread_monitor import PrometheusMonitor
from heflwr.monitor.thread_monitor import RemoteMonitor
```

## Initialization
Below are the initialization methods for `FileMonitor`, `PrometheusMonitor`, and `RemoteMonitor`.
- `FileMonitor`: You need to specify the file to write to `file` (if it does not exist, it will be created), and the monitoring interval `interval` (unit: seconds).
```python
monitor = FileMonitor(file="./log.txt", interval=5)
```
- `PrometheusMonitor`: You need to specify the port to expose `port` (default is 8003), and the monitoring interval `interval` (unit: seconds).
```python
monitor = PrometheusMonitor(port=8003, interval=5)
```
- `RemoteMonitor`: You need to specify the address of the remote server host, the monitoring interval `interval` (unit: seconds),
log identifier `identifier` (if `None`, a random string will be used instead), and the remote log format `simple` (default is `True`, representing concise format).
```python
monitor = RemoteMonitor(host="127.0.0.1:5000", interval=5, identifier=None, simple=True)
```
The above initialization methods apply to both `process_monitor` and `thread_monitor`.
For detailed information about the initialization parameters `identifier` and `simple` for `RemoteMonitor`, please refer to the [`heflwr.log` documentation](TODO).


## Setting Monitoring Metrics
`heflwr.monitor` supports monitoring four types of metrics:
- CPU Usage: Monitoring metric enabled by default. It calculates the percentage of CPU usage of the monitored process compared to the total CPU usage of the system. Unit: %.
- Memory Usage: Monitoring metric enabled by default. It calculates the percentage of memory usage of the monitored process compared to the total memory of the system. Unit: %.
- Network Traffic: Monitoring metric enabled by default. It monitors the system's upstream and downstream traffic information. Unit: B.
- Device Power Consumption: Monitoring metric disabled by default, only enabled on devices that support `tegrastats`.
It monitors the input voltage of the device, the total voltage of CPU / GPU / CV cores, and the SOC voltage excluding CPU / GPU/ CV cores (such as memory and nvdec). Unit: mV.

After initializing the monitor, you can manually add or remove each monitoring metric:

```python
monitor.set_metrics(cpu=False, memory=True, network=True, power=True)
```
The above example is equivalent to:
```python
monitor.set_metrics(cpu=False, power=True)
```

## Start and Stop the Monitor
Add `monitor.start()` and `monitor.stop()` before and after the function or code segment that needs monitoring.
```python
monitor.start()
your_main_logic()
monitor.stop()
```

## Post-processing
In addition to writing to a local file, exposing to Prometheus, or sending to a remote server,
you can also use the monitor's `stats()` and `summary()` methods to process or use the monitored information in subsequent programs.
- The `stats()` method provides monitoring logs for each sampling point (controlled by the `interval` parameter during initialization).
- The `summary()` method provides average (e.g., CPU usage, unit: %) or cumulative monitoring information (e.g., power consumption, unit: mJ) for all sampling points.

You can use these two methods at any time, including before the `stop()` method is called, as long as the monitor `monitor` has been instantiated.

```python
detail = monitor.stats()
summary = monitor.summary()
print(detail)
print(summary)
```
The output information of the above example is similar to:
```shell
{'cpu_usage': [0.0, 0.0], 'memory_usage': [0.2398851887620762, 0.2398851887620762], 'network_bytes_sent': [14399, 447], 'network_bytes_recv': [11201, 975], 'power_vdd_in': [], 'power_vdd_cpu_gpu_cv': [], 'power_vdd_soc': []}
{'avg_cpu_usage': 0.0, 'avg_memory_usage': 0.2398851887620762, 'total_network_bytes_sent': 14846, 'total_network_bytes_recv': 12176, 'total_power_vdd_in': 0, 'total_power_vdd_cpu_gpu_cv': 0, 'total_power_vdd_soc': 0}
```
Loading

0 comments on commit d700364

Please sign in to comment.