Skip to content

Commit

Permalink
Merge tag 'misc-habanalabs-next-2022-02-28' of https://git.kernel.org…
Browse files Browse the repository at this point in the history
…/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-next

Oded writes:

This tag contains habanalabs driver changes for v5.18:

- Add new feature of recording time-stamp when a completion
  queue counter reaches a target value as determined by the
  userspace application. This is used by the graph compiler
  to accurately measure the time it takes for certain workloads
  to execute, which helps to fine-tune future compilations.

- Add two new attributes to sysfs that expose the VRM and
  f/w OS version

- Add a delay to the reset path that allows the driver to
  receive and handle additional events from the f/w before
  doing the reset. This can help when debugging why a reset
  event was received from the f/w.

- Re-factor some of the sysfs code in the driver. Mainly,
  move functions from hwmgr.c to more relevant files and
  totally remove hwmgr.c file.

- Fix multiple bugs such as races, use-after-free, ignoring
  error codes, etc.

- As usual, multiple minor changes and small fixes.

* tag 'misc-habanalabs-next-2022-02-28' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux: (50 commits)
  habanalabs: remove deprecated firmware states
  habanalabs: add an option to delay a device reset
  habanalabs: Add check for pci_enable_device
  habanalabs: Fix reset upon device release bug
  habanalabs: make sure device mem alloc is page aligned
  habanalabs/gaudi: add missing handling of NIC related events
  habanalabs/gaudi: handle axi errors from NIC engines
  habanalabs: allow user to set allocation page size
  habanalabs: use kernel-doc for memory ioctl documentation
  habanalabs: avoid using an uninitialized variable
  habanalabs: set max power on device init per ASIC
  habanalabs: use proper max_power variable for device utilization
  habanalabs: enable stop-on-error debugfs setting per ASIC
  habanalabs: change function to static
  habanalabs: add missing include of vmalloc.h
  habanalabs: fix use-after-free bug
  habanalabs: rephrase error messages in PCI initialization
  habanalabs: fix spelling mistake
  habanalabs: Timestamps buffers registration
  habanalabs: fix race when waiting on encaps signal
  ...
  • Loading branch information
gregkh committed Mar 10, 2022
2 parents 0245107 + 655221c commit 4dee7a7
Show file tree
Hide file tree
Showing 27 changed files with 1,504 additions and 915 deletions.
20 changes: 2 additions & 18 deletions Documentation/ABI/testing/debugfs-driver-habanalabs
Original file line number Diff line number Diff line change
Expand Up @@ -12,24 +12,7 @@ What: /sys/kernel/debug/habanalabs/hl<n>/clk_gate
Date: May 2020
KernelVersion: 5.8
Contact: [email protected]
Description: Allow the root user to disable/enable in runtime the clock
gating mechanism in Gaudi. Due to how Gaudi is built, the
clock gating needs to be disabled in order to access the
registers of the TPC and MME engines. This is sometimes needed
during debug by the user and hence the user needs this option.
The user can supply a bitmask value, each bit represents
a different engine to disable/enable its clock gating feature.
The bitmask is composed of 20 bits:

======= ============
0 - 7 DMA channels
8 - 11 MME engines
12 - 19 TPC engines
======= ============

The bit's location of a specific engine can be determined
using (1 << GAUDI_ENGINE_ID_*). GAUDI_ENGINE_ID_* values
are defined in uapi habanalabs.h file in enum gaudi_engine_id
Description: This setting is now deprecated as clock gating is handled solely by the f/w

What: /sys/kernel/debug/habanalabs/hl<n>/command_buffers
Date: Jan 2019
Expand Down Expand Up @@ -239,6 +222,7 @@ KernelVersion: 5.6
Contact: [email protected]
Description: Sets the stop-on_error option for the device engines. Value of
"0" is for disable, otherwise enable.
Relevant only for GOYA and GAUDI.

What: /sys/kernel/debug/habanalabs/hl<n>/timeout_locked
Date: Sep 2021
Expand Down
16 changes: 14 additions & 2 deletions Documentation/ABI/testing/sysfs-driver-habanalabs
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,12 @@ KernelVersion: 5.1
Contact: [email protected]
Description: Displays the device's version from the eFuse

What: /sys/class/habanalabs/hl<n>/fw_os_ver
Date: Dec 2021
KernelVersion: 5.18
Contact: [email protected]
Description: Version of the firmware OS running on the device's CPU

What: /sys/class/habanalabs/hl<n>/hard_reset
Date: Jan 2019
KernelVersion: 5.1
Expand Down Expand Up @@ -115,7 +121,7 @@ What: /sys/class/habanalabs/hl<n>/infineon_ver
Date: Jan 2019
KernelVersion: 5.1
Contact: [email protected]
Description: Version of the Device's power supply F/W code
Description: Version of the Device's power supply F/W code. Relevant only to GOYA and GAUDI

What: /sys/class/habanalabs/hl<n>/max_power
Date: Jan 2019
Expand Down Expand Up @@ -220,4 +226,10 @@ What: /sys/class/habanalabs/hl<n>/uboot_ver
Date: Jan 2019
KernelVersion: 5.1
Contact: [email protected]
Description: Version of the u-boot running on the device's CPU
Description: Version of the u-boot running on the device's CPU

What: /sys/class/habanalabs/hl<n>/vrm_ver
Date: Jan 2022
KernelVersion: not yet upstreamed
Contact: [email protected]
Description: Version of the Device's Voltage Regulator Monitor F/W code. N/A to GOYA and GAUDI
2 changes: 1 addition & 1 deletion drivers/misc/habanalabs/common/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ HL_COMMON_FILES := common/habanalabs_drv.o common/device.o common/context.o \
common/command_buffer.o common/hw_queue.o common/irq.o \
common/sysfs.o common/hwmon.o common/memory.o \
common/command_submission.o common/firmware_if.o \
common/state_dump.o common/hwmgr.o
common/state_dump.o
4 changes: 3 additions & 1 deletion drivers/misc/habanalabs/common/command_buffer.c
Original file line number Diff line number Diff line change
Expand Up @@ -424,8 +424,8 @@ int hl_cb_ioctl(struct hl_fpriv *hpriv, void *data)
{
union hl_cb_args *args = data;
struct hl_device *hdev = hpriv->hdev;
u64 handle = 0, device_va = 0;
enum hl_device_status status;
u64 handle = 0, device_va;
u32 usage_cnt = 0;
int rc;

Expand Down Expand Up @@ -464,6 +464,8 @@ int hl_cb_ioctl(struct hl_fpriv *hpriv, void *data)
args->in.flags,
&usage_cnt,
&device_va);
if (rc)
break;

memset(&args->out, 0, sizeof(args->out));

Expand Down
Loading

0 comments on commit 4dee7a7

Please sign in to comment.