forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] master from torvalds:master #1939
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fixes some spelling errors in the comments. Signed-off-by: Atul Kumar Pant <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
Report the task weight when dumping the task state during an error exit. Moreover, adjust the output format to display dsq_vtime, slice, and weight on the same line. This can help identify whether certain tasks were excessively prioritized or de-prioritized due to large niceness gaps. Signed-off-by: Andrea Righi <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
All scx enums are now automatically generated from vmlinux.h and they must be initialized using the SCX_ENUM_INIT() macro. Fix the scx selftests to use this macro to properly initialize these values. Fixes: 8da7bf2 ("tools/sched_ext: Receive updates from SCX repo") Reported-by: Ihor Solodrai <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Andrea Righi <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
The commit 68f8305("workqueue: Reap workers via kthread_stop() and remove detach_completion") adds code to reap the normal workers but mistakenly does not handle the rescuer and also removes the code waiting for the rescuer in put_unbound_pool(), which caused a use-after-free bug reported by Cheung Wall. To avoid the use-after-free bug, the pool’s reference must be held until the detachment is complete. Therefore, move the code that puts the pwq after detaching the rescuer from the pool. Reported-by: cheung wall <[email protected]> Cc: cheung wall <[email protected]> Link: https://lore.kernel.org/lkml/CAKHoSAvP3iQW+GwmKzWjEAOoPvzeWeoMO0Gz7Pp3_4kxt-RMoA@mail.gmail.com/ Fixes: 68f8305("workqueue: Reap workers via kthread_stop() and remove detach_completion") Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
dsp_local_on has several incorrect assumptions, one of which is that p->nr_cpus_allowed always tracks p->cpus_ptr. This is not true when a task is scheduled out while migration is disabled - p->cpus_ptr is temporarily overridden to the previous CPU while p->nr_cpus_allowed remains unchanged. This led to sporadic test faliures when dsp_local_on_dispatch() tries to put a migration disabled task to a different CPU. Fix it by keeping the previous CPU when migration is disabled. There are SCX schedulers that make use of p->nr_cpus_allowed. They should also implement explicit handling for p->migration_disabled. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Ihor Solodrai <[email protected]> Cc: Andrea Righi <[email protected]> Cc: Changwoo Min <[email protected]>
scx_move_task() is called from sched_move_task() and tells the BPF scheduler that cgroup migration is being committed. sched_move_task() is used by both cgroup and autogroup migrations and scx_move_task() tried to filter out autogroup migrations by testing the destination cgroup and PF_EXITING but this is not enough. In fact, without explicitly tagging the thread which is doing the cgroup migration, there is no good way to tell apart scx_move_task() invocations for racing migration to the root cgroup and an autogroup migration. This led to scx_move_task() incorrectly ignoring a migration from non-root cgroup to an autogroup of the root cgroup triggering the following warning: WARNING: CPU: 7 PID: 1 at kernel/sched/ext.c:3725 scx_cgroup_can_attach+0x196/0x340 ... Call Trace: <TASK> cgroup_migrate_execute+0x5b1/0x700 cgroup_attach_task+0x296/0x400 __cgroup_procs_write+0x128/0x140 cgroup_procs_write+0x17/0x30 kernfs_fop_write_iter+0x141/0x1f0 vfs_write+0x31d/0x4a0 __x64_sys_write+0x72/0xf0 do_syscall_64+0x82/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fix it by adding an argument to sched_move_task() that indicates whether the moving is for a cgroup or autogroup migration. After the change, scx_move_task() is called only for cgroup migrations and renamed to scx_cgroup_move_task(). Link: sched-ext/scx#370 Fixes: 8195136 ("sched_ext: Add cgroup support") Cc: [email protected] # v6.12+ Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
Introduce a new helper for BPF schedulers to determine whether a task can migrate or not (supporting both SMP and UP systems). Fixes: e9fe182 ("sched_ext: selftests/dsp_local_on: Fix sporadic failures") Signed-off-by: Andrea Righi <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
In UP systems p->migration_disabled is not available. Fix this by using the portable helper is_migration_disabled(p). Fixes: e9fe182 ("sched_ext: selftests/dsp_local_on: Fix sporadic failures") Signed-off-by: Andrea Righi <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
While performing the rq locking dance in dispatch_to_local_dsq(), we may trigger the following lock imbalance condition, in particular when multiple tasks are rapidly changing CPU affinity (i.e., running a `stress-ng --race-sched 0`): [ 13.413579] ===================================== [ 13.413660] WARNING: bad unlock balance detected! [ 13.413729] 6.13.0-virtme #15 Not tainted [ 13.413792] ------------------------------------- [ 13.413859] kworker/1:1/80 is trying to release lock (&rq->__lock) at: [ 13.413954] [<ffffffff873c6c48>] dispatch_to_local_dsq+0x108/0x1a0 [ 13.414111] but there are no more locks to release! [ 13.414176] [ 13.414176] other info that might help us debug this: [ 13.414258] 1 lock held by kworker/1:1/80: [ 13.414318] #0: ffff8b66feb41698 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x20/0x90 [ 13.414612] [ 13.414612] stack backtrace: [ 13.415255] CPU: 1 UID: 0 PID: 80 Comm: kworker/1:1 Not tainted 6.13.0-virtme #15 [ 13.415505] Workqueue: 0x0 (events) [ 13.415567] Sched_ext: dsp_local_on (enabled+all), task: runnable_at=-2ms [ 13.415570] Call Trace: [ 13.415700] <TASK> [ 13.415744] dump_stack_lvl+0x78/0xe0 [ 13.415806] ? dispatch_to_local_dsq+0x108/0x1a0 [ 13.415884] print_unlock_imbalance_bug+0x11b/0x130 [ 13.415965] ? dispatch_to_local_dsq+0x108/0x1a0 [ 13.416226] lock_release+0x231/0x2c0 [ 13.416326] _raw_spin_unlock+0x1b/0x40 [ 13.416422] dispatch_to_local_dsq+0x108/0x1a0 [ 13.416554] flush_dispatch_buf+0x199/0x1d0 [ 13.416652] balance_one+0x194/0x370 [ 13.416751] balance_scx+0x61/0x1e0 [ 13.416848] prev_balance+0x43/0xb0 [ 13.416947] __pick_next_task+0x6b/0x1b0 [ 13.417052] __schedule+0x20d/0x1740 This happens because dispatch_to_local_dsq() is racing with dispatch_dequeue() and, when the latter wins, we incorrectly assume that the task has been moved to dst_rq. Fix by properly tracking the currently locked rq. Fixes: 4d3ca89 ("sched_ext: Refactor consume_remote_task()") Signed-off-by: Andrea Righi <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
Tejun reported the following race between fork() and cgroup.kill at [1]. Tejun: I was looking at cgroup.kill implementation and wondering whether there could be a race window. So, __cgroup_kill() does the following: k1. Set CGRP_KILL. k2. Iterate tasks and deliver SIGKILL. k3. Clear CGRP_KILL. The copy_process() does the following: c1. Copy a bunch of stuff. c2. Grab siglock. c3. Check fatal_signal_pending(). c4. Commit to forking. c5. Release siglock. c6. Call cgroup_post_fork() which puts the task on the css_set and tests CGRP_KILL. The intention seems to be that either a forking task gets SIGKILL and terminates on c3 or it sees CGRP_KILL on c6 and kills the child. However, I don't see what guarantees that k3 can't happen before c6. ie. After a forking task passes c5, k2 can take place and then before the forking task reaches c6, k3 can happen. Then, nobody would send SIGKILL to the child. What am I missing? This is indeed a race. One way to fix this race is by taking cgroup_threadgroup_rwsem in write mode in __cgroup_kill() as the fork() side takes cgroup_threadgroup_rwsem in read mode from cgroup_can_fork() to cgroup_post_fork(). However that would be heavy handed as this adds one more potential stall scenario for cgroup.kill which is usually called under extreme situation like memory pressure. To fix this race, let's maintain a sequence number per cgroup which gets incremented on __cgroup_kill() call. On the fork() side, the cgroup_can_fork() will cache the sequence number locally and recheck it against the cgroup's sequence number at cgroup_post_fork() site. If the sequence numbers mismatch, it means __cgroup_kill() can been called and we should send SIGKILL to the newly created task. Reported-by: Tejun Heo <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ [1] Fixes: 661ee62 ("cgroup: introduce cgroup.kill") Cc: [email protected] # v5.14+ Signed-off-by: Shakeel Butt <[email protected]> Reviewed-by: Michal Koutný <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
When (s64)(after - before) > 0, the code returns the result of (s64)(after - before) > 0 while the intended result should be (s64)(after - before). That happens because the middle operand of the ternary operator was omitted incorrectly, returning the result of (s64)(after - before) > 0. Thus, add the middle operand -- (s64)(after - before) -- to return the correct time calculation. Fixes: d07be81 ("sched_ext: Add time helpers for BPF schedulers") Signed-off-by: Changwoo Min <[email protected]> Acked-by: Andrea Righi <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
Add the missing code to allocate P4D level page tables when cloning the the kernel page tables. This fixes a crash that may be observed when attempting to resume from hibernation on an LPA2 capable system with 4k pages, which therefore uses 5 levels of paging. Presumably, kexec is equally affected. Signed-off-by: Ard Biesheuvel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
Commit a3ed415 ("fgraph: Replace fgraph_ret_regs with ftrace_regs") replaces the config HAVE_FUNCTION_GRAPH_RETVAL with the config HAVE_FUNCTION_GRAPH_FREGS, and it replaces all the select commands in the various architecture Kconfig files. In the arm64 architecture, the commit adds the 'select HAVE_FUNCTION_GRAPH_FREGS', but misses to remove the 'select HAVE_FUNCTION_GRAPH_RETVAL', i.e., the select on the replaced config. Remove selecting the replaced config. No functional change, just cleanup. Fixes: a3ed415 ("fgraph: Replace fgraph_ret_regs with ftrace_regs") Signed-off-by: Lukas Bulwahn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
In one of the renumberings of the GCS hwcap a stray reference to HWCAP2 was left, fix it. Reported-by: David Spickett <[email protected]> Fixes: 7058bf8 ("arm64/gcs: Document the ABI for Guarded Control Stacks") Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
Due to SME currently being disabled when removing the SF8MMx support it wasn't noticed that there were some stray references in the hwcap table, delete them. Fixes: 8199354 ("arm64/hwcap: Describe 2024 dpISA extensions to userspace") Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
The script uses non-POSIX features like `[[` for conditionals and hence does not work when run with a POSIX /bin/sh. Change the shebang to /bin/bash instead, like the other tests in cgroup. Signed-off-by: Bharadwaj Raju <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
A recent LLVM commit [1] started generating an .ARM.attributes section similar to the one that exists for 32-bit, which results in orphan section warnings (or errors if CONFIG_WERROR is enabled) from the linker because it is not handled in the arm64 linker scripts. ld.lld: error: arch/arm64/kernel/vdso/vgettimeofday.o:(.ARM.attributes) is being placed in '.ARM.attributes' ld.lld: error: arch/arm64/kernel/vdso/vgetrandom.o:(.ARM.attributes) is being placed in '.ARM.attributes' ld.lld: error: vmlinux.a(lib/vsprintf.o):(.ARM.attributes) is being placed in '.ARM.attributes' ld.lld: error: vmlinux.a(lib/win_minmax.o):(.ARM.attributes) is being placed in '.ARM.attributes' ld.lld: error: vmlinux.a(lib/xarray.o):(.ARM.attributes) is being placed in '.ARM.attributes' Discard the new sections in the necessary linker scripts to resolve the warnings, as the kernel and vDSO do not need to retain it, similar to the .note.gnu.property section. Cc: [email protected] Fixes: b3e5d80 ("arm64/build: Warn on orphan section placement") Link: llvm/llvm-project@ee99c4d [1] Signed-off-by: Nathan Chancellor <[email protected]> Link: https://lore.kernel.org/r/20250206-arm64-handle-arm-attributes-in-linker-script-v3-1-d53d169913eb@kernel.org Signed-off-by: Will Deacon <[email protected]>
The loop that detects/populates cache information already has a bounds check on the array size but does not account for cache levels with separate data/instructions cache. Fix this by incrementing the index for any populated leaf (instead of any populated level). Fixes: 5d425c1 ("arm64: kernel: add support for cpu cache information") Signed-off-by: Radu Rendec <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
The CPU usage time is the time when user, system or both are using the CPU. Steal time is the time when CPU is waiting to be run by the Hypervisor. It should not be added to the CPU usage time, hence removing it from the usage_usec entry. Fixes: 936f2a7 ("cgroup: add cpu.stat file to root cgroup") Acked-by: Axel Busch <[email protected]> Acked-by: Michal Koutný <[email protected]> Signed-off-by: Muhammad Adeel <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
Migration disabled tasks are special and pinned to their previous CPUs. They tripped up some unsuspecting BPF schedulers as their ->nr_cpus_allowed may not agree with the bits set in ->cpus_ptr. Make it easier for BPF schedulers by automatically dispatching them to the pinned local DSQs by default. If a BPF scheduler wants to handle migration disabled tasks explicitly, it can set SCX_OPS_ENQ_MIGRATION_DISABLED. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Andrea Righi <[email protected]>
A dispatch operation that can target a specific local DSQ - scx_bpf_dsq_move_to_local() or scx_bpf_dsq_move() - checks whether the task can be migrated to the target CPU using task_can_run_on_remote_rq(). If the task can't be migrated to the targeted CPU, it is bounced through a global DSQ. task_can_run_on_remote_rq() assumes that the task is on a CPU that's different from the targeted CPU but the callers doesn't uphold the assumption and may call the function when the task is already on the target CPU. When such task has migration disabled, task_can_run_on_remote_rq() ends up returning %false incorrectly unnecessarily bouncing the task to a global DSQ. Fix it by updating the callers to only call task_can_run_on_remote_rq() when the task is on a different CPU than the target CPU. As this is a bit subtle, for clarity and documentation: - Make task_can_run_on_remote_rq() trigger SCHED_WARN_ON() if the task is on the same CPU as the target CPU. - is_migration_disabled() test in task_can_run_on_remote_rq() cannot trigger if the task is on a different CPU than the target CPU as the preceding task_allowed_on_cpu() test should fail beforehand. Convert the test into SCHED_WARN_ON(). Signed-off-by: Tejun Heo <[email protected]> Fixes: 4c30f5c ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()") Fixes: 0366017 ("sched_ext: Use task_can_run_on_remote_rq() test in dispatch_to_local_dsq()") Cc: [email protected] # v6.12+
The iopf_queue_remove_device() helper removes a device from the per-iommu iopf queue when PRI is disabled on the device. It responds to all outstanding iopf's with an IOMMU_PAGE_RESP_INVALID code and detaches the device from the queue. However, it fails to release the group structure that represents a group of iopf's awaiting for a response after responding to the hardware. This can cause a memory leak if iopf_queue_remove_device() is called with pending iopf's. Fix it by calling iopf_free_group() after the iopf group is responded. Fixes: 1991123 ("iommu: Track iopf group instead of last fault") Cc: [email protected] Suggested-by: Kevin Tian <[email protected]> Signed-off-by: Lu Baolu <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
… task_can_run_on_remote_rq() While fixing migration disabled task handling, 3296682 ("sched_ext: Fix migration disabled handling in targeted dispatches") assumed that a migration disabled task's ->cpus_ptr would only have the pinned CPU. While this is eventually true for migration disabled tasks that are switched out, ->cpus_ptr update is performed by migrate_disable_switch() which is called right before context_switch() in __scheduler(). However, the task is enqueued earlier during pick_next_task() via put_prev_task_scx(), so there is a race window where another CPU can see the task on a DSQ. If the CPU tries to dispatch the migration disabled task while in that window, task_allowed_on_cpu() will succeed and task_can_run_on_remote_rq() will subsequently trigger SCHED_WARN(is_migration_disabled()). WARNING: CPU: 8 PID: 1837 at kernel/sched/ext.c:2466 task_can_run_on_remote_rq+0x12e/0x140 Sched_ext: layered (enabled+all), task: runnable_at=-10ms RIP: 0010:task_can_run_on_remote_rq+0x12e/0x140 ... <TASK> consume_dispatch_q+0xab/0x220 scx_bpf_dsq_move_to_local+0x58/0xd0 bpf_prog_84dd17b0654b6cf0_layered_dispatch+0x290/0x1cfa bpf__sched_ext_ops_dispatch+0x4b/0xab balance_one+0x1fe/0x3b0 balance_scx+0x61/0x1d0 prev_balance+0x46/0xc0 __pick_next_task+0x73/0x1c0 __schedule+0x206/0x1730 schedule+0x3a/0x160 __do_sys_sched_yield+0xe/0x20 do_syscall_64+0xbb/0x1e0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fix it by converting the SCHED_WARN() back to a regular failure path. Also, perform the migration disabled test before task_allowed_on_cpu() test so that BPF schedulers which fail to handle migration disabled tasks can be noticed easily. While at it, adjust scx_ops_error() message for !task_allowed_on_cpu() case for brevity and consistency. Signed-off-by: Tejun Heo <[email protected]> Fixes: 3296682 ("sched_ext: Fix migration disabled handling in targeted dispatches") Acked-by: Andrea Righi <[email protected]> Reported-by: Jake Hillion <[email protected]>
For the time being, the amu_fie_cpus cpumask is being exclusively used by the AMU-related internals of FIE support and is guaranteed to be valid on every access currently made. Still the mask is not being invalidated on one of the error handling code paths, which leaves a soft spot with theoretical risk of UAF for CPUMASK_OFFSTACK cases. To make things sound, delay allocating said cpumask (for CPUMASK_OFFSTACK) avoiding otherwise nasty sanitising case failing to register the cpufreq policy notifications. Signed-off-by: Beata Michalska <[email protected]> Reviewed-by: Prasanna Kumar T S M <[email protected]> Reviewed-by: Sumit Gupta <[email protected]> Reviewed-by: Sudeep Holla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
Perhaps unsurprisingly there are some platforms where the GTDT isn't quite right and the Platforms Timer array overflows the length of the overall table. While the recently-added sanity checking isn't wrong, it makes it impossible to boot the kernel on offending platforms. Try to hobble along and limit the Platform Timer count to the bounds of the table. Cc: Marc Zyngier <[email protected]> Cc: Lorenzo Pieralisi <[email protected]> Cc: Zheng Zengkai <[email protected]> Cc: [email protected] Fixes: 263e22d ("ACPI: GTDT: Tighten the check for the array of platform timer structures") Signed-off-by: Oliver Upton <[email protected]> Acked-by: Marc Zyngier <[email protected]> Reviewed-by: Lorenzo Pieralisi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
Commit 8199354 ("arm64/hwcap: Describe 2024 dpISA extensions to userspace") added definitions for HWCAP_FPRCVT, HWCAP_F8MM8 and HWCAP_F8MM4 but did not include the crucial registration in arm64_elf_hwcaps. Add it. Fixes: 8199354 ("arm64/hwcap: Describe 2024 dpISA extensions to userspace") Reported-by: Mark Rutland <[email protected]> Signed-off-by: Mark Brown <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
Starting with Rust 1.85.0 (to be released 2025-02-20), `rustc` warns [1] about disabling neon in the aarch64 hardfloat target: warning: target feature `neon` cannot be toggled with `-Ctarget-feature`: unsound on hard-float targets because it changes float ABI | = note: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release! = note: for more information, see issue #116344 <rust-lang/rust#116344> Thus, instead, use the softfloat target instead. While trying it out, I found that the kernel sanitizers were not enabled for that built-in target [2]. Upstream Rust agreed to backport the enablement for the current beta so that it is ready for the 1.85.0 release [3] -- thanks! However, that still means that before Rust 1.85.0, we cannot switch since sanitizers could be in use. Thus conditionally do so. Cc: [email protected] # Needed in 6.12.y and 6.13.y only (Rust is pinned in older LTSs). Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Matthew Maurer <[email protected]> Cc: Alice Ryhl <[email protected]> Cc: Ralf Jung <[email protected]> Cc: Jubilee Young <[email protected]> Link: rust-lang/rust#133417 [1] Link: https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/arm64.20neon.20.60-Ctarget-feature.60.20warning/near/495358442 [2] Link: rust-lang/rust#135905 [3] Link: rust-lang/rust#116344 Signed-off-by: Miguel Ojeda <[email protected]> Reviewed-by: Trevor Gross <[email protected]> Tested-by: Matthew Maurer <[email protected]> Reviewed-by: Ralf Jung <[email protected]> Reviewed-by: Alice Ryhl <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
When mapping a buffer for DMA via .map_page or .map_sg DMA operations, there is no need to check the machine frames to be aligned according to the mapped areas size. All what is needed in these cases is that the buffer is contiguous at machine level. So carve out the alignment check from range_straddles_page_boundary() and move it to a helper called by xen_swiotlb_alloc_coherent() and xen_swiotlb_free_coherent() directly. Fixes: 9f40ec8 ("xen/swiotlb: add alignment check for dma buffers") Reported-by: Jan Vejvalka <[email protected]> Tested-by: Jan Vejvalka <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
Today a PV guest (including dom0) can create 2MB contiguous memory regions for DMA buffers at max. This has led to problems at least with the megaraid_sas driver, which wants to allocate a 2.3MB DMA buffer. The limiting factor is the frame array used to do the hypercall for making the memory contiguous, which has 512 entries and is just a static array in mmu_pv.c. In order to not waste memory for non-PV guests, put the initial frame array into .init.data section and dynamically allocate an array from the .init_after_bootmem hook of PV guests. In case a contiguous memory area larger than the initially supported 2MB is requested, allocate a larger buffer for the frame list. Note that such an allocation is tried only after memory management has been initialized properly, which is tested via a flag being set in the .init_after_bootmem hook. Fixes: 9f40ec8 ("xen/swiotlb: add alignment check for dma buffers") Signed-off-by: Juergen Gross <[email protected]> Tested-by: Alan Robinson <[email protected]> Reviewed-by: Jan Beulich <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
It's sole user (pci_xen_swiotlb_init()) is __init, too. Signed-off-by: Jan Beulich <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Message-ID: <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
Fixed grammar for a few tests of sched_ext. Signed-off-by: Devaansh Kumar <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
Now BPF only supports bpf_list_push_{front,back}_impl kfunc, not bpf_list_ push_{front,back}. This patch fix this issue. Without this patch, if we use bpf_list kfunc in scx, the BPF verifier would complain: libbpf: extern (func ksym) 'bpf_list_push_back': not found in kernel or module BTFs libbpf: failed to load object 'scx_foo' libbpf: failed to load BPF skeleton 'scx_foo': -EINVAL With this patch, the bpf list kfunc will work as expected. Signed-off-by: Chuyi Zhou <[email protected]> Fixes: 2a52ca7 ("sched_ext: Add scx_simple and scx_example_qmap example schedulers") Signed-off-by: Tejun Heo <[email protected]>
Now when we use scx_bpf_task_cgroup() in ops.tick() to get the cgroup of the current task, the following error will occur: scx_foo[3795244] triggered exit kind 1024: runtime error (called on a task not being operated on) The reason is that we are using SCX_CALL_OP() instead of SCX_CALL_OP_TASK() when calling ops.tick(), which triggers the error during the subsequent scx_kf_allowed_on_arg_tasks() check. SCX_CALL_OP_TASK() was first introduced in commit 3645402 ("sched_ext: Track tasks that are subjects of the in-flight SCX operation") to ensure task's rq lock is held when accessing task's sched_group. Since ops.tick() is marked as SCX_KF_TERMINAL and task_tick_scx() is protected by the rq lock, we can use SCX_CALL_OP_TASK() to avoid the above issue. Similarly, the same changes should be made for ops.disable() and ops.exit_task(), as they are also protected by task_rq_lock() and it's safe to access the task's task_group. Fixes: 3645402 ("sched_ext: Track tasks that are subjects of the in-flight SCX operation") Signed-off-by: Chuyi Zhou <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
With recent kernel, AMDGPU failed to resume after suspend on certain laptop. Sample log: ----------- Nov 14 11:52:19 Thinkbook kernel: iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:06:00.0 pasid=0x00000 address=0x135300000 flags=0x0080] Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[0]: 7d90000000000003 Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[1]: 0000100103fc0009 Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[2]: 2000000117840013 Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[3]: 0000000000000000 This is because in resume path, CNTRL[EPHEn] is not set. Fix this by setting CNTRL[EPHEn] to 1 in resume path if EFR[EPHSUP] is set. Note May be better approach is to save the control register in suspend path and restore it in resume path instead of trying to set indivisual bits. We will have separate patch for that. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219499 Fixes: c4cb231 ("iommu/amd: Add support for enable/disable IOPF") Tested-by: Hamish McIntyre-Bhatty <[email protected]> Signed-off-by: Vasant Hegde <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
Fix spelling error IDENITY -> IDENTITY in drivers/iommu/iommu.c. Signed-off-by: Easwar Hariharan <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ joro: Add commit message ] Signed-off-by: Joerg Roedel <[email protected]>
There are some typos in comments/messages: - modyfying -> modifying - Unabled -> Unable Fix them via codespell. Signed-off-by: Andrew Kreimer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
This driver supports page faults on PCI RID since commit <9f831c16c69e> ("iommu/vt-d: Remove the pasid present check in prq_event_thread") by allowing the reporting of page faults with the pasid_present field cleared to the upper layer for further handling. The fundamental assumption here is that the detach or replace operations act as a fence for page faults. This implies that all pending page faults associated with a specific RID or PASID are flushed when a domain is detached or replaced from a device RID or PASID. However, the intel_iommu_drain_pasid_prq() helper does not correctly handle faults for RID. This leads to faults potentially remaining pending in the iommu hardware queue even after the domain is detached, thereby violating the aforementioned assumption. Fix this issue by extending intel_iommu_drain_pasid_prq() to cover faults for RID. Fixes: 9f831c1 ("iommu/vt-d: Remove the pasid present check in prq_event_thread") Cc: [email protected] Suggested-by: Kevin Tian <[email protected]> Signed-off-by: Lu Baolu <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Yi Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
…nux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Three fixes to xen-swiotlb driver: - two fixes for issues coming up due to another fix in 6.12 - addition of an __init annotation" * tag 'for-linus-6.14-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: Xen/swiotlb: mark xen_swiotlb_fixup() __init x86/xen: allow larger contiguous memory regions in PV guests xen/swiotlb: relax alignment requirements
…ux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: - core: fix potential memory leak in iopf_queue_remove_device() - Intel VT-d: handle faults correctly in intel_iommu_drain_pasid_prq() - AMD-Vi: fix faults happening in resume path - typo and spelling fixes * tag 'iommu-fixes-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu/vt-d: Make intel_iommu_drain_pasid_prq() cover faults for RID iommu/exynos: Fix typos iommu: Fix a spelling error iommu/amd: Expicitly enable CNTRL.EPHEn bit in resume path iommu: Fix potential memory leak in iopf_queue_remove_device()
…git/arm64/linux Pull arm64 fixes from Will Deacon: - Fix kexec and hibernation when using 5-level page-table configuration - Remove references to non-existent SF8MM4 and SF8MM8 ID register fields, hooking up hwcaps for the FPRCVT, F8MM4 and F8MM8 fields instead - Drop unused .ARM.attributes ELF sections - Fix array indexing when probing CPU cache topology from firmware - Fix potential use-after-free in AMU initialisation code - Work around broken GTDT entries by tolerating excessively large timer arrays - Force use of Rust's "softfloat" target to avoid a threatening warning about the NEON target feature - Typo fix in GCS documentation and removal of duplicate Kconfig select * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: rust: clean Rust 1.85.0 warning using softfloat target arm64: Add missing registrations of hwcaps ACPI: GTDT: Relax sanity checking on Platform Timers array count arm64: amu: Delay allocating cpumask for AMU FIE support arm64: cacheinfo: Avoid out-of-bounds write to cacheinfo array arm64: Handle .ARM.attributes section in linker scripts arm64/hwcap: Remove stray references to SF8MMx arm64/gcs: Fix documentation for HWCAP arm64: Kconfig: Remove selecting replaced HAVE_FUNCTION_GRAPH_RETVAL arm64: Fix 5-level paging support in kexec/hibernate trampoline
…ux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: - Fix a regression where a worker pool can be freed before rescuer workers are done with it leading to user-after-free * tag 'wq-for-6.14-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Put the pwq after detaching the rescuer from the pool
…/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Fix a race window where a newly forked task could escape cgroup.kill - Remove incorrectly included steal time from cpu.stat::usage_usec - Minor update in selftest * tag 'cgroup-for-6.14-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Remove steal time from usage_usec selftests/cgroup: use bash in test_cpuset_v1_hp.sh cgroup: fix race between fork and cgroup.kill
…scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - Fix lock imbalance in a corner case of dispatch_to_local_dsq() - Migration disabled tasks were confusing some BPF schedulers and its handling had a bug. Fix it and simplify the default behavior by dispatching them automatically - ops.tick(), ops.disable() and ops.exit_task() were incorrectly disallowing kfuncs that require the task argument to be the rq operation is currently operating on and thus is rq-locked. Allow them. - Fix autogroup migration handling bug which was occasionally triggering a warning in the cgroup migration path - tools/sched_ext, selftest and other misc updates * tag 'sched_ext-for-6.14-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Use SCX_CALL_OP_TASK in task_tick_scx sched_ext: Fix the incorrect bpf_list kfunc API in common.bpf.h. sched_ext: selftests: Fix grammar in tests description sched_ext: Fix incorrect assumption about migration disabled tasks in task_can_run_on_remote_rq() sched_ext: Fix migration disabled handling in targeted dispatches sched_ext: Implement auto local dispatching of migration disabled tasks sched_ext: Fix incorrect time delta calculation in time_delta() sched_ext: Fix lock imbalance in dispatch_to_local_dsq() sched_ext: selftests/dsp_local_on: Fix selftest on UP systems tools/sched_ext: Add helper to check task migration state sched_ext: Fix incorrect autogroup migration detection sched_ext: selftests/dsp_local_on: Fix sporadic failures selftests/sched_ext: Fix enum resolution sched_ext: Include task weight in the error state dump sched_ext: Fixes typos in comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )