Issue
When allocating physically-contiguous memory with alloc_pages_node
in Linux v6.0, the _refcount
in struct page
for all of the allocated pages is not incremented. Only the first page of the allocation has its _refcount
correctly incremented.
- Is this correct/intended behavior?
- Is this function only intended to be used in particular use cases/in a particular way such that the incorrect
_refcount
is accounted for?
Context: alloc_pages*
are a series of functions in the kernel intended for allocating a physically contiguous set of pages Documentation. These functions return a pointer to the struct page
corresponding to the first page of the allocated region.
I am using this function during early boot (in fact while setting up the stacks for the init
process and for kthreadd
).
By this point, the buddy-allocator is functional and usable.
Similar APIs (ignoring the need for physical contiguity) such as vmalloc
increment the _refcount
for all allocated pages.
This is the code I am running. The output is also listed below.
Code
order = get_order(nr_pages << PAGE_SIZE);
p = alloc_pages_node(node, gfp_mask, order);
if (!p)
return;
for(i = 0; i < nr_pages; i++, p++)
printk("_refcount = %d", p->_refcount);
Output
_refcount = 1
_refcount = 0
_refcount = 0
...
Arguments
gfp_mask
is (THREADINFO_GFP & ~__GFP_ACCOUNT) | __GFP_NOWARN | __GFP_HIGHMEM
.
- The first part
THREADINFO_GFP & ~__GFP_ACCOUNT
of this is sent byalloc_thread_stack_node
__vmalloc_area_node
adds__GFP_NOWARN | __GFP_HIGHMEM
order = get_order(nr_pages << PAGE_SIZE) = 2
since nr_pages
is 4.
Solution
Is this correct/intended behavior?
Yes, this is normal. Page allocations of order higher than 0 are effectively considered as a single "high-order" page(1) by the the buddy allocator, so functions such as alloc_pages()
and __free_pages()
, which operate on both order-0 and high-order pages, only care about the reference count of the first page.
Upon allocation (alloc_pages
), only the first struct page
of the group gets its refcount initialized. Upon deallocation (__free_pages
), the refcount of the first page is decremented and tested: if it reaches zero, the whole group of pages gets actually freed(2). When this happens, a sanity check is also performed on every single page to ensure that the reference count is zero.
If you intend to allocate multiple pages at once, but then manage them separately, you will need to split them using split_page()
, which effectively "enables" reference counting for every single struct page
and initializes its refcount to 1. You can then use __free_pages(p, 0)
(or __free_page()
) on each page separately.(3)
Similar APIs (ignoring the need for physical contiguity) such as
vmalloc
increment the_refcount
for all allocated pages.
Whether to allocate single order-0 pages or do a higher-order allocationis is a choice that depends on the semantics of the specific memory allocation API. Problem is, these semantics can often change based on the actual API usage in kernel code(4). Indeed as of now vmalloc()
splits the high-order page obtained from alloc_pages()
using split_page()
, but this was only a recent change done because some of its callers were relying on the allocated pages to be independent (e.g., doing their own reference counting).
(1) Not to be confused with compound pages, although their refcounting is performed in the same way, i.e. only the first page (PageHead()
) is refcounted.
(2) It is actually a little bit more complex than that, all pages except the first are freed regardless of the refcount of the first, to avoid memory leaks in rare situations, see this relevant commit. The refcount sanity check on all the freed pages is done anyway.
(3) Note that allocating high-order pages and then splitting them into order-0 pages is generally not a good idea, as you can guess from the comment on top of split_pages()
: "Note: this is probably too low level an operation for use in drivers. Please consult with lkml before using this in your driver." - This is because high-order allocations are harder to satisfy than order-0 allocations, and breaking high-order page blocks only makes it even harder.
(4) Welcome to the magic world of kernel APIs I guess. Much like Hogwarts' staircases, they like to change.
Answered By - Marco Bonelli Answer Checked By - Cary Denson (WPSolving Admin)