New Organization of NUMA nodes and Memory
Memory children
In hwloc v1.x, NUMA nodes were inside the tree, for instance Packages contained 2 NUMA nodes which contained a L3 and several cache.
Starting with hwloc v2.0, NUMA nodes are not in the main tree anymore. They are attached under objects as Memory Children on the side of normal children. This memory children list starts at obj->memory_first_child
and its size is obj->memory_arity
. Hence there can now exist two local NUMA nodes, for instance on Intel Xeon Phi processors.
The normal list of children (starting at obj->first_child
, ending at obj->last_child
, of size obj->arity
, and available as the array obj->children
) now only contains CPU-side objects: PUs, Cores, Packages, Caches, Groups, Machine and System. hwloc_get_next_child() may still be used to iterate over all children of all lists.
Hence the CPU-side hierarchy is built using normal children, while memory is attached to that hierarchy depending on its affinity.
Examples
-
a UMA machine with 2 packages and a single NUMA node is now modeled as a "Machine" object with two "Package" children and one "NUMANode" memory children (displayed first in lstopo below):
Machine (1024MB total)
NUMANode L#0 (P#0 1024MB)
Package L#0
Core L#0 + PU L#0 (P#0)
Core L#1 + PU L#1 (P#1)
Package L#1
Core L#2 + PU L#2 (P#2)
Core L#3 + PU L#3 (P#3)
-
a machine with 2 packages with one NUMA node and 2 cores in each is now:
Machine (2048MB total)
Package L#0
NUMANode L#0 (P#0 1024MB)
Core L#0 + PU L#0 (P#0)
Core L#1 + PU L#1 (P#1)
Package L#1
NUMANode L#1 (P#1 1024MB)
Core L#2 + PU L#2 (P#2)
Core L#3 + PU L#3 (P#3)
-
if there are two NUMA nodes per package, a Group object may be added to keep cores together with their local NUMA node:
Machine (4096MB total)
Package L#0
Group0 L#0
NUMANode L#0 (P#0 1024MB)
Core L#0 + PU L#0 (P#0)
Core L#1 + PU L#1 (P#1)
Group0 L#1
NUMANode L#1 (P#1 1024MB)
Core L#2 + PU L#2 (P#2)
Core L#3 + PU L#3 (P#3)
Package L#1
[...]
-
if the platform has L3 caches whose localities are identical to NUMA nodes, Groups aren't needed:
Machine (4096MB total)
Package L#0
L3 L#0 (16MB)
NUMANode L#0 (P#0 1024MB)
Core L#0 + PU L#0 (P#0)
Core L#1 + PU L#1 (P#1)
L3 L#1 (16MB)
NUMANode L#1 (P#1 1024MB)
Core L#2 + PU L#2 (P#2)
Core L#3 + PU L#3 (P#3)
Package L#1
[...]
NUMA level and depth
NUMA nodes are not in "main" tree of normal objects anymore. Hence, they don't have a meaningful depth anymore (like I/O and Misc objects). They have a virtual (negative) depth (HWLOC_TYPE_DEPTH_NUMANODE) so that functions manipulating depths and level still work, and so that we can still iterate over the level of NUMA nodes just like for any other level.
For instance we can still use lines such as
int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE);
hwloc_obj_t obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NUMANODE, 4);
hwloc_obj_t node = hwloc_get_next_obj_by_depth(topology, HWLOC_TYPE_DEPTH_NUMANODE, prev);
The NUMA depth should not be compared with others. An unmodified code that still compares NUMA and Package depths (to find out whether Packages contain NUMA or the contrary) would now always assume Packages contain NUMA (because the NUMA depth is negative).
However, the depth of the Normal parents of NUMA nodes may be used instead. In the last example above, NUMA nodes are attached to L3 caches, hence one may compare the depth of Packages and L3 to find out that NUMA nodes are contained in Packages. This depth of parents may be retrieved with hwloc_get_memory_parents_depth(). However, this function may return HWLOC_TYPE_DEPTH_MULTIPLE on future platforms if NUMA nodes are attached to different levels.
Finding Local NUMA nodes and looking at Children and Parents
Applications that walked up/down to find NUMANode parent/children must now be updated. Instead of looking directly for a NUMA node, one should now look for an object that has some memory children. NUMA node(s) will be be attached there. For instance, when looking for a NUMA node above a given core core
:
hwloc_obj_t parent = core->parent;
while (parent && !parent->memory_arity)
parent = parent->parent; /* no memory child, walk up */
if (parent)
/* use parent->memory_first_child (and its siblings if there are multiple local NUMA nodes) */
The list of local NUMA nodes (usually a single one) is also described by the nodeset
attribute of each object (which contains the physical indexes of these nodes). Iterating over the NUMA level is also an easy way to find local NUMA nodes:
hwloc_obj_t tmp = NULL;
while ((tmp = hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_NUMANODE, tmp)) != NULL) {
if (hwloc_bitmap_isset(obj->nodeset, tmp->os_index))
/* tmp is a NUMA node local to obj, use it */
}
Similarly finding objects that are close to a given NUMA nodes should be updated too. Instead of looking at the NUMA node parents/children, one should now find a Normal parent above that NUMA node, and then look at its parents/children as usual:
hwloc_obj_t tmp = obj->parent;
while (hwloc_obj_type_is_memory(tmp))
tmp = tmp->parent;
/* now use tmp instead of obj */
To avoid such hwloc v2.x-specific and NUMA-specific cases in the code, a generic lookup for any kind of object, including NUMA nodes, might also be implemented by iterating over a level. For instance finding an object of type type
which either contains or is included in object obj
can be performed by traversing the level of that type and comparing CPU sets:
hwloc_obj_t tmp = NULL;
while ((tmp = hwloc_get_next_obj_by_type(topology, type, tmp)) != NULL) {
if (hwloc_bitmap_intersects(tmp->cpuset, obj->cpuset))
/* tmp matches, use it */
}
This generic lookup works whenever type
or obj
are Normal or Memory objects since both have CPU sets. Moreover, it is compatible with the hwloc v1.x API.
4 Kinds of Objects and Children
I/O and Misc children
I/O children are not in the main object children list anymore either. They are in the list starting at obj->io_first_child
and whose size if obj->io_arity
.
Misc children are not in the main object children list anymore. They are in the list starting at obj->misc_first_child
nd whose size if obj->misc_arity
.
See hwloc_obj for details about children lists.
hwloc_get_next_child() may still be used to iterate over all children of all lists.
Kinds of objects
Given the above, objects may now be of 4 kinds:
-
Normal (everything not listed below, including Machine, Package, Core, PU, CPU Caches, etc);
-
Memory (currently NUMA nodes or Memory-side Caches), attached to parents as Memory children;
-
I/O (Bridges, PCI and OS devices), attached to parents as I/O children;
-
Misc objects, attached to parents as Misc children.
See hwloc_obj for details about children lists.
For a given object type, the kind may be found with hwloc_obj_type_is_normal(), hwloc_obj_type_is_memory(), hwloc_obj_type_is_normal(), or comparing with HWLOC_OBJ_MISC.
Normal and Memory objects have (non-NULL) CPU sets and nodesets, while I/O and Misc objects don't have any sets (they are NULL).
allowed_cpuset and allowed_nodeset only in the main topology
Objects do not have allowed_cpuset
and allowed_nodeset
anymore. They are only available for the entire topology using hwloc_topology_get_allowed_cpuset() and hwloc_topology_get_allowed_nodeset().
As usual, those are only needed when the INCLUDE_DISALLOWED topology flag is given, which means disallowed objects are kept in the topology. If so, one may find out whether some PUs inside an object is allowed by checking
hwloc_bitmap_intersects(obj->cpuset, hwloc_topology_get_allowed_cpuset(topology))
Replace cpusets with nodesets for NUMA nodes. To find out which ones, replace intersects() with and() to get the actual intersection.
Memory attributes become NUMANode-specific
Memory attributes such as obj->memory.local_memory
are now only available in NUMANode-specific attributes in obj->attr->numanode.local_memory
.
obj->memory.total_memory
is available in all objects as obj->total_memory
.
See hwloc_obj_attr_u::hwloc_numanode_attr_s and hwloc_obj for details.
Topology configuration changes
The old ignoring API as well as several configuration flags are replaced with the new filtering API, see hwloc_topology_set_type_filter() and its variants, and hwloc_type_filter_e for details.
-
hwloc_topology_ignore_type(), hwloc_topology_ignore_type_keep_structure() and hwloc_topology_ignore_all_keep_structure() are respectively superseded by
hwloc_topology_set_type_filter(topology, type, HWLOC_TYPE_FILTER_KEEP_NONE);
hwloc_topology_set_type_filter(topology, type, HWLOC_TYPE_FILTER_KEEP_STRUCTURE);
hwloc_topology_set_all_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_STRUCTURE);
Also, the meaning of KEEP_STRUCTURE has changed (only entire levels may be ignored, instead of single objects), the old behavior is not available anymore.
-
HWLOC_TOPOLOGY_FLAG_ICACHES is superseded by
hwloc_topology_set_icache_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_ALL);
-
HWLOC_TOPOLOGY_FLAG_WHOLE_IO, HWLOC_TOPOLOGY_FLAG_IO_DEVICES and HWLOC_TOPOLOGY_FLAG_IO_BRIDGES replaced.
To keep all I/O devices (PCI, Bridges, and OS devices), use:
hwloc_topology_set_io_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_ALL);
To only keep important devices (Bridges with children, common PCI devices and OS devices):
hwloc_topology_set_io_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_IMPORTANT);
XML changes
2.0 XML files are not compatible with 1.x
2.0 can load 1.x files, but only NUMA distances are imported. Other distance matrices are ignored (they were never used by default anyway).
2.0 can export 1.x-compatible files, but only distances attached to the root object are exported (i.e. distances that cover the entire machine). Other distance matrices are dropped (they were never used by default anyway).
Users are advised to negociate hwloc versions between exporter and importer: If the importer isn't 2.x, the exporter should export to 1.x. Otherwise, things should work by default.
Hence hwloc_topology_export_xml() and hwloc_topology_export_xmlbuffer() have a new flags argument. to force a hwloc-1.x-compatible XML export.
-
If both always support 2.0, don't pass any flag.
-
When the importer uses hwloc 1.x, export with HWLOC_TOPOLOGY_EXPORT_XML_FLAG_V1. Otherwise the importer will fail to import.
-
When the exporter uses hwloc 1.x, it cannot pass any flag, and a 2.0 importer can import without problem.
#if HWLOC_API_VERSION >= 0x20000
if (need 1.x compatible XML export)
hwloc_topology_export_xml(...., HWLOC_TOPOLOGY_EXPORT_XML_FLAG_V1);
else /* need 2.x compatible XML export */
hwloc_topology_export_xml(...., 0);
#else
hwloc_topology_export_xml(....);
#endif
Additionally, hwloc_topology_diff_load_xml(), hwloc_topology_diff_load_xmlbuffer(), hwloc_topology_diff_export_xml(), hwloc_topology_diff_export_xmlbuffer() and hwloc_topology_diff_destroy() lost the topology argument: The first argument (topology) isn't needed anymore.
Distances API totally rewritten
The new distances API is in hwloc/distances.h.
Distances are not accessible directly from objects anymore. One should first call hwloc_distances_get() (or a variant) to retrieve distances (possibly with one call to get the number of available distances structures, and another call to actually get them). Then it may consult these structures, and finally release them.
The set of object involved in a distances structure is specified by an array of objects, it may not always cover the entire machine or so.
Return values of functions
Bitmap functions (and a couple other functions) can return errors (in theory).
Most bitmap functions may have to reallocate the internal bitmap storage. In v1.x, they would silently crash if realloc failed. In v2.0, they now return an int that can be negative on error. However, the preallocated storage is 512 bits, hence realloc will not even be used unless you run hwloc on machines with larger PU or NUMAnode indexes.
hwloc_obj_add_info(), hwloc_cpuset_from_nodeset() and hwloc_cpuset_from_nodeset() also return an int, which would be -1 in case of allocation errors.
API removals and deprecations
-
HWLOC_OBJ_SYSTEM removed: The root object is always HWLOC_OBJ_MACHINE
-
_membind_nodeset() memory binding interfaces deprecated: One should use the variant without _nodeset suffix and pass the HWLOC_MEMBIND_BYNODESET flag.
-
HWLOC_MEMBIND_REPLICATE removed: no supported operating system supports it anymore.
-
hwloc_obj_snprintf() removed because it was long-deprecated by hwloc_obj_type_snprintf() and hwloc_obj_attr_snprintf().
-
hwloc_obj_type_sscanf() deprecated, hwloc_obj_type_of_string() removed.
-
hwloc_cpuset_from/to_nodeset_strict() deprecated: Now useless since all topologies are NUMA. Use the variant without the _strict suffix
-
hwloc_distribute() and hwloc_distributev() removed, deprecated by hwloc_distrib().
-
The Custom interface (hwloc_topology_set_custom(), etc.) was removed, as well as the corresponding command-line tools (hwloc-assembler, etc.). Topologies always start with object with valid cpusets and nodesets.
-
obj->online_cpuset
removed: Offline PUs are simply listed in the complete_cpuset
as previously.
-
obj->os_level
removed.