Browsing by Subject "Multicore Processing"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Disabled Core Patterns and Core Defect Rates in Xeon Phi x200 ("Knights Landing") Processors(2021-10-18) McCalpin, John D.The Intel Xeon Phi x200 (“Knights Landing”, “KNL”) processor was Intel’s second-generation commercial many-core processor offering and the first offered as a standalone processor. Each processor die has 76 cores arranged in 38 pairs. Unlike Intel’s mainstream multicore processors, there were no product offerings with less than 84% of the cores enabled, making issues of yield critical. The Texas Advanced Computing Center deployed its 4200 Xeon Phi 7250 (68-core) processors in two phases: 504 nodes in June of 2016 and the remaining 3696 nodes in April 2017. Over 1100 different patterns of disabled cores are observed across the systems, with approximately 75% appearing only once. The most common pattern is seen in over 30% of nodes, with cores disabled at the tiles immediately above and below the two memory controllers. Interpreting these as the “default” cores to be disabled in the absence of defective cores allows disambiguation of cores that are disabled due to defects and those disabled to meet the target enabled core count. Analysis of the statistics of disabled cores in each of these two deployments supports the hypothesis that that core defects are random and independent, with a statistically significant reduction in the probability of defects between the first and second deployments.Item Mapping Addresses to L3/CHA Slices in Intel Processors(2021-09-10) McCalpin, John D.The distributed, shared L3 caches in Intel multicore processors are composed of “slices” (typically one “slice” per core), each assigned responsibility for a fraction of the address space. A high degree of interleaving of consecutive cache lines across the slices provides the appearance of a single cache resource shared by all cores. A family of undocumented hash functions is used to distribute addresses to slices, with different hash functions required for different numbers of slices. In all systems studied to date, the hash consists of a relatively short (16 to 16384 elements) “base sequence” of slice numbers, which is repeated with binary permutations for consecutive blocks of memory. The specific binary permutation used is selected by XOR-reductions of different subsets of the higher-order address bits. This report provides the base sequences and permutation select masks for Intel Xeon Scalable Processors (1st and 2nd generation) with 14, 16, 18, 20, 22, 24, 26, 28 slices, for 3rd Generation Intel Xeon Scalable Processors with 28 slices, and for Xeon Phi x200 processors with 38 slices.