mirror of
https://git.FreeBSD.org/src.git
synced 2024-12-21 11:13:30 +00:00
1577 lines
63 KiB
Groff
1577 lines
63 KiB
Groff
.\" Copyright (c) 2010 Fabien Thomas. All rights reserved.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" $FreeBSD$
|
|
.\"
|
|
.Dd March 24, 2010
|
|
.Dt PMC.COREI7 3
|
|
.Os
|
|
.Sh NAME
|
|
.Nm pmc.corei7
|
|
.Nd measurement events for
|
|
.Tn Intel
|
|
.Tn Core i7 and Xeon 5500
|
|
family CPUs
|
|
.Sh LIBRARY
|
|
.Lb libpmc
|
|
.Sh SYNOPSIS
|
|
.In pmc.h
|
|
.Sh DESCRIPTION
|
|
.Tn Intel
|
|
.Tn "Core i7"
|
|
CPUs contain PMCs conforming to version 2 of the
|
|
.Tn Intel
|
|
performance measurement architecture.
|
|
These CPUs may contain up to three classes of PMCs:
|
|
.Bl -tag -width "Li PMC_CLASS_IAP"
|
|
.It Li PMC_CLASS_IAF
|
|
Fixed-function counters that count only one hardware event per counter.
|
|
.It Li PMC_CLASS_IAP
|
|
Programmable counters that may be configured to count one of a defined
|
|
set of hardware events.
|
|
.El
|
|
.Pp
|
|
The number of PMCs available in each class and their widths need to be
|
|
determined at run time by calling
|
|
.Xr pmc_cpuinfo 3 .
|
|
.Pp
|
|
Intel Core i7 and Xeon 5500 PMCs are documented in
|
|
.Rs
|
|
.%B "Intel(R) 64 and IA-32 Architectures Software Developes Manual"
|
|
.%T "Volume 3B: System Programming Guide, Part 2"
|
|
.%N "Order Number: 253669-033US"
|
|
.%D December 2009
|
|
.%Q "Intel Corporation"
|
|
.Re
|
|
.Ss COREI7 AND XEON 5500 FIXED FUNCTION PMCS
|
|
These PMCs and their supported events are documented in
|
|
.Xr pmc.iaf 3 .
|
|
Not all CPUs in this family implement fixed-function counters.
|
|
.Ss COREI7 AND XEON 5500 PROGRAMMABLE PMCS
|
|
The programmable PMCs support the following capabilities:
|
|
.Bl -column "PMC_CAP_INTERRUPT" "Support"
|
|
.It Em Capability Ta Em Support
|
|
.It PMC_CAP_CASCADE Ta \&No
|
|
.It PMC_CAP_EDGE Ta Yes
|
|
.It PMC_CAP_INTERRUPT Ta Yes
|
|
.It PMC_CAP_INVERT Ta Yes
|
|
.It PMC_CAP_READ Ta Yes
|
|
.It PMC_CAP_PRECISE Ta \&No
|
|
.It PMC_CAP_SYSTEM Ta Yes
|
|
.It PMC_CAP_TAGGING Ta \&No
|
|
.It PMC_CAP_THRESHOLD Ta Yes
|
|
.It PMC_CAP_USER Ta Yes
|
|
.It PMC_CAP_WRITE Ta Yes
|
|
.El
|
|
.Ss Event Qualifiers
|
|
Event specifiers for these PMCs support the following common
|
|
qualifiers:
|
|
.Bl -tag -width indent
|
|
.It Li rsp= Ns Ar value
|
|
Configure the Off-core Response bits.
|
|
.Bl -tag -width indent
|
|
.It Li DMND_DATA_RD
|
|
Counts the number of demand and DCU prefetch data reads of full
|
|
and partial cachelines as well as demand data page table entry
|
|
cacheline reads. Does not count L2 data read prefetches or
|
|
instruction fetches.
|
|
.It Li DMND_RFO
|
|
Counts the number of demand and DCU prefetch reads for ownership
|
|
(RFO) requests generated by a write to data cacheline. Does not
|
|
count L2 RFO.
|
|
.It Li DMND_IFETCH
|
|
Counts the number of demand and DCU prefetch instruction cacheline
|
|
reads. Does not count L2 code read prefetches.
|
|
WB
|
|
Counts the number of writeback (modified to exclusive) transactions.
|
|
.It Li PF_DATA_RD
|
|
Counts the number of data cacheline reads generated by L2 prefetchers.
|
|
.It Li PF_RFO
|
|
Counts the number of RFO requests generated by L2 prefetchers.
|
|
.It Li PF_IFETCH
|
|
Counts the number of code reads generated by L2 prefetchers.
|
|
.It Li OTHER
|
|
Counts one of the following transaction types, including L3 invalidate,
|
|
I/O, full or partial writes, WC or non-temporal stores, CLFLUSH, Fences,
|
|
lock, unlock, split lock.
|
|
.It Li UNCORE_HIT
|
|
L3 Hit: local or remote home requests that hit L3 cache in the uncore
|
|
with no coherency actions required (snooping).
|
|
.It Li OTHER_CORE_HIT_SNP
|
|
L3 Hit: local or remote home requests that hit L3 cache in the uncore
|
|
and was serviced by another core with a cross core snoop where no modified
|
|
copies were found (clean).
|
|
.It Li OTHER_CORE_HITM
|
|
L3 Hit: local or remote home requests that hit L3 cache in the uncore
|
|
and was serviced by another core with a cross core snoop where modified
|
|
copies were found (HITM).
|
|
.It Li REMOTE_CACHE_FWD
|
|
L3 Miss: local homed requests that missed the L3 cache and was serviced
|
|
by forwarded data following a cross package snoop where no modified
|
|
copies found. (Remote home requests are not counted)
|
|
.It Li REMOTE_DRAM
|
|
L3 Miss: remote home requests that missed the L3 cache and were serviced
|
|
by remote DRAM.
|
|
.It Li LOCAL_DRAM
|
|
L3 Miss: local home requests that missed the L3 cache and were serviced
|
|
by local DRAM.
|
|
.It Li NON_DRAM
|
|
Non-DRAM requests that were serviced by IOH.
|
|
.El
|
|
.It Li cmask= Ns Ar value
|
|
Configure the PMC to increment only if the number of configured
|
|
events measured in a cycle is greater than or equal to
|
|
.Ar value .
|
|
.It Li edge
|
|
Configure the PMC to count the number of de-asserted to asserted
|
|
transitions of the conditions expressed by the other qualifiers.
|
|
If specified, the counter will increment only once whenever a
|
|
condition becomes true, irrespective of the number of clocks during
|
|
which the condition remains true.
|
|
.It Li inv
|
|
Invert the sense of comparison when the
|
|
.Dq Li cmask
|
|
qualifier is present, making the counter increment when the number of
|
|
events per cycle is less than the value specified by the
|
|
.Dq Li cmask
|
|
qualifier.
|
|
.It Li os
|
|
Configure the PMC to count events happening at processor privilege
|
|
level 0.
|
|
.It Li usr
|
|
Configure the PMC to count events occurring at privilege levels 1, 2
|
|
or 3.
|
|
.El
|
|
.Pp
|
|
If neither of the
|
|
.Dq Li os
|
|
or
|
|
.Dq Li usr
|
|
qualifiers are specified, the default is to enable both.
|
|
.Ss Event Specifiers (Programmable PMCs)
|
|
Core i7 and Xeon 5500 programmable PMCs support the following events:
|
|
.Bl -tag -width indent
|
|
.It Li SB_DRAIN.ANY
|
|
.Pq Event 04H , Umask 07H
|
|
Counts the number of store buffer drains.
|
|
.It Li STORE_BLOCKS.AT_RET
|
|
.Pq Event 06H , Umask 04H
|
|
Counts number of loads delayed with at-Retirement block code. The following
|
|
loads need to be executed at retirement and wait for all senior stores on
|
|
the same thread to be drained: load splitting across 4K boundary (page
|
|
split), load accessing uncacheable (UC or USWC) memory, load lock, and load
|
|
with page table in UC or USWC memory region.
|
|
.It Li STORE_BLOCKS.L1D_BLOCK
|
|
.Pq Event 06H , Umask 08H
|
|
Cacheable loads delayed with L1D block code
|
|
.It Li PARTIAL_ADDRESS_ALIAS
|
|
.Pq Event 07H , Umask 01H
|
|
Counts false dependency due to partial address aliasing
|
|
.It Li DTLB_LOAD_MISSES.ANY
|
|
.Pq Event 08H , Umask 01H
|
|
Counts all load misses that cause a page walk
|
|
.It Li DTLB_LOAD_MISSES.WALK_COMPLETED
|
|
.Pq Event 08H , Umask 02H
|
|
Counts number of completed page walks due to load miss in the STLB.
|
|
.It Li DTLB_LOAD_MISSES.STLB_HIT
|
|
.Pq Event 08H , Umask 10H
|
|
Number of cache load STLB hits
|
|
.It Li DTLB_LOAD_MISSES.PDE_MISS
|
|
.Pq Event 08H , Umask 20H
|
|
Number of DTLB cache load misses where the low part of the linear to
|
|
physical address translation was missed.
|
|
.It Li DTLB_LOAD_MISSES.LARGE_WALK_COMPLETED
|
|
.Pq Event 08H , Umask 80H
|
|
Counts number of completed large page walks due to load miss in the STLB.
|
|
.It Li MEM_INST_RETIRED.LOADS
|
|
.Pq Event 0BH , Umask 01H
|
|
Counts the number of instructions with an architecturally-visible store
|
|
retired on the architected path.
|
|
In conjunction with ld_lat facility
|
|
.It Li MEM_INST_RETIRED.STORES
|
|
.Pq Event 0BH , Umask 02H
|
|
Counts the number of instructions with an architecturally-visible store
|
|
retired on the architected path.
|
|
In conjunction with ld_lat facility
|
|
.It Li MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD
|
|
.Pq Event 0BH , Umask 10H
|
|
Counts the number of instructions exceeding the latency specified with
|
|
ld_lat facility.
|
|
In conjunction with ld_lat facility
|
|
.It Li MEM_STORE_RETIRED.DTLB_MISS
|
|
.Pq Event 0CH , Umask 01H
|
|
The event counts the number of retired stores that missed the DTLB. The DTLB
|
|
miss is not counted if the store operation causes a fault. Does not counter
|
|
prefetches. Counts both primary and secondary misses to the TLB
|
|
.It Li UOPS_ISSUED.ANY
|
|
.Pq Event 0EH , Umask 01H
|
|
Counts the number of Uops issued by the Register Allocation Table to the
|
|
Reservation Station, i.e. the UOPs issued from the front end to the back
|
|
end.
|
|
.It Li UOPS_ISSUED.STALLED_CYCLES
|
|
.Pq Event 0EH , Umask 01H
|
|
Counts the number of cycles no Uops issued by the Register Allocation Table
|
|
to the Reservation Station, i.e. the UOPs issued from the front end to the
|
|
back end.
|
|
set invert=1, cmask = 1
|
|
.It Li UOPS_ISSUED.FUSED
|
|
.Pq Event 0EH , Umask 02H
|
|
Counts the number of fused Uops that were issued from the Register
|
|
Allocation Table to the Reservation Station.
|
|
.It Li MEM_UNCORE_RETIRED.L3_DATA_MISS_UNKNOWN
|
|
.Pq Event 0FH , Umask 01H
|
|
Counts number of memory load instructions retired where the memory reference
|
|
missed L3 and data source is unknown.
|
|
Available only for CPUID signature 06_2EH
|
|
.It Li MEM_UNCORE_RETIRED.OTHER_CORE_L2_HITM
|
|
.Pq Event 0FH , Umask 02H
|
|
Counts number of memory load instructions retired where the memory reference
|
|
hit modified data in a sibling core residing on the same socket.
|
|
.It Li MEM_UNCORE_RETIRED.REMOTE_CACHE_LOCAL_HOME_HIT
|
|
.Pq Event 0FH , Umask 08H
|
|
Counts number of memory load instructions retired where the memory reference
|
|
missed the L1, L2 and L3 caches and HIT in a remote socket's cache. Only
|
|
counts locally homed lines.
|
|
.It Li MEM_UNCORE_RETIRED.REMOTE_DRAM
|
|
.Pq Event 0FH , Umask 10H
|
|
Counts number of memory load instructions retired where the memory reference
|
|
missed the L1, L2 and L3 caches and was remotely homed. This includes both
|
|
DRAM access and HITM in a remote socket's cache for remotely homed lines.
|
|
.It Li MEM_UNCORE_RETIRED.LOCAL_DRAM
|
|
.Pq Event 0FH , Umask 20H
|
|
Counts number of memory load instructions retired where the memory reference
|
|
missed the L1, L2 and L3 caches and required a local socket memory
|
|
reference. This includes locally homed cachelines that were in a modified
|
|
state in another socket.
|
|
.It Li MEM_UNCORE_RETIRED.UNCACHEABLE
|
|
.Pq Event 0FH , Umask 80H
|
|
Counts number of memory load instructions retired where the memory reference
|
|
missed the L1, L2 and L3 caches and to perform I/O.
|
|
Available only for CPUID signature 06_2EH
|
|
.It Li FP_COMP_OPS_EXE.X87
|
|
.Pq Event 10H , Umask 01H
|
|
Counts the number of FP Computational Uops Executed. The number of FADD,
|
|
FSUB, FCOM, FMULs, integer MULsand IMULs, FDIVs, FPREMs, FSQRTS, integer
|
|
DIVs, and IDIVs. This event does not distinguish an FADD used in the middle
|
|
of a transcendental flow from a separate FADD instruction.
|
|
.It Li FP_COMP_OPS_EXE.MMX
|
|
.Pq Event 10H , Umask 02H
|
|
Counts number of MMX Uops executed.
|
|
.It Li FP_COMP_OPS_EXE.SSE_FP
|
|
.Pq Event 10H , Umask 04H
|
|
Counts number of SSE and SSE2 FP uops executed.
|
|
.It Li FP_COMP_OPS_EXE.SSE2_INTEGER
|
|
.Pq Event 10H , Umask 08H
|
|
Counts number of SSE2 integer uops executed.
|
|
.It Li FP_COMP_OPS_EXE.SSE_FP_PACKED
|
|
.Pq Event 10H , Umask 10H
|
|
Counts number of SSE FP packed uops executed.
|
|
.It Li FP_COMP_OPS_EXE.SSE_FP_SCALAR
|
|
.Pq Event 10H , Umask 20H
|
|
Counts number of SSE FP scalar uops executed.
|
|
.It Li FP_COMP_OPS_EXE.SSE_SINGLE_PRECISION
|
|
.Pq Event 10H , Umask 40H
|
|
Counts number of SSE* FP single precision uops executed.
|
|
.It Li FP_COMP_OPS_EXE.SSE_DOUBLE_PRECISION
|
|
.Pq Event 10H , Umask 80H
|
|
Counts number of SSE* FP double precision uops executed.
|
|
.It Li SIMD_INT_128.PACKED_MPY
|
|
.Pq Event 12H , Umask 01H
|
|
Counts number of 128 bit SIMD integer multiply operations.
|
|
.It Li SIMD_INT_128.PACKED_SHIFT
|
|
.Pq Event 12H , Umask 02H
|
|
Counts number of 128 bit SIMD integer shift operations.
|
|
.It Li SIMD_INT_128.PACK
|
|
.Pq Event 12H , Umask 04H
|
|
Counts number of 128 bit SIMD integer pack operations.
|
|
.It Li SIMD_INT_128.UNPACK
|
|
.Pq Event 12H , Umask 08H
|
|
Counts number of 128 bit SIMD integer unpack operations.
|
|
.It Li SIMD_INT_128.PACKED_LOGICAL
|
|
.Pq Event 12H , Umask 10H
|
|
Counts number of 128 bit SIMD integer logical operations.
|
|
.It Li SIMD_INT_128.PACKED_ARITH
|
|
.Pq Event 12H , Umask 20H
|
|
Counts number of 128 bit SIMD integer arithmetic operations.
|
|
.It Li SIMD_INT_128.SHUFFLE_MOVE
|
|
.Pq Event 12H , Umask 40H
|
|
Counts number of 128 bit SIMD integer shuffle and move operations.
|
|
.It Li LOAD_DISPATCH.RS
|
|
.Pq Event 13H , Umask 01H
|
|
Counts number of loads dispatched from the Reservation Station that bypass
|
|
the Memory Order Buffer.
|
|
.It Li LOAD_DISPATCH.RS_DELAYED
|
|
.Pq Event 13H , Umask 02H
|
|
Counts the number of delayed RS dispatches at the stage latch. If an RS
|
|
dispatch can not bypass to LB, it has another chance to dispatch from the
|
|
one-cycle delayed staging latch before it is written into the LB.
|
|
.It Li LOAD_DISPATCH.MOB
|
|
.Pq Event 13H , Umask 04H
|
|
Counts the number of loads dispatched from the Reservation Station to the
|
|
Memory Order Buffer.
|
|
.It Li LOAD_DISPATCH.ANY
|
|
.Pq Event 13H , Umask 07H
|
|
Counts all loads dispatched from the Reservation Station.
|
|
.It Li ARITH.CYCLES_DIV_BUSY
|
|
.Pq Event 14H , Umask 01H
|
|
Counts the number of cycles the divider is busy executing divide or square
|
|
root operations. The divide can be integer, X87 or Streaming SIMD Extensions
|
|
(SSE). The square root operation can be either X87 or SSE.
|
|
Set 'edge =1, invert=1, cmask=1' to count the number of divides.
|
|
Count may be incorrect When SMT is on.
|
|
.It Li ARITH.MUL
|
|
.Pq Event 14H , Umask 02H
|
|
Counts the number of multiply operations executed. This includes integer as
|
|
well as floating point multiply operations but excludes DPPS mul and MPSAD.
|
|
Count may be incorrect When SMT is on
|
|
.It Li INST_QUEUE_WRITES
|
|
.Pq Event 17H , Umask 01H
|
|
Counts the number of instructions written into the instruction queue every
|
|
cycle.
|
|
.It Li INST_DECODED.DEC0
|
|
.Pq Event 18H , Umask 01H
|
|
Counts number of instructions that require decoder 0 to be decoded. Usually,
|
|
this means that the instruction maps to more than 1 uop
|
|
.It Li TWO_UOP_INSTS_DECODED
|
|
.Pq Event 19H , Umask 01H
|
|
An instruction that generates two uops was decoded
|
|
.It Li INST_QUEUE_WRITE_CYCLES
|
|
.Pq Event 1EH , Umask 01H
|
|
This event counts the number of cycles during which instructions are written
|
|
to the instruction queue. Dividing this counter by the number of
|
|
instructions written to the instruction queue (INST_QUEUE_WRITES) yields the
|
|
average number of instructions decoded each cycle. If this number is less
|
|
than four and the pipe stalls, this indicates that the decoder is failing to
|
|
decode enough instructions per cycle to sustain the 4-wide pipeline.
|
|
If SSE* instructions that are 6 bytes or longer arrive one after another,
|
|
then front end throughput may limit execution speed. In such case,
|
|
.It Li LSD_OVERFLOW
|
|
.Pq Event 20H , Umask 01H
|
|
Counts number of loops that cant stream from the instruction queue.
|
|
.It Li L2_RQSTS.LD_HIT
|
|
.Pq Event 24H , Umask 01H
|
|
Counts number of loads that hit the L2 cache. L2 loads include both L1D
|
|
demand misses as well as L1D prefetches. L2 loads can be rejected for
|
|
various reasons. Only non rejected loads are counted.
|
|
.It Li L2_RQSTS.LD_MISS
|
|
.Pq Event 24H , Umask 02H
|
|
Counts the number of loads that miss the L2 cache. L2 loads include both L1D
|
|
demand misses as well as L1D prefetches.
|
|
.It Li L2_RQSTS.LOADS
|
|
.Pq Event 24H , Umask 03H
|
|
Counts all L2 load requests. L2 loads include both L1D demand misses as well
|
|
as L1D prefetches.
|
|
.It Li L2_RQSTS.RFO_HIT
|
|
.Pq Event 24H , Umask 04H
|
|
Counts the number of store RFO requests that hit the L2 cache. L2 RFO
|
|
requests include both L1D demand RFO misses as well as L1D RFO prefetches.
|
|
Count includes WC memory requests, where the data is not fetched but the
|
|
permission to write the line is required.
|
|
.It Li L2_RQSTS.RFO_MISS
|
|
.Pq Event 24H , Umask 08H
|
|
Counts the number of store RFO requests that miss the L2 cache. L2 RFO
|
|
requests include both L1D demand RFO misses as well as L1D RFO prefetches.
|
|
.It Li L2_RQSTS.RFOS
|
|
.Pq Event 24H , Umask 0CH
|
|
Counts all L2 store RFO requests. L2 RFO requests include both L1D demand
|
|
RFO misses as well as L1D RFO prefetches.
|
|
.It Li L2_RQSTS.IFETCH_HIT
|
|
.Pq Event 24H , Umask 10H
|
|
Counts number of instruction fetches that hit the L2 cache. L2 instruction
|
|
fetches include both L1I demand misses as well as L1I instruction
|
|
prefetches.
|
|
.It Li L2_RQSTS.IFETCH_MISS
|
|
.Pq Event 24H , Umask 20H
|
|
Counts number of instruction fetches that miss the L2 cache. L2 instruction
|
|
fetches include both L1I demand misses as well as L1I instruction
|
|
prefetches.
|
|
.It Li L2_RQSTS.IFETCHES
|
|
.Pq Event 24H , Umask 30H
|
|
Counts all instruction fetches. L2 instruction fetches include both L1I
|
|
demand misses as well as L1I instruction prefetches.
|
|
.It Li L2_RQSTS.PREFETCH_HIT
|
|
.Pq Event 24H , Umask 40H
|
|
Counts L2 prefetch hits for both code and data.
|
|
.It Li L2_RQSTS.PREFETCH_MISS
|
|
.Pq Event 24H , Umask 80H
|
|
Counts L2 prefetch misses for both code and data.
|
|
.It Li L2_RQSTS.PREFETCHES
|
|
.Pq Event 24H , Umask C0H
|
|
Counts all L2 prefetches for both code and data.
|
|
.It Li L2_RQSTS.MISS
|
|
.Pq Event 24H , Umask AAH
|
|
Counts all L2 misses for both code and data.
|
|
.It Li L2_RQSTS.REFERENCES
|
|
.Pq Event 24H , Umask FFH
|
|
Counts all L2 requests for both code and data.
|
|
.It Li L2_DATA_RQSTS.DEMAND.I_STATE
|
|
.Pq Event 26H , Umask 01H
|
|
Counts number of L2 data demand loads where the cache line to be loaded is
|
|
in the I (invalid) state, i.e. a cache miss. L2 demand loads are both L1D
|
|
demand misses and L1D prefetches.
|
|
.It Li L2_DATA_RQSTS.DEMAND.S_STATE
|
|
.Pq Event 26H , Umask 02H
|
|
Counts number of L2 data demand loads where the cache line to be loaded is
|
|
in the S (shared) state. L2 demand loads are both L1D demand misses and L1D
|
|
prefetches.
|
|
.It Li L2_DATA_RQSTS.DEMAND.E_STATE
|
|
.Pq Event 26H , Umask 04H
|
|
Counts number of L2 data demand loads where the cache line to be loaded is
|
|
in the E (exclusive) state. L2 demand loads are both L1D demand misses and
|
|
L1D prefetches.
|
|
.It Li L2_DATA_RQSTS.DEMAND.M_STATE
|
|
.Pq Event 26H , Umask 08H
|
|
Counts number of L2 data demand loads where the cache line to be loaded is
|
|
in the M (modified) state. L2 demand loads are both L1D demand misses and
|
|
L1D prefetches.
|
|
.It Li L2_DATA_RQSTS.DEMAND.MESI
|
|
.Pq Event 26H , Umask 0FH
|
|
Counts all L2 data demand requests. L2 demand loads are both L1D demand
|
|
misses and L1D prefetches.
|
|
.It Li L2_DATA_RQSTS.PREFETCH.I_STATE
|
|
.Pq Event 26H , Umask 10H
|
|
Counts number of L2 prefetch data loads where the cache line to be loaded is
|
|
in the I (invalid) state, i.e. a cache miss.
|
|
.It Li L2_DATA_RQSTS.PREFETCH.S_STATE
|
|
.Pq Event 26H , Umask 20H
|
|
Counts number of L2 prefetch data loads where the cache line to be loaded is
|
|
in the S (shared) state. A prefetch RFO will miss on an S state line, while
|
|
a prefetch read will hit on an S state line.
|
|
.It Li L2_DATA_RQSTS.PREFETCH.E_STATE
|
|
.Pq Event 26H , Umask 40H
|
|
Counts number of L2 prefetch data loads where the cache line to be loaded is
|
|
in the E (exclusive) state.
|
|
.It Li L2_DATA_RQSTS.PREFETCH.M_STATE
|
|
.Pq Event 26H , Umask 80H
|
|
Counts number of L2 prefetch data loads where the cache line to be loaded is
|
|
in the M (modified) state.
|
|
.It Li L2_DATA_RQSTS.PREFETCH.MESI
|
|
.Pq Event 26H , Umask F0H
|
|
Counts all L2 prefetch requests.
|
|
.It Li L2_DATA_RQSTS.ANY
|
|
.Pq Event 26H , Umask FFH
|
|
Counts all L2 data requests.
|
|
.It Li L2_WRITE.RFO.I_STATE
|
|
.Pq Event 27H , Umask 01H
|
|
Counts number of L2 demand store RFO requests where the cache line to be
|
|
loaded is in the I (invalid) state, i.e, a cache miss. The L1D prefetcher
|
|
does not issue a RFO prefetch.
|
|
This is a demand RFO request
|
|
.It Li L2_WRITE.RFO.S_STATE
|
|
.Pq Event 27H , Umask 02H
|
|
Counts number of L2 store RFO requests where the cache line to be loaded is
|
|
in the S (shared) state. The L1D prefetcher does not issue a RFO prefetch,.
|
|
This is a demand RFO request
|
|
.It Li L2_WRITE.RFO.M_STATE
|
|
.Pq Event 27H , Umask 08H
|
|
Counts number of L2 store RFO requests where the cache line to be loaded is
|
|
in the M (modified) state. The L1D prefetcher does not issue a RFO prefetch.
|
|
This is a demand RFO request
|
|
.It Li L2_WRITE.RFO.HIT
|
|
.Pq Event 27H , Umask 0EH
|
|
Counts number of L2 store RFO requests where the cache line to be loaded is
|
|
in either the S, E or M states. The L1D prefetcher does not issue a RFO
|
|
prefetch.
|
|
This is a demand RFO request
|
|
.It Li L2_WRITE.RFO.MESI
|
|
.Pq Event 27H , Umask 0FH
|
|
Counts all L2 store RFO requests.The L1D prefetcher does not issue a RFO
|
|
prefetch.
|
|
This is a demand RFO request
|
|
.It Li L2_WRITE.LOCK.I_STATE
|
|
.Pq Event 27H , Umask 10H
|
|
Counts number of L2 demand lock RFO requests where the cache line to be
|
|
loaded is in the I (invalid) state, i.e. a cache miss.
|
|
.It Li L2_WRITE.LOCK.S_STATE
|
|
.Pq Event 27H , Umask 20H
|
|
Counts number of L2 lock RFO requests where the cache line to be loaded is
|
|
in the S (shared) state.
|
|
.It Li L2_WRITE.LOCK.E_STATE
|
|
.Pq Event 27H , Umask 40H
|
|
Counts number of L2 demand lock RFO requests where the cache line to be
|
|
loaded is in the E (exclusive) state.
|
|
.It Li L2_WRITE.LOCK.M_STATE
|
|
.Pq Event 27H , Umask 80H
|
|
Counts number of L2 demand lock RFO requests where the cache line to be
|
|
loaded is in the M (modified) state.
|
|
.It Li L2_WRITE.LOCK.HIT
|
|
.Pq Event 27H , Umask E0H
|
|
Counts number of L2 demand lock RFO requests where the cache line to be
|
|
loaded is in either the S, E, or M state.
|
|
.It Li L2_WRITE.LOCK.MESI
|
|
.Pq Event 27H , Umask F0H
|
|
Counts all L2 demand lock RFO requests.
|
|
.It Li L1D_WB_L2.I_STATE
|
|
.Pq Event 28H , Umask 01H
|
|
Counts number of L1 writebacks to the L2 where the cache line to be written
|
|
is in the I (invalid) state, i.e. a cache miss.
|
|
.It Li L1D_WB_L2.S_STATE
|
|
.Pq Event 28H , Umask 02H
|
|
Counts number of L1 writebacks to the L2 where the cache line to be written
|
|
is in the S state.
|
|
.It Li L1D_WB_L2.E_STATE
|
|
.Pq Event 28H , Umask 04H
|
|
Counts number of L1 writebacks to the L2 where the cache line to be written
|
|
is in the E (exclusive) state.
|
|
.It Li L1D_WB_L2.M_STATE
|
|
.Pq Event 28H , Umask 08H
|
|
Counts number of L1 writebacks to the L2 where the cache line to be written
|
|
is in the M (modified) state.
|
|
.It Li L1D_WB_L2.MESI
|
|
.Pq Event 28H , Umask 0FH
|
|
Counts all L1 writebacks to the L2.
|
|
.It Li L3_LAT_CACHE.REFERENCE
|
|
.Pq Event 2EH , Umask 4FH
|
|
This event counts requests originating from the core that reference a cache
|
|
line in the last level cache. The event count includes speculative traffic
|
|
but excludes cache line fills due to a L2 hardware-prefetch. Because cache
|
|
hierarchy, cache sizes and other implementation-specific characteristics;
|
|
value comparison to estimate performance differences is not recommended.
|
|
see Table A-1
|
|
.It Li L3_LAT_CACHE.MISS
|
|
.Pq Event 2EH , Umask 41H
|
|
This event counts each cache miss condition for references to the last level
|
|
cache. The event count may include speculative traffic but excludes cache
|
|
line fills due to L2 hardware-prefetches. Because cache hierarchy, cache
|
|
sizes and other implementation-specific characteristics; value comparison to
|
|
estimate performance differences is not recommended.
|
|
see Table A-1
|
|
.It Li CPU_CLK_UNHALTED.THREAD_P
|
|
.Pq Event 3CH , Umask 00H
|
|
Counts the number of thread cycles while the thread is not in a halt state.
|
|
The thread enters the halt state when it is running the HLT instruction. The
|
|
core frequency may change from time to time due to power or thermal
|
|
throttling.
|
|
see Table A-1
|
|
.It Li CPU_CLK_UNHALTED.REF_P
|
|
.Pq Event 3CH , Umask 01H
|
|
Increments at the frequency of TSC when not halted.
|
|
see Table A-1
|
|
.It Li L1D_CACHE_LD.I_STATE
|
|
.Pq Event 40H , Umask 01H
|
|
Counts L1 data cache read requests where the cache line to be loaded is in
|
|
the I (invalid) state, i.e. the read request missed the cache.
|
|
Counter 0, 1 only
|
|
.It Li L1D_CACHE_LD.S_STATE
|
|
.Pq Event 40H , Umask 02H
|
|
Counts L1 data cache read requests where the cache line to be loaded is in
|
|
the S (shared) state.
|
|
Counter 0, 1 only
|
|
.It Li L1D_CACHE_LD.E_STATE
|
|
.Pq Event 40H , Umask 04H
|
|
Counts L1 data cache read requests where the cache line to be loaded is in
|
|
the E (exclusive) state.
|
|
Counter 0, 1 only
|
|
.It Li L1D_CACHE_LD.M_STATE
|
|
.Pq Event 40H , Umask 08H
|
|
Counts L1 data cache read requests where the cache line to be loaded is in
|
|
the M (modified) state.
|
|
Counter 0, 1 only
|
|
.It Li L1D_CACHE_LD.MESI
|
|
.Pq Event 40H , Umask 0FH
|
|
Counts L1 data cache read requests.
|
|
Counter 0, 1 only
|
|
.It Li L1D_CACHE_ST.S_STATE
|
|
.Pq Event 41H , Umask 02H
|
|
Counts L1 data cache store RFO requests where the cache line to be loaded is
|
|
in the S (shared) state.
|
|
Counter 0, 1 only
|
|
.It Li L1D_CACHE_ST.E_STATE
|
|
.Pq Event 41H , Umask 04H
|
|
Counts L1 data cache store RFO requests where the cache line to be loaded is
|
|
in the E (exclusive) state.
|
|
Counter 0, 1 only
|
|
.It Li L1D_CACHE_ST.M_STATE
|
|
.Pq Event 41H , Umask 08H
|
|
Counts L1 data cache store RFO requests where cache line to be loaded is in
|
|
the M (modified) state.
|
|
Counter 0, 1 only
|
|
.It Li L1D_CACHE_LOCK.HIT
|
|
.Pq Event 42H , Umask 01H
|
|
Counts retired load locks that hit in the L1 data cache or hit in an already
|
|
allocated fill buffer. The lock portion of the load lock transaction must
|
|
hit in the L1D.
|
|
The initial load will pull the lock into the L1 data cache. Counter 0, 1
|
|
only
|
|
.It Li L1D_CACHE_LOCK.S_STATE
|
|
.Pq Event 42H , Umask 02H
|
|
Counts L1 data cache retired load locks that hit the target cache line in
|
|
the shared state.
|
|
Counter 0, 1 only
|
|
.It Li L1D_CACHE_LOCK.E_STATE
|
|
.Pq Event 42H , Umask 04H
|
|
Counts L1 data cache retired load locks that hit the target cache line in
|
|
the exclusive state.
|
|
Counter 0, 1 only
|
|
.It Li L1D_CACHE_LOCK.M_STATE
|
|
.Pq Event 42H , Umask 08H
|
|
Counts L1 data cache retired load locks that hit the target cache line in
|
|
the modified state.
|
|
Counter 0, 1 only
|
|
.It Li L1D_ALL_REF.ANY
|
|
.Pq Event 43H , Umask 01H
|
|
Counts all references (uncached, speculated and retired) to the L1 data
|
|
cache, including all loads and stores with any memory types. The event
|
|
counts memory accesses only when they are actually performed. For example, a
|
|
load blocked by unknown store address and later performed is only counted
|
|
once.
|
|
The event does not include non- memory accesses, such as I/O accesses.
|
|
Counter 0, 1 only
|
|
.It Li L1D_ALL_REF.CACHEABLE
|
|
.Pq Event 43H , Umask 02H
|
|
Counts all data reads and writes (speculated and retired) from cacheable
|
|
memory, including locked operations.
|
|
Counter 0, 1 only
|
|
.It Li DTLB_MISSES.ANY
|
|
.Pq Event 49H , Umask 01H
|
|
Counts the number of misses in the STLB which causes a page walk.
|
|
.It Li DTLB_MISSES.WALK_COMPLETED
|
|
.Pq Event 49H , Umask 02H
|
|
Counts number of misses in the STLB which resulted in a completed page walk.
|
|
.It Li DTLB_MISSES.STLB_HIT
|
|
.Pq Event 49H , Umask 10H
|
|
Counts the number of DTLB first level misses that hit in the second level
|
|
TLB. This event is only relevant if the core contains multiple DTLB levels.
|
|
.It Li DTLB_MISSES.PDE_MISS
|
|
.Pq Event 49H , Umask 20H
|
|
Number of DTLB misses caused by low part of address, includes references to 2M pages because 2M pages do not use the PDE.
|
|
.It Li DTLB_MISSES.LARGE_WALK_COMPLETED
|
|
.Pq Event 49H , Umask 80H
|
|
Counts number of misses in the STLB which resulted in a completed page walk for large pages.
|
|
.It Li LOAD_HIT_PRE
|
|
.Pq Event 4CH , Umask 01H
|
|
Counts load operations sent to the L1 data cache while a previous SSE
|
|
prefetch instruction to the same cache line has started prefetching but has
|
|
not yet finished.
|
|
.It Li L1D_PREFETCH.REQUESTS
|
|
.Pq Event 4EH , Umask 01H
|
|
Counts number of hardware prefetch requests dispatched out of the prefetch
|
|
FIFO.
|
|
.It Li L1D_PREFETCH.MISS
|
|
.Pq Event 4EH , Umask 02H
|
|
Counts number of hardware prefetch requests that miss the L1D. There are two
|
|
prefetchers in the L1D. A streamer, which predicts lines sequentially after
|
|
this one should be fetched, and the IP prefetcher that remembers access
|
|
patterns for the current instruction. The streamer prefetcher stops on an
|
|
L1D hit, while the IP prefetcher does not.
|
|
.It Li L1D_PREFETCH.TRIGGERS
|
|
.Pq Event 4EH , Umask 04H
|
|
Counts number of prefetch requests triggered by the Finite State Machine and
|
|
pushed into the prefetch FIFO. Some of the prefetch requests are dropped due
|
|
to overwrites or competition between the IP index prefetcher and streamer
|
|
prefetcher. The prefetch FIFO contains 4 entries.
|
|
.It Li L1D.REPL
|
|
.Pq Event 51H , Umask 01H
|
|
Counts the number of lines brought into the L1 data cache.
|
|
Counter 0, 1 only
|
|
.It Li L1D.M_REPL
|
|
.Pq Event 51H , Umask 02H
|
|
Counts the number of modified lines brought into the L1 data cache.
|
|
Counter 0, 1 only
|
|
.It Li L1D.M_EVICT
|
|
.Pq Event 51H , Umask 04H
|
|
Counts the number of modified lines evicted from the L1 data cache due to
|
|
replacement.
|
|
Counter 0, 1 only
|
|
.It Li L1D.M_SNOOP_EVICT
|
|
.Pq Event 51H , Umask 08H
|
|
Counts the number of modified lines evicted from the L1 data cache due to
|
|
snoop HITM intervention.
|
|
Counter 0, 1 only
|
|
.It Li L1D_CACHE_PREFETCH_LOCK_FB_HIT
|
|
.Pq Event 52H , Umask 01H
|
|
Counts the number of cacheable load lock speculated instructions accepted
|
|
into the fill buffer.
|
|
.It Li L1D_CACHE_LOCK_FB_HIT
|
|
.Pq Event 53H , Umask 01H
|
|
Counts the number of cacheable load lock speculated or retired instructions
|
|
accepted into the fill buffer.
|
|
.It Li CACHE_LOCK_CYCLES.L1D_L2
|
|
.Pq Event 63H , Umask 01H
|
|
Cycle count during which the L1D and L2 are locked. A lock is asserted when
|
|
there is a locked memory access, due to uncacheable memory, a locked
|
|
operation that spans two cache lines, or a page walk from an uncacheable
|
|
page table.
|
|
Counter 0, 1 only. L1D and L2 locks have a very high performance penalty and
|
|
it is highly recommended to avoid such accesses.
|
|
.It Li CACHE_LOCK_CYCLES.L1D
|
|
.Pq Event 63H , Umask 02H
|
|
Counts the number of cycles that cacheline in the L1 data cache unit is
|
|
locked.
|
|
Counter 0, 1 only.
|
|
.It Li IO_TRANSACTIONS
|
|
.Pq Event 6CH , Umask 01H
|
|
Counts the number of completed I/O transactions.
|
|
.It Li L1I.HITS
|
|
.Pq Event 80H , Umask 01H
|
|
Counts all instruction fetches that hit the L1 instruction cache.
|
|
.It Li L1I.MISSES
|
|
.Pq Event 80H , Umask 02H
|
|
Counts all instruction fetches that miss the L1I cache. This includes
|
|
instruction cache misses, streaming buffer misses, victim cache misses and
|
|
uncacheable fetches. An instruction fetch miss is counted only once and not
|
|
once for every cycle it is outstanding.
|
|
.It Li L1I.READS
|
|
.Pq Event 80H , Umask 03H
|
|
Counts all instruction fetches, including uncacheable fetches that bypass
|
|
the L1I.
|
|
.It Li L1I.CYCLES_STALLED
|
|
.Pq Event 80H , Umask 04H
|
|
Cycle counts for which an instruction fetch stalls due to a L1I cache miss,
|
|
ITLB miss or ITLB fault.
|
|
.It Li LARGE_ITLB.HIT
|
|
.Pq Event 82H , Umask 01H
|
|
Counts number of large ITLB hits.
|
|
.It Li ITLB_MISSES.ANY
|
|
.Pq Event 85H , Umask 01H
|
|
Counts the number of misses in all levels of the ITLB which causes a page
|
|
walk.
|
|
.It Li ITLB_MISSES.WALK_COMPLETED
|
|
.Pq Event 85H , Umask 02H
|
|
Counts number of misses in all levels of the ITLB which resulted in a
|
|
completed page walk.
|
|
.It Li ILD_STALL.LCP
|
|
.Pq Event 87H , Umask 01H
|
|
Cycles Instruction Length Decoder stalls due to length changing prefixes:
|
|
66, 67 or REX.W (for EM64T) instructions which change the length of the
|
|
decoded instruction.
|
|
.It Li ILD_STALL.MRU
|
|
.Pq Event 87H , Umask 02H
|
|
Instruction Length Decoder stall cycles due to Brand Prediction Unit (PBU)
|
|
Most Recently Used (MRU) bypass.
|
|
.It Li ILD_STALL.IQ_FULL
|
|
.Pq Event 87H , Umask 04H
|
|
Stall cycles due to a full instruction queue.
|
|
.It Li ILD_STALL.REGEN
|
|
.Pq Event 87H , Umask 08H
|
|
Counts the number of regen stalls.
|
|
.It Li ILD_STALL.ANY
|
|
.Pq Event 87H , Umask 0FH
|
|
Counts any cycles the Instruction Length Decoder is stalled.
|
|
.It Li BR_INST_EXEC.COND
|
|
.Pq Event 88H , Umask 01H
|
|
Counts the number of conditional near branch instructions executed, but not
|
|
necessarily retired.
|
|
.It Li BR_INST_EXEC.DIRECT
|
|
.Pq Event 88H , Umask 02H
|
|
Counts all unconditional near branch instructions excluding calls and
|
|
indirect branches.
|
|
.It Li BR_INST_EXEC.INDIRECT_NON_CALL
|
|
.Pq Event 88H , Umask 04H
|
|
Counts the number of executed indirect near branch instructions that are not
|
|
calls.
|
|
.It Li BR_INST_EXEC.NON_CALLS
|
|
.Pq Event 88H , Umask 07H
|
|
Counts all non call near branch instructions executed, but not necessarily
|
|
retired.
|
|
.It Li BR_INST_EXEC.RETURN_NEAR
|
|
.Pq Event 88H , Umask 08H
|
|
Counts indirect near branches that have a return mnemonic.
|
|
.It Li BR_INST_EXEC.DIRECT_NEAR_CALL
|
|
.Pq Event 88H , Umask 10H
|
|
Counts unconditional near call branch instructions, excluding non call
|
|
branch, executed.
|
|
.It Li BR_INST_EXEC.INDIRECT_NEAR_CALL
|
|
.Pq Event 88H , Umask 20H
|
|
Counts indirect near calls, including both register and memory indirect,
|
|
executed.
|
|
.It Li BR_INST_EXEC.NEAR_CALLS
|
|
.Pq Event 88H , Umask 30H
|
|
Counts all near call branches executed, but not necessarily retired.
|
|
.It Li BR_INST_EXEC.TAKEN
|
|
.Pq Event 88H , Umask 40H
|
|
Counts taken near branches executed, but not necessarily retired.
|
|
.It Li BR_INST_EXEC.ANY
|
|
.Pq Event 88H , Umask 7FH
|
|
Counts all near executed branches (not necessarily retired). This includes
|
|
only instructions and not micro-op branches. Frequent branching is not
|
|
necessarily a major performance issue. However frequent branch
|
|
mispredictions may be a problem.
|
|
.It Li BR_MISP_EXEC.COND
|
|
.Pq Event 89H , Umask 01H
|
|
Counts the number of mispredicted conditional near branch instructions
|
|
executed, but not necessarily retired.
|
|
.It Li BR_MISP_EXEC.DIRECT
|
|
.Pq Event 89H , Umask 02H
|
|
Counts mispredicted macro unconditional near branch instructions, excluding
|
|
calls and indirect branches (should always be 0).
|
|
.It Li BR_MISP_EXEC.INDIRECT_NON_CALL
|
|
.Pq Event 89H , Umask 04H
|
|
Counts the number of executed mispredicted indirect near branch instructions
|
|
that are not calls.
|
|
.It Li BR_MISP_EXEC.NON_CALLS
|
|
.Pq Event 89H , Umask 07H
|
|
Counts mispredicted non call near branches executed, but not necessarily
|
|
retired.
|
|
.It Li BR_MISP_EXEC.RETURN_NEAR
|
|
.Pq Event 89H , Umask 08H
|
|
Counts mispredicted indirect branches that have a rear return mnemonic.
|
|
.It Li BR_MISP_EXEC.DIRECT_NEAR_CALL
|
|
.Pq Event 89H , Umask 10H
|
|
Counts mispredicted non-indirect near calls executed, (should always be 0).
|
|
.It Li BR_MISP_EXEC.INDIRECT_NEAR_CALL
|
|
.Pq Event 89H , Umask 20H
|
|
Counts mispredicted indirect near calls executed, including both register
|
|
and memory indirect.
|
|
.It Li BR_MISP_EXEC.NEAR_CALLS
|
|
.Pq Event 89H , Umask 30H
|
|
Counts all mispredicted near call branches executed, but not necessarily
|
|
retired.
|
|
.It Li BR_MISP_EXEC.TAKEN
|
|
.Pq Event 89H , Umask 40H
|
|
Counts executed mispredicted near branches that are taken, but not
|
|
necessarily retired.
|
|
.It Li BR_MISP_EXEC.ANY
|
|
.Pq Event 89H , Umask 7FH
|
|
Counts the number of mispredicted near branch instructions that were
|
|
executed, but not necessarily retired.
|
|
.It Li RESOURCE_STALLS.ANY
|
|
.Pq Event A2H , Umask 01H
|
|
Counts the number of Allocator resource related stalls. Includes register
|
|
renaming buffer entries, memory buffer entries. In addition to resource
|
|
related stalls, this event counts some other events. Includes stalls arising
|
|
during branch misprediction recovery, such as if retirement of the
|
|
mispredicted branch is delayed and stalls arising while store buffer is
|
|
draining from synchronizing operations.
|
|
Does not include stalls due to SuperQ (off core) queue full, too many cache
|
|
misses, etc.
|
|
.It Li RESOURCE_STALLS.LOAD
|
|
.Pq Event A2H , Umask 02H
|
|
Counts the cycles of stall due to lack of load buffer for load operation.
|
|
.It Li RESOURCE_STALLS.RS_FULL
|
|
.Pq Event A2H , Umask 04H
|
|
This event counts the number of cycles when the number of instructions in
|
|
the pipeline waiting for execution reaches the limit the processor can
|
|
handle. A high count of this event indicates that there are long latency
|
|
operations in the pipe (possibly load and store operations that miss the L2
|
|
cache, or instructions dependent upon instructions further down the pipeline
|
|
that have yet to retire.
|
|
When RS is full, new instructions can not enter the reservation station and
|
|
start execution.
|
|
.It Li RESOURCE_STALLS.STORE
|
|
.Pq Event A2H , Umask 08H
|
|
This event counts the number of cycles that a resource related stall will
|
|
occur due to the number of store instructions reaching the limit of the
|
|
pipeline, (i.e. all store buffers are used). The stall ends when a store
|
|
instruction commits its data to the cache or memory.
|
|
.It Li RESOURCE_STALLS.ROB_FULL
|
|
.Pq Event A2H , Umask 10H
|
|
Counts the cycles of stall due to re- order buffer full.
|
|
.It Li RESOURCE_STALLS.FPCW
|
|
.Pq Event A2H , Umask 20H
|
|
Counts the number of cycles while execution was stalled due to writing the
|
|
floating-point unit (FPU) control word.
|
|
.It Li RESOURCE_STALLS.MXCSR
|
|
.Pq Event A2H , Umask 40H
|
|
Stalls due to the MXCSR register rename occurring to close to a previous
|
|
MXCSR rename. The MXCSR provides control and status for the MMX registers.
|
|
.It Li RESOURCE_STALLS.OTHER
|
|
.Pq Event A2H , Umask 80H
|
|
Counts the number of cycles while execution was stalled due to other
|
|
resource issues.
|
|
.It Li MACRO_INSTS.FUSIONS_DECODED
|
|
.Pq Event A6H , Umask 01H
|
|
Counts the number of instructions decoded that are macro-fused but not
|
|
necessarily executed or retired.
|
|
.It Li BACLEAR_FORCE_IQ
|
|
.Pq Event A7H , Umask 01H
|
|
Counts number of times a BACLEAR was forced by the Instruction Queue. The IQ
|
|
is also responsible for providing conditional branch prediction direction
|
|
based on a static scheme and dynamic data provided by the L2 Branch
|
|
Prediction Unit. If the conditional branch target is not found in the Target
|
|
Array and the IQ predicts that the branch is taken, then the IQ will force
|
|
the Branch Address Calculator to issue a BACLEAR. Each BACLEAR asserted by
|
|
the BAC generates approximately an 8 cycle bubble in the instruction fetch
|
|
pipeline.
|
|
.It Li LSD.UOPS
|
|
.Pq Event A8H , Umask 01H
|
|
Counts the number of micro-ops delivered by loop stream detector
|
|
Use cmask=1 and invert to count cycles
|
|
.It Li ITLB_FLUSH
|
|
.Pq Event AEH , Umask 01H
|
|
Counts the number of ITLB flushes
|
|
.It Li OFFCORE_REQUESTS.L1D_WRITEBACK
|
|
.Pq Event B0H , Umask 40H
|
|
Counts number of L1D writebacks to the uncore.
|
|
.It Li UOPS_EXECUTED.PORT0
|
|
.Pq Event B1H , Umask 01H
|
|
Counts number of Uops executed that were issued on port 0. Port 0 handles
|
|
integer arithmetic, SIMD and FP add Uops.
|
|
.It Li UOPS_EXECUTED.PORT1
|
|
.Pq Event B1H , Umask 02H
|
|
Counts number of Uops executed that were issued on port 1. Port 1 handles
|
|
integer arithmetic, SIMD, integer shift, FP multiply and FP divide Uops.
|
|
.It Li UOPS_EXECUTED.PORT2_CORE
|
|
.Pq Event B1H , Umask 04H
|
|
Counts number of Uops executed that were issued on port 2. Port 2 handles
|
|
the load Uops. This is a core count only and can not be collected per
|
|
thread.
|
|
.It Li UOPS_EXECUTED.PORT3_CORE
|
|
.Pq Event B1H , Umask 08H
|
|
Counts number of Uops executed that were issued on port 3. Port 3 handles
|
|
store Uops. This is a core count only and can not be collected per thread.
|
|
.It Li UOPS_EXECUTED.PORT4_CORE
|
|
.Pq Event B1H , Umask 10H
|
|
Counts number of Uops executed that where issued on port 4. Port 4 handles
|
|
the value to be stored for the store Uops issued on port 3. This is a core
|
|
count only and can not be collected per thread.
|
|
.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES_NO_PORT5
|
|
.Pq Event B1H , Umask 1FH
|
|
Counts cycles when the Uops executed were issued from any ports except port
|
|
5. Use Cmask=1 for active cycles; Cmask=0 for weighted cycles; Use CMask=1,
|
|
Invert=1 to count P0-4 stalled cycles Use Cmask=1, Edge=1, Invert=1 to count
|
|
P0-4 stalls.
|
|
.It Li UOPS_EXECUTED.PORT5
|
|
.Pq Event B1H , Umask 20H
|
|
Counts number of Uops executed that where issued on port 5.
|
|
.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES
|
|
.Pq Event B1H , Umask 3FH
|
|
Counts cycles when the Uops are executing. Use Cmask=1 for active cycles;
|
|
Cmask=0 for weighted cycles; Use CMask=1, Invert=1 to count P0-4 stalled
|
|
cycles Use Cmask=1, Edge=1, Invert=1 to count P0-4 stalls.
|
|
.It Li UOPS_EXECUTED.PORT015
|
|
.Pq Event B1H , Umask 40H
|
|
Counts number of Uops executed that where issued on port 0, 1, or 5.
|
|
use cmask=1, invert=1 to count stall cycles
|
|
.It Li UOPS_EXECUTED.PORT234
|
|
.Pq Event B1H , Umask 80H
|
|
Counts number of Uops executed that where issued on port 2, 3, or 4.
|
|
.It Li OFFCORE_REQUESTS_SQ_FULL
|
|
.Pq Event B2H , Umask 01H
|
|
Counts number of cycles the SQ is full to handle off-core requests.
|
|
.It Li OFF_CORE_RESPONSE_0
|
|
.Pq Event B7H , Umask 01H
|
|
see Section 30.6.1.3, Off-core Response Performance Monitoring in the
|
|
Processor Core
|
|
Requires programming MSR 01A6H
|
|
.It Li SNOOP_RESPONSE.HIT
|
|
.Pq Event B8H , Umask 01H
|
|
Counts HIT snoop response sent by this thread in response to a snoop
|
|
request.
|
|
.It Li SNOOP_RESPONSE.HITE
|
|
.Pq Event B8H , Umask 02H
|
|
Counts HIT E snoop response sent by this thread in response to a snoop
|
|
request.
|
|
.It Li SNOOP_RESPONSE.HITM
|
|
.Pq Event B8H , Umask 04H
|
|
Counts HIT M snoop response sent by this thread in response to a snoop
|
|
request.
|
|
.It Li OFF_CORE_RESPONSE_1
|
|
.Pq Event BBH , Umask 01H
|
|
see Section 30.6.1.3, Off-core Response Performance Monitoring in the
|
|
Processor Core
|
|
Requires programming MSR 01A7H
|
|
.It Li INST_RETIRED.ANY_P
|
|
.Pq Event C0H , Umask 01H
|
|
See Table A-1
|
|
Notes: INST_RETIRED.ANY is counted by a designated fixed counter.
|
|
INST_RETIRED.ANY_P is counted by a programmable counter and is an
|
|
architectural performance event. Event is supported if CPUID.A.EBX[1] = 0.
|
|
Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not
|
|
count as retired instructions.
|
|
.It Li INST_RETIRED.X87
|
|
.Pq Event C0H , Umask 02H
|
|
Counts the number of MMX instructions retired.
|
|
.It Li INST_RETIRED.MMX
|
|
.Pq Event C0H , Umask 04H
|
|
Counts the number of floating point computational operations retired:
|
|
floating point computational operations executed by the assist handler and
|
|
sub-operations of complex floating point instructions like transcendental
|
|
instructions.
|
|
.It Li UOPS_RETIRED.ANY
|
|
.Pq Event C2H , Umask 01H
|
|
Counts the number of micro-ops retired, (macro-fused=1, micro- fused=2,
|
|
others=1; maximum count of 8 per cycle). Most instructions are composed of
|
|
one or two micro-ops. Some instructions are decoded into longer sequences
|
|
such as repeat instructions, floating point transcendental instructions, and
|
|
assists.
|
|
Use cmask=1 and invert to count active cycles or stalled cycles
|
|
.It Li UOPS_RETIRED.RETIRE_SLOTS
|
|
.Pq Event C2H , Umask 02H
|
|
Counts the number of retirement slots used each cycle
|
|
.It Li UOPS_RETIRED.MACRO_FUSED
|
|
.Pq Event C2H , Umask 04H
|
|
Counts number of macro-fused uops retired.
|
|
.It Li MACHINE_CLEARS.CYCLES
|
|
.Pq Event C3H , Umask 01H
|
|
Counts the cycles machine clear is asserted.
|
|
.It Li MACHINE_CLEARS.MEM_ORDER
|
|
.Pq Event C3H , Umask 02H
|
|
Counts the number of machine clears due to memory order conflicts.
|
|
.It Li MACHINE_CLEARS.SMC
|
|
.Pq Event C3H , Umask 04H
|
|
Counts the number of times that a program writes to a code section.
|
|
Self-modifying code causes a sever penalty in all Intel 64 and IA-32
|
|
processors. The modified cache line is written back to the L2 and L3caches.
|
|
.It Li BR_INST_RETIRED.ALL_BRANCHES
|
|
.Pq Event C4H , Umask 00H
|
|
See Table A-1
|
|
.It Li BR_INST_RETIRED.CONDITIONAL
|
|
.Pq Event C4H , Umask 01H
|
|
Counts the number of conditional branch instructions retired.
|
|
.It Li BR_INST_RETIRED.NEAR_CALL
|
|
.Pq Event C4H , Umask 02H
|
|
Counts the number of direct & indirect near unconditional calls retired
|
|
.It Li BR_INST_RETIRED.ALL_BRANCHES
|
|
.Pq Event C4H , Umask 04H
|
|
Counts the number of branch instructions retired
|
|
.It Li BR_MISP_RETIRED.ALL_BRANCHES
|
|
.Pq Event C5H , Umask 00H
|
|
See Table A-1
|
|
.It Li BR_MISP_RETIRED.NEAR_CALL
|
|
.Pq Event C5H , Umask 02H
|
|
Counts mispredicted direct & indirect near unconditional retired calls.
|
|
.It Li SSEX_UOPS_RETIRED.PACKED_SINGLE
|
|
.Pq Event C7H , Umask 01H
|
|
Counts SIMD packed single-precision floating point Uops retired.
|
|
.It Li SSEX_UOPS_RETIRED.SCALAR_SINGLE
|
|
.Pq Event C7H , Umask 02H
|
|
Counts SIMD calar single-precision floating point Uops retired.
|
|
.It Li SSEX_UOPS_RETIRED.PACKED_DOUBLE
|
|
.Pq Event C7H , Umask 04H
|
|
Counts SIMD packed double- precision floating point Uops retired.
|
|
.It Li SSEX_UOPS_RETIRED.SCALAR_DOUBLE
|
|
.Pq Event C7H , Umask 08H
|
|
Counts SIMD scalar double-precision floating point Uops retired.
|
|
.It Li SSEX_UOPS_RETIRED.VECTOR_INTEGER
|
|
.Pq Event C7H , Umask 10H
|
|
Counts 128-bit SIMD vector integer Uops retired.
|
|
.It Li ITLB_MISS_RETIRED
|
|
.Pq Event C8H , Umask 20H
|
|
Counts the number of retired instructions that missed the ITLB when the
|
|
instruction was fetched.
|
|
.It Li MEM_LOAD_RETIRED.L1D_HIT
|
|
.Pq Event CBH , Umask 01H
|
|
Counts number of retired loads that hit the L1 data cache.
|
|
.It Li MEM_LOAD_RETIRED.L2_HIT
|
|
.Pq Event CBH , Umask 02H
|
|
Counts number of retired loads that hit the L2 data cache.
|
|
.It Li MEM_LOAD_RETIRED.L3_UNSHARED_HIT
|
|
.Pq Event CBH , Umask 04H
|
|
Counts number of retired loads that hit their own, unshared lines in the L3
|
|
cache.
|
|
.It Li MEM_LOAD_RETIRED.OTHER_CORE_L2_HIT_HITM
|
|
.Pq Event CBH , Umask 08H
|
|
Counts number of retired loads that hit in a sibling core's L2 (on die
|
|
core). Since the L3 is inclusive of all cores on the package, this is an L3
|
|
hit. This counts both clean or modified hits.
|
|
.It Li MEM_LOAD_RETIRED.L3_MISS
|
|
.Pq Event CBH , Umask 10H
|
|
Counts number of retired loads that miss the L3 cache. The load was
|
|
satisfied by a remote socket, local memory or an IOH.
|
|
.It Li MEM_LOAD_RETIRED.HIT_LFB
|
|
.Pq Event CBH , Umask 40H
|
|
Counts number of retired loads that miss the L1D and the address is located
|
|
in an allocated line fill buffer and will soon be committed to cache. This
|
|
is counting secondary L1D misses.
|
|
.It Li MEM_LOAD_RETIRED.DTLB_MISS
|
|
.Pq Event CBH , Umask 80H
|
|
Counts the number of retired loads that missed the DTLB. The DTLB miss is
|
|
not counted if the load operation causes a fault. This event counts loads
|
|
from cacheable memory only. The event does not count loads by software
|
|
prefetches. Counts both primary and secondary misses to the TLB.
|
|
.It Li FP_MMX_TRANS.TO_FP
|
|
.Pq Event CCH , Umask 01H
|
|
Counts the first floating-point instruction following any MMX instruction.
|
|
You can use this event to estimate the penalties for the transitions between
|
|
floating-point and MMX technology states.
|
|
.It Li FP_MMX_TRANS.TO_MMX
|
|
.Pq Event CCH , Umask 02H
|
|
Counts the first MMX instruction following a floating-point instruction. You
|
|
can use this event to estimate the penalties for the transitions between
|
|
floating-point and MMX technology states.
|
|
.It Li FP_MMX_TRANS.ANY
|
|
.Pq Event CCH , Umask 03H
|
|
Counts all transitions from floating point to MMX instructions and from MMX
|
|
instructions to floating point instructions. You can use this event to
|
|
estimate the penalties for the transitions between floating-point and MMX
|
|
technology states.
|
|
.It Li MACRO_INSTS.DECODED
|
|
.Pq Event D0H , Umask 01H
|
|
Counts the number of instructions decoded, (but not necessarily executed or
|
|
retired).
|
|
.It Li UOPS_DECODED.MS
|
|
.Pq Event D1H , Umask 02H
|
|
Counts the number of Uops decoded by the Microcode Sequencer, MS. The MS
|
|
delivers uops when the instruction is more than 4 uops long or a microcode
|
|
assist is occurring.
|
|
.It Li UOPS_DECODED.ESP_FOLDING
|
|
.Pq Event D1H , Umask 04H
|
|
Counts number of stack pointer (ESP) instructions decoded: push , pop , call
|
|
, ret, etc. ESP instructions do not generate a Uop to increment or decrement
|
|
ESP. Instead, they update an ESP_Offset register that keeps track of the
|
|
delta to the current value of the ESP register.
|
|
.It Li UOPS_DECODED.ESP_SYNC
|
|
.Pq Event D1H , Umask 08H
|
|
Counts number of stack pointer (ESP) sync operations where an ESP
|
|
instruction is corrected by adding the ESP offset register to the current
|
|
value of the ESP register.
|
|
.It Li RAT_STALLS.FLAGS
|
|
.Pq Event D2H , Umask 01H
|
|
Counts the number of cycles during which execution stalled due to several
|
|
reasons, one of which is a partial flag register stall. A partial register
|
|
stall may occur when two conditions are met: 1) an instruction modifies
|
|
some, but not all, of the flags in the flag register and 2) the next
|
|
instruction, which depends on flags, depends on flags that were not modified
|
|
by this instruction.
|
|
.It Li RAT_STALLS.REGISTERS
|
|
.Pq Event D2H , Umask 02H
|
|
This event counts the number of cycles instruction execution latency became
|
|
longer than the defined latency because the instruction used a register that
|
|
was partially written by previous instruction.
|
|
.It Li RAT_STALLS.ROB_READ_PORT
|
|
.Pq Event D2H , Umask 04H
|
|
Counts the number of cycles when ROB read port stalls occurred, which did
|
|
not allow new micro-ops to enter the out-of-order pipeline. Note that, at
|
|
this stage in the pipeline, additional stalls may occur at the same cycle
|
|
and prevent the stalled micro-ops from entering the pipe. In such a case,
|
|
micro-ops retry entering the execution pipe in the next cycle and the
|
|
ROB-read port stall is counted again.
|
|
.It Li RAT_STALLS.SCOREBOARD
|
|
.Pq Event D2H , Umask 08H
|
|
Counts the cycles where we stall due to microarchitecturally required
|
|
serialization. Microcode scoreboarding stalls.
|
|
.It Li RAT_STALLS.ANY
|
|
.Pq Event D2H , Umask 0FH
|
|
Counts all Register Allocation Table stall cycles due to: Cycles when ROB
|
|
read port stalls occurred, which did not allow new micro-ops to enter the
|
|
execution pipe. Cycles when partial register stalls occurred Cycles when
|
|
flag stalls occurred Cycles floating-point unit (FPU) status word stalls
|
|
occurred. To count each of these conditions separately use the events:
|
|
RAT_STALLS.ROB_READ_PORT, RAT_STALLS.PARTIAL, RAT_STALLS.FLAGS, and
|
|
RAT_STALLS.FPSW.
|
|
.It Li SEG_RENAME_STALLS
|
|
.Pq Event D4H , Umask 01H
|
|
Counts the number of stall cycles due to the lack of renaming resources for
|
|
the ES, DS, FS, and GS segment registers. If a segment is renamed but not
|
|
retired and a second update to the same segment occurs, a stall occurs in
|
|
the front-end of the pipeline until the renamed segment retires.
|
|
.It Li ES_REG_RENAMES
|
|
.Pq Event D5H , Umask 01H
|
|
Counts the number of times the ES segment register is renamed.
|
|
.It Li UOP_UNFUSION
|
|
.Pq Event DBH , Umask 01H
|
|
Counts unfusion events due to floating point exception to a fused uop.
|
|
.It Li BR_INST_DECODED
|
|
.Pq Event E0H , Umask 01H
|
|
Counts the number of branch instructions decoded.
|
|
.It Li BPU_MISSED_CALL_RET
|
|
.Pq Event E5H , Umask 01H
|
|
Counts number of times the Branch Prediction Unit missed predicting a call
|
|
or return branch.
|
|
.It Li BACLEAR.CLEAR
|
|
.Pq Event E6H , Umask 01H
|
|
Counts the number of times the front end is resteered, mainly when the
|
|
Branch Prediction Unit cannot provide a correct prediction and this is
|
|
corrected by the Branch Address Calculator at the front end. This can occur
|
|
if the code has many branches such that they cannot be consumed by the BPU.
|
|
Each BACLEAR asserted by the BAC generates approximately an 8 cycle bubble
|
|
in the instruction fetch pipeline. The effect on total execution time
|
|
depends on the surrounding code.
|
|
.It Li BACLEAR.BAD_TARGET
|
|
.Pq Event E6H , Umask 02H
|
|
Counts number of Branch Address Calculator clears (BACLEAR) asserted due to
|
|
conditional branch instructions in which there was a target hit but the
|
|
direction was wrong. Each BACLEAR asserted by the BAC generates
|
|
approximately an 8 cycle bubble in the instruction fetch pipeline.
|
|
.It Li BPU_CLEARS.EARLY
|
|
.Pq Event E8H , Umask 01H
|
|
Counts early (normal) Branch Prediction Unit clears: BPU predicted a taken
|
|
branch after incorrectly assuming that it was not taken.
|
|
The BPU clear leads to 2 cycle bubble in the Front End.
|
|
.It Li BPU_CLEARS.LATE
|
|
.Pq Event E8H , Umask 02H
|
|
Counts late Branch Prediction Unit clears due to Most Recently Used
|
|
conflicts. The PBU clear leads to a 3 cycle bubble in the Front End.
|
|
.It Li L2_TRANSACTIONS.LOAD
|
|
.Pq Event F0H , Umask 01H
|
|
Counts L2 load operations due to HW prefetch or demand loads.
|
|
.It Li L2_TRANSACTIONS.RFO
|
|
.Pq Event F0H , Umask 02H
|
|
Counts L2 RFO operations due to HW prefetch or demand RFOs.
|
|
.It Li L2_TRANSACTIONS.IFETCH
|
|
.Pq Event F0H , Umask 04H
|
|
Counts L2 instruction fetch operations due to HW prefetch or demand ifetch.
|
|
.It Li L2_TRANSACTIONS.PREFETCH
|
|
.Pq Event F0H , Umask 08H
|
|
Counts L2 prefetch operations.
|
|
.It Li L2_TRANSACTIONS.L1D_WB
|
|
.Pq Event F0H , Umask 10H
|
|
Counts L1D writeback operations to the L2.
|
|
.It Li L2_TRANSACTIONS.FILL
|
|
.Pq Event F0H , Umask 20H
|
|
Counts L2 cache line fill operations due to load, RFO, L1D writeback or
|
|
prefetch.
|
|
.It Li L2_TRANSACTIONS.WB
|
|
.Pq Event F0H , Umask 40H
|
|
Counts L2 writeback operations to the L3.
|
|
.It Li L2_TRANSACTIONS.ANY
|
|
.Pq Event F0H , Umask 80H
|
|
Counts all L2 cache operations.
|
|
.It Li L2_LINES_IN.S_STATE
|
|
.Pq Event F1H , Umask 02H
|
|
Counts the number of cache lines allocated in the L2 cache in the S (shared)
|
|
state.
|
|
.It Li L2_LINES_IN.E_STATE
|
|
.Pq Event F1H , Umask 04H
|
|
Counts the number of cache lines allocated in the L2 cache in the E
|
|
(exclusive) state.
|
|
.It Li L2_LINES_IN.ANY
|
|
.Pq Event F1H , Umask 07H
|
|
Counts the number of cache lines allocated in the L2 cache.
|
|
.It Li L2_LINES_OUT.DEMAND_CLEAN
|
|
.Pq Event F2H , Umask 01H
|
|
Counts L2 clean cache lines evicted by a demand request.
|
|
.It Li L2_LINES_OUT.DEMAND_DIRTY
|
|
.Pq Event F2H , Umask 02H
|
|
Counts L2 dirty (modified) cache lines evicted by a demand request.
|
|
.It Li L2_LINES_OUT.PREFETCH_CLEAN
|
|
.Pq Event F2H , Umask 04H
|
|
Counts L2 clean cache line evicted by a prefetch request.
|
|
.It Li L2_LINES_OUT.PREFETCH_DIRTY
|
|
.Pq Event F2H , Umask 08H
|
|
Counts L2 modified cache line evicted by a prefetch request.
|
|
.It Li L2_LINES_OUT.ANY
|
|
.Pq Event F2H , Umask 0FH
|
|
Counts all L2 cache lines evicted for any reason.
|
|
.It Li SQ_MISC.SPLIT_LOCK
|
|
.Pq Event F4H , Umask 10H
|
|
Counts the number of SQ lock splits across a cache line.
|
|
.It Li SQ_FULL_STALL_CYCLES
|
|
.Pq Event F6H , Umask 01H
|
|
Counts cycles the Super Queue is full. Neither of the threads on this core
|
|
will be able to access the uncore.
|
|
.It Li FP_ASSIST.ALL
|
|
.Pq Event F7H , Umask 01H
|
|
Counts the number of floating point operations executed that required
|
|
micro-code assist intervention. Assists are required in the following cases:
|
|
SSE instructions, (Denormal input when the DAZ flag is off or Underflow
|
|
result when the FTZ flag is off): x87 instructions, (NaN or denormal are
|
|
loaded to a register or used as input from memory, Division by 0 or
|
|
Underflow output).
|
|
.It Li FP_ASSIST.OUTPUT
|
|
.Pq Event F7H , Umask 02H
|
|
Counts number of floating point micro-code assist when the output value
|
|
(destination register) is invalid.
|
|
.It Li FP_ASSIST.INPUT
|
|
.Pq Event F7H , Umask 04H
|
|
Counts number of floating point micro-code assist when the input value (one
|
|
of the source operands to an FP instruction) is invalid.
|
|
.It Li SIMD_INT_64.PACKED_MPY
|
|
.Pq Event FDH , Umask 01H
|
|
Counts number of SID integer 64 bit packed multiply operations.
|
|
.It Li SIMD_INT_64.PACKED_SHIFT
|
|
.Pq Event FDH , Umask 02H
|
|
Counts number of SID integer 64 bit packed shift operations.
|
|
.It Li SIMD_INT_64.PACK
|
|
.Pq Event FDH , Umask 04H
|
|
Counts number of SID integer 64 bit pack operations.
|
|
.It Li SIMD_INT_64.UNPACK
|
|
.Pq Event FDH , Umask 08H
|
|
Counts number of SID integer 64 bit unpack operations.
|
|
.It Li SIMD_INT_64.PACKED_LOGICAL
|
|
.Pq Event FDH , Umask 10H
|
|
Counts number of SID integer 64 bit logical operations.
|
|
.It Li SIMD_INT_64.PACKED_ARITH
|
|
.Pq Event FDH , Umask 20H
|
|
Counts number of SID integer 64 bit arithmetic operations.
|
|
.It Li SIMD_INT_64.SHUFFLE_MOVE
|
|
.Pq Event FDH , Umask 40H
|
|
Counts number of SID integer 64 bit shift or move operations.
|
|
.El
|
|
.Ss Event Specifiers (Programmable PMCs)
|
|
Core i7 and Xeon 5500 programmable PMCs support the following events as
|
|
June 2009 document (removed in December 2009):
|
|
.Bl -tag -width indent
|
|
.It Li SB_FORWARD.ANY
|
|
.Pq Event 02H , Umask 01H
|
|
Counts the number of store forwards.
|
|
.It Li LOAD_BLOCK.STD
|
|
.Pq Event 03H , Umask 01H
|
|
Counts the number of loads blocked by a preceding store with unknown data.
|
|
.It Li LOAD_BLOCK.ADDRESS_OFFSET
|
|
.Pq Event 03H , Umask 04H
|
|
Counts the number of loads blocked by a preceding store address.
|
|
.It Li LOAD_BLOCK.ADDRESS_OFFSET
|
|
.Pq Event 01H , Umask 04H
|
|
Counts the cycles of store buffer drains.
|
|
.It Li MISALIGN_MEM_REF.LOAD
|
|
.Pq Event 05H , Umask 01H
|
|
Counts the number of misaligned load references
|
|
.It Li MISALIGN_MEM_REF.STORE
|
|
.Pq Event 05H , Umask 02H
|
|
Counts the number of misaligned store references
|
|
.It Li MISALIGN_MEM_REF.ANY
|
|
.Pq Event 05H , Umask 03H
|
|
Counts the number of misaligned memory references
|
|
.It Li STORE_BLOCKS.NOT_STA
|
|
.Pq Event 06H , Umask 01H
|
|
This event counts the number of load operations delayed caused by preceding
|
|
stores whose addresses are known but whose data is unknown, and preceding
|
|
stores that conflict with the load but which incompletely overlap the load.
|
|
.It Li STORE_BLOCKS.STA
|
|
.Pq Event 06H , Umask 02H
|
|
This event counts load operations delayed caused by preceding stores whose
|
|
addresses are unknown (STA block).
|
|
.It Li STORE_BLOCKS.ANY
|
|
.Pq Event 06H , Umask 0FH
|
|
All loads delayed due to store blocks
|
|
.It Li MEMORY_DISAMBIGURATION.RESET
|
|
.Pq Event 09H , Umask 01H
|
|
Counts memory disambiguration reset cycles
|
|
.It Li MEMORY_DISAMBIGURATION.SUCCESS
|
|
.Pq Event 09H , Umask 02H
|
|
Counts the number of loads that memory disambiguration succeeded
|
|
.It Li MEMORY_DISAMBIGURATION.WATCHDOG
|
|
.Pq Event 09H , Umask 04H
|
|
Counts the number of times the memory disambiguration watchdog kicked in.
|
|
.It Li MEMORY_DISAMBIGURATION.WATCH_CYCLES
|
|
.Pq Event 09H , Umask 08H
|
|
Counts the cycles that the memory disambiguration watchdog is active.
|
|
set invert=1, cmask = 1
|
|
.It Li HW_INT.RCV
|
|
.Pq Event 1DH , Umask 01H
|
|
Number of interrupt received
|
|
.It Li HW_INT.CYCLES_MASKED
|
|
.Pq Event 1DH , Umask 02H
|
|
Number of cycles interrupt are masked
|
|
.It Li HW_INT.CYCLES_PENDING_AND_MASKED
|
|
.Pq Event 1DH , Umask 04H
|
|
Number of cycles interrupts are pending and masked
|
|
.It Li HW_INT.CYCLES_PENDING_AND_MASKED
|
|
.Pq Event 04H , Umask 04H
|
|
Counts number of L2 store RFO requests where the cache line to be loaded is
|
|
in the E (exclusive) state. The L1D prefetcher does not issue a RFO
|
|
prefetch.
|
|
This is a demand RFO request
|
|
.It Li HW_INT.CYCLES_PENDING_AND_MASKED
|
|
.Pq Event 27H , Umask 04H
|
|
LONGEST_LAT_CACH E.MISS
|
|
.It Li UOPS_DECODED.DEC0
|
|
.Pq Event 3DH , Umask 01H
|
|
Counts micro-ops decoded by decoder 0.
|
|
.It Li UOPS_DECODED.DEC0
|
|
.Pq Event 01H , Umask 01H
|
|
Counts L1 data cache store RFO requests where the cache line to be loaded is
|
|
in the I state.
|
|
Counter 0, 1 only
|
|
.It Li 0FH
|
|
.Pq Event 41H , Umask 41H
|
|
L1D_CACHE_ST.MESI
|
|
Counts L1 data cache store RFO requests.
|
|
Counter 0, 1 only
|
|
.It Li DTLB_MISSES.PDE_MISS
|
|
.Pq Event 49H , Umask 20H
|
|
Number of DTLB cache misses where the low part of the linear to physical
|
|
address translation was missed.
|
|
.It Li DTLB_MISSES.PDP_MISS
|
|
.Pq Event 49H , Umask 40H
|
|
Number of DTLB misses where the high part of the linear to physical address
|
|
translation was missed.
|
|
.It Li DTLB_MISSES.LARGE_WALK_COMPLETED
|
|
.Pq Event 49H , Umask 80H
|
|
Counts number of completed large page walks due to misses in the STLB.
|
|
.It Li SSE_MEM_EXEC.NTA
|
|
.Pq Event 4BH , Umask 01H
|
|
Counts number of SSE NTA prefetch/weakly-ordered instructions which missed
|
|
the L1 data cache.
|
|
.It Li SSE_MEM_EXEC.STREAMING_STORES
|
|
.Pq Event 4BH , Umask 08H
|
|
Counts number of SSE non temporal stores
|
|
.It Li SFENCE_CYCLES
|
|
.Pq Event 4DH , Umask 01H
|
|
Counts store fence cycles
|
|
.It Li EPT.EPDE_MISS
|
|
.Pq Event 4FH , Umask 02H
|
|
Counts Extended Page Directory Entry misses. The Extended Page Directory
|
|
cache is used by Virtual Machine operating systems while the guest operating
|
|
systems use the standard TLB caches.
|
|
.It Li EPT.EPDPE_HIT
|
|
.Pq Event 4FH , Umask 04H
|
|
Counts Extended Page Directory Pointer Entry hits.
|
|
.It Li EPT.EPDPE_MISS
|
|
.Pq Event 4FH , Umask 08H
|
|
Counts Extended Page Directory Pointer Entry misses. T
|
|
.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_DATA
|
|
.Pq Event 60H , Umask 01H
|
|
Counts weighted cycles of offcore demand data read requests. Does not
|
|
include L2 prefetch requests.
|
|
counter 0
|
|
.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_CODE
|
|
.Pq Event 60H , Umask 02H
|
|
Counts weighted cycles of offcore demand code read requests. Does not
|
|
include L2 prefetch requests.
|
|
counter 0
|
|
.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.RFO
|
|
.Pq Event 60H , Umask 04H
|
|
Counts weighted cycles of offcore demand RFO requests. Does not include L2
|
|
prefetch requests.
|
|
counter 0
|
|
.It Li OFFCORE_REQUESTS_OUTSTANDING.ANY.READ
|
|
.Pq Event 60H , Umask 08H
|
|
Counts weighted cycles of offcore read requests of any kind. Include L2
|
|
prefetch requests.
|
|
counter 0
|
|
.It Li IFU_IVC.FULL
|
|
.Pq Event 81H , Umask 01H
|
|
Instruction Fetche unit victim cache full.
|
|
.It Li IFU_IVC.L1I_EVICTION
|
|
.Pq Event 81H , Umask 02H
|
|
L1 Instruction cache evictions.
|
|
.It Li L1I_OPPORTUNISTIC_HITS
|
|
.Pq Event 83H , Umask 01H
|
|
Opportunistic hits in streaming.
|
|
.It Li ITLB_MISSES.WALK_CYCLES
|
|
.Pq Event 85H , Umask 04H
|
|
Counts ITLB miss page walk cycles.
|
|
.It Li ITLB_MISSES.PMH_BUSY_CYCLES
|
|
.Pq Event 85H , Umask 04H
|
|
Counts PMH busy cycles.
|
|
.It Li ITLB_MISSES.STLB_HIT
|
|
.Pq Event 85H , Umask 10H
|
|
Counts the number of ITLB misses that hit in the second level TLB.
|
|
.It Li ITLB_MISSES.PDE_MISS
|
|
.Pq Event 85H , Umask 20H
|
|
Number of ITLB misses where the low part of the linear to physical address
|
|
translation was missed.
|
|
.It Li ITLB_MISSES.PDP_MISS
|
|
.Pq Event 85H , Umask 40H
|
|
Number of ITLB misses where the high part of the linear to physical address
|
|
translation was missed.
|
|
.It Li ITLB_MISSES.LARGE_WALK_COMPLETED
|
|
.Pq Event 85H , Umask 80H
|
|
Counts number of completed large page walks due to misses in the STLB.
|
|
.It Li ITLB_MISSES.LARGE_WALK_COMPLETED
|
|
.Pq Event 01H , Umask 80H
|
|
Counts number of offcore demand data read requests. Does not count L2
|
|
prefetch requests.
|
|
.It Li OFFCORE_REQUESTS.DEMAND.READ_CODE
|
|
.Pq Event B0H , Umask 02H
|
|
Counts number of offcore demand code read requests. Does not count L2
|
|
prefetch requests.
|
|
.It Li OFFCORE_REQUESTS.DEMAND.RFO
|
|
.Pq Event B0H , Umask 04H
|
|
Counts number of offcore demand RFO requests. Does not count L2 prefetch
|
|
requests.
|
|
.It Li OFFCORE_REQUESTS.ANY.READ
|
|
.Pq Event B0H , Umask 08H
|
|
Counts number of offcore read requests. Includes L2 prefetch requests.
|
|
.It Li OFFCORE_REQUESTS.ANY.RFO
|
|
.Pq Event B0H , Umask 10H
|
|
Counts number of offcore RFO requests. Includes L2 prefetch requests.
|
|
.It Li OFFCORE_REQUESTS.UNCACHED_MEM
|
|
.Pq Event B0H , Umask 20H
|
|
Counts number of offcore uncached memory requests.
|
|
.It Li OFFCORE_REQUESTS.ANY
|
|
.Pq Event B0H , Umask 80H
|
|
Counts all offcore requests.
|
|
.It Li SNOOPQ_REQUESTS_OUTSTANDING.DATA
|
|
.Pq Event B3H , Umask 01H
|
|
Counts weighted cycles of snoopq requests for data. Counter 0 only
|
|
Use cmask=1 to count cycles not empty.
|
|
.It Li SNOOPQ_REQUESTS_OUTSTANDING.INVALIDATE
|
|
.Pq Event B3H , Umask 02H
|
|
Counts weighted cycles of snoopq invalidate requests. Counter 0 only
|
|
Use cmask=1 to count cycles not empty.
|
|
.It Li SNOOPQ_REQUESTS_OUTSTANDING.CODE
|
|
.Pq Event B3H , Umask 04H
|
|
Counts weighted cycles of snoopq requests for code. Counter 0 only
|
|
Use cmask=1 to count cycles not empty.
|
|
.It Li SNOOPQ_REQUESTS_OUTSTANDING.CODE
|
|
.Pq Event BAH , Umask 04H
|
|
Counts number of TPR reads
|
|
.It Li PIC_ACCESSES.TPR_WRITES
|
|
.Pq Event BAH , Umask 02H
|
|
Counts number of TPR writes
|
|
one or two micro-ops. Some instructions are decoded into longer sequences
|
|
.It Li MACHINE_CLEARS.FUSION_ASSIST
|
|
.Pq Event C3H , Umask 10H
|
|
Counts the number of macro-fusion assists
|
|
Counts SIMD packed single- precision floating point Uops retired.
|
|
.It Li BOGUS_BR
|
|
.Pq Event E4H , Umask 01H
|
|
Counts the number of bogus branches.
|
|
.It Li L2_HW_PREFETCH.HIT
|
|
.Pq Event F3H , Umask 01H
|
|
Count L2 HW prefetcher detector hits
|
|
.It Li L2_HW_PREFETCH.ALLOC
|
|
.Pq Event F3H , Umask 02H
|
|
Count L2 HW prefetcher allocations
|
|
.It Li L2_HW_PREFETCH.DATA_TRIGGER
|
|
.Pq Event F3H , Umask 04H
|
|
Count L2 HW data prefetcher triggered
|
|
.It Li L2_HW_PREFETCH.CODE_TRIGGER
|
|
.Pq Event F3H , Umask 08H
|
|
Count L2 HW code prefetcher triggered
|
|
.It Li L2_HW_PREFETCH.DCA_TRIGGER
|
|
.Pq Event F3H , Umask 10H
|
|
Count L2 HW DCA prefetcher triggered
|
|
.It Li L2_HW_PREFETCH.KICK_START
|
|
.Pq Event F3H , Umask 20H
|
|
Count L2 HW prefetcher kick started
|
|
.It Li SQ_MISC.PROMOTION
|
|
.Pq Event F4H , Umask 01H
|
|
Counts the number of L2 secondary misses that hit the Super Queue.
|
|
.It Li SQ_MISC.PROMOTION_POST_GO
|
|
.Pq Event F4H , Umask 02H
|
|
Counts the number of L2 secondary misses during the Super Queue filling L2.
|
|
.It Li SQ_MISC.LRU_HINTS
|
|
.Pq Event F4H , Umask 04H
|
|
Counts number of Super Queue LRU hints sent to L3.
|
|
.It Li SQ_MISC.FILL_DROPPED
|
|
.Pq Event F4H , Umask 08H
|
|
Counts the number of SQ L2 fills dropped due to L2 busy.
|
|
.It Li SEGMENT_REG_LOADS
|
|
.Pq Event F8H , Umask 01H
|
|
Counts number of segment register loads.
|
|
.El
|
|
.Sh SEE ALSO
|
|
.Xr pmc 3 ,
|
|
.Xr pmc.atom 3 ,
|
|
.Xr pmc.core 3 ,
|
|
.Xr pmc.corei7uc 3 ,
|
|
.Xr pmc.iaf 3 ,
|
|
.Xr pmc.k7 3 ,
|
|
.Xr pmc.k8 3 ,
|
|
.Xr pmc.p4 3 ,
|
|
.Xr pmc.p5 3 ,
|
|
.Xr pmc.p6 3 ,
|
|
.Xr pmc.soft 3 ,
|
|
.Xr pmc.tsc 3 ,
|
|
.Xr pmc.ucf 3 ,
|
|
.Xr pmc.westmere 3 ,
|
|
.Xr pmc.westmereuc 3 ,
|
|
.Xr pmc_cpuinfo 3 ,
|
|
.Xr pmclog 3 ,
|
|
.Xr hwpmc 4
|
|
.Sh HISTORY
|
|
The
|
|
.Nm pmc
|
|
library first appeared in
|
|
.Fx 6.0 .
|
|
.Sh AUTHORS
|
|
The
|
|
.Lb libpmc
|
|
library was written by
|
|
.An Joseph Koshy Aq Mt jkoshy@FreeBSD.org .
|