EMU Test / Changes Detection (push) Has been cancelledDetails
EMU Test / Generate Verilog (push) Has been cancelledDetails
EMU Test / EMU - Basics (push) Has been cancelledDetails
EMU Test / EMU - CHI (push) Has been cancelledDetails
EMU Test / EMU - Performance (push) Has been cancelledDetails
EMU Test / EMU - MC (push) Has been cancelledDetails
EMU Test / SIMV - Basics (push) Has been cancelledDetails
EMU Test / Upload Artifacts (push) Has been cancelledDetails
EMU Test / Check Submodules (push) Has been cancelledDetails
EMU Test / Check Format (push) Has been cancelledDetails
According to the current RISC-V specification, the Svnapot extension
defines pages of size 64KB. Therefore, when performing TLB or bypass hit
tag matching, the lower 4 bits of the VPN and tag do not need to be
compared.
In MMU, the hit determination logic typically consists of two parts: tag
matching and level matching. For pages of different levels—512GB, 1GB,
2MB, or 4KB—different bit fields of the tag need to be matched. In our
implementation, the napot case is treated as a variant of the 4KB page.
As such, we need to modify the 4KB tag matching logic (`tag_match(0)`)
accordingly so that when napot is enabled, the lower 4 bits of the tag
are ignored during comparison.
However, the MMU codebase contains numerous `def hit` definitions (which
is also one of the reasons for the code complexity), and previously we
did not account for all cases or modify all relevant `def hit`
definitions appropriately. This commit fixes those bugs.
In addition, this commit also removes some redundant code and introduces
a few code additions to slightly improve overall readability.
Consider the following two-stage address translation process:
In the vs-stage, a 2MB large page is found during the lookup, which
means the lower 9 bits of the resulting ppn are zeros. When generating
the gvpn for the g-stage, it is formed as `gvpn = {s1_ppn, s1_vpn(9
bits)}`.
Then, in the g-stage translation, a 1GB large page is found, so the
lower 9 * 2 = 18 bits of the resulting ppn are zeros. The final physical
page number is constructed as ppn = `{s2_ppn, s2_gvpn(18 bits)}` =
`{s2_ppn, s1_ppn(9 bits), s1_vpn(9 bits)}`.
In other words, if the g-stage page is larger than the vs-stage page,
the final ppn should be composed of three parts: s2_ppn, s1_ppn, and
s1_vpn.
However, in `handle_block`, the original implementation incorrectly
concatenated the lower 18 bits of the ppn solely from s1_vpn, i.e.,
`{s2_ppn, s1_vpn(18 bits)}`, instead of the correct `{s2_ppn, s1_ppn(9
bits), s1_vpn(9 bits)}`. This commit fixes that bug.
EMU Test / Changes Detection (push) Waiting to runDetails
EMU Test / Generate Verilog (push) Blocked by required conditionsDetails
EMU Test / EMU - Basics (push) Blocked by required conditionsDetails
EMU Test / EMU - CHI (push) Blocked by required conditionsDetails
EMU Test / EMU - Performance (push) Blocked by required conditionsDetails
EMU Test / EMU - MC (push) Blocked by required conditionsDetails
EMU Test / SIMV - Basics (push) Blocked by required conditionsDetails
EMU Test / Upload Artifacts (push) Blocked by required conditionsDetails
EMU Test / Check Submodules (push) Blocked by required conditionsDetails
EMU Test / Check Format (push) Blocked by required conditionsDetails
For implementations that support Smcdeleg, Sscofpmf, and Smaia, the
local counter overflow interrupt (LCOFI) bit (bit 13) in each of CSRs
mvip and mvien is implemented and writable.
For implementations that support Smcdeleg/Ssccfg, Sscofpmf, Smaia/Ssaia,
and the H extension, the LCOFI bit (bit 13) in each of hvip and hvien is
implemented and writable.
* xstateen.IMSIC should control to access sireg/vsireg when
siselect/vsiselect is between 0x70 and 0xff.
* xstateen.AIA should control to access sireg when
siselect is between 0x30 and 0x3f.
EMU Test / Changes Detection (push) Has been cancelledDetails
EMU Test / Generate Verilog (push) Has been cancelledDetails
EMU Test / EMU - Basics (push) Has been cancelledDetails
EMU Test / EMU - CHI (push) Has been cancelledDetails
EMU Test / EMU - Performance (push) Has been cancelledDetails
EMU Test / EMU - MC (push) Has been cancelledDetails
EMU Test / SIMV - Basics (push) Has been cancelledDetails
EMU Test / Upload Artifacts (push) Has been cancelledDetails
EMU Test / Check Submodules (push) Has been cancelledDetails
EMU Test / Check Format (push) Has been cancelledDetails
Previously, we set addrvalid to be set only at s1.
For misaligned stores, if it will go into misalignbuffer at s1, it may
generate a revoke at s2 due to an exception.
If no revoke is generated, the addrvalid should not be set because the
addrvalid will be set by the subsequent split request.
And if a revoke is generated, then an exception is generated and the
addrvalid should be set at s2 to make this uop available for enq.
1. Level: Previously, the code always used the smaller value between
s1.stage and s2.stage, regardless of the virtualization stage. In fact,
only the allStage case should compare both stages; other cases should
determine the level independently based on their respective stage.
2. VPN: Previously, only the allStage case checked the level to decide
whether to concatenate the lower bits of the VPN. However, in reality,
other cases also need to perform VPN concatenation based on the level.
Bug descriptions:
* a younger misalign store enq at `s1`, and revoke at `s2`, but
`req_valid` not set to false. As a result, it is not freed, and the
later older misalign store cannot enter the buffer and cannot be
completed, resulting in a stuck state.
How to fix:
* when a misalign store enq at `s1`, but revoke at `s2`, `req_valid`
will be set `false` at `s2`
* mmio or nc should report `Hardware Error` when response with `nderr`
* loadunit should report `Hardware Error` when it should be `delay kill`
from fast replay
EMU Test / Changes Detection (push) Waiting to runDetails
EMU Test / Generate Verilog (push) Blocked by required conditionsDetails
EMU Test / EMU - Basics (push) Blocked by required conditionsDetails
EMU Test / EMU - CHI (push) Blocked by required conditionsDetails
EMU Test / EMU - Performance (push) Blocked by required conditionsDetails
EMU Test / EMU - MC (push) Blocked by required conditionsDetails
EMU Test / SIMV - Basics (push) Blocked by required conditionsDetails
EMU Test / Upload Artifacts (push) Blocked by required conditionsDetails
EMU Test / Check Submodules (push) Blocked by required conditionsDetails
EMU Test / Check Format (push) Blocked by required conditionsDetails
This allows the Git commit SHA and dirty status to be known directly
from the simulation output.
It helps in scenarios such as responding to issues in the XiangShan
repository, where the version used can be identified directly from the
provided log file.
## Bug Description
A represents a normal store, and B represents an NC (Non-Cacheable)
store. In the SQ (Store Queue), there exists a sequence like A1 B1 B2 B3
B4 .... A1 enters the `dataBuffer`, and `rdataPtr` moves to B1. Due to a
blockage in the `Sbuffer`, A1 is unable to be issued, and `deqPtr`
remains at A1.
At `rdataPtr`, B1 is found and successfully sent to the `UncacheBuffer`,
causing `rdataPtr` to move forward. Since the NC handshake succeeds,
`deqPtr` also moves forward (from A1 to B1), even though A1 itself has
not completed.
Subsequently, as NC(Bx) handshakes continue to succeed, `deqPtr` keeps
advancing. Meanwhile, `enqPtr` continues to insert new entries,
eventually overwriting the original A1 entry.
When the `Sbuffer` finally becomes available, the design attempts to
clear (set `false.B`) the allocated status of the dequeueing entry of
`dataBuffer` based on its original `sqIdx`. However, at this point, the
A1 slot already holds a new entry, leading to incorrect deallocation.
Later, when `rdataPtr` cycles back to A1’s position, it finds
`allocated(A1)` is false, resulting in no action, causing all pointers
to stall and ultimately deadlock.
## Bug Analysis
The deadlock occurs because `deqPtr` is not updated in strict order.
* Before introducing NC support, the design guaranteed ordering due to
the sequential nature of `Sbuffer` (for regular stores) and MMIO
(head-of-queue processing).
* After introducing NC, when running mixed tests with both NC and normal
stores, the issue arises: there is a cycle gap between when entries
complete and when `deqPtr` updates, allowing later entries to complete
while earlier ones remain unfinished.
## Solution
The current solution introduces a new `completed` signal. Instead of
directly advancing `deqPtr` based on previous conditions, the
corresponding entry’s `completed` flag is first set to true. Then,
`deqPtrNext` is updated according to whether the entry pointed to by
`deqPtr` is completed.
This approach preserves the characteristic of updating at most two
entries per cycle, as in the original PR, and leverages the two-cycle
write delay of `Sbuffer` along with registered `completed` signals,
ensuring that `deqPtr` still raises in the last two cycles for normal
stores, thus maintaining their performance.
However, for MMIO and NC stores, `deqPtr` will now be delayed by one
cycle.
**Please wait for the results of the verification feedback.**
---
Adjusted a problem caused by expanding loops within for loops.
The expected condition for `ncstall` is:
` val ncStall = if(i == 0) nc(rdataPtrExt(0).value) else
(nc(rdataPtrExt(i).value) || nc(rdataPtrExt(i-1).value)) `.
Since expanding the assignment in the loop:
```
dataBuffer.io.enq(0).valid := canDeqMisaligned && allocated(rdataPtrExt(0).value) && committed(rdataPtrExt(0).value) &&
((!isVec(rdataPtrExt(0).value) && allvalid(rdataPtrExt(0).value) || vecMbCommit(rdataPtrExt(0).value)) &&
(!isCross4KPage || isCross4KPageCanDeq) || hasException(rdataPtrExt(0).value)) && !ncStall
dataBuffer.io.enq(1).valid := canDeqMisaligned && allocated(rdataPtrExt(0).value) && committed(rdataPtrExt(0).value) &&
(!isVec(rdataPtrExt(0).value) && allvalid(rdataPtrExt(0).value) || vecMbCommit(rdataPtrExt(0).value)) &&
(!isCross4KPage || isCross4KPageCanDeq) && !hasException(rdataPtrExt(0).value) && !ncStall
```
This causes ncstall to always use the result of the last loop:
`(nc(rdataPtrExt(i).value) || nc(rdataPtrExt(i-1).value))`.
which can lead to jams, and `!ncstall` is now no longer needed for the
`enq.valid` condition for misaligned accesses.
for misaligned accesses to `nc space`, because misaligned accesses to
`nc space` are supposed to throw an exception.
---
Some `enq.valid` conditions for `DataBuffer` have been briefly
organized.
By default, XiangShan uses a fixed 48-bit physical address width, which
is not configurable. However, some SoCs require support for different
address widths (e.g., CHI buses support 44-52-bit addressing). To
accommodate these SoC needs, this pr introduces a parameterized physical
address width configured via `CHI_ADDR_WIDTH`. Key notes:
1. `CHI_ADDR_WIDTH` only modifies the address width for interactions
between CoupledL2 and the CHI bus. Addresses within CoupledL2 and XSCore
remain 48-bit, incurring some area overhead but functionally correct.
2. If `CHI_ADDR_WIDTH` < 48, CoupledL2 truncates the upper bits of
addresses. As for snoops, truncated bits are treated as zero. Therefore
It is critical to configure PMA at compile time to prevent XiangShan
from generating address beyond the `CHI_ADDR_WIDTH`-defined address
space.
EMU Test / Changes Detection (push) Waiting to runDetails
EMU Test / Generate Verilog (push) Blocked by required conditionsDetails
EMU Test / EMU - Basics (push) Blocked by required conditionsDetails
EMU Test / EMU - CHI (push) Blocked by required conditionsDetails
EMU Test / EMU - Performance (push) Blocked by required conditionsDetails
EMU Test / EMU - MC (push) Blocked by required conditionsDetails
EMU Test / SIMV - Basics (push) Blocked by required conditionsDetails
EMU Test / Upload Artifacts (push) Blocked by required conditionsDetails
EMU Test / Check Submodules (push) Blocked by required conditionsDetails
EMU Test / Check Format (push) Blocked by required conditionsDetails
## Bug Discovery
The Svpbmt CI of master at
https://github.com/OpenXiangShan/XiangShan/actions/runs/14639358525/job/41077890352
reported the following implicit output error:
```
check_misa_h PASSED
test_pbmt_perf
TEST: read 4 Bytes 1000 times
Svpbmt IO test...
addr:0x10006d000
start: 8589, end: 59845, ticks: 51256
Svpbmt NC test...
addr:0x10006c000
start: 67656, end: 106762, ticks: 39106
Svpbmt NC OUTSTANDING test...
smblockctl = 0x3f7
addr:0x10006c000
start: 118198, end: 134513, ticks: 16315
Svpbmt PMA test...
addr:0x100000000
start: 142696, end: 144084, ticks: 1388
PASSED
test_pbmt_ldld_violate ERROR: untested exception! cause NO: 5
(mhandler, 219)
[FORK_INFO pid(1251274)] clear processes...
Core 0: HIT GOOD TRAP at pc = 0x80005d64
Core-0 instrCnt = 174,141, cycleCnt = 240,713, IPC = 0.723438
```
## Design Background
For NC (Non-Cacheable) store operations, the handshake logic between the
StoreQueue and Uncache is as follows:
1. **Without Outstanding Enabled:**
In the `nc_idle` state, when an executable `nc store` is encountered, it
transitions to the `nc_req` state. After `req.fire`, it moves to the
`nc_resp` state. Once `resp.fire` is triggered, it returns to `nc_idle`,
and both `rdataPtrExtNext` and `deqPtrExtNext` are updated to handle the
next request.
2. **With Outstanding Enabled:**
In the `nc_idle` state, upon encountering an executable `nc store`, it
transitions to the `nc_req` state. After `req.fire`, it **returns to
`nc_idle`** (Point A). Once the request is fully written into Uncache,
i.e., upon receiving `ncSlaveAck` (Point B), it updates
`rdataPtrExtNext` and `deqPtrExtNext` to handle the next request.
## Bug Description
In the above scenario, since the transition to `nc_idle` at Point A
occurs earlier (by two cycles) than Point B due to timing differences,
the `rdataPtr` at Point A still points to the location of the previous
uncache request (let’s call it NC1). The condition for sending uncache
request is still met at this moment, leading Point A to issue a
**duplicate `uncache` request** for NC1.
By the time Point B occurs, **two identical requests for NC1** have
already been sent. At Point B, `rdataPtr` is updated to proceed to the
next request. However, when the **second `ncSlaveAck`** for NC1 returns,
`rdataPtr` is updated **again**, causing it to move forward **twice**
for a single request. This eventually results in one of the following
requests never being executed.
## Bug Fix
Given that multiple cycles are required to ensure that a request is
fully written to Uncache, a new state called `nc_req_ack` is introduced.
The revised handshake logic with outstanding enabled is as follows:
In the `nc_idle` state, when an executable `ncstore` is encountered, it
transitions to the `nc_req` state. After `req.fire`, it moves to the
`nc_req_ack` state. Once the request is fully written to Uncache and
`ncSlaveAck` is received, it transitions back to `nc_idle`, and updates
`rdataPtrExtNext` and `deqPtrExtNext` to handle the next request.
Bump CoupledL2, this pr includes:
1. set data SRAM's dataSplit = 8
* Set data SRAM(`dataArray` in `DataStorage`) dataSplit = 8.
Previously the dataSplit = 4 and encDataBankBits = 137,
due to area demand, the `dataArray` SRAM bankBits should
be 69. Therefore, after ECC encode, the data need further
split = 2, and add 0 padding(4 bits) each cache line.
* Avoid tag split when tag SRAM's `dataSplit` requirement cannot
be met. This occurs when L2 size changes or `dataSplit` changes
or address width.
* Parameterize Split of tag and data.
2. remove unused register of WriteEvictOrEvict logics
3. remove deprecated cache step
4. support parameterized addr width by cde
* AIA Spec:
* Ties in nominal priority are broken as usual by the default priority
* order from Table 8, unless hvictl fields VTI = 1 and IID ≠ 9
* (last item in the candidate list above), in which case
* default priority order is determined solely by hvictl.DPR.
* If bit IPRIOM (IPRIO Mode) of hvictl is zero, IPRIO in vstopi is 1;
* else, if the priority number for the highest-priority candidate
* is within the range 1 to 255, IPRIO is that value; else, IPRIO
* is set to either 0 or 255 in the manner documented for stopi
* in Section 5.4.2.
* If all bytes of the supervisor-level iprio array are read-only zeros,
* a simplified implementation of field IPRIO is allowed in which
* its value is always 1 whenever stopi is not zero.
*
* We are configurable and do not need to simplify the implementation.
For "unit-stride access with element granularity misaligned and emul<0",
it could be the case that:
has only once valid elements, but splits into two flows(misaligned),
which would result in the `elemidx` being the same, making it impossible
for the exception handling logic in the `mergebuffer` to recognise the
correct order.
Instead of adding a new variable, we have chosen to reuse `elemidx` as a
marker. But this does pollute the original semantics of `elemidx`.
The miselect register implements at least enough bits to support all
implemented miselect values.
The siselect register will support the value range 0..0xFFF at a
minimum.
The vsiselect register will support the value range 0..0xFFF at a
minimum.