Commit Graph

422 Commits

Author SHA1 Message Date
Anzo b4473bd3f0
fix(LSU): misalign exception are generated directly within the pipeline (#4757)
EMU Test / Changes Detection (push) Waiting to run Details
EMU Test / Generate Verilog (push) Blocked by required conditions Details
EMU Test / EMU - Basics (push) Blocked by required conditions Details
EMU Test / EMU - CHI (push) Blocked by required conditions Details
EMU Test / Docker Build (push) Blocked by required conditions Details
EMU Test / EMU - Performance (push) Blocked by required conditions Details
EMU Test / EMU - MC (push) Blocked by required conditions Details
EMU Test / SIMV - Basics (push) Blocked by required conditions Details
EMU Test / Upload Artifacts (push) Blocked by required conditions Details
EMU Test / Check Submodules (push) Blocked by required conditions Details
EMU Test / Check Format (push) Blocked by required conditions Details
Release Jobs / build-xsdev-image (push) Waiting to run Details
Previously, when accessing mmio space after splitting, we generated an
exception in misalignbuffer, which prevented the exception address from
being written to exceptionbuffer.
Therefore, we now choose to generate an exception in pipeline so that
the exception address can be written correctly.
2025-06-03 11:23:22 +08:00
Anzo 8769efd7b4
fix(LoadUnit): no longer allow tlb missing misaligned load to enter misalignbuffer (#4760)
EMU Test / Changes Detection (push) Has been cancelled Details
Release Jobs / build-xsdev-image (push) Has been cancelled Details
EMU Test / Generate Verilog (push) Has been cancelled Details
EMU Test / EMU - Basics (push) Has been cancelled Details
EMU Test / EMU - CHI (push) Has been cancelled Details
EMU Test / Docker Build (push) Has been cancelled Details
EMU Test / EMU - Performance (push) Has been cancelled Details
EMU Test / EMU - MC (push) Has been cancelled Details
EMU Test / SIMV - Basics (push) Has been cancelled Details
EMU Test / Upload Artifacts (push) Has been cancelled Details
EMU Test / Check Submodules (push) Has been cancelled Details
EMU Test / Check Format (push) Has been cancelled Details
Misaligned load that cause TLB miss are no longer allowed to enter the
loadmisalignbuffer.
This is because it would cause subsequent exception addresses to be
incorrect :

The original address of the misaligned load is 0x07, while the first
request address after splitting is 0x00.
Thus, when a page fault exception occurs, 0x00 would be considered the
exception address instead of the original 0x07.

---

Store have always been handled this way and do not require modification.
2025-05-31 16:38:27 +08:00
Anzo 14608211a7
fix(LoadUnit): misaligned exception addr should use split addr (#4751) 2025-05-30 15:27:13 +08:00
Yanqin Li 076c8dd2db
fix(NCMM): nc access in main memory should not skip difftest (#4704)
**Bug Trigger:** In a self-modifying program, the program modifies its
own instructions in a region where PBMT=NC and PMA=MM. If difftest is
skipped in this case, NEMU will not execute the corresponding memory
access instruction. This causes NEMU and DUT to execute different
instructions later on, ultimately leading to an error.

**Solution:** For regions where PBMT=NC and PMA=MM, difftest should not
be skipped, since PMA=MM indicates that NEMU can perform normal
synchronization. However, for regions with PMA=IO, difftest should still
be skipped because NEMU might not be able to access the corresponding
devices. Instruction self-modification in PMA=IO regions is generally
not a concern, as such regions are typically non-writable. Therefore,
synchronization of self-modifying IO instructions is not handled here
(as doing so would be overly complex).
2025-05-20 16:02:04 +08:00
Anzo 9c021a8c49
fix(LoadUnit): preventing raw jams caused by misalignment (#4674) 2025-05-10 02:18:50 +08:00
Anzo d76526f545
fix(LoadUnit): misaligned exception addr should use split addr (#4673) 2025-05-10 02:18:26 +08:00
Anzo 6a3636fd23
fix(LoadUnit): perfetch no longer generates nc access (#4636)
EMU Test / Changes Detection (push) Waiting to run Details
EMU Test / Generate Verilog (push) Blocked by required conditions Details
EMU Test / EMU - Basics (push) Blocked by required conditions Details
EMU Test / EMU - CHI (push) Blocked by required conditions Details
EMU Test / EMU - Performance (push) Blocked by required conditions Details
EMU Test / EMU - MC (push) Blocked by required conditions Details
EMU Test / SIMV - Basics (push) Blocked by required conditions Details
EMU Test / Upload Artifacts (push) Blocked by required conditions Details
EMU Test / Check Submodules (push) Blocked by required conditions Details
EMU Test / Check Format (push) Blocked by required conditions Details
We no longer allow software prefetch requests to generate nc signals,
thus preventing weirdness in LoadUnit.
2025-04-29 16:47:00 +08:00
cz4e bd1e467399
fix(LoadUnit, LSQ): fix report exception type for hardware error (#4619)
* mmio or nc should report `Hardware Error` when response with `nderr`
* loadunit should report `Hardware Error` when it should be `delay kill`
from fast replay
2025-04-29 16:34:46 +08:00
Huijin Li efee2982bb
fix(LoadUnit): fix ldld && stld query revoke logic (#4580)
EMU Test / Changes Detection (push) Has been cancelled Details
EMU Test / Generate Verilog (push) Has been cancelled Details
EMU Test / EMU - Basics (push) Has been cancelled Details
EMU Test / EMU - CHI (push) Has been cancelled Details
EMU Test / EMU - Performance (push) Has been cancelled Details
EMU Test / EMU - MC (push) Has been cancelled Details
EMU Test / SIMV - Basics (push) Has been cancelled Details
EMU Test / Upload Artifacts (push) Has been cancelled Details
EMU Test / Check Submodules (push) Has been cancelled Details
EMU Test / Check Format (push) Has been cancelled Details
The prior design reassigns `io.lsq.ldin.bits.rep_info.need_rep` to 0
when source comes from MisalignBuffer, preventing cancellation of
rar/raw enqueue requests during misaligned instruction reissuance.

Thus, we must use `io.misalign_ldout.bits.rep_info.need_rep` to
determine whether to revoke rar/raw enqueue requests when source is from
MisalignBuffer.
2025-04-18 12:32:07 +08:00
Anzo 35bb77967d
fix(LSU): fix exception for misalign access to `nc` space (#4526)
For misaligned accesses, say if the access after the split goes to `nc`
space, then a misaligned exception should also be generated.

Co-authored-by: Yanqin Li <maxpicca@qq.com>
2025-04-14 07:24:32 +08:00
cz4e 4ec1f46275
timing(StoreMisalignBuffer): fix misalign buffer enq timing (#4493)
EMU Test / Changes Detection (push) Waiting to run Details
EMU Test / Generate Verilog (push) Blocked by required conditions Details
EMU Test / EMU - Basics (push) Blocked by required conditions Details
EMU Test / EMU - CHI (push) Blocked by required conditions Details
EMU Test / EMU - Performance (push) Blocked by required conditions Details
EMU Test / EMU - MC (push) Blocked by required conditions Details
EMU Test / SIMV - Basics (push) Blocked by required conditions Details
EMU Test / Upload Artifacts (push) Blocked by required conditions Details
EMU Test / Check Submodules (push) Blocked by required conditions Details
EMU Test / Check Format (push) Blocked by required conditions Details
* a misalign store will enqueue misalign buffer at s1, and revoke if it
needs at s2
2025-04-09 17:53:23 +08:00
Yan Xu 1592abd11e
feat: support inst lifetime trace (#4007)
PerfCCT(performance counter commit trace) is a Instruction-level
granularity perfCounter like GEM5
How to use this:
1. Make with "WITH_CHISELDB=1" argument
2. Run with "--dump-db --dump-select-db lifetime", then get the database
3. Instruction lifetime visualize run "python3 scripts/perfcct.py
"the-db-file-path" -p 1 -v | less"
4. Analysis script now is in XS-GEM5 repo, see
https://github.com/OpenXiangShan/GEM5/blob/xs-dev/util/ClockAnalysis.py

How it works:
1. Allocate one unique tag "seqNum" like GEM5 for each instruction at
fetch stage
2. Passing the "seqNum" in each pipeline
3. Recording perf data through the DPIC interface
2025-04-08 11:21:04 +08:00
Anzo 83e1708387
fix(LoadUnit): not enter misalignbuffer on exception (#4477) 2025-04-01 14:12:52 +08:00
Yanqin Li 0b8a9d16b9
fix(LDU): only selected can be used in address mux (#4466) 2025-03-28 11:37:54 +08:00
Anzo dac94c4957
fix(LoadUnit): uncache should not be generated when page fault (#4442)
As the comment says, even if a `PF` is generated, an address is still
generated for `PMP/PMA` checking, which can lead to some strange
responses.
Since the previous(https://github.com/OpenXiangShan/XiangShan/pull/4426)
modification removed `s2_exception`, this resulted in the incorrect
generation of `s2_uncache`.

This is now represented using clearer semantics:
`s2_actually_uncache`: this real physical address is for uncache space.
The `s2_uncache` has been retained to distinguish if it's a request from
prefetching, which may be handled in a subsequent change to **YQ senior
sister**.

I synchronised the changes to StoreUnit in this
pr(https://github.com/OpenXiangShan/XiangShan/pull/4441).
2025-03-20 19:39:14 +08:00
Anzo bbed9f8de9
fix(LoadUnit): fix misalign exception and clearer uncache semantics (#4426)
The loadAddrMisaligned exception is generated when misaligned accesses
uncache space.

---

A misaligned load sets a loadAddrMisaligned exception at the s0 flag to
ensure that it only enters the loadmisalignbuffer and has no other side
effects.
So it will prevent s2_uncache from spawning properly.
Previously we used an additional `s2_un_misalign_exception` to flag
this.
Now, after examining the semantics of s2_uncache, the semantics of
s2_uncache can be appropriately represented by directly removing the
excepiont related signals
2025-03-17 14:00:10 +08:00
Anzo 522c7f99f1
fix(LSU): misaligned violation detection stuck (#4369)
Since a load instruction that cross 16Byte needs to be split and
accessed twice, it needs to enter the `RAR Queue` twice, but occupies
only one `virtual load queue`, so in the extreme case it may happen that
36 load instructions that span 16Byte fill all 72 `RAR queues`.

---

There is some problem with our previous handling; if the oldest load
instruction spanning 16Byte enters the `replayqueue` and at the same
time there exists an instruction in the `loadmisalignbuffer` that can't
finish executing because the `RAR Queue` is full, then the oldest load
instruction is never cannot be issued because the `loadmisalignbuffer`
has instructions in it all the time.

---

Therefore, we use a more violent scheme to do this.
When the RAR is full, we let the misaligned load generate a rollback,
and the next load instruction that the loadmisalignbuffer can receive
must be the oldest (if it is misaligned).
2025-03-07 11:50:50 +08:00
cz4e 90f8d3cfc2
fix(LoadUnit): exclude prefetch requests (#4367)
* In order to ensure timing, the RAR enqueue conditions need to be
compromised, worst source of timing from `pmp` and `missQueue`.

* if `LoadQueueRARSize` == `VirtualLoadQueueSize`, just need to exclude
prefetching.
    
* if `LoadQueueRARSize` < `VirtualLoadQueueSize`, need to consider the
situation of `s2_can_query`
2025-03-06 19:02:30 +08:00
Anzo 25381b72d6
fix(LoadUnit): misalign wakeup should not set s0 valid (#4359)
`s0_src_valid_vec` is not `s0_src_select_vec`, and bit corresponding to
`s0_src_valid_vec` is valid when any of the inputs `valid`. Therefore,
`misalign wakeup` needs to globally control `s0_valid`.
2025-03-05 14:40:25 +08:00
Anzo 7ea48366e4
fix(LoadUnit): misalign load wakeup not enter loadunit (#4333) 2025-03-03 15:22:24 +08:00
cz4e 0d55e1db4c
timing(LoadQueueRAR, LoadUnit): adjust rar/raw query logic (#4297)
* Because of `LoadQueueRARSize == VirtualLoadQueueSize`, so no need to
add additional logic for rar enq
* When no need fast replay, loadunit allocate raw entry
2025-02-28 11:09:04 +08:00
Yanqin Li 66e9b546ea
fix(LDU): nc is also not mis-aligned (#4326) 2025-02-27 23:10:07 +08:00
cz4e 99ce5576f0
style(Bundles): rewrite bundles with new style (#4274) 2025-02-20 10:52:42 +08:00
Yanqin Li 48f7f553b3
fix(LDU): only tlb hit can use tlb resp (#4293) 2025-02-20 10:36:04 +08:00
Anzo 5a36f63d70
fix(LoadUnit): corrupt should be triggered on valid mshr (#4292) 2025-02-20 10:35:12 +08:00
Yanqin Li 638f3d8429
fix(uncache): uncache load fails to replay (#4275)
Fixed the situation where the nc_with_data was not replayed correctly.
2025-02-17 11:31:36 +08:00
cz4e ccde5272a6
fix(LoadUnit): fix misalign load wrong wakeup (#4263)
when `io.dcache.req.ready` is false, misalign load will be stall, but
`wakeup` still work normally and is not canceled in `s3`, which will
cause the backend to get wrong data.
2025-02-16 17:38:10 +08:00
cz4e 9e12e8edb2
style(Bundles): move bundles to Bundles.scala (#4247) 2025-02-09 01:03:37 +08:00
Anzo faeef3281c
fix(LoadUnit): `dcache_kill` if `prf_wr` has no permissions (#4226)
`prefetch.w` sends a write request to `TLB/PMA/PMP`.
As a result, `PMA/PMP` returns a permission check (`io.pmp.st`) for the
write request.

---

Previously, we only handled the case where `prefetch.r` did not have
read permissions, not handled the case where `prefetch.w` did not have
write permissions.
**So, when `prefetch.w` has an address without write permissions, the
request will still be sent to `Dcache`, which generates an error.**

**This pr fixes that, when `PMA/PMP` returns `io.pmp.st`, we generate
`dcache.s2_kill`.**
2025-01-27 21:48:54 +08:00
Yanqin Li 74050fc0c8
perf(Uncache): add merge policy when entering (#4154)
# Background

## Problem

How to design a more efficient entry rule for a new load/store request
when a load/store with the same address already exists in the `ubuffer`?

* **Old Design**: Always **reject** the new request.
* **New Desig**n: Consider **merging** requests.

## Merge Scenarios

‼️If the new one can be merge into the existing one, both need to be
`NC`.

1. **New Store Request:**
   1. **Existing Store:** Merge (the new store is younger).
   2. **Existing Load:** Reject.

2. **New Load Request:**
1. **Existing Load:** Merge (the new load may be younger or older. Both
are ok to merge).
   2. **Existing Store:** Reject.

# What this PR do? 

## 1. Entry Actions

1. **Allocate** a new entry and mark as `valid`
   1. When there is no matching address.
2. **Allocate** a new entry and mark as `valid` and `waitSame`:
   1. When there is a matching address, and:
      * The virtual addresses and attributes are the same.
      * The older entry is either selected to issue or issued.
3. **Merge** into an Existing Entry:
   1. When there is a matching address, and:
      * The virtual addresses and attributes are the same.
      * The older entry is **not** selected to issue or issued.
4. **Reject** the New Request:
   1. When the ubuffer is full.
   2. When there is a matching address, but:
      * The virtual addresses or attributes are **different**.

**NOTE:** According to the definition in the TL-UL SPEC, the `mask` must
be continuous and naturally aligned, and the `addr` must correspond to
the mask. Therefore, the "**same attributes**" here introduces a new
condition: the merged `mask` must meet the requirements of being
continuous and naturally aligned (function `continueAndAlign`). During
merging, the block offset of addr must be synchronously updated in
`UncacheEntry.update`.

## 2. Handshake Mechanism Between `LoadQueueUncache (M)` and `Uncache
(S)`

> `mid`: master id
>
> `sid`: slave id

**Old Design:**

- `M` sends a `req` with a **`mid`**.
- `S` receives the `req`, records the **`mid`**.
- `S` sends a `resp` with the  **`mid`**.
- `M` receives the `resp` and matches it with the recorded **`mid`**.

**New Design:**

- `M` sends a `req` with a **`mid`**.
- `S` receives the `req` and responds with `{mid, sid}` .
- `M` matches it with the **`mid`** and updates its record with the
received **`sid`**.
- `S` sends a `resp` with the its **`sid`**.
- `M` receives the `resp` and matches it with the recorded **`sid`**.

**Benefit:** The new design allows `S` to merge requests when new
request enters.

## 3. Forwarding Mechanism

**Old Design:** Each address in the `ubuffer` is **unique**, so
forwarding is straightforward based on a match.

**New Design:** 

* A single address may have up to two entries matched in the `ubuffer`.
* If it has two matched enties, it must be true that one entry is marked
`inflight` and the other entry is marked `waitSame`. In this case, the
forwarded data comes from the merged data of two entries, with the
`inflight` entry being the older one.

## 4. Bug Fixes

1. In the `loadUnit`, `!tlbMiss` cannot be directly used as `tlbHit`,
because when `tlbValid` is false, `!tlbMiss` can still be true.
2. `Uncache` state machine transition: The state indicating "**able to
send requests**" (previously `s_refill_req`, now `s_inflight`) should
not be triggered by `reqFire` but rather by `acquireFire`.

<img width="747" alt="image"
src="https://github.com/user-attachments/assets/75fbc761-1da8-43d9-a0e6-615cc58cefef"
/>

# Evaluation

-  timing
-  performance

| Type           | 4B*1000 | Speedup1-IO | 1B*4096 | Speedup2-IO |
| -------------- | ------- | ----------- | ------- | ----------- |
| IO             | 51026   | 1           | 208149  | 1.00        |
| NC             | 42343   | 1.21        | 169248  | 1.23        |
| NC+OT          | 20379   | 2.50        | 160101  | 1.30        |
| NC+OT+mergeOpt | 16308   | 3.13        | 126369  | 1.65        |
| cache          | 1298    | 39.31       | 4410    | 47.20       |
2025-01-26 18:04:11 +08:00
cz4e fa5e530d3c
timing(VSegmentUnit): duplicate latchVAddr (#4209)
* `latchVAddr` needs to index all dcache data sram from top to bottom,
which causes a large fanout, so duplicate `latchVaddr`
2025-01-21 18:46:55 +08:00
cz4e 0b4afd3490
timing(LoadUnit): optimization load unit writeback data generate logic (#4167)
optimization load unit writeback data generate logic
* merge multi source data at `s2`, select and expand data at `s3`
* select data use one-hot instead of shifter
2025-01-15 11:13:37 +08:00
cz4e 37f33e11bc
timing(LoadUnit): fpWen and pdest reg out (#4144)
when loadunit writeback
* **fpWen** uses register directly out
* **pdest** uses register directly out
2025-01-13 14:30:06 +08:00
Anzo 1021e139ea
fix(LoadUnit): `fast replay` no longer requests to `RAR/RAW Queue` (#4149)
For `fast replay`, there is no need to request access to the `RAW/RAW
Queue`.
This prevents the `RAW Queue` from constantly ping-ponging between `not
full/full` due to `revoke`.

These two lines were removed because it would lead to combinatorial
logic loops and it was an unwanted condition:

dfc474ebe1/src/main/scala/xiangshan/mem/pipeline/LoadUnit.scala (L1269-L1270)

---

**This may result in some performance gains.**
2025-01-09 16:04:54 +08:00
Anzo dfc474ebe1
style(LoadUnit): removes redundant 'fast_rep_out' assignments (#4148) 2025-01-09 14:27:37 +08:00
Anzo da51a7acf9
fix(VLSU): fix vector exception writeback to 'MergeBuffer' logic (#4137)
Fixed the bug of abnormal signal loss when writing back.

Previously, we expected to compare only the ports of the writebacks that
triggered the exception and pick the oldest.

But amazingly, I just realised that the implementation doesn't match the
annotation. The current implementation can be problematic in that if
the write-back port that did not have an exception is older, the port that
triggered the exception is not elected.

Use s3_exception to try to optimise timing.
2025-01-07 09:50:11 +08:00
Anzo c75efc00f0
fix(LoadUnit): use `lqIdx` to determine age (#4136)
1. `lqIdx` has less bit width.
2. for vectors, the `robIdx` is the same for multiple `flow`s.
Previously, for vectors, we would additionally use `uopIdx` for
judgement. But actually, in theory, we only need to use `lqIdx/sqIdx`.

Here we change the age judgement for vectors to `lqIdx` to ensure
accurate age judgement. And change the age judgement of scalar to
`lqIdx` as well to reduce the cost.
2025-01-07 09:49:12 +08:00
Anzo 370a252d94
fix(LoadUnit): fix Vector priority related issues (#4101)
Vector load should be the same as scalar load.
Priority judgement needs to be made with the instructions for replay.
Otherwise it will generate a stuck.
2024-12-30 17:28:55 +08:00
Anzo 0aeeba0ea3
fix(LoadUnit): `fastReplay` can only happen once (#4102)
Currently, when `RAW` is full, a `RAW nack` is generated, which leads to
`LoadQueueReplay`.
And when `RAW` is non-empty, commands are reissued from `Replay`.

Currently, a load instruction goes into `LoadUnit` at `S2`, and then if
an exception occurs, a `revoke` is generated at `S3`.
Therefore, this will happen:

`RAW` has only one item remaining.

The instructions in `LoadQueueReplay` are sent to `LoadUnit1`.
The Load instruction also exists in `LoadUnit0`, so `LoadUnit0` has
access to `RAW`, while a Load in `LoadUnit1` produces a `RAW nack`.
And `LoadUnit0` and `LoadUnit1` would generate `bank conflict`, thus
causing `LoadUnit0` to get to `S3` to generate a `fast replay` and
`revoke`, which would result in `RAW` being non-full, which would result
in `RAW in `LoadQueueReplay` nack` command would be allowed to reissue.
The reissued instruction will in turn create a `bank conflict` with
`fast replay` and cause itself to create another `RAW nack` due to
priority issues.

When the above loop expands, it causes this to happen over and over
again, leading to a jam.

`Wu Shen` suggested that this bug could be solved by allowing `fast
replay` to spawn only once.
2024-12-30 14:12:18 +08:00
zhanglinjuan 066ca2498b
fix(MemBlock): support non-data error handling for cacheable region (#4093)
When DCache refill reponses with `denied` or `corrupt` asserted, the
loads belonging to the cache line should report load access fault. This
is accomplished by including a `corrupt` bit in the DCache MSHR
forwarding and TileLink channel D forwarding logic and triggering
excepion when `corrupt` is detected.

Store non-data error that comes from DCache store miss is unable to
trigger a precise access fault trap but an imprecise bus-error
interrupt. And it will be included in another commit.
2024-12-27 18:54:23 +08:00
Anzo 6aee9d0b62
fix(LoadUnit): fix Load misalign related bugs (#4085)
1. Only if no `pf/af` occurs can it be considered a `mmio`. Thus
allowing a non-aligned Load to generate a misalign exception.
The store also suffers from this problem, but I will modify `StoreUnit`
later in some other way

2. Prefetching shouldn't produce non-alignment, and I previously placed
the logic for prefetching processing in the wrong place.
2024-12-27 10:37:09 +08:00
Yanqin Li 519244c70f
submodule(CoupledL2, OpenLLC): support pbmt in CHI scene (#4071)
* L1: deliver the NC and PMA signals of uncacheReq to L2
* L2: [support Svpbmt on CHI
MemAttr](https://github.com/OpenXiangShan/CoupledL2/pull/273)
* LLC: [Non-cache requests are forwarded directly downstream without
entering the slice](https://github.com/OpenXiangShan/OpenLLC/pull/28)
2024-12-25 10:03:38 +08:00
klin02 8b33cd30e0 feat(XSLog): move all XSLog outside WhenContext for collection
As data in WhenContext is not acessible in another module. To support
XSLog collection, we move all XSLog and related signal outside
WhenContext. For example, when(cond1){XSDebug(cond2, pable)} to
XSDebug(cond1 && cond2, pable)
2024-12-23 10:14:24 +08:00
Anzo 0ae34b3816
fix(LoadUnit): fix trigger exception when writeback and wakeup logic (#4057)
When misaligned encounters mmio, we should actually generate the
misaligned exception and write it back directly. Therefore
`s2_real_exception`, instead of `s2_exception`, should be used for
`s_safe_writeback` and `s2_wakeup` judgement.
2024-12-18 11:43:04 +08:00
Anzooooo 562eaa0c86 fix(MemBlock): fix misaligned exception and remove redundant reg from `SQ` 2024-12-17 00:15:56 +08:00
cz4e 72dab9745c
feat(CtrlUnit, DCache): support L1 DCache RAS (#4009)
# L1 DCache RAS extension support

The L1 DCache supports the part of Reliability, Availability, and
Serviceability (RAS) Extension.
* L1 DCache protection with Single Error Correct Double Error Detect
(SECDED) ECC on the RAMs. This includes the L1 DChace tag and data RAMs.
Not recovery error tag or data.
* Fault Handling Interrupt (Bus Error Unit Interrupt,BEU, 65)
* Error inject

## ECC Error Detect
An error might be triggered, when access L1 DCache.
* **Error Report**:
  * Tag ECC Error: As long as an ECC error occurs on a certain path, it
  is judged that an ECC error has occurred.
* Data ECC Error: If an ECC error occurs in the hit line, it is
considered
that an ECC error has occurred. If it does not hit, it will not be
processed.
  * If an instruction access triggers an ECC error, a Hardware error is 
    considered and an exception is reported.
  * Whenever there is an error in starting, an error message needs to
  be sent to BEU.
  * When the hardware detects an error, it reports it to the BEU and
  triggers the NMI external interrupt(65).

* **Load instruction**:
  * Only ECC errors of tags or data will be triggered during execution,
  and the errors will be reported to the BEU and a `Hardware Error`
  will be reported.

* **Probe/Snoop**:
* If a tag ecc error occurs, there is no need to change the cache
status,
     and a `ProbeAck` with `corrupt=1` needs to be returned to l2.
  * If a data ecc error occurs, change the cache status according to
the rules. If data needs to be returned, `ProbeAckData` with `corrupt=1`
    needs to be returned to l2.

* **Replace/Evict**:
  * `ReleaseData` with `corrupt=1` needs to be returned to l2.

* **Store to L1 DCache**:
* If a tag ecc error occurs, the cacheline is released according to the
`Repalce/Evict` process and the data is written to L1 DCache without
    reporting errors to l2.
* If a data ecc error occurs, the data is written directly without
reporting
     the error to l2.

* **Atomics**:
  * report `Hardware Error`, do not report errors to l2.

## Error Inject
Each core's L1 DCache is configured with a memory map
register-controlled
controller, and each hardware unit that supports ECC is configured with
a
control bank. After the Bank register configuration is completed, L1
DCache
will trigger an ecc error for the first access L1 DCache.
<div style="text-align: center;">
<img
src="https://github.com/user-attachments/assets/8c4d23c5-0324-4e52-bcf4-29b47a282d72"
alt="err_inject" width="200" />
</div>

### Address Space
Address space `0x38022000`-`0x3802207F`, a total of 128 bytes of space, 
this space is the local space of each hart.
<div style="text-align: center;">
<img width="292" alt="ctl_bank"
src="https://github.com/user-attachments/assets/89f88b24-37a4-4786-a192-401759eb95cf">
</div>

### L1 DCache Control Bank
Each Control Bank contains registers: `ECCCTL`, `ECCEID`, `ECCMASK`, 
each register is 8 bytes.
<img width="414" alt="eccctl"
src="https://github.com/user-attachments/assets/b22ff437-d05d-4b3c-a353-dbea1afdc156">
* ECCCTL(ECC Control): ECC injection control register.
  * `ese(error signaling enable)`: Indicates that the injection is valid
and is initialized to 0. When the injection is successful and `pst==0`,
  ese will be clean.
  * `pst(persist)`: Continuously inject signals. When `pst==1`, 
  the `ECCEID` 
      counter decreases to 0 and after successful injection, the 
injection timer will be restored to the last set `ECCEID` and
re-injected;
      when `pst==0`, it will be injected only once.
  * `ede(error delay enable)`: Indicates that counter is valid and 
  initialized to 0. If
    * `ese==1` and `ede==0`, error injection is effective immediately.
    * `ese==1` and `ede==1`, you need to wait until `ECCEID` 
    decrements to 0 before the injection is effective.
  * `cmp(component)`: Injection target, initialized to 0.
    * 1'b0: The injection object is tag.
    * 1'b1: The injection object is data.
  * `bank`: The bank valid signal is initialized to 0. When the bit in 
  the `bank` is set, the corresponding mask is valid.
 
<img width="414" alt="ecceid"
src="https://github.com/user-attachments/assets/8cea0d8d-2540-44b1-b1f9-c1ed6ec5341e">

* ECCEID(ECC Error Inject Delay): ECC injection delay controller.
  * When `ese==1` and `ede==1`, it
  starts to decrease until it reaches 0. Currently, the same clock as 
the core frequency is used, which can also be divided. Since ECC 
injection relies on L1 DCache access, the time of the `EID` and the 
time when the ECC error is triggered may not be consistent.

<img width="414" alt="eccmask"
src="https://github.com/user-attachments/assets/b1be83fd-17a6-4324-8aa6-45858249c476">

* ECCMASK(ECC Mask): ECC injection mask register.
  * 0 means no inversion, 1 means flip. 
  Tag injection only uses the bits in `ECCMASK0` corresponding to 
the tag length.

### Error Inject Example
```
1 # set control bank base address
2 mv x3, $(BASEADDR)
3
4 # set eid
5 mv x5, 500 # delay 500 cycles
6 sd x5, 8(x3) # mmio store
7
8 # set mask
9 mv x5, 0x1 # flip bit 0
10 sd x5, 16(x3) # mmio store
11
12 # set ctl
13 mv x5, 0x7 # comp = 0, ede = 1, pst = 1, ese = 1
14 sd x5, 0(x3) # mmio store
```
2024-12-16 19:34:26 +08:00
Anzooooo 1b5499a2d9 fix(LSU): `rfwen` not be set when `WakeUp` cancelled or not need `WakeUp` 2024-12-11 10:11:07 +08:00
Anzooooo b240e1c0b8 feat(Zicclsm): refactoring misalign and support vector misalign 2024-12-11 10:11:07 +08:00
Haoyuan Feng 189833a16f
feat(pointer masking): support Ssnpm & Smnpm & Smmpm (#3921)
feat(pointer masking): support Ssnpm & Smnpm & Smmpm
2024-12-05 14:21:35 +08:00
Yanqin Li e10e20c653 style(pbmt): remove the useless and standardize code
* style(pbmt): remove outstanding constant which is just for self-test

* fix(uncache): added mask comparison for `addrMatch`

* style(mem): code normalization

* fix(pbmt): handle cases where the load unit is byte, word, etc

* style(uncache): fix an import

* fix(uncahce): address match should use non-offset address when forwading

  In this case, to ensure correct forwarding, stores with the same address but overlapping masks cannot be entered at the same time.

* style(RAR): remove redundant design of `nc` reg
2024-12-04 19:25:46 +08:00