* introduce new fields created_by in rule tables
* update domain model and compat layer to support UpdatedBy
* add alert rule generator mutators for UpdatedBy
* ignore UpdatedBy in diff and hash calculation
* Add user context to alert rule insert/update operations
Updated InsertAlertRules and UpdateAlertRules methods to accept a user context parameter. This change ensures auditability and better tracking of user actions when creating or updating alert rules. Adjusted all relevant calls and interfaces to pass the user context accordingly.
* set UpdatedBy in PreSave because this is where Updated is set
* Use nil userID for system-initiated updates
This ensures differentiation between system and user-initiated changes for better traceability and clarity in update origins.
---------
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
* Publish event when one or more rules are changed
* Publish affected rules
* Use a fake bus to test publish event without listening
* Wire alerting store into provisioning service
* Revert "chore: add replDB to team service (#91799)"
This reverts commit c6ae2d7999.
* Revert "experiment: use read replica for Get and Find Dashboards (#91706)"
This reverts commit 54177ca619.
* Revert "QuotaService: refactor to use ReplDB for Get queries (#91333)"
This reverts commit 299c142f6a.
* Revert "refactor replCfg to look more like plugins/plugin config (#91142)"
This reverts commit ac0b4bb34d.
* Revert "chore (replstore): fix registration with multiple sql drivers, again (#90990)"
This reverts commit daedb358dd.
* Revert "Chore (sqlstore): add validation and testing for repl config (#90683)"
This reverts commit af19f039b6.
* Revert "ReplStore: Add support for round robin load balancing between multiple read replicas (#90530)"
This reverts commit 27b52b1507.
* Revert "DashboardStore: Use ReplDB and get dashboard quotas from the ReadReplica (#90235)"
This reverts commit 8a6107cd35.
* Revert "accesscontrol service read replica (#89963)"
This reverts commit 77a4869fca.
* Revert "Fix: add mapping for the new mysqlRepl driver (#89551)"
This reverts commit ab5a079bcc.
* Revert "fix: sql instrumentation dual registration error (#89508)"
This reverts commit d988f5c3b0.
* Revert "Experimental Feature Toggle: databaseReadReplica (#89232)"
This reverts commit 50244ed4a1.
Removes legacy alerting, so long and thanks for all the fish! 🐟
---------
Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com>
Co-authored-by: Sonia Aguilar <soniaAguilarPeiron@users.noreply.github.com>
Co-authored-by: Armand Grillet <armandgrillet@users.noreply.github.com>
Co-authored-by: William Wernert <rwwiv@users.noreply.github.com>
Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
* update GetUserVisibleNamespaces to use FolderSeriver
* update GetNamespaceByUID to use FolderService.GetFolders
* update GetAlertRulesForScheduling to use FolderService.GetFolders
* Update API and GetAlertRulesForScheduling to use the folder's full path
* get full path of folder in RouteTestGrafanaRuleConfig
* fix escaping of titles for MySQL
* Alerting: Move migration from background service run to ngalert init
sqlite database write contention between the migration's single transaction and
dashboard provisioning's frequent commits was causing the migration to
fail with SQLITE_BUSY/SQLITE_BUSY_SNAPSHOT on all retries.
This is not a new issue for sqlite+grafana, but the discrepancy between the
length of the transactions was causing it to be very consistent. In addition,
since a failed migration has implications on the assumed correctness of the
alertmanager and alert rule definition state, we cause a server shutdown on
error. This can make e2e tests as well as some high-load provisioned
sqlite installations flaky on startup.
The correct fix for this is better transaction management across various
services and is out of scope for this change as we're primarily interested in
mitigating the current bout of server failures in e2e tests when using sqlite.
* add folder data migration, fix unique index
* fix unique index
* pass a fake store in tests
* pass store into other providers in tests
* and now with alerting!
* Let alert rule service implement registry service
* Add count method to RuleStore interface
* Add implementation for deletion of alert rules
* Rename uid to folderUID in registry methods
* Check forceDeleteRule value for registry deletion
* Register alerting store with folder service
* Move folder test functions to separate package
* Add testing for alert rule counting, deletion
* Remove redundant count method
* Fix deleteChildrenInFolder signature
* Update pkg/services/ngalert/store/alert_rule.go
Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
* Add tests for nested folder deletion
* Refactor TestIntegrationNestedFolderService
* Add rules store as parameter for alertng provider
---------
Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
* revert to using folder store from the resolvers
* fixing tests after revert
* api test fixes
---------
Co-authored-by: Kristin Laemmert <mildwonkey@users.noreply.github.com>
* chore(services): replace dependencies on dashboard store with dashboard service
This continues the backend service/store split by replacing dashboard store dependencies with service dependencies. the folder service remains the single exception for now; otherwise we'd have a dependency cycle between the folder and dashboard services. I have some ideas for that, but I'll take care of all the easy parts first.
While doing this, I identified and removed a number of unused arguments from the following functions:
NewFolderNameScopeResolver
NewFolderIDScopeResolver
NewFolderUIDScopeResolver
NewDashboardIDScopeResolver
NewDashboardUIDScopeResolver
resolveDashboardScope
I have a small enterprise PR to support this commit.
* lingering fmt
* Use suggested value for uid
* update the snapshot
* use __expr__
* replace all -100 with __expr__
* update snapshot
* more changes
* revert redundant change
* Use expr.DatasourceUID where it's possible
* generate files
* add nested folder scope inheritance to managed permission services
* add a more specific erorr
* remove circular dependencies
* use errutil for returning erorr
* fix tests
* fix tests
* define a new error in ac package
* Access Control: Add folder service dependency to the dashboard/folder resolvers
* Expose the function fetching parents to folder interface
* Add generic prepend utility
* Modify dashboard resolvers to return inherited scopes
* add feature flag `alertingNoNormalState`
* update instance database to support exclusion of state in list operation
* do not save normal state and delete transitions to normal
* update get methods to filter out normal state
* Nested Folders: Support getting of nested folder in folder service when feature flag is set
* Fix lint
* Fix some tests
* Fix ngalert test
* ngalert fix
* Fix API tests
* Fix some tests and lint
* Fix lint 2
* Fix library elements and panels
* Add access control to get folder
* Cleanup and minor test change
* chore: add alias for InitTestDB and Session
Adds an alias for the sqlstore InitTestDB and Session, and updates tests using these to reduce dependencies on the sqlstore.Store.
* next pass of removing sqlstore imports
* last little bit
* remove mockstore where possible
* Chore: move folder service interface into a separate package
* copy implementation into a standalone package
* move implementation and tests to the new folder package
* remove leftovers from wire
* add test doubles for folder service
* fix tests in library panels/elements
* fix provideservice in ngalert
Prior to this change, all alert instance writes and deletes happened
individually, in their own database transaction. This change batches up
writes or deletes for a given rule's evaluation loop into a single
transaction before applying it.
These new transactions are off by default, guarded by the feature toggle "alertingBigTransactions"
Before:
```
goos: darwin
goarch: arm64
pkg: github.com/grafana/grafana/pkg/services/ngalert/store
BenchmarkAlertInstanceOperations-8 398 2991381 ns/op 1133537 B/op 27703 allocs/op
--- BENCH: BenchmarkAlertInstanceOperations-8
util.go:127: alert definition: {orgID: 1, UID: FovKXiRVzm} with title: "an alert definition FTvFXmRVkz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: foDFXmRVkm} with title: "an alert definition fovFXmRVkz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: VQvFuigVkm} with title: "an alert definition VwDKXmR4kz" interval: 60 created
PASS
ok github.com/grafana/grafana/pkg/services/ngalert/store 1.619s
```
After:
```
goos: darwin
goarch: arm64
pkg: github.com/grafana/grafana/pkg/services/ngalert/store
BenchmarkAlertInstanceOperations-8 1440 816484 ns/op 352297 B/op 6529 allocs/op
--- BENCH: BenchmarkAlertInstanceOperations-8
util.go:127: alert definition: {orgID: 1, UID: 302r_igVzm} with title: "an alert definition q0h9lmR4zz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: 71hrlmR4km} with title: "an alert definition nJ29_mR4zz" interval: 60 created
util.go:127: alert definition: {orgID: 1, UID: Cahr_mR4zm} with title: "an alert definition ja2rlmg4zz" interval: 60 created
PASS
ok github.com/grafana/grafana/pkg/services/ngalert/store 1.383s
```
So we cut time by about 75% and memory allocations by about 60% when
storing and deleting 100 instances.