Remove some flaky scenarios, and switch to chat for better reliability
* tests: add single threaded history test Also tidies up some existing tests to handle more model output variation * test: add support for testing specific architectures
* Only load supported models on new engine Verify the model is supported before trying to load * int: testcase for all library models