Commit Graph

7 Commits

Author SHA1 Message Date
Benjamin Trent 14ca8fee20
[ML] add support for xlm_roberta tokenized models (#94089)
Many multi-lingual and newer models use a tokenization scheme similar to
sentence-piece. This PR adds support for one of those tokenization
schemes, XLMRoBERTa. 

The main changes are:  - Support for xlm_roberta tokenization
configuration  - Adding `scores` to the vocabulary document stored,
requiring that scores be the same size as the vocabulary  - Adding a new
flat text file to resources that is the spm char normalizer.
2023-06-13 08:40:55 -04:00
David Roberts 6fa3d73fd5
[ML] Make native inference generally available (#92213)
Previously this functionality was beta. This PR changes it to GA.
2022-12-12 15:43:30 +00:00
Nik Everett 6481342466
Fix sneaky docs test failure (#91829)
This prevents docs files from *starting* with a "response" because when
that happens the response is converted to an assertion and appended
to the last snippet that was processed. If that last snipper was in a
different file then it's very hard to reason about the tests. That goes
double because the order we iterate files isn't defined....

Anyway! This adds a guard in the build, removes the offending
"response", and reenables the tests that we'd thought we failing here.

Closes #91081
2022-12-07 11:02:44 -05:00
David Roberts d9ea080d10
[ML] Release native inference functionality as beta (#90418)
Previously this functionality was tech preview (aka experimental).
This PR changes it to beta.
2022-09-28 11:09:02 +01:00
Lisa Cawley 89a3e18e10
[DOCS] Add preview admonition to infer API (#86486) 2022-05-05 13:49:02 -07:00
Benjamin Trent 258d2b71e2
[ML] add roberta/bart docs (#85001)
adds roberta section to NLP tokenization documentation.
2022-03-17 12:14:57 -04:00
Lisa Cawley 429bdd9afc
[DOCS] Move trained model APIs out of dataframe analytics (#81315) 2021-12-03 09:21:09 -08:00