-
-
Notifications
You must be signed in to change notification settings - Fork 80
Comparing changes
Open a pull request
base repository: jf-tech/omniparser
base: v1.0.4
head repository: jf-tech/omniparser
compare: master
- 17 commits
- 67 files changed
- 3 contributors
Commits on Sep 19, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 0f61d34 - Browse repository at this point
Copy the full SHA 0f61d34View commit details
Commits on Oct 14, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 4f51e0a - Browse repository at this point
Copy the full SHA 4f51e0aView commit details
Commits on Dec 6, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 0637d60 - Browse repository at this point
Copy the full SHA 0637d60View commit details
Commits on Jan 7, 2023
-
Configuration menu - View commit details
-
Copy full SHA for a3adf70 - Browse repository at this point
Copy the full SHA a3adf70View commit details -
Configuration menu - View commit details
-
Copy full SHA for 862e398 - Browse repository at this point
Copy the full SHA 862e398View commit details
Commits on Mar 13, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 6fa373c - Browse repository at this point
Copy the full SHA 6fa373cView commit details
Commits on Apr 5, 2023
-
Configuration menu - View commit details
-
Copy full SHA for b4d60ad - Browse repository at this point
Copy the full SHA b4d60adView commit details
Commits on Jun 29, 2023
-
allowing for extra properties in edi file declaration (#207)
Adding `_comment` property to EDI schema to annotate segments and/or elements.
Configuration menu - View commit details
-
Copy full SHA for 1604d8f - Browse repository at this point
Copy the full SHA 1604d8fView commit details
Commits on Jul 19, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 37370fa - Browse repository at this point
Copy the full SHA 37370faView commit details
Commits on Jul 25, 2023
-
Multi-lined envelope reading corruption caused by direct reference in…
…to bufio.Reader's internal buffer rollover. (#214) BUG: #213 If we're dealing with multi-lined envelope (either by rows or by header/footer), readLine() will be called several times, thus whatever ios.ByteReadLine, which uses bufio.Reader underneath, returns in a previous call may be potentially be invalidated due to bufio.Reader's internal buf rollover. If we read the previous line directly, it would cause corruption. To fix the problem the easiest solution would be simply copying the return []byte from ios.ByteReadLine every single time. But for files with single-line envelope, which are the vast majority cases, this copy becomes unnecessary and burdensome on gc. So the trick is to has a flag on reader.linesBuf's last element to tell if it contains a reference into the bufio.Reader's internal buffer, or it's a copy. Every time before we call bufio.Reader read, we check reader.liensBuf's last element flag, if it is not a copy, then we will turn it into a copy. This way, we optimize for the vast majority cases without needing allocations, and avoid any potential corruptions in the multi-lined envelope cases.
Configuration menu - View commit details
-
Copy full SHA for 9e0c8da - Browse repository at this point
Copy the full SHA 9e0c8daView commit details -
Introducing
repetition_delimiter
to EDI schema. (#215)Issue: #212 `repetition_delimiter`: delimiter to separate multiple data instances for an element. For example, if `^` is the repetition delimiter for a segment `DMG*D8*19690815*M**A^B^C^D~`, then the last element has 4 pieces of data: `A`, `B`, `C`, and `D`. Any element without `repetition_delimiter` present has essentially one piece of data; similarly, if `^` is the repetition delimiter for a segment `CLM*A37YH556*500***11:B:1^12:B:2~`, the last element has 2 pieces of data: `11:B:1` and `12:B:2`, each of which is further delimited by a `component_delimiter` `:`. Note, since `repetition_delimiter` creates multiple pieces of data under the same element name in the schema, in most cases the suitable construct type in `transform_declarations` is `array`. Currently we read in all the elements and their components in serial in `NonValidatingReader` into a slice: `[]RawSegElem`, each of which contains the element value, the element index, and component index if there are more than 1 component. When `repetition_delimiter` is added, we continue down the same pattern: `NonValidatingReader` still reads everything into the slice, except now, there potentially can be multiple `RawSegElem` share the same `ElemIndex` and `CompIndex`. Using the example above: `^` is the rep delim and seg is `CLM*A37YH556*500***11:B:1^12:B:2~`. After `NonValidatingReader.Read()` is done, we'll have the following `[]RawSegElem` (simplified): ``` { {'CLM', ElemIndex: 0, CompIndex: 1}, {'A37YH556', ElemIndex: 1, CompIndex: 1}, {'500', ElemIndex: 2, CompIndex: 1}, {'', ElemIndex: 3, CompIndex: 1}, {'', ElemIndex: 4, CompIndex: 1}, {'', ElemIndex: 4, CompIndex: 1}, {'11', ElemIndex: 5, CompIndex: 1}, {'B', ElemIndex: 5, CompIndex: 2}, {'1', ElemIndex: 5, CompIndex: 3}, {'12', ElemIndex: 5, CompIndex: 1}, {'B', ElemIndex: 5, CompIndex: 2}, {'2', ElemIndex: 5, CompIndex: 3}, } ``` Note the last 3 elements have the same `ElemIndex` and `CompIndex` as the previous 3 elements. This behavior is new and introduced in this PR. Now on the EDI reader side (reader.go), previously when we match element decl against the raw element slice, we only do one way scan, because `ElemIndex` and `CompIndex` are always increase, thus we never need to back-scan. With introduction of potentially duplicate `ElemIndex` and `CompIndex`, now for each of the element decl, we simply do a full `[]RawSegElem` scan. Yes, it is a bit more expensive but given usually the number of total elements and components in a seg is really really small (around 20), we feel this trade-off is acceptable without making the already-complex code even more so. With this reader change, the IDR produced will potentially contain child element nodes with the same element name. Thus in schema writing, it's practically required that the user of the `repetition_delimiter` feature needs to use `array` type in the `transform_declarations`.
Configuration menu - View commit details
-
Copy full SHA for 79a540b - Browse repository at this point
Copy the full SHA 79a540bView commit details
Commits on Oct 9, 2023
-
by @jose-sherpa While the omniparser tool outputs JSON format currently, you will often need another tool or package to stream the JSON output. While I am aware this tool will only be used for JSON output, there is a type of JSON called NDJSON which stands for new line delimited JSON. This makes it easy to stream parse and process a JSON array with no added packages or complexity since you just read each line and parse them one by one. Since a strength of omniparser is to stream parse large files, we think it makes sense to make the output easily streamable without violating the output of JSON. It also results in a smaller file size. http://ndjson.org/
Configuration menu - View commit details
-
Copy full SHA for dd04a11 - Browse repository at this point
Copy the full SHA dd04a11View commit details
Commits on Jul 3, 2024
-
ISSUE-221 Per request, upgrading dependency goja to v0.0.0-2023081210…
…5242-81d76064690d to include all ES6 features (#222) This is a major upgrade for omniparser since the upgrade of goja requires go version 1.16, thus we're bumping omniparser min go version to 1.16 as well. Be careful when you upgrade omniparser.
Configuration menu - View commit details
-
Copy full SHA for 0ae53cc - Browse repository at this point
Copy the full SHA 0ae53ccView commit details -
Configuration menu - View commit details
-
Copy full SHA for e00c3f7 - Browse repository at this point
Copy the full SHA e00c3f7View commit details
Commits on Feb 6, 2025
-
Configuration menu - View commit details
-
Copy full SHA for 141d966 - Browse repository at this point
Copy the full SHA 141d966View commit details -
Configuration menu - View commit details
-
Copy full SHA for da04593 - Browse repository at this point
Copy the full SHA da04593View commit details
Commits on Feb 21, 2025
-
ISSUE 227: add
parser_settings.debug
flag and enablecsv2
reader ……to optionally inject debug (line) info into record IDR. (#228) Decided to make the debug flag a global setting in `parser_settings` and leave its type to be an int (>= 0) for future flexibility. Could've done with a string/enum to make it more defined and strict, but given it is an adv setting and current usage is so scarce thus leaving it flexible until further requirements arise. For `csv2` reader, if `parser_settings.debug` is 0 or omitted, which is the vast vast majority of existing and future `csv2` schemas, no behavior changes; if `parser_settings.debug` isn't 0, then a `__debug` node will be added to the record IDR structure, underneath which, currently only `line_num` debug info will be added. This design is flexible for all future adoptions in all other file format readers, yet has zero impact on any existing schemas.
Configuration menu - View commit details
-
Copy full SHA for d4371ab - Browse repository at this point
Copy the full SHA d4371abView commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v1.0.4...master