Comparing changes

fixed typo

playground suspension.

Adding `_comment` property to EDI schema to annotate segments and/or elements.

…to bufio.Reader's internal buffer rollover. (#214) BUG: #213 If we're dealing with multi-lined envelope (either by rows or by header/footer), readLine() will be called several times, thus whatever ios.ByteReadLine, which uses bufio.Reader underneath, returns in a previous call may be potentially be invalidated due to bufio.Reader's internal buf rollover. If we read the previous line directly, it would cause corruption. To fix the problem the easiest solution would be simply copying the return []byte from ios.ByteReadLine every single time. But for files with single-line envelope, which are the vast majority cases, this copy becomes unnecessary and burdensome on gc. So the trick is to has a flag on reader.linesBuf's last element to tell if it contains a reference into the bufio.Reader's internal buffer, or it's a copy. Every time before we call bufio.Reader read, we check reader.liensBuf's last element flag, if it is not a copy, then we will turn it into a copy. This way, we optimize for the vast majority cases without needing allocations, and avoid any potential corruptions in the multi-lined envelope cases.

Issue: #212 `repetition_delimiter`: delimiter to separate multiple data instances for an element. For example, if `^` is the repetition delimiter for a segment `DMG*D8*19690815*M**A^B^C^D~`, then the last element has 4 pieces of data: `A`, `B`, `C`, and `D`. Any element without `repetition_delimiter` present has essentially one piece of data; similarly, if `^` is the repetition delimiter for a segment `CLM*A37YH556*500***11:B:1^12:B:2~`, the last element has 2 pieces of data: `11:B:1` and `12:B:2`, each of which is further delimited by a `component_delimiter` `:`. Note, since `repetition_delimiter` creates multiple pieces of data under the same element name in the schema, in most cases the suitable construct type in `transform_declarations` is `array`. Currently we read in all the elements and their components in serial in `NonValidatingReader` into a slice: `[]RawSegElem`, each of which contains the element value, the element index, and component index if there are more than 1 component. When `repetition_delimiter` is added, we continue down the same pattern: `NonValidatingReader` still reads everything into the slice, except now, there potentially can be multiple `RawSegElem` share the same `ElemIndex` and `CompIndex`. Using the example above: `^` is the rep delim and seg is `CLM*A37YH556*500***11:B:1^12:B:2~`. After `NonValidatingReader.Read()` is done, we'll have the following `[]RawSegElem` (simplified): ``` { {'CLM', ElemIndex: 0, CompIndex: 1}, {'A37YH556', ElemIndex: 1, CompIndex: 1}, {'500', ElemIndex: 2, CompIndex: 1}, {'', ElemIndex: 3, CompIndex: 1}, {'', ElemIndex: 4, CompIndex: 1}, {'', ElemIndex: 4, CompIndex: 1}, {'11', ElemIndex: 5, CompIndex: 1}, {'B', ElemIndex: 5, CompIndex: 2}, {'1', ElemIndex: 5, CompIndex: 3}, {'12', ElemIndex: 5, CompIndex: 1}, {'B', ElemIndex: 5, CompIndex: 2}, {'2', ElemIndex: 5, CompIndex: 3}, } ``` Note the last 3 elements have the same `ElemIndex` and `CompIndex` as the previous 3 elements. This behavior is new and introduced in this PR. Now on the EDI reader side (reader.go), previously when we match element decl against the raw element slice, we only do one way scan, because `ElemIndex` and `CompIndex` are always increase, thus we never need to back-scan. With introduction of potentially duplicate `ElemIndex` and `CompIndex`, now for each of the element decl, we simply do a full `[]RawSegElem` scan. Yes, it is a bit more expensive but given usually the number of total elements and components in a seg is really really small (around 20), we feel this trade-off is acceptable without making the already-complex code even more so. With this reader change, the IDR produced will potentially contain child element nodes with the same element name. Thus in schema writing, it's practically required that the user of the `repetition_delimiter` feature needs to use `array` type in the `transform_declarations`.

@jose-sherpa

by @jose-sherpa While the omniparser tool outputs JSON format currently, you will often need another tool or package to stream the JSON output. While I am aware this tool will only be used for JSON output, there is a type of JSON called NDJSON which stands for new line delimited JSON. This makes it easy to stream parse and process a JSON array with no added packages or complexity since you just read each line and parse them one by one. Since a strength of omniparser is to stream parse large files, we think it makes sense to make the output easily streamable without violating the output of JSON. It also results in a smaller file size. http://ndjson.org/

…5242-81d76064690d to include all ES6 features (#222) This is a major upgrade for omniparser since the upgrade of goja requires go version 1.16, thus we're bumping omniparser min go version to 1.16 as well. Be careful when you upgrade omniparser.

…to optionally inject debug (line) info into record IDR. (#228) Decided to make the debug flag a global setting in `parser_settings` and leave its type to be an int (>= 0) for future flexibility. Could've done with a string/enum to make it more defined and strict, but given it is an adv setting and current usage is so scarce thus leaving it flexible until further requirements arise. For `csv2` reader, if `parser_settings.debug` is 0 or omitted, which is the vast vast majority of existing and future `csv2` schemas, no behavior changes; if `parser_settings.debug` isn't 0, then a `__debug` node will be added to the record IDR structure, underneath which, currently only `line_num` debug info will be added. This design is flexible for all future adoptions in all other file format readers, yet has zero impact on any existing schemas.

Commits on Sep 19, 2022

Update README.md

jf-tech authored Sep 19, 2022

Configuration menu

View commit details

Copy full SHA for 0f61d34

Browse repository at this point

Copy the full SHA

0f61d34 View commit details

Browse the repository at this point in the history

Commits on Oct 14, 2022

README.md (#181 )

fixed typo

jarede-dev authored Oct 14, 2022

Configuration menu

View commit details

Copy full SHA for 4f51e0a

Browse repository at this point

Copy the full SHA

4f51e0a View commit details

Browse the repository at this point in the history

Commits on Dec 6, 2022

Update README.md

jf-tech authored Dec 6, 2022

Configuration menu

View commit details

Copy full SHA for 0637d60

Browse repository at this point

Copy the full SHA

0637d60 View commit details

Browse the repository at this point in the history

Commits on Mar 13, 2023

Update README.md

playground suspension.

jf-tech authored Mar 13, 2023

Configuration menu

View commit details

Copy full SHA for 6fa373c

Browse repository at this point

Copy the full SHA

6fa373c View commit details

Browse the repository at this point in the history

Commits on Apr 5, 2023

entos llc (#198 )

jf-tech authored Apr 5, 2023

Configuration menu

View commit details

Copy full SHA for b4d60ad

Browse repository at this point

Copy the full SHA

b4d60ad View commit details

Browse the repository at this point in the history

Commits on Jul 19, 2023

sponsor: healthsherpa (#211 )

jf-tech authored Jul 19, 2023

Configuration menu

View commit details

Copy full SHA for 37370fa

Browse repository at this point

Copy the full SHA

37370fa View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Comparing changes

Open a pull request

Commits on Sep 19, 2022

Commits on Oct 14, 2022

Commits on Dec 6, 2022

Commits on Jan 7, 2023

Commits on Mar 13, 2023

Commits on Apr 5, 2023

Commits on Jun 29, 2023

Commits on Jul 19, 2023

Commits on Jul 25, 2023

Commits on Oct 9, 2023

Commits on Jul 3, 2024

Commits on Feb 6, 2025

Commits on Feb 21, 2025

This comparison is taking too long to generate.

Uh oh!