Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: jf-tech/omniparser
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.0.4
Choose a base ref
...
head repository: jf-tech/omniparser
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
  • 17 commits
  • 67 files changed
  • 3 contributors

Commits on Sep 19, 2022

  1. Update README.md

    jf-tech authored Sep 19, 2022
    Configuration menu
    Copy the full SHA
    0f61d34 View commit details
    Browse the repository at this point in the history

Commits on Oct 14, 2022

  1. README.md (#181)

    fixed typo
    jarede-dev authored Oct 14, 2022
    Configuration menu
    Copy the full SHA
    4f51e0a View commit details
    Browse the repository at this point in the history

Commits on Dec 6, 2022

  1. Update README.md

    jf-tech authored Dec 6, 2022
    Configuration menu
    Copy the full SHA
    0637d60 View commit details
    Browse the repository at this point in the history

Commits on Jan 7, 2023

  1. Configuration menu
    Copy the full SHA
    a3adf70 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    862e398 View commit details
    Browse the repository at this point in the history

Commits on Mar 13, 2023

  1. Update README.md

    playground suspension.
    jf-tech authored Mar 13, 2023
    Configuration menu
    Copy the full SHA
    6fa373c View commit details
    Browse the repository at this point in the history

Commits on Apr 5, 2023

  1. entos llc (#198)

    jf-tech authored Apr 5, 2023
    Configuration menu
    Copy the full SHA
    b4d60ad View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2023

  1. allowing for extra properties in edi file declaration (#207)

    Adding `_comment` property to EDI schema to annotate segments and/or elements.
    jose-sherpa authored Jun 29, 2023
    Configuration menu
    Copy the full SHA
    1604d8f View commit details
    Browse the repository at this point in the history

Commits on Jul 19, 2023

  1. sponsor: healthsherpa (#211)

    jf-tech authored Jul 19, 2023
    Configuration menu
    Copy the full SHA
    37370fa View commit details
    Browse the repository at this point in the history

Commits on Jul 25, 2023

  1. Multi-lined envelope reading corruption caused by direct reference in…

    …to bufio.Reader's internal buffer rollover. (#214)
    
    BUG: #213
    
    If we're dealing with multi-lined envelope (either by rows or by header/footer), readLine()
    will be called several times, thus whatever ios.ByteReadLine, which uses bufio.Reader underneath,
    returns in a previous call may be potentially be invalidated due to bufio.Reader's internal buf
    rollover. If we read the previous line directly, it would cause corruption.
    
    To fix the problem the easiest solution would be simply copying the return []byte from
    ios.ByteReadLine every single time. But for files with single-line envelope, which are the vast
    majority cases, this copy becomes unnecessary and burdensome on gc. So the trick is to has a flag
    on reader.linesBuf's last element to tell if it contains a reference into the bufio.Reader's
    internal buffer, or it's a copy. Every time before we call bufio.Reader read, we check
    reader.liensBuf's last element flag, if it is not a copy, then we will turn it into a copy.
    
    This way, we optimize for the vast majority cases without needing allocations, and avoid any potential
    corruptions in the multi-lined envelope cases.
    jf-tech authored Jul 25, 2023
    Configuration menu
    Copy the full SHA
    9e0c8da View commit details
    Browse the repository at this point in the history
  2. Introducing repetition_delimiter to EDI schema. (#215)

    Issue: #212
    
    `repetition_delimiter`: delimiter to separate multiple data instances for an element. For example,
    if `^` is the repetition delimiter for a segment `DMG*D8*19690815*M**A^B^C^D~`, then the last
    element has 4 pieces of data: `A`, `B`, `C`, and `D`. Any element without `repetition_delimiter`
    present has essentially one piece of data; similarly, if `^` is the repetition delimiter for a
    segment `CLM*A37YH556*500***11:B:1^12:B:2~`, the last element has 2 pieces of data: `11:B:1` and
    `12:B:2`, each of which is further delimited by a `component_delimiter` `:`. Note, since
    `repetition_delimiter` creates multiple pieces of data under the same element name in the schema,
    in most cases the suitable construct type in `transform_declarations` is `array`.
    
    Currently we read in all the elements and their components in serial in `NonValidatingReader` into
    a slice: `[]RawSegElem`, each of which contains the element value, the element index, and component
    index if there are more than 1 component. When `repetition_delimiter` is added, we continue down
    the same pattern: `NonValidatingReader` still reads everything into the slice, except now, there
    potentially can be multiple `RawSegElem` share the same `ElemIndex` and `CompIndex`.
    
    Using the example above: `^` is the rep delim and seg is `CLM*A37YH556*500***11:B:1^12:B:2~`. After
    `NonValidatingReader.Read()` is done, we'll have the following `[]RawSegElem` (simplified):
    
    ```
    {
       {'CLM', ElemIndex: 0, CompIndex: 1},
       {'A37YH556', ElemIndex: 1, CompIndex: 1},
       {'500', ElemIndex: 2, CompIndex: 1},
       {'', ElemIndex: 3, CompIndex: 1},
       {'', ElemIndex: 4, CompIndex: 1},
       {'', ElemIndex: 4, CompIndex: 1},
       {'11', ElemIndex: 5, CompIndex: 1},
       {'B', ElemIndex: 5, CompIndex: 2},
       {'1', ElemIndex: 5, CompIndex: 3},
       {'12', ElemIndex: 5, CompIndex: 1},
       {'B', ElemIndex: 5, CompIndex: 2},
       {'2', ElemIndex: 5, CompIndex: 3},
    }
    ```
    
    Note the last 3 elements have the same `ElemIndex` and `CompIndex` as the previous 3 elements.
    This behavior is new and introduced in this PR.
    
    Now on the EDI reader side (reader.go), previously when we match element decl against the raw element
    slice, we only do one way scan, because `ElemIndex` and `CompIndex` are always increase, thus we
    never need to back-scan. With introduction of potentially duplicate `ElemIndex` and `CompIndex`, now
    for each of the element decl, we simply do a full `[]RawSegElem` scan. Yes, it is a bit more expensive
    but given usually the number of total elements and components in a seg is really really small (around
    20), we feel this trade-off is acceptable without making the already-complex code even more so.
    
    With this reader change, the IDR produced will potentially contain child element nodes with the same
    element name. Thus in schema writing, it's practically required that the user of the
    `repetition_delimiter` feature needs to use `array` type in the `transform_declarations`.
    jf-tech authored Jul 25, 2023
    Configuration menu
    Copy the full SHA
    79a540b View commit details
    Browse the repository at this point in the history

Commits on Oct 9, 2023

  1. adding ndjson format (#218)

    by @jose-sherpa 
    
    While the omniparser tool outputs JSON format currently, you will often need another tool or package to stream the JSON output. While I am aware this tool will only be used for JSON output, there is a type of JSON called NDJSON which stands for new line delimited JSON. This makes it easy to stream parse and process a JSON array with no added packages or complexity since you just read each line and parse them one by one. Since a strength of omniparser is to stream parse large files, we think it makes sense to make the output easily streamable without violating the output of JSON. It also results in a smaller file size.
    
    http://ndjson.org/
    jose-sherpa authored Oct 9, 2023
    Configuration menu
    Copy the full SHA
    dd04a11 View commit details
    Browse the repository at this point in the history

Commits on Jul 3, 2024

  1. ISSUE-221 Per request, upgrading dependency goja to v0.0.0-2023081210…

    …5242-81d76064690d to include all ES6 features (#222)
    
    This is a major upgrade for omniparser since the upgrade of goja requires go version 1.16, thus we're bumping
    omniparser min go version to 1.16 as well. Be careful when you upgrade omniparser.
    jf-tech authored Jul 3, 2024
    Configuration menu
    Copy the full SHA
    0ae53cc View commit details
    Browse the repository at this point in the history
  2. Update README.md

    jf-tech authored Jul 3, 2024
    Configuration menu
    Copy the full SHA
    e00c3f7 View commit details
    Browse the repository at this point in the history

Commits on Feb 6, 2025

  1. Configuration menu
    Copy the full SHA
    141d966 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    da04593 View commit details
    Browse the repository at this point in the history

Commits on Feb 21, 2025

  1. ISSUE 227: add parser_settings.debug flag and enable csv2 reader …

    …to optionally inject debug (line) info into record IDR. (#228)
    
    Decided to make the debug flag a global setting in `parser_settings` and leave its type to be an int (>= 0) for
    future flexibility. Could've done with a string/enum to make it more defined and strict, but given it is an adv
    setting and current usage is so scarce thus leaving it flexible until further requirements arise.
    
    For `csv2` reader, if `parser_settings.debug` is 0 or omitted, which is the vast vast majority of existing and
    future `csv2` schemas, no behavior changes; if `parser_settings.debug` isn't 0, then a `__debug` node will
    be added to the record IDR structure, underneath which, currently only `line_num` debug info will be added.
    
    This design is flexible for all future adoptions in all other file format readers, yet has zero impact on any existing
    schemas.
    jf-tech authored Feb 21, 2025
    Configuration menu
    Copy the full SHA
    d4371ab View commit details
    Browse the repository at this point in the history
Loading