nodejs pause sax stream

I am parsing super large XML files using sax.
I am creating a readStream from my XML file and piping it into sax like this:

this.sourceStream = fs.createReadStream(file);
this.sourceStream
    .pipe(this.saxStream);

I am listening to some events like this:

this.saxStream.on("error", (err) => {
    logger.error(`Error during XML Parsing`, err);
});
this.saxStream.on("opentag", (node) => {
    // doing some stuff
});
this.saxStream.on("text", (t) => {
    // doing some stuff
});
this.saxStream.on("closetag", () => {
    if( this.current_element.parent === null ) {
        this.sourceStream.pause();
        this.process_company_information(this.current_company, (err) => {
            if( err ) {
                logger.error("An error appeared while parsing company", err);
            }
            this.sourceStream.resume();
        });
    }
    else {
        this.current_element = this.current_element.parent;
    }
});
this.saxStream.on("end", () => {
    logger.info("Finished reading through stream");
});

After a specific end tag comes into the sax stream the stream needs to pause, current elements need to be processed and then the stream can continue.
As you can see in my code I tried to pause the sourceStream however I found out that pausing a readStream won’t work if it is piped.

So my general question is how can I make the sax parser pause until currently parsed elements are processed?

I`ve read about unpiping and pausing and then piping again and resuming, is this really the way to do this, also it is reliable?

To illustrate better here are some logs:

debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: Done with root tag, can continue stream
debug: Done with root tag, can continue stream
debug: Done with root tag, can continue stream
debug: Done with root tag, can continue stream

What I actually want would be a log like this:

debug: New root tag found
debug: Done with root tag, can continue stream
debug: New root tag found
debug: Done with root tag, can continue stream
debug: New root tag found
debug: Done with root tag, can continue stream
debug: New root tag found
debug: Done with root tag, can continue stream
debug: New root tag found

In it’s current state sax is much faster then the processor, not pausing the stream will therefore lead to memory issues.

61 thoughts on “nodejs pause sax stream”

Leave a Comment