Monday, April 29, 2019

Splitting large XML file using Mule


In this article we will learn all the options we have to split small and large XML files using Mule.

Since we are talking about dealing with large XML, we need to understand a bit about DOM v/s SAX parsing. Both DOM and SAX parser are extensively used to read and parse XML files.

DOM Stands for Document Object Model and it represents an XML Document in a tree format which each element representing tree branches. DOM Parser creates an In-Memory tree representation of XML file and then parses it, so it requires more memory and it's advisable to have increased the heap size for DOM parser in order to avoid OutOfMemoryError in heap space. Parsing XML file using DOM parser is quite fast if XML file is small but if you try to read a large XML file using DOM parser there is more chances that it will take a long time or even may not be able to load it completely simply because it requires lot of memory to create XML Dom Tree.

SAX Stands for Simple API for XML Parsing. This is an event based XML Parser and it parses XML file step by step so much suitable for large XML Files. SAX XML Parser fires an event when it encountered opening tag, element or attribute, and the parsing works accordingly. It’s recommended to use SAX XML parser for parsing large XML files because it doesn't require to load whole XML file in-memory and it can read a big XML file in small parts. One disadvantage of using SAX Parser is that reading XML file using SAX Parser requires custom java coding in comparison with DOM Parser.

Having learned about DOM v/s SAX, let’s get started with the options we have in Mule for splitting XML files. Mule provides out of the box flow control - Splitter” for splitting files which takes xml file as input and outputs DOM object. Splitter is simple, effective and fast to implement provided the source XML is small. In this example, we have a bunch of schools that we need split based on School_Name tag and use XPATH3 expressions inside Expression component to retrieve details like Address, Rating, Contact_Info of each school and send out emails to users (parents) seeking this information. XPATH3 expression is used inside Splitter. It outputs DOM, which is converted to XML using out of the box DOM to XML transformer to get access to the split XML. Finally retrieve required data in expression component and email the info.







Expression Component:
flowVars.School_Address=xpath3('/School_Name/School_Address',payload, 'STRING');
flowVars.Rating=xpath3('/School_Name/Rating',payload, 'STRING');



This approach becomes expensive when the XML file size gets bigger. If we are using Splitter, then we need to pay attention to Mule Infrastructure side. Things like number of cores used, RAM allocated are key. If resources are limited and file size is medium to large, then we can consider this approach.




Here we use DataWeave which takes the xml file as input and outputs a collection (Map – key/value) object. This approach provides better performance as memory consumption is significantly lower than DOM approach. We iterate the map object using For-Each scope one-by-one to retrieve required data and email the info.

 



Expression Component:
flowVars.School_Address =payload.get("School_Address");
flowVars.Rating =payload.get("Rating");


If we are dealing with really large file, then best approach would be to implement SAX or StAX parser in Java. Invoke a Java component, send the payload to Java layer and let StAX parser handle the large file piece by piece. Here is an excellent article from Mulesoft on how to split large xml file using StAX parser. Here is another great article from DZone on difference between DOM, SAX & StAX.

How to resolve Mule SFTP Kerberos username error & Authentication failure


In this article, we will learn how to resolve Mule SFTP outbound-endpoint Kerberos username issue and Authentication failure.

<sftp:outbound-endpoint exchange-pattern="one-way" connector-ref="SFTP" outputPattern="#[flowVars.targetFileName]"
        host="${sftp.target.host}" port="${sftp.target.port}" path="${sftp.target.path}" user="${sftp.target.username}"
        password="${sftp.target.password}" responseTimeout="10000" doc:name="SFTP"/>

When I executed an SFTP outbound-endpoint flow, I got below Kerberos username request in console and actual connectivity to SFTP endpoint is stopped.
Kerberos username [Vishnu.Ramakrishnan]:

Mule SFTP Connector uses JSch library (a pure java implementation of the SSH2 protocol). JSch depends on Java Cryptography Extension (JCE) and supports 4 different types of User Authentication (1) gssapi-with-mic (2) keyboard-interactive (3) publickey(DSA,RSA,ECDSA) (4) password
You can read all about JSch here

What is SSH:
Secure Shell (SSH) is a cryptographic network protocol for operating network services securely over an unsecured network. SSH provides support for secure remote login, secure file transfer, and secure TCP/IP and X11 forwarding. It can automatically encrypt, authenticate, and compress transmitted data. The SSH protocol is available in two incompatible varieties: SSH1 and SSH2. 

Why are we get this Kerberos username request in console ?
This is a known issue related to Java version 7 & above. JSch running in java 1.7 and above connecting to a SFTP server with Kerberos enabled requests for Kerberos username.


Mulesoft Ticket & Solution:

From Mulesoft Documentation:
SFTP Connector Attribute:
preferredAuthenticationMethods: Comma-separated list of authentication methods used by the SFTP client. Valid values are: gssapi-with-mic, publickey, keyboard-interactive and password.

Solution:
Add preferredAuthenticationMethods attribute to sftp connector
       <sftp:connector name="SFTP" validateConnections="true"   sizeCheckWaitTime="500" doc:name="SFTP" 
             preferredAuthenticationMethods="publickey,password,keyboard-interactive"> 
    </sftp:connector>

Now, let’s take a look at handling Authentication error.

Authentication Error:
ERROR 2018-08-24 09:39:52,698 [[fin602-dynamics-cashbookentries].SFTP.dispatcher.01] org.mule.transport.sftp.SftpClient: Error during login to abc@abc-test.xyz.net
com.jcraft.jsch.JSchException: Auth fail

Solution:
I was using a password that contained special characters dollar (“$”) and backslash (“\”). To escape these special characters we need to use a backslash. The backslash is used as a marker character to tell the compiler/interpreter that the next character has no special meaning. For example n to be interpreted as n instead of as a newline.

The meta characters that we usually need to escape are:
<([{\^-=$!|]})?*+.>

Connecting to Workday using Mule HTTP Connector


In this article, we will see how we can connect to Workday HCM (Human Capital Management) a SAAS system using Mule HTTP Connector.

There are multiple ways to connect to Workday HCM like (1) Using MuleSoft Workday Connector (It’s a select connector) (2) Using custom Java code – JAXP (Java API for XML Processing) (3) HTTP connector.



Problem with option 1 is, we need to have an enterprise license to use MuleSoft Workday Connector and issue with option 2 is no point in using custom java code (JAXP) while using Mule.

Requirement:
Connect to workday at a specific time of the day to download employee data, transform and enrich it to CSV file and send it to external systems.

Drag and drop poll scope and setup cron expression to kick start mule flow.




Use Until Successful scope and drop HTTP connector inside it. This will help with reconnection attempts. Use basic authentication for connecting to Workday. Using OAuth2 will be a better option.



 






When the payload is huge (like 20 MB or more), then below 2 attributes are important, because we need the connection to stay open & active for a longer time (say 1 to 3 mins) and the HTTP Connector should wait for the large payload to fully download before the control is passed on to the next mule message processor in the flow.


https.connection.idle.timeout
The number of milliseconds that a connection can remain idle before it is closed. The value of this attribute is only used when persistent connections are enabled.

https.response.timeout
The maximum time that the request element will block the execution of the flow waiting for the HTTP response. If this value is not present, the default response timeout from the Mule configuration will be used.

Since the response from Workday is XML file (DOM Object), we use Object to ByteArray transformer. We then use a Message filter (Expression filter inside message filter to specify XPATH3 expression) to filter out null/empty payload.



unaccepted-messages-handler-flow

When the expression is not satisfied, in this case, the payload should not be null, then the control is sent to another sub-flow where we print a suitable log message.