In this code I will be using pyspark code to process the data and html, pandas to render the output into html file.
Firstly, I read the file using spark read command and assigned it to a dataframe. I have used this dataframe to filter the data to display the profiling attributes. I used pandas in some places to data formatting/reporting purposes.
The file I have used is CSV file with a header inside. For getting data types I have used df.types. so date type will be showed as String.
the output will show like this.
For any custom enhancements/feedback, please contact me dileep.psdk@gmail.com
Comments
Post a Comment
Your Comments are more valuable to improve. Please go ahead