![]() ![]() All such analyses can be performed by clicking on the desired column, and selecting Analyze. Next, for numerical variables, we may need to perform some univariate analyses, such as checking for data normality, outlier analysis, imputing missing cells with desired values, etc. “Replace” operation from the formulae list helps with replacing all cells in the Dependents column displaying 3+ with 3. Processors library displaying different actions that can be performed to clean the data based on the data type. To perform this action, you can add a data processing step called Replace from the Formulas bar. For the sake of model building, we can perhaps replace 3+ with an integer, say 3. Dependents are either entered as 0, 1, 2, or 3+. The DSS has categorized most of the information as being an “integer”, however, the red portion of the bar in the header indicates existence of certain values that may not be integers. Take for example the Dependents column in the train data set. In this data exploration stage, you can investigate samples of the total data to identify all sorts of shortcomings such as missing values, invalid data, existence of outliers, etc. Loan Prediction data columns as viewed on importing the dataset to Dataiku DSS. You can also see columns being classified into “Gender” or “Date/Time” categories. Once the data is imported, the DSS automatically parses information from different columns, and categorizes them as being integer, decimal, bigint, string, Boolean, etc. An interesting and intuitive aspect about Dataiku DSS is the visual flow representation of the data’s journey in the model as we go on applying different processing “recipes” onto it. Starting with a blank slate project, you can upload the “train” data set into the DSS project dashboard. I will refrain from delving deep into the problem itself and discuss more on the Dataiku modeling interface. The crux of this problem is to develop a feature based predictive model that will help us make a decision on the loan approval decision for a particular applicant. For my first experiment with Dataiku, I used the loan prediction data set downloaded from Analytics Vidhya. The data set essentially predicts whether an applicant’s loan request would be approved or not, based on a variety of parameters ranging from applicant income, loan period, geographic factors, dependents, credit history, etc. The Learn section hosts a variety of get-started tutorials that help getting on-boarded with the interface. To get accustomed with Dataiku DSS, you can either download it on your personal computer, or can use their web based trial for 14 days. ![]() I tried my hand at the visual click-and-drag based workflow, and will try to elaborate on my experience so far in this post. One special aspect about this software is its duality in the sense that it pleases both “clickers” as well as “coders”. This weekend, I tried my hand at Dataiku Data Science Studio (DSS) by Dataiku, a Paris based company, that helps enterprises develop AI and data analytics workflows. Hooked onto the prowess of machine learning and data analytics since the last two years, I’ve been trying out different software and coding platforms that make data analytics and machine learning easier for a Mechanical Engineer with decent coding background like me. Off lately, thanks to the industry wide boom for getting an upper hand and making the best of available data, there has been a surge of multiple applications & programs that help you make the data work for you. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |