Consulting and Servies

Natural Language Processing

We use NLP techniques and explore all Classification models for various dataset, we found that the set of Naive Bayes, Random Forest,   Logistic, Decision Tree, and KNN, result in better solutions. On the   other hand, SVM, kernel SVM result in bad solutions.  

Machine Learning Techniques for the LendingClub

This project has 887,379 observations and 74 variables. At first we do the data exploration and analysis. Nest we decide the important variables for the data mining techniques. The most fitting is gradient boosting (91.21%), recursive partitioning(88.65%), logical regression (69.54%), and random forest (67.34%) in order.   

Data Preprocessing/ Analysis / Visualization

taking care of missing data, encoding categorical data for those independent and dependent variables, and feature scaling to speed up the performance and keeping the accuracy). Explore the correlation   between those independent variables, and Data visualization for the training/test set and predicted models.  

Statistical Analysis

Natural Language Processing (NLP), Data Mining, Data Modeling, Machine Learning, Deep Learning, Statistical Modeling,   Artificial Intelligence, Information Retrieval, Regression, Matrix   Factorization, Classification, and Clustering. 

Tools and IDE

R Programming, Python (Numpy, Pandas, Metplotlib, Seaborn, Scikit-  learning), Tableau, SAS, Hadoop (HDFS, MapReduce, Spark), Hive/  Pig, Spark, Java, Scala, SQL, C++, Matlab, PySpark, PyTorch.  Eclipse, Spyder, PyCharm, RStudio, Tabelau, Visual Studio, MathLab 


PostgreSQL, SQL, NoSQL, Google BigQuery, Spark, Google CloudStorage 


Monday - Friday: 9am - 6pm