Q: What are the model selection and data manipulation techniques you follow to solve a probelm?
a. Generally, i try almost everything for most problems
b. in priciple for:
i. time series, GARCH, ARCH, regression, RIMA models.
ii. Image classification, deep learning (convolutional nets)
iii. Sound: commonly nns
iv. High cardinality categorical (like text data), linear models, FTRL, Vowpal wabbit, LibFFM, libFM, SVD
v. For everything else, everything, especially Gradient boosting machines (like XGBoost and LightGBM) and deep learning (like keras, Lasagne, caffe, Cxxnet)
c. I decided what model to keep/drop in meta modelling with feature selection techniques, Furthermore the latter may be:
i. Forward (cv or not)
ii. Backward (cv or not)
iii. Mixed (or stepwise)
iv. Permutations
v. Using feature importance or similar
vi. Apply some stats logic
d. Data manipulation could be different for every problem:
i. time series: moving averages, derivative, outlier removal
ii. text: tfidf, countvectorizers, word2vec, svd (dimensionality reduction). Semming, spell checking, Sparse matrices. Likelihood encoding, one hot encoding (or dummies).
iii. image classification, scalling, resizing, removing noise (smoothing), annotating
iv. sounds: Furrier Transformations, MFCC (MeI frequency cepstral coefficients), Low pass filters
v. everything else:
notes: deep learning in python to deal with text probelms: Keras (support sparse data), Gensim (for word2vec)