- The procedure of data mining
- How to carefully analyze the providing data.
- Mean Average Precision: the evaluation method
- In such condition we can not use machine learning
- There are millions of rows, which increases runtime and memory usage for algorithms
- There are 100 different clusters, and according to the competition admins, the boundaries are fairly fuzzy, so it will likely be hard to make predictions. As the number of clusters increases, classifiers generally decrease in accuracy.
- Nothing is linearly correlated with the target (hotel_clusters), meaning we can’t use fast machine learning techniques like linear regression.
- How to create feature by ourselves
- How to check the correlation between the feature and label
- Provide PCA or Downsampling method when the data or feature amount is really huge
- How to flexibly use Pandas
- The CV data need to manually prepared some times, and the format need to be similar to testing data