Data says

This #Halloween2021, we look back at the global temperature record for past Octobers.
Note how much warming has happened in about 100 years, since the first official citywide Halloween celebration in the U.S. (in Anoka, Minn.) in 1921.
👉 By the way, in the natural sciences, the word “anomaly” (as seen in the graphic) refers to a measurement that is different from an expected trend or a model prediction.
In this case, the plot’s data points are compared with data from 1951 to 1980, which is NASA’s reference baseline.@NASA

Kaggle’s 2021 State of Data Science and Machine Learning survey

Top 5 IDEs

  1. Jupyter Notebook
  2. Visual Studio Code
  3. JupyterLab
  4. PyCharm
  5. RStudio

ML Algorithms Usage: Top 10

  1. Linear/logistic regression
  2. Decision trees/random forests
  3. Gradient boosting machines(Xgboost, LightGBM)
  4. Convnets
  5. Bayesian approaches
  6. Dense neural networks(MLPs)
  7. Recurrent neural networks(RNNs)
  8. Transformers(BERT, GPT-3)
  9. GANs

Machine Learning Tools Landscape – Top 8

  1. Scikit-Learn
  2. TensorFlow(tf.keras included)
  3. XGBoost
  4. Keras
  5. PyTorch
  6. LightGBM
  7. CatBoost
  8. Huggingface🤗

Cloud Computing Tools – Top 3

  1. AWS
  2. GCP
  3. Microsoft Azure

Enteprise ML Tools – Top 5

  1. Amazon SageMaker
  2. DataBricks
  3. Asure ML Studio
  4. Google Cloud Vertex AI
  5. DataRobot

Notes: If you look at the graph, it seems that over half the number of the survey responders don’t use those kinds of tools

Databases – Top 4

  1. MySQL
  2. PostgreSQL
  3. Microsoft SQL Server
  4. MongoDB

CONCLUSIONS:

  1. Notebooks are still the most appreciated way of experimenting with ML. If you never did it, try them in VSCode.
  2. Scikit-Learn is ahead of the game
  3. All you need is XGBoost(CC: @tunguz)
  4. No need for model tracking on Kaggle. There is a leaderboard