Tag: AI

  • Demand Forecasting with Darts: A Tutorial

    Sandra E.G.

    A hands-on tutorial with Python and Darts for demand forecasting, showcasing the power of TiDE and TFT

    Photo by Victoriano Izquierdo on Unsplash

    Demand forecasting for retailing companies can become a complex task, as several factors need to be considered from the start of the project to the final deployment. This article provides an overview of the main steps required to train and deploy a demand forecasting model, alongside some tips and recommendations I have gained with experience as a consultant.

    Each section will have two main types of subsections. The first subsections will be dedicated to tips and advice I gathered from working on my projects. The other subsection is called Let’s code! and will include the code snippets of a hands-on Python tutorial. Note that this tutorial is meant to be simple, show how you can use Darts for demand forecasting, and highlight a couple of deep learning models for this task.

    You can access the full source code for the tutorial at: https://github.com/egomezsandra/demand-forecasting-darts.git

    Why demand forecasting isn’t as straightforward as it may seem

    You just landed your first job as a data scientist at a retail company. Your new boss has heard that every big company uses AI to boost its inventory management efficiency, so your first project is to deploy a demand forecasting model. You know all about time series forecasting, but when you start working with the transactional database, you feel unsure and don’t know how to begin.

    In business, use cases have more layers and complexity than simple forecasting or prediction machine learning tasks. This difference is something most of us learn once we gain professional experience. Let’s break down the different steps this guide will go over:

    1. Have you gathered all the relevant data?
    2. Have you checked for wrong values? Are you forecasting sales or demand?
    3. Do you need to forecast all products? Can you extract any more relevant information?
    4. Have you defined a baseline set of predictions? Is your model local or global? What libraries and models did you choose?
    5. How are you evaluating the performance of your model? Will your model’s predictions deteriorate with time?
    6. How will you provide the predictions? Will you deploy the model in the company’s cloud services?

    The dataset

    Have you gathered all the relevant data?

    Let’s assume your company has provided you with a transactional database with sales of different products and different sale locations. This data is called panel data, which means that you will be working with many time series simultaneously.

    The transactional database will probably have the following format: the date of the sale, the location identifier where the sale took place, the product identifier, the quantity, and probably the monetary cost. Depending on how this data is collected, it will be aggregated differently, by time (daily, weekly, monthly) and by group (by customer or by location and product).

    But is this all the data you need for demand forecasting? Yes and no. Of course, you can work with this data and make some predictions, and if the relations between the series are not complex, a simple model might work. But if you are reading this tutorial, you are probably interested in predicting demand when the data is not as simple. In this case, there’s additional information that can be a gamechanger if you have access to it:

    • Historical stock data: It is crucial to be aware of when stockouts occur, as the demand could still be high when sales data doesn’t reflect it.
    • Promotions data: Discounts and promotions can also affect demand as they affect the customers’ shopping behavior.
    • Events data: As discussed later, one can extract time features from the date index. However, holiday data or special dates can also condition consumption.
    • Other domain data: Any other data that could affect the demand for the products you are working with can be relevant to the task.

    Let’s code!

    For this tutorial, we will work with monthly sales data aggregated by product and sale location. This example dataset is from the Stallion Kaggle Competition and records beer products (SKU) distributed to retailers through wholesalers (Agencies). The first step is to format the dataset and select the columns that we want to use for training the models. As you can see in the code snippet, we are combining all the events columns into one called ‘special days’ for simplicity. As previously mentioned, this dataset misses stock data, so if stockouts occurred we could be misinterpreting the realdemand.

    # Load data with pandas
    sales_data = pd.read_csv(f'{local_path}/price_sales_promotion.csv')
    volume_data = pd.read_csv(f'{local_path}/historical_volume.csv')
    events_data = pd.read_csv(f'{local_path}/event_calendar.csv')

    # Merge all data
    dataset = pd.merge(volume_data, sales_data, on=['Agency','SKU','YearMonth'], how='left')
    dataset = pd.merge(dataset, events_data, on='YearMonth', how='left')

    # Datetime
    dataset.rename(columns={'YearMonth': 'Date', 'SKU': 'Product'}, inplace=True)
    dataset['Date'] = pd.to_datetime(dataset['Date'], format='%Y%m')

    # Format discounts
    dataset['Discount'] = dataset['Promotions']/dataset['Price']
    dataset = dataset.drop(columns=['Promotions','Sales'])

    # Format events
    special_days_columns = ['Easter Day','Good Friday','New Year','Christmas','Labor Day','Independence Day','Revolution Day Memorial','Regional Games ','FIFA U-17 World Cup','Football Gold Cup','Beer Capital','Music Fest']
    dataset['Special_days'] = dataset[special_days_columns].max(axis=1)
    dataset = dataset.drop(columns=special_days_columns)
    Image by author

    Preprocessing

    Have you checked for wrong values?

    While this part is more obvious, it’s still worth mentioning, as it can avoid feeding wrong data into our models. In transactional data, look for zero-price transactions, sales volume larger than the remaining stock, transactions of discontinued products, and similar.

    Are you forecasting sales or demand?

    This is a key distinction we should make when forecasting demand, as the goal is to foresee the demand for products to optimize re-stocking. If we look at sales without observing the stock values, we could be underestimating demand when stockouts occur, thus, introducing bias into our models. In this case, we can ignore transactions after a stockout or try to fill those values correctly, for example, with a moving average of the demand.

    Let’s code!

    In the case of the selected dataset for this tutorial, the preprocessing is quite simple as we don’t have stock data. We need to correct zero-price transactions by filling them with the correct value and fill the missing values for the discount column.

    # Fill prices
    dataset.Price = np.where(dataset.Price==0, np.nan, dataset.Price)
    dataset.Price = dataset.groupby(['Agency', 'Product'])['Price'].ffill()
    dataset.Price = dataset.groupby(['Agency', 'Product'])['Price'].bfill()

    # Fill discounts
    dataset.Discount = dataset.Discount.fillna(0)

    # Sort
    dataset = dataset.sort_values(by=['Agency','Product','Date']).reset_index(drop=True)

    Feature extraction

    Do you need to forecast all products?

    Depending on some conditions such as budget, cost savings and the models you are using you might not want to forecast the whole catalog of products. Let’s say after experimenting, you decide to work with neural networks. These are usually costly to train, and take more time and lots of resources. If you choose to train and forecast the complete set of products, the costs of your solution will increase, maybe even making it not worth investing in for your company. In this case, a good alternative is to segment the products based on specific criteria, for example using your model to forecast just the products that produce the highest volume of income. The demand for remaining products could be predicted using a simpler and cheaper model.

    Can you extract any more relevant information?

    Feature extraction can be applied in any time series task, as you can extract some interesting variables from the date index. Particularly, in demand forecasting tasks, these features are interesting as some consumer habits could be seasonal.Extracting the day of the week, the week of the month, or the month of the year could be interesting to help your model identify these patterns. It is key to encode these features correctly, and I advise you to read about cyclical encoding as it could be more suitable in some situations for time features.

    Let’s code!

    The first thing we are doing in this tutorial is to segment our products and keep only those that are high-rotation. Doing this step before performing feature extraction can help reduce performance costs when you have too many low-rotation series that you are not going to use. For computing rotation, we are only going to use train data. For that, we define the splits of the data beforehand. Notice that we have 2 dates for the validation set, VAL_DATE_IN indicates those dates that also belong to the training set but can be used as input of the validation set, and VAL_DATE_OUT indicates from which point the timesteps will be used to evaluate the output of the models. In this case, we tag as high-rotation all series that have sales 75% of the year, but you can play around with the implemented function in the source code. After that, we perform a second segmentation, to ensure that we have enough historical data to train the models.

    # Split dates 
    TEST_DATE = pd.Timestamp('2017-07-01')
    VAL_DATE_OUT = pd.Timestamp('2017-01-01')
    VAL_DATE_IN = pd.Timestamp('2016-01-01')
    MIN_TRAIN_DATE = pd.Timestamp('2015-06-01')

    # Rotation
    rotation_values = rotation_tags(dataset[dataset.Date<VAL_DATE_OUT], interval_length_list=[365], threshold_list=[0.75])
    dataset = dataset.merge(rotation_values, on=['Agency','Product'], how='left')
    dataset = dataset[dataset.Rotation=='high'].reset_index(drop=True)
    dataset = dataset.drop(columns=['Rotation'])

    # History
    first_transactions = dataset[dataset.Volume!=0].groupby(['Agency','Product'], as_index=False).agg(
    First_transaction = ('Date', 'min'),
    )
    dataset = dataset.merge(first_transactions, on=['Agency','Product'], how='left')
    dataset = dataset[dataset.Date>=dataset.First_transaction]
    dataset = dataset[MIN_TRAIN_DATE>=dataset.First_transaction].reset_index(drop=True)
    dataset = dataset.drop(columns=['First_transaction'])

    As we are working with monthly aggregated data, there aren’t many time features to be extracted. In this case, we include the position, which is just a numerical index of the order of the series. Time features can be computed on train time by specifying them to Darts via encoders. Moreover, we also compute the moving average and exponential moving average of the previous four months.

    dataset['EMA_4'] = dataset.groupby(['Agency','Product'], group_keys=False).apply(lambda group: group.Volume.ewm(span=4, adjust=False).mean())
    dataset['MA_4'] = dataset.groupby(['Agency','Product'], group_keys=False).apply(lambda group: group.Volume.rolling(window=4, min_periods=1).mean())

    # Darts' encoders
    encoders = {
    "position": {"past": ["relative"], "future": ["relative"]},
    "transformer": Scaler(),
    }

    Training the model

    Have you defined a baseline set of predictions?

    As in other use cases, before training any fancy models, you need to establish a baseline that you want to overcome.Usually, when choosing a baseline model, you should aim for something simple that barely has any costs. A common practice in this field is using the moving average of demand over a time window as a baseline. This baseline can be computed without requiring any models, but for code simplicity, in this tutorial, we will use the Darts’ baseline model, NaiveMovingAverage.

    Is your model local or global?

    You are working with multiple time series. Now, you can choose to train a local model for each of these time series or train just one global model for all the series. There is not a ‘right’ answer, both work depending on your data. If you have data that you believe has similar behaviors when grouped by store, types of products, or other categorical features, you might benefit from a global model. Moreover, if you have a very high volume of series and you want to use models that are more costly to store once trained, you may also prefer a global model. However, if after analyzing your data you believe there are no common patterns between series, your volume of series is manageable, or you are not using complex models, choosing local models may be best.

    What libraries and models did you choose?

    There are many options for working with time series. In this tutorial, I suggest using Darts. Assuming you are working with Python, this forecasting library is very easy to use. It provides tools for managing time series data, splitting data, managing grouped time series, and performing different analyses. It offers a wide variety of global and local models, so you can run experiments without switching libraries. Examples of the available options are baseline models, statistical models like ARIMA or Prophet, Scikit-learn-based models, Pytorch-based models, and ensemble models. Interesting options are models like Temporal Fusion Transformer (TFT) or Time Series Deep Encoder (TiDE), which can learn patterns between grouped series, supporting categorical covariates.

    Let’s code!

    The first step to start using the different Darts models is to turn the Pandas Dataframes into the time series Darts objects and split them correctly. To do so, I have implemented two different functions that use Darts’ functionalities to perform these operations. The features of prices, discounts, and events will be known when forecasting occurs, while for calculated features we will only know past values.

    # Darts format
    series_raw, series, past_cov, future_cov = to_darts_time_series_group(
    dataset=dataset,
    target='Volume',
    time_col='Date',
    group_cols=['Agency','Product'],
    past_cols=['EMA_4','MA_4'],
    future_cols=['Price','Discount','Special_days'],
    freq='MS', # first day of each month
    encode_static_cov=True, # so that the models can use the categorical variables (Agency & Product)
    )

    # Split
    train_val, test = split_grouped_darts_time_series(
    series=series,
    split_date=TEST_DATE
    )

    train, _ = split_grouped_darts_time_series(
    series=train_val,
    split_date=VAL_DATE_OUT
    )

    _, val = split_grouped_darts_time_series(
    series=train_val,
    split_date=VAL_DATE_IN
    )

    The first model we are going to use is the NaiveMovingAverage baseline model, to which we will compare the rest of our models. This model is really fast as it doesn’t learn any patterns and just performs a moving average forecast given the input and output dimensions.

    maes_baseline, time_baseline, preds_baseline = eval_local_model(train_val, test, NaiveMovingAverage, mae, prediction_horizon=6, input_chunk_length=12)

    Normally, before jumping into deep learning, you would try using simpler and less costly models, but in this tutorial, I wanted to focus on two special deep learning models that have worked well for me. I used both of these models to forecast the demand for hundreds of products across multiple stores by using daily aggregated sales data and different static and continuous covariates, as well as stock data. It is important to note that these models work better than others specifically in long-term forecasting.

    The first model is the Temporal Fusion Transformer. This model allows you to work with lots of time series simultaneously (i.e., it is a global model) and is very flexible when it comes to covariates. It works with static, past (the values are only known in the past), and future (the values are known in both the past and future) covariates. It manages to learn complex patterns and it supports probabilistic forecasting. The only drawback is that, while it is well-optimized, it can be costly to tune and train. In my experience, it can give very good results but the process of tuning the hyperparameters takes too much time if you are short on resources. In this tutorial, we are training the TFT with mostlythe default parameters, and the same input and output windows that we used for the baseline model.

    # PyTorch Lightning Trainer arguments
    early_stopping_args = {
    "monitor": "val_loss",
    "patience": 50,
    "min_delta": 1e-3,
    "mode": "min",
    }

    pl_trainer_kwargs = {
    "max_epochs": 200,
    #"accelerator": "gpu", # uncomment for gpu use
    "callbacks": [EarlyStopping(**early_stopping_args)],
    "enable_progress_bar":True
    }

    common_model_args = {
    "output_chunk_length": 6,
    "input_chunk_length": 12,
    "pl_trainer_kwargs": pl_trainer_kwargs,
    "save_checkpoints": True, # checkpoint to retrieve the best performing model state,
    "force_reset": True,
    "batch_size": 128,
    "random_state": 42,
    }

    # TFT params
    best_hp = {
    'optimizer_kwargs': {'lr':0.0001},
    'loss_fn': MAELoss(),
    'use_reversible_instance_norm': True,
    'add_encoders':encoders,
    }

    # Train
    start = time.time()
    ## COMMENT TO LOAD PRE-TRAINED MODEL
    fit_mixed_covariates_model(
    model_cls = TFTModel,
    common_model_args = common_model_args,
    specific_model_args = best_hp,
    model_name = 'TFT_model',
    past_cov = past_cov,
    future_cov = future_cov,
    train_series = train,
    val_series = val,
    )
    time_tft = time.time() - start

    # Predict
    best_tft = TFTModel.load_from_checkpoint(model_name='TFT_model', best=True)
    preds_tft = best_tft.predict(
    series = train_val,
    past_covariates = past_cov,
    future_covariates = future_cov,
    n = 6
    )

    The second model is the Time Series Deep Encoder. This model is a little bit more recent than the TFT and is built with dense layers instead of LSTM layers, which makes the training of the model much less time-consuming. The Darts implementation also supports all types of covariates and probabilistic forecasting, as well as multiple time series. The paper on this model shows that it can match or outperform transformer-based models on forecasting benchmarks. In my case, as it was much less costly to tune, I managed to obtain better results with TiDE than I did with the TFT model in the same amount of time or less. Once again for this tutorial, we are just doing a first run with mostly default parameters. Note that for TiDE the number of epochs needed is usually smaller than for the TFT.

    # PyTorch Lightning Trainer arguments
    early_stopping_args = {
    "monitor": "val_loss",
    "patience": 10,
    "min_delta": 1e-3,
    "mode": "min",
    }

    pl_trainer_kwargs = {
    "max_epochs": 50,
    #"accelerator": "gpu", # uncomment for gpu use
    "callbacks": [EarlyStopping(**early_stopping_args)],
    "enable_progress_bar":True
    }

    common_model_args = {
    "output_chunk_length": 6,
    "input_chunk_length": 12,
    "pl_trainer_kwargs": pl_trainer_kwargs,
    "save_checkpoints": True, # checkpoint to retrieve the best performing model state,
    "force_reset": True,
    "batch_size": 128,
    "random_state": 42,
    }

    # TiDE params
    best_hp = {
    'optimizer_kwargs': {'lr':0.0001},
    'loss_fn': MAELoss(),
    'use_layer_norm': True,
    'use_reversible_instance_norm': True,
    'add_encoders':encoders,
    }

    # Train
    start = time.time()
    ## COMMENT TO LOAD PRE-TRAINED MODEL
    fit_mixed_covariates_model(
    model_cls = TiDEModel,
    common_model_args = common_model_args,
    specific_model_args = best_hp,
    model_name = 'TiDE_model',
    past_cov = past_cov,
    future_cov = future_cov,
    train_series = train,
    val_series = val,
    )
    time_tide = time.time() - start

    # Predict
    best_tide = TiDEModel.load_from_checkpoint(model_name='TiDE_model', best=True)
    preds_tide = best_tide.predict(
    series = train_val,
    past_covariates = past_cov,
    future_covariates = future_cov,
    n = 6
    )

    Evaluating the model

    How are you evaluating the performance of your model?

    While typical time series metrics are useful for evaluating how good your model is at forecasting, it is recommended to go a step further. First, when evaluating against a test set, you should discard all series that have stockouts, as you won’t be comparing your forecast against real data. Second, it is also interesting to incorporate domain knowledge or KPIs into your evaluation. One key metric could be how much money would you be earning with your model, avoiding stockouts. Another key metric could be how much money are you saving by avoiding overstocking short shelf-life products. Depending on the stability of your prices, you could even train your models with a custom loss function, such as a price-weighted Mean Absolute Error (MAE) loss.

    Will your model’s predictions deteriorate with time?

    Dividing your data in a train, validation, and test split is not enough for evaluating the performance of a model that could go into production. By just evaluating a short window of time with the test set, your model choice is biased by how well your model performs in a very specific predictive window. Darts provides an easy-to-use implementation of backtesting, allowing you to simulate how your model would perform over time by forecasting moving windows of time. With backtesting you can also simulate the retraining of the model every N steps.

    Let’s code!

    If we look at our models’ results in terms of MAE across all series we can see that the clear winner is TiDE, as it manages to reduce the baseline’s error the most while keeping the time cost fairly low. However, let’s say that our beer company’s best interest is to reduce the monetary cost of stockouts and overstocking equally. In that case, we can evaluate the predictions using a price-weighted MAE.

    Image by author

    After computing the price-weighted MAE for all series, the TiDE is still the best model, although it could have been different. If we compute the improvement of using TiDE w.r.t the baseline model, in terms of MAE is 6.11% but in terms of monetary costs, the improvement increases a little bit. Reversely, when looking at the improvement when using TFT, the improvement is greater when looking at just sales volume rather than when taking prices into the calculation.

    Image by author

    For this dataset, we aren’t using backtesting to compare predictions because of the limited amount of data due to it being monthly aggregated. However, I encourage you to perform backtesting with your projects if possible. In the source code, I include this function to easily perform backtesting with Darts:

    def backtesting(model, series, past_cov, future_cov, start_date, horizon, stride):
    historical_backtest = model.historical_forecasts(
    series, past_cov, future_cov,
    start=start_date,
    forecast_horizon=horizon,
    stride=stride, # Predict every N months
    retrain=False, # Keep the model fixed (no retraining)
    overlap_end=False,
    last_points_only=False
    )
    maes = model.backtest(series, historical_forecasts=historical_backtest, metric=mae)

    return np.mean(maes)

    Deploying the model

    How will you provide the predictions?

    In this tutorial, it is assumed that you are already working with a predefined forecasting horizon and frequency. If this wasn’t provided, it is also a separate use case on its own, where delivery or supplier lead times should also be taken into account. Knowing how often your model’s forecast is required is important as it could require a different level of automation. If your company needs predictions every two months, maybe investing time, money, and resources in the automation of this task isn’t necessary. However, if your company needs predictions twice a week and your model takes longer to make these predictions, automating the process can save future efforts.

    Will you deploy the model in the company’s cloud services?

    Following the previous advice, if you and your company decide to deploy the model and put it into production, it is a good idea to follow MLOps principles. This would allow anyone to easily make changes in the future, without disrupting the whole system. Moreover, it is also important to monitor the model’s performance once in production, as concept drift or data drift could happen. Nowadays numerous cloud services offer tools that manage the development, deployment, and monitoring of machine learning models. Examples of these are Azure Machine Learning and Amazon Web Services.

    Conclusion

    Now you have a brief introduction to the basics of demand forecasting. We have gone over each step: data extraction, preprocessing and feature extraction, model training and evaluation, and deployment. We have gone through different interesting model options for demand forecasting using only Darts, showcasing the availability of simpler benchmark models, and also the potential of TiDE and TFT models with a first run.

    What to do next?

    Now it’s your turn to incorporate these different tips into your data or play around with this tutorial and other datasets you may find online. There are many models, and each demand forecasting dataset has its peculiarities, so the possibilities are endless.

    Other problems

    There are some other problems that we haven’t covered in this article. One that I have encountered is that sometimes products are discontinued and then substituted by a very similar version with minor changes. Since this affects how much history you have for a product, you need to map these changes, and often you can’t compare the demand of these two products due to the changes made.

    If you can think of other problems related to this use case, I encourage you to share them in the comments and start a discussion.

    References

    [1] N. Vandeput, How To: Machine Learning-Driven Demand Forecasting (2021), Towards Data Science

    [2] N. Vandeput, Data Science & Machine Learning for Demand Forecasting (2023), YouTube

    [3] S. Saci, Machine Learning for Retail Demand Forecasting (2020), Towards Data Science

    [4] E. Ortiz Recalde, AI Frontiers Series: Supply Chain (2023), Towards Data Science

    [5] B. Lim, S. O. Arik, N. Loeff and T. Pfister, Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting (2019), arXiv:1912.09363

    [6] A. Das, W. Kong, A. Leach, S. Mathur, R. Sen and R. Yu, Long-term Forecasting with TiDE: Time-series Dense Encoder (2023), arXiv:2304.08424


    Demand Forecasting with Darts: A Tutorial was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    Demand Forecasting with Darts: A Tutorial

    Go Here to Read this Fast! Demand Forecasting with Darts: A Tutorial

  • The Fallacy of Complacent Distroless Containers

    The Fallacy of Complacent Distroless Containers

    Cristovao Cordeiro

    Making containers smaller is the most popular practice when reducing your attack surface. But how real is this sense of security?

    Originally appeared here:
    The Fallacy of Complacent Distroless Containers

    Go Here to Read this Fast! The Fallacy of Complacent Distroless Containers

  • Sensor Fusion — KITTI — ‘Lidar-based Obstacle Detection’ — Part-1

    Sensor Fusion — KITTI — ‘Lidar-based Obstacle Detection’ — Part-1

    Erol Çıtak

    Mastering Sensor Fusion: LiDAR Obstacle Detection with KITTI Data — Part 1

    How to use Lidar data for obstacle detection with unsupervised learning

    Sensor fusion, multi-modal perception, autonomous vehicles — if these keywords pique your interest, this Medium blog is for you. Join me as I explore the fascinating world of LiDAR and color image-based environment understanding, showcasing how these technologies are combined to enhance obstacle detection and decision-making for autonomous vehicles. This blog and the following series dive into practical implementations and theoretical insights, offering an engaging read for all curious eyes.

    In this Medium blog series, we will examine the KITTI 3D Object Detection dataset [1][3] in three distinct parts. In the first article, which is this one, we will be talking about the KITTI Velodyne Lidar sensor and single-mode obstacle detection with this sensor only. In the second article of the series, we will be working on detection studies on color images with a uni-modal approach. In the last article of the series, we will work on multi-modal object detection, which can be called sensor fusion. During that process, both the Lidar and the Color Image sensors come into play to work together.

    One last note before we get into the topic! I promise that I will provide all the theoretical information about each subtopic at a basic level throughout this series 🙂 However, I will also be leaving very high-quality references for each subtopic without forgetting those who want more in-depth information.

    Introduction

    KITTI or KITTI Vision Benchmark Suite is a project created in collaboration with Karlsruhe Institute of Technology and Toyota Research Institute. We can say that it is a platform that includes many different test scenarios, including 2D/3D object detection, multi-object tracking, semantic segmentation, and so forth.

    For 3D object detection, which is the subject of this article series, there are 7481 training and 7518 test data from different sensors, which are Velodyne Lidar Sensor and Stereo Color Image Sensors.

    A sample image for 3D Object Detection [3] (Image Taken from https://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d)

    In this blog post, we will perform obstacle detection using Velodyne Lidar point clouds. In this context, reading point clouds, visualization, and segmentation with the help of unsupervised machine learning algorithms will be the main topics. In addition to these, we will talk a lot about camera calibration and its internal and external parameters, the RANSAC algorithm for vehicle path detection, and basic evaluation metrics to measure the performance of the outputs that we will need while performing these steps.

    Also, I will be using Python language throughout this series, but don’t worry, I will share with you the information about the virtual environment I use. This way, you can quickly get your own environment up and running. Please check the Github repo to get the requirements.txt file.

    Problem Definition

    The main goal of this blog post is to detect obstacles in the environment detected by the sensor using the unsupervised learning method on point clouds obtained with the Veloydne Lidar in the KITTI dataset.

    Within this scope, I am sharing an example Lidar point cloud image below to visualize the problem. If we analyze the following sample point cloud, we can easily recognize some cars at the left bottom or some other objects on the road.

    A sample Lidar point cloud [3] ( from KITTI dataset)

    To make it more visible, let me draw some arrows and boxes to show them. In the following image, red arrows indicate cars, orange arrows stand for pedestrians, and red boxes are drawn for street lambs.

    A sample Lidar point cloud ( from KITTI dataset)
    A sample Lidar point cloud [3] ( from KITTI dataset)

    Then, you may wonder and ask this question “Wouldn’t we also say that there are other objects around, perhaps walls or trees?” The answer is YES! The proof of my answer can be obtained from the color image corresponding to this point cloud. As can be seen from the image below, there are people, a car, street lights, and trees on the scene.

    A sample color image [3] ( from KITTI dataset)

    After this visual analysis, we come to a subject that careful readers will immediately notice. While the Lidar point cloud provides a 360-degree view of the scene, color image only provides a limited wide perception of the scene. The following blog will be taking only this colored image into consideration for object detection and the last one will try to fuse Lidar point cloud and color image sensors to handle the problem (I hope they will be available soon!)

    Sensor Setup

    Then let’s talk about the sensors and their installations and so on. The KITTI 3D object detection dataset was collected using a specially modified Volkswagen Passat B6. Data recording was handled by an eight-core i7 computer with a RAID system, running Ubuntu Linux alongside a real-time database for efficient data management.

    The following sensors were used for data collection:

    • Inertial Navigation System (GPS/IMU): OXTS RT 3003
    • Lidar Sensor: Velodyne HDL-64E
    • Grayscale Cameras: Two Point Grey Flea 2 (FL2–14S3M-C), each with 1.4 Megapixels
    • Color Cameras: Two Point Grey Flea 2 (FL2–14S3C-C), each with 1.4 Megapixels
    • Varifocal Lenses: Four Edmund Optics NT59–917 (4–8 mm)

    The visualization of the aforementioned setup is presented in the following figure.

    KITTI dataset setup visualization [3] ( Image Taken from KITTI)

    The Velodyne Lidar sensor and the Color cameras are installed on top of the car but their height from the ground and their coordinates are different than each other. No worries! As promised, we will go step by step. It means that, before getting the core of the algorithm of this blog post, we need to revisit the camera calibration topic first!

    Camera Calibration

    Cameras, or sensors in a broader sense, provide perceptual outputs of the surrounding environment in different ways. In this concept, let’s take an RGB camera, it could be your webcam or maybe a professional digital compact camera. It projects 3D points in the world onto a 2D image plane using two sets of parameters; the intrinsic and extrinsic parameters.

    Projection of 3D points in the world to the 2D image plane ( Image taken from: https://de.mathworks.com/help/vision/ug/camera-calibration.html)

    While the extrinsic parameters are about the location and the orientation of the camera in the world frame domain, the intrinsic parameters map the camera coordinates to the pixel coordinates in the image frame.

    In this concept, the camera extrinsic parameters can be represented as a matrix like T = [R | t ] where R stands for the rotation matrix, which is 3×3 and t stands for the translation vector, which is 3×1. As a result, the T matrix is a 3×4 matrix that takes a point in the world and maps it to the ‘camera coordinate’ domain.

    On the other hand, the camera’s intrinsic parameters can be represented as a 3×3 matrix. The corresponding matrix, K, can be given as follows. While fx and fy represent the focal length of the camera, cx and cy stand for principal points, and s indicates the skewness of the pixel.

    The camera’s intrinsic parameters

    As a result, any 3D point can be projectable to the 2D image plane via following complete camera matrix.

    The complete camera matrix to project a 3D world point into the image plane

    I know that camera calibration seems a little bit complicated especially if you encounter it for the first time. But I have searched for some really good references for you. Also, I will be talking about the applied camera calibration operations for our problem in the following sections.

    References for the camera calibration topic:

    — Carnegie Mellon University, https://www.cs.cmu.edu/~16385/s17/Slides/11.1_Camera_matrix.pdf

    — Columbia University, https://www.youtube.com/watch?v=GUbWsXU1mac

    — Camera Calibration Medium Post, https://yagmurcigdemaktas.medium.com/visual-perception-camera-calibration-9108f8be789

    Dataset Understanding

    After a couple of terminologies and the required basic theory, now we are able to get into the problem.

    First of all, I highly suggest you download the dataset from here [2] for the following ones;

    • Left Color Images (size is 12GB)
    • Velodyne Point Cloud (size is 29GB)
    • Camera Calibration Matrices of the Object Dataset (size is negligible)
    • Training Labels (size is negligible)

    The data that we are going to analyze is the ground truth (G.T.)label files. G.T. files are presented in ‘.txt’ format and each object is labeled with 15 different fields. No worries, I prepared a detailed G.T. file read function in my Github repo as follows.

    def parse_label_file(label_file_path):
    """
    KITTI 3D Object Detection Label Fields:

    Each line in the label file corresponds to one object in the scene and contains 15 fields:

    1. Type (string):
    - The type of object (e.g., Car, Van, Truck, Pedestrian, Cyclist, etc.).
    - "DontCare" indicates regions to ignore during training.

    2. Truncated (float):
    - Value between 0 and 1 indicating how truncated the object is.
    - 0: Fully visible, 1: Completely truncated (partially outside the image).

    3. Occluded (integer):
    - Level of occlusion:
    0: Fully visible.
    1: Partly occluded.
    2: Largely occluded.
    3: Fully occluded (annotated based on prior knowledge).

    4. Alpha (float):
    - Observation angle of the object in the image plane, ranging from [-π, π].
    - Encodes the orientation of the object relative to the camera plane.

    5. Bounding Box (4 floats):
    - (xmin, ymin, xmax, ymax) in pixels.
    - Defines the 2D bounding box in the image plane.

    6. Dimensions (3 floats):
    - (height, width, length) in meters.
    - Dimensions of the object in the 3D world.

    7. Location (3 floats):
    - (x, y, z) in meters.
    - 3D coordinates of the object center in the camera coordinate system:
    - x: Right, y: Down, z: Forward.

    8. Rotation_y (float):
    - Rotation around the Y-axis in camera coordinates, ranging from [-π, π].
    - Defines the orientation of the object in 3D space.

    9. Score (float) [optional]:
    - Confidence score for detections (used for results, not training).

    Example Line:
    Car 0.00 0 -1.82 587.00 156.40 615.00 189.50 1.48 1.60 3.69 1.84 1.47 8.41 -1.56

    Notes:
    - "DontCare" objects: Regions ignored during training and evaluation. Their bounding boxes can overlap with actual objects.
    - Camera coordinates: All 3D values are given relative to the camera coordinate system, with the camera at the origin.
    """

    The color images are presented as files in the folder and they can be read easily, which means without any further operations. As a result of this operation, it can be that # of training and testing images: 7481 / 7518

    The next data that we will be taking into consideration is the calibration files for each scene. As I did before, I prepared another function to parse calibration files as follows.

    def parse_calib_file(calib_file_path):
    """
    Parses a calibration file to extract and organize key transformation matrices.

    The calibration file contains the following data:
    - P0, P1, P2, P3: 3x4 projection matrices for the respective cameras.
    - R0: 3x3 rectification matrix for aligning data points across sensors.
    - Tr_velo_to_cam: 3x4 transformation matrix from the LiDAR frame to the camera frame.
    - Tr_imu_to_velo: 3x4 transformation matrix from the IMU frame to the LiDAR frame.

    Parameters:
    calib_file_path (str): Path to the calibration file.

    Returns:
    dict: A dictionary where each key corresponds to a calibration parameter
    (e.g., 'P0', 'R0') and its value is the associated 3x4 NumPy matrix.

    Process:
    1. Reads the calibration file line by line.
    2. Maps each line to its corresponding key ('P0', 'P1', etc.).
    3. Extracts numerical elements, converts them to a NumPy 3x4 matrix,
    and stores them in a dictionary.

    Example:
    Input file line for 'P0':
    P0: 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0
    Output dictionary:
    {
    'P0': [[1.0, 0.0, 0.0, 0.0],
    [0.0, 1.0, 0.0, 0.0],
    [0.0, 0.0, 1.0, 0.0]]
    }
    """

    The final data is the Velodyne point cloud and they are presented in ‘.bin’ format. In this format, each point cloud line consists of the location of x, y, and z plus the reflectivity score. As before, the corresponding parse function is as follows.

    def read_velodyne_bin(file_path):
    """
    Reads a KITTI Velodyne .bin file and returns the point cloud data as a numpy array.

    :param file_path: Path to the .bin file
    :return: Numpy array of shape (N, 4) where N is the number of points,
    and each point has (x, y, z, reflectivity)

    ### For KITTI's Velodyne LiDAR point cloud, the coordinate system used is forward-right-up (FRU).
    KITTI Coordinate System (FRU):
    X-axis (Forward): Points in the positive X direction move forward from the sensor.
    Y-axis (Right): Points in the positive Y direction move to the right of the sensor.
    Z-axis (Up): Points in the positive Z direction move upward from the sensor.


    ### Units: All coordinates are in meters (m). A point (10, 5, 2) means:

    It is 10 meters forward.
    5 meters to the right.
    2 meters above the sensor origin.
    Reflectivity: The fourth value in KITTI’s .bin files represents the reflectivity or intensity of the LiDAR laser at that point. It is unrelated to the coordinate system but adds extra context for certain tasks like segmentation or object detection.

    Velodyne Sensor Placement:

    The LiDAR sensor is mounted on a vehicle at a specific height and offset relative to the car's reference frame.
    The point cloud captures objects relative to the sensor’s position.

    """

    At the end of this section, all the required files will be loaded and ready to be used.

    For the sample scene, which was presented at the top of this post in the ‘Problem Definition’ section, there are 122794 points in the point cloud.

    But since that amount of information could be hard to analyze for some systems in terms of CPU or GPU power, we may want to reduce the number of points in the cloud. To make it possible we can use the “Voxel Downsampling” operation, which is similar to the “Pooling” operation in deep neural networks. Roughly it divides the complete point cloud into a grid of equally sized voxels and chooses a single point from each voxel.

    print(f"Points before downsampling: {len(sample_point_cloud.points)} ")
    sample_point_cloud = sample_point_cloud.voxel_down_sample(voxel_size=0.2)
    print(f"Points after downsampling: {len(sample_point_cloud.points)}")

    The output of this downsampling looks like this;

    Points before downsampling: 122794
    Points after downsampling: 33122

    But it shouldn’t be forgotten that reducing the number of points may cause to loss of some information as might be expected. Also, the voxel grid size is a hyper-parameter that we can choose is another crucial thing. Smaller sizes return a high number of points or vice versa.

    But, before getting into the road segmentation by RANSAC, let’s quickly re-visit the Voxel Downsampling operation together.

    Voxel Downsampling

    Voxel Downsampling is a technique to create a downsampled point cloud. It highly helps to reduce some noise and not-required points. It also reduces the required computational power in light of the selected voxel grid size hyperparameter. The visualization of this operation can be given as follows.

    The illustration of Voxel Downsampling (Image taken from https://www.mdpi.com/2076-3417/14/8/3160)

    Besides that, the steps of this algorithm can be presented as follows.

    To apply this function, we will be using the “open3d” library with a single line;

    sample_point_cloud = sample_point_cloud.voxel_down_sample(voxel_size=0.2)

    In the above single-line code, it can be observed that the voxel size is chosen as 0.2

    RANSAC

    The next step will be segmenting the largest plane, which is the road for our problem. RANSAC, Random Sample Consensus, is an iterative algorithm and works by randomly sampling a subset of the data points to hypothesize a model and then evaluating its fit to the entire dataset. It aims to find the model that best explains the inliers while ignoring the outliers.

    While the algorithm is highly robust to the extreme outliers, it requires to sample of n points at the beginning (n=2 for a 2D line or 3 for a 3D plane). Then evaluates the performance of the mathematical equation with respect to it. Then it means;

    — the chosen points at the beginning are so crucial

    — the number of iterations to find the best values is so crucial

    — it may require some computation power, especially for large datasets

    But it’s a kind of de-facto operation for many different cases. So first let’s visualize the RANSAC to find a 2D line then let me present the key steps of this algorithm.

    The key steps and working flow of the RANSAC algorithm

    After reviewing the concept of RANSAC, it is time to apply the algorithm on the point cloud to determine the largest plane, which is a road, for our problem.

    # 3. RANSAC Segmentation to identify the largest plane
    plane_model, inliers = sample_point_cloud.segment_plane(distance_threshold=0.3, ransac_n=3, num_iterations=150)

    ## Identify inlier points -> road
    inlier_cloud = sample_point_cloud.select_by_index(inliers)
    inlier_cloud.paint_uniform_color([0, 1, 1]) # R, G, B format

    ## Identify outlier points -> objects on the road
    outlier_cloud = sample_point_cloud.select_by_index(inliers, invert=True)
    outlier_cloud.paint_uniform_color([1, 0, 0]) # R, G, B format

    The output of this process will show the outside of the road in red and the road will be colored in a mixture of Green and Blue.

    The output of the RANSAC algorithm (Image taken from KITTI dataset [3])

    DBSCAN — a density-based clustering non-parametric algorithm

    At this stage, the detection of objects outside the road will be performed using the segmented version of the road with RANSAC.

    In this context, we will be using unsupervised learning algorithms. However, the question that may come to mind here is “Can’t a detection be made using supervised learning algorithms?” The answer is very short and clear: Yes! However, since we want to introduce the problem and get a quick result with this blog post, we will continue with DBSCAN, which is a segmentation algorithm in the unsupervised learning domain. If you would like to see the results with a supervised learning-based object detection algorithm on point clouds, please indicate this in the comments.

    Anyway, let’s try to answer these three questions: What is DBSCAN and how does it work? What are the hyper-parameters to consider? How do we apply it to this problem?

    DBSCAN also known as a density-based clustering non-parametric algorithm, is an unsupervised clustering algorithm. Even if there are some other unsupervised clustering algorithms, maybe one of the most popular ones is K-Means, DBSCAN is capable of clustering the objects in arbitrary shape while K-Means asumes the shape of the object is spherical. Moreover, probably the most important feature of DBSCAN is that it does not require the number of clusters to be defined/estimated in advance, as in the K-Means algorithm. If you would like to analyze some really good visualizations for some specific problems like “2Moons”, you can visit here: https://www.kaggle.com/code/ahmedmohameddawoud/dbscan-vs-k-means-visualizing-the-difference

    DBSCAN works like our eyes. It means it takes the densities of different groups in the data and then makes a decision for clustering. It has two different hyper-parameters: “Epsilon” and “MinimumPoints”. Initially, DBSCAN identifies core points, which are points with at least a minimum number of neighbors (minPts) within a specified radius (epsilon). Clusters are then formed by expanding from these core points, connecting all reachable points within the density criteria. Points that cannot be connected to any cluster are classified as noise. To get in-depth information about this algorithm like ‘Core Point’, ‘Border Point’ and ‘Noise Point’ please visit there: Josh Starmer, https://www.youtube.com/watch?v=RDZUdRSDOok&t=61s

    A sample clustering result of the DBSCAN algorithm

    For our problem, while we can use DBSCAN from the SKLearn library, let’s use the open3d as follows.

    # 4. Clustering using DBSCAN -> To further segment objects on the road
    with o3d.utility.VerbosityContextManager(o3d.utility.VerbosityLevel.Debug) as cm:
    labels = np.array(outlier_cloud.cluster_dbscan(eps=0.45, min_points=10, print_progress=True))

    As we can see, ‘epsilon’ was chosen as 0.45, and ‘MinPts’ was chosen as 10. A quick comment about these. Since they are hyper-parameters, there are no best “numbers” out there. Unfortunately, it’s a matter of trying and measuring success. But no worries! After you read the last chapter of this blog post, “Evaluation Metrics”, you will be able to measure your algorithm’s performance in total. Then it means you can apply GridSearch ( ref: https://www.analyticsvidhya.com/blog/2021/06/tune-hyperparameters-with-gridsearchcv/) to find the best hyper-param pairs!

    Yep, then let me visualize the output of DBCAN for our point cloud then let’s move to the next step!

    The output of the DBSCAN clustering algorithm (Image taken from KITTI dataset [3])

    To recall, we can see that some of the objects that I first showed and marked by hand are separate and in different colors here! This shows that these objects belong to different clusters (as it should be).

    G.T. Labels and Their Calibration Process

    Now it’s time to analyze G.T. labels and Calibration files of the KITTI 3D Object Detection benchmark. In the previous section, I shared some tips about them like how to read, how to parse, etc.

    But now I want to mention the relation between the G.T. object and the Calibration matrices. First of all, let me share a figure of the G.T. file and the Calibration file side by side.

    A sample training label file in .txt format

    As we discussed before, the last element of the training label refers to the rotation of the object around the y-axis. The three numbers before the rotation element (1.84, 1.47, and 8.41) stand for the 3D location of the object’s centroid in the camera coordinate system.

    A sample calibration file in .txt format

    On the calibration file side; P0, P1, P2, and P3 are the camera projection matrices for their corresponding cameras. In this blog post, as we indicated before, we are using the ‘Left Color Images’ which is equal to P2. Also, R0_rect is a rectification matrix for aligning stereo images. As can be understood from their names, Tr_velo_to_cam and Tr_imu_to_velo are transformation matrices that will be used to provide the transition between different coordinate systems. For example, Tr_velo_to_cam is a transformation matrix converting Velodyne coordinates to the unrectified camera coordinate system.

    After this explanation, I really paid attention to which matrix or which label in the which coordinate system, now we can mention the transformation of G.T. object coordinates to the Velodyne coordinate system easily. It’s a good point to both understand the use of matrices between coordinate systems and evaluate our predicted bounding boxes and G.T. object bounding boxes.

    The first thing that we will be doing is computing the G.T. object bounding box in 3D. To do so, you can reach out to the following function in the repo.

    def compute_box_3d(obj, Tr_cam_to_velo):
    """
    Compute the 8 corners of a 3D bounding box in Velodyne coordinates.
    Args:
    obj (dict): Object parameters (dimensions, location, rotation_y).
    Tr_cam_to_velo (np.ndarray): Camera to Velodyne transformation matrix.
    Returns:
    np.ndarray: Array of shape (8, 3) with the 3D box corners.
    """

    Given an object’s dimensions (height, width, length) and position (x, y, z) in the camera coordinate system, this function first rotates the bounding box based on its orientation (rotation_y) and then computes the corners of the box in 3D space.

    This computation is based on the transformation that uses a matrix that is capable of transferring any point from the camera coordinate system to the Velodyne coordinate system. But, wait? We don’t have the camera to Velodyne matrix, do we? Yes, we need to calculate it first by taking the inverse of the Tr_velo_to_cam matrix, which is presented in the calibration files.

    No worries, all this workflow is presented by these functions.

    def transform_points(points, transformation):
    """
    Apply a transformation matrix to 3D points.
    Args:
    points (np.ndarray): Nx3 array of 3D points.
    transformation (np.ndarray): 4x4 transformation matrix.
    Returns:
    np.ndarray: Transformed Nx3 points.
    """
    def inverse_rigid_trans(Tr):
    """
    Inverse a rigid body transform matrix (3x4 as [R|t]) to [R'|-R't; 0|1].
    Args:
    Tr (np.ndarray): 4x4 transformation matrix.
    Returns:
    np.ndarray: Inverted 4x4 transformation matrix.
    """

    In the end, we can easily see the G.T. objects and project them into the Velodyne point cloud coordinate system. Now let’s visualize the output and then jump into the evaluation section!

    The projected G.T. object bounding boxes (Image taken from KITTI dataset [3])

    (I know the green bounding boxes can be a little hard to see, so I added arrows next to them in black.)

    Evaluation Metrics

    Now we have the predicted bounding boxes by our pipeline and G.T. object boxes! Then let’s calculate some metrics to evaluate our pipeline. In order to perform the hyperparameter optimization that we talked about earlier, we must be able to continuously monitor our performance for each parameter group.

    But before getting into the evaluation metric I need to mention two things. First of all, KITTI has different evaluation criteria for different objects. For example, while a 50% match between the labels produced for pedestrians and G.T. is sufficient, it is 70% for vehicles. Another issue is that while the pipeline we created performs object detection in a 360-degree environment, the KITTI G.T. labels only include the label values ​​of the objects in the viewing angle of the color cameras. Consequently, we can detect more bounding boxes than presented in G.T. label files. So what to do? Based on the concepts I will talk about here, you can reach the final result by carefully analyzing KITTI’s evaluation criteria. But for now, I will not do a more detailed analysis in this section for the continuation posts of this Medium blog post series.

    To evaluate the predicted bounding boxes and G.T. bounding boxes, we will be using the TP, FP, and FN metrics.

    TP represents the predicted boxes that match with G.T. boxes, FP stands for the predicted boxes that do NOT match with any G.T. boxes, and FN is the condition that there are no corresponding predicted bounding boxes for G.T. bounding boxes.

    In this context, of course, we need to find a tool to measure how a predicted bounding box and a G.T. bounding box match. The name of our tool is IOU, intersected over union.

    You can easily reach out to the IOU and evaluation functions as follows.

    def compute_iou(box1, box2):
    """
    Calculate the Intersection over Union (IoU) between two bounding boxes.
    :param box1: open3d.cpu.pybind.geometry.AxisAlignedBoundingBox object for the first box
    :param box2: open3d.cpu.pybind.geometry.AxisAlignedBoundingBox object for the second box
    :return: IoU value (float)
    """
    # Function to evaluate metrics (TP, FP, FN)
    def evaluate_metrics(ground_truth_boxes, predicted_boxes, iou_threshold=0.5):
    """
    Evaluate True Positives (TP), False Positives (FP), and False Negatives (FN).
    :param ground_truth_boxes: List of AxisAlignedBoundingBox objects for ground truth
    :param predicted_boxes: List of AxisAlignedBoundingBox objects for predictions
    :param iou_threshold: IoU threshold for a match
    :return: TP, FP, FN counts
    """

    Let me finalize this section by giving predicted bounding boxes (RED) and G.T. bounding boxes (GREEN) over the point cloud.

    Predicted bounding boxes and G.T. bounding boxes are presented together on the point cloud (Image taken from KITTI dataset [3])

    Conclusion

    Yeah, it’s a little bit long, but we are about to finish it. First, we have learned a couple of things about the KITTI 3D Object Detection Benchmark and some terminology about different topics, like camera coordinate systems and unsupervised learning, etc.

    Now interested readers can extend this study by adding a grid search to find the best hyper-param elements. For example, the number of minimum points in segmentation, or maybe the # of iteration RANSAC or the voxel grid size in Voxel Downsampling operation, all could be possible improvement points.

    What’s next?

    The next part will be investigating object detection on ONLY Left Color Camera frames. This is another fundamental step of this series cause we will be fusing the Lidar Point Cloud and Color Camera frames in the last part of this blog series. Then we will be able to make a conclusion and answer this question: “Does Sensor Fusion reduce the uncertainty and improve the performance in KITTI 3D Object Detection Benchmark?”

    Any comments, error fixes, or improvements are welcome!

    Thank you all and I wish you healthy days.

    ********************************************************************************************************************************************************

    Github link: https://github.com/ErolCitak/KITTI-Sensor-Fusion/tree/main/lidar_based_obstacle_detection

    References

    [1] — https://www.cvlibs.net/datasets/kitti/

    [2] — https://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d

    [3] — Geiger, Andreas, et al. “Vision meets robotics: The kitti dataset.” The International Journal of Robotics Research 32.11 (2013): 1231–1237.

    Disclaimer

    The images used in this blog series are taken from the KITTI dataset for education and research purposes. If you want to use it for similar purposes, you must go to the relevant website, approve the intended use there, and use the citations defined by the benchmark creators as follows.

    For the stereo 2012, flow 2012, odometry, object detection, or tracking benchmarks, please cite:
    @inproceedings{Geiger2012CVPR,
    author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
    title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
    booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2012}
    }

    For the raw dataset, please cite:
    @article{Geiger2013IJRR,
    author = {Andreas Geiger and Philip Lenz and Christoph Stiller and Raquel Urtasun},
    title = {Vision meets Robotics: The KITTI Dataset},
    journal = {International Journal of Robotics Research (IJRR)},
    year = {2013}
    }

    For the road benchmark, please cite:
    @inproceedings{Fritsch2013ITSC,
    author = {Jannik Fritsch and Tobias Kuehnl and Andreas Geiger},
    title = {A New Performance Measure and Evaluation Benchmark for Road Detection Algorithms},
    booktitle = {International Conference on Intelligent Transportation Systems (ITSC)},
    year = {2013}
    }

    For the stereo 2015, flow 2015, and scene flow 2015 benchmarks, please cite:
    @inproceedings{Menze2015CVPR,
    author = {Moritz Menze and Andreas Geiger},
    title = {Object Scene Flow for Autonomous Vehicles},
    booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2015}
    }


    Sensor Fusion — KITTI — ‘Lidar-based Obstacle Detection’ — Part-1 was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    Sensor Fusion — KITTI — ‘Lidar-based Obstacle Detection’ — Part-1

    Go Here to Read this Fast! Sensor Fusion — KITTI — ‘Lidar-based Obstacle Detection’ — Part-1

  • Start a New Year of Learning on the Right Foot

    TDS Editors

    Feeling inspired to write your first TDS post? We’re always open to contributions from new authors.

    Happy new year! Welcome back to the Variable!

    The ink has barely dried on our 2024 highlights roundup (it’s never too late to browse it, of course), and here we are, ready to dive headfirst into a fresh year of learning, growth, and exploration.

    We have a cherished tradition of devoting the first edition of the year to our most inspiring—and accessible—resources for early-stage data science and machine learning professionals (we really do!). We continue it this year with a selection of top-notch recent articles geared at beginner-level learners and job seekers. For the rest of our readers, we’re thrilled to kick things off with a trio of excellent posts from industry veterans who reflect on the current state of data science and AI, and share their opinionated, bold predictions for what the year ahead might look like. Let’s get started!

    2025: Ready, Set, Go!

    Photo by Annie Spratt on Unsplash

    Data science and machine learning, step by step by step

    Thank you for supporting the work of our authors! As we mentioned above, we love publishing articles from new authors; if contributing to TDS in 2025 is one of your new year’s resolutions—or even if you’ve just recently written an interesting project walkthrough, tutorial, or theoretical reflection on any of our core topics—don’t hesitate to share it with us.

    Until the next Variable,

    TDS Team


    Start a New Year of Learning on the Right Foot was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    Start a New Year of Learning on the Right Foot

    Go Here to Read this Fast! Start a New Year of Learning on the Right Foot