Ion Switching Challenge

Subhankar Halder
3 min readMar 3, 2020

During my time at the university, my roommate would often take me to his research lab. He was a graduate student at the Department of Computational Biology and frequently spent the nights working on his protein design experiments. His work involved designing proteins to change the behavior of ion channels.

Ion Channels act like gatekeepers — restricting and granting access to electrically charged particles called ions to pass through a channel pore. Ion Channels are an integral part of all humans and a detailed study could help us develop pharmaceutical products for various diseases.

When Kaggle started the Ion Switching Challenge, I recalled my days at my roommate’s lab and decided to take a shot at the competition. The challenge is to predict number of open channels based on signal data.

Dataset

The dataset for this competition is not complicated. Here’s a snapshot of the training data:

Train Data Snapshot

So, we just have three columns: time, signal and open_channels. We do have a lot of rows.

Here’s a count of open channels in the training set:

In this competition, we need to predict the number of open channels for the signal and time data. Here’s a snapshot of the test data:

Test Data Snapshot

LightGBM + XGBoost + CatBoost

My initial plan was to use XGBoost as a solution to this competition problem. Boosting algorithms usually does well for this kind of data. However, one of the Kagglers shared a kernel that ensembled LightGBM, XGBoost and CatBoost all together. I implemented the shared kernel and got a better score than my single XGBoost model.

I found some Kagglers using UNet to make channel predictions and I was quite taken aback. To my knowledge, UNet is usually used to segment images. This was the first time I saw an image neural network being used to predict structured data.

I asked this question on the Kaggle discussion forums as to how such networks are being used to predict ion channel data. The host of the competition replied that images are just data — so one could arrange the data of this competition in the form of an image array and make predictions. And it struck me that he was right! Neural networks make predictions on array data. I was like, “Of course! Why didn’t I think of this? I am so dumb!”

These are still early days for the competition. I am thinking of implementing CNNs, time series models and LSTMs in the future. Wish me luck!

--

--