LiPCoT

LiPCoT: Linear Predictive Coding based Tokenizer for Self-Supervised Learning of Time Series Data via BERT

LiPCoT (Linear Predictive Coding based Tokenizer for time series) is a novel tokenizer that encodes time series data into a sequence of tokens, enabling self-supervised learning of time series using existing Language model architectures such as BERT.

Main Article: LiPCoT: Linear Predictive Coding based Tokenizer for Self-supervised Learning of Time Series Data via Language Models

Citation

If you use this dataset or code in your research, please cite the following paper:

    @misc{anjum2024lipcot,
        title={LiPCoT: Linear Predictive Coding based Tokenizer for Self-supervised Learning of Time Series Data via Language Models},
        author={Md Fahim Anjum},
        year={2024},
        eprint={2408.07292},
        archivePrefix={arXiv},
        primaryClass={cs.LG}
    }

Dataset

We use EEG dataset of 28 PD and 28 control participants.

How to run

Steps

1. Data Preparation

First, the data must be processed. data_processing notebook loads raw data and prepares training,validation and test dataset.

2. Tokenization via LiPCoT

data_tokenizer notebook tokenizes the data using LiPCoT model

3. Prepare tokenized dataset for BERT

data_prepare notebook prepares datasets for BERT models. If you are downloading from GitHub, up to this step is done for you.

4. Self-supervised learning via BERT

pretrain_bert notebook conducts pretraining of BERT model.

If you are running code with data from GitHub, start with this step.

5. Classification task: with pretrained BERT

finetune_bert notebook conducts fine-tune of BERT model for binary classification

6. Classification task: without pretraining

finetune_bert_without_pretrain notebook uses a randomly initialized BERT model and fine tunes it for classification

7. Classification task: CNN-based architectures

  1. cnn_classifier notebook uses CNN model as described in Oh et. al. (2018)
  2. deepnet_classifier notebook uses Deep Convolutional Network as described in Schirrmeister et. al. (2017)
  3. shallownet_classifier notebook uses Shallow Convolutional Network as described in Schirrmeister et. al. (2017)
  4. eegnet_classifier notebook uses EEGNet as described in here