LiPCoT (Linear Predictive Coding based Tokenizer for time series) is a novel tokenizer that encodes time series data into a sequence of tokens, enabling self-supervised learning of time series using existing Language model architectures such as BERT.
Main Article: LiPCoT: Linear Predictive Coding based Tokenizer for Self-supervised Learning of Time Series Data via Language Models
If you use this dataset or code in your research, please cite the following paper:
@misc{anjum2024lipcot,
title={LiPCoT: Linear Predictive Coding based Tokenizer for Self-supervised Learning of Time Series Data via Language Models},
author={Md Fahim Anjum},
year={2024},
eprint={2408.07292},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
We use EEG dataset of 28 PD and 28 control participants.
data/raw folderFirst, the data must be processed. data_processing notebook loads raw data and prepares training,validation and test dataset.
data_tokenizer notebook tokenizes the data using LiPCoT model
data_prepare notebook prepares datasets for BERT models. If you are downloading from GitHub, up to this step is done for you.
pretrain_bert notebook conducts pretraining of BERT model.
If you are running code with data from GitHub, start with this step.
finetune_bert notebook conducts fine-tune of BERT model for binary classification
finetune_bert_without_pretrain notebook uses a randomly initialized BERT model and fine tunes it for classification
cnn_classifier notebook uses CNN model as described in Oh et. al. (2018)deepnet_classifier notebook uses Deep Convolutional Network as described in Schirrmeister et. al. (2017)shallownet_classifier notebook uses Shallow Convolutional Network as described in Schirrmeister et. al. (2017)eegnet_classifier notebook uses EEGNet as described in here