LiPCoT (Linear Predictive Coding based Tokenizer for time series) is a novel tokenizer that encodes time series data into a sequence of tokens, enabling self-supervised learning of time series using existing Language model architectures such as BERT.
Main Article: LiPCoT: Linear Predictive Coding based Tokenizer for Self-supervised Learning of Time Series Data via Language Models
If you use this dataset or code in your research, please cite the following paper:
@misc{anjum2024lipcot,
title={LiPCoT: Linear Predictive Coding based Tokenizer for Self-supervised Learning of Time Series Data via Language Models},
author={Md Fahim Anjum},
year={2024},
eprint={2408.07292},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
We use EEG dataset of 28 PD and 28 control participants.
data/raw
folderFirst, the data must be processed. data_processing
notebook loads raw data and prepares training,validation and test dataset.
data_tokenizer
notebook tokenizes the data using LiPCoT model
data_prepare
notebook prepares datasets for BERT models. If you are downloading from GitHub, up to this step is done for you.
pretrain_bert
notebook conducts pretraining of BERT model.
If you are running code with data from GitHub, start with this step.
finetune_bert
notebook conducts fine-tune of BERT model for binary classification
finetune_bert_without_pretrain
notebook uses a randomly initialized BERT model and fine tunes it for classification
cnn_classifier
notebook uses CNN model as described in Oh et. al. (2018)deepnet_classifier
notebook uses Deep Convolutional Network as described in Schirrmeister et. al. (2017)shallownet_classifier
notebook uses Shallow Convolutional Network as described in Schirrmeister et. al. (2017)eegnet_classifier
notebook uses EEGNet as described in here