The Missouri Senate introduced SB 194 on Dec. 1, proposing to ban central bank digital currencies (CBDCs) as legal tender within the state. The bill seeks to prohibit public entities from accepting or using CBDCs and modifies the definition of “money” under the Uniform Commercial Code to exclude these digital currencies. Sponsored by Senator Brattin, […]
President Donald Trump is reportedly considering Paul Atkins, a pro-crypto advocate, for the role of SEC Chairman.
Known for his stance on reducing restrictive regulations, Atkins’ potential le
Altcoins are gaining momentum, sparking debates on what drives this new market phase.
Experts highlight market trends and sentiment as key factors behind altcoin strength.
It’s crazy times in crypto! Indeed, it’s that time of year. With SHIB’s price not offering much promise, the PEPE coin seeks to dethrone SHIB, but its holders are beginning to cause a rally for Lunex. What’s not to like? Lunex Network has outperformed expectations in presale, is growing at a premium, and offers impressive […]
So, you recently discovered Hugging Face and the host of open source models like BERT, Llama, BART and a whole host of generative language models by Mistral AI, Facebook, Salesforce and other companies. Now you want to experiment with fine tuning some Large Language Models for your side projects. Things start off great, but then you discover how computationally greedy they are and you do not have a GPU processor handy.
Google Colab generously offers you a way to access to free computation so you can solve this problem. The downside is, you need to do it all inside a transitory browser based environment. To make matter worse, the whole thing is time limited, so it seems like no matter what you do, you are going to lose your precious fine tuned model and all the results when the kernel is eventually shut down and the environment nuked.
Never fear. There is a way around this: make use of Google Drive to save any of your intermediate results or model parameters. This will allow you to continue experimentation at a later stage, or take and use a trained model for inference elsewhere.
To do this you will need a Google account that has sufficient Google Drive space for both your training data and you model checkpoints. I will presume you have created a folder called data in Google Drive containing your dataset. Then another called checkpoints that is empty.
Inside your Google Colab Notebook you then mount your Drive using the following command:
from google.colab import drive drive.mount('/content/drive')
You now list the contents of your data and checkpoints directories with the following two commands in a new cell:
If these commands work then you now have access to these directories inside your notebook. If the commands do not work then you might have missed the authorisation step. The drive.mount command above should have spawned a pop up window which requires you to click through and authorise access. You may have missed the pop up, or not selected all of the required access rights. Try re-running the cell and checking.
Once you have that access sorted, you can then write your scripts such that models and results are serialised into the Google Drive directories so they persist over sessions. In an ideal world, you would code your training job so that any script that takes too long to run can load partially trained models from the previous session and continue training from that point.
A simple way for achieving that is creating a save and load function that gets used by your training scripts. The training process should always check if there is a partially trained model, before initialising a new one. Here is an example save function:
def save_checkpoint(epoch, model, optimizer, scheduler, loss, model_name, overwrite=True): checkpoint = { 'epoch': epoch, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'scheduler_state_dict': scheduler.state_dict(), 'loss': loss } direc = get_checkpoint_dir(model_name) if overwrite: file_path = direc + '/checkpoint.pth' else: file_path = direc + '/epoch_'+str(epoch) + '_checkpoint.pth' if not os.path.isdir(direc): try: os.mkdir(direc) except: print("Error: directory does not exist and cannot be created") file_path = direc +'_epoch_'+str(epoch) + '_checkpoint.pth' torch.save(checkpoint, file_path) print(f"Checkpoint saved at epoch {epoch}")
In this instance we are saving the model state along with some meta-data (epochs and loss) inside a dictionary structure. We include an option to overwrite a single checkpoint file, or create a new file for every epoch. We are using the torch save function, but in principle you could use other serialisation methods. The key idea is that your program opens the file and determines how many epochs of training were used for the existing file. This allows the program to decide whether to continue training or move on.
Similarly, in the load function we pass in a reference to a model we wish to use. If there is already a serialised model we load the parameters into our model and return the number of epochs it was trained for. This epoch value will determine how many additional epochs are required. If there is no model then we get the default value of zero epochs and we know the model still has the parameters it was initialised with.
def load_checkpoint(model_name, model, optimizer, scheduler): direc = get_checkpoint_dir(model_name) if os.path.exists(direc): file_path = get_path_with_max_epochs(direc) checkpoint = torch.load(file_path, map_location=torch.device('cpu')) model.load_state_dict(checkpoint['model_state_dict']) optimizer.load_state_dict(checkpoint['optimizer_state_dict']) scheduler.load_state_dict(checkpoint['scheduler_state_dict']) epoch = checkpoint['epoch'] loss = checkpoint['loss'] print(f"Checkpoint loaded from {epoch} epoch") return epoch, loss else: print(f"No checkpoint found, starting from epoch 1.") return 0, None
These two functions will need to be called inside your training loop, and you need to ensure that the returned value for epochs value is used to update the value of epochs in your training iterations. The result is you now have a training process that can be re-started when a kernel dies, and it will pick up and continue from where it left off.
That core training loop might look something like the following:
EPOCHS = 10 for exp in experiments: model, optimizer, scheduler = initialise_model_components(exp) train_loader, val_loader = generate_data_loaders(exp) start_epoch, prev_loss = load_checkpoint(exp, model, optimizer, scheduler) for epoch in range(start_epoch, EPOCHS): print(f'Epoch {epoch + 1}/{EPOCHS}') # ALL YOUR TRAINING CODE HERE save_checkpoint(epoch + 1, model, optimizer, scheduler, train_loss, exp)
Note: In this example I am experimenting with training multiple different model setups (in a list called experiments), potentially using different training datasets. The supporting functions initialise_model_components and generate_data_loaders are taking care of ensuring that I get the correct model and data for each experiment.
The core training loop above allows us to reuse the overall code structure that trains and serialises these models, ensuring that each model gets to the desired number of epochs of training. If we restart the process, it will iterate through the experiment list again, but it will abandon any experiments that have already reached the maximum number of epochs.
Hopefully you can use this boilerplate code to setup your own process for experimenting with training some deep learning language models inside Google Colab. Please comment and let me know what you are building and how you use this code.
Massive thank you to Aditya Pramar for his initial scripts that prompted this piece of work.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.