Practical Tips With Google Colab


1. TL;DR: What happened?

Here’s a collection of useful Google Colab tips/code snippets.

2. Resources

3. Working with files

Scenario: Someone shared a Google Drive link for a dataset (.csv) with you, and you want to read into Colab for processing.


from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate with PyDrive
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Read file into wrapper class
downloaded = drive.CreateFile({"id": "INSERT_GOOGLE_DRIVE_FILE_ID_HERE"})

# Get the actual content of the file 
downloaded.GetContentFile("FILE_NAME.csv")

# Check that file is actually present in root directory 
!ls

Note: the ID of the file is the string between https://drive.google.com/file/d/ and /view?usp=sharing in the shared link.

You will be required to enter an authorization code from your Google Account. This does NOT mean that the file is copied onto your Google Drive.

Download a file to your computer

(TO DO)

4. Imports/Libraries

Installing an older version of pytorch


!pip install torch==1.3.0 torchvision==0.4.1

Importing an older version of TensorFlow

Colab has default TensorFlow 1.x and 2,x installed.

TensorFlow 2.x is the default. To choose, use the %tensorflow_version magic command.


%tensorflow_version 1.x
import tensorflow
print(tensorflow.__version__)

DO NOT use !pip install to specify a particular TensorFlow version for GPU/TPU backends. Colab builds TensorFlow from source to ensure compatibility with our fleet of accelerators. Versions of TensorFlow fetched from PyPI by pip may suffer from performance problems or may not work at all.

Running FastAI


from fastbook import *
from fastai.vision.widgets import *

Using Pyspark

to finish and check, spark/hadoop version may be outdated.

See tutorial here for setup + quick regression example

# get pyspark  
!apt-get install openjdk-8-jdk-headless -qq > /dev/null

#Note, check https://spark.apache.org/downloads.html for version/URL changes
!wget https://downloads.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz


!tar xf spark-2.4.1-bin-hadoop2.7.tgz
!pip install -q findspark

#configure path for JAVA/Spark 
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.3.2-bin-hadoop2.7"

#
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()

Author: Zhao Du
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source Zhao Du !
  TOC