Practical Tips With Google Colab

notes

Publish Date: 2021-07-08

1. TL;DR: What happened?

Here’s a collection of useful Google Colab tips/code snippets.

2. Resources

Neptune.ai article for dealing with files in Colab (TODO: check this thing out!)
Colab’s official documentation about dealing with files
PyDrive Documentation

3. Working with files

Reading from a shared Google Drive Link

Scenario: Someone shared a Google Drive link for a dataset (.csv) with you, and you want to read into Colab for processing.


from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate with PyDrive
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Read file into wrapper class
downloaded = drive.CreateFile({"id": "INSERT_GOOGLE_DRIVE_FILE_ID_HERE"})

# Get the actual content of the file 
downloaded.GetContentFile("FILE_NAME.csv")

# Check that file is actually present in root directory 
!ls

Note: the ID of the file is the string between https://drive.google.com/file/d/ and /view?usp=sharing in the shared link.

You will be required to enter an authorization code from your Google Account. This does NOT mean that the file is copied onto your Google Drive.

Download a file to your computer

(TO DO)

4. Imports/Libraries

Installing an older version of pytorch


!pip install torch==1.3.0 torchvision==0.4.1

Importing an older version of TensorFlow

Colab has default TensorFlow 1.x and 2,x installed.

TensorFlow 2.x is the default. To choose, use the %tensorflow_version magic command.


%tensorflow_version 1.x
import tensorflow
print(tensorflow.__version__)

DO NOT use !pip install to specify a particular TensorFlow version for GPU/TPU backends. Colab builds TensorFlow from source to ensure compatibility with our fleet of accelerators. Versions of TensorFlow fetched from PyPI by pip may suffer from performance problems or may not work at all.

Running FastAI


from fastbook import *
from fastai.vision.widgets import *

Using Pyspark

to finish and check, spark/hadoop version may be outdated.

See tutorial here for setup + quick regression example

# get pyspark  
!apt-get install openjdk-8-jdk-headless -qq > /dev/null

#Note, check https://spark.apache.org/downloads.html for version/URL changes
!wget https://downloads.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz


!tar xf spark-2.4.1-bin-hadoop2.7.tgz
!pip install -q findspark

#configure path for JAVA/Spark 
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.3.2-bin-hadoop2.7"

#
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()

Zhao Du

https://zhao-du.github.io/practical-tips-colab.html

All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source Zhao Du !

GCP python continuous updates

OpenSUSE Leap / Windows Dual-booting Personal Workstation

Setting up a dual-boot personal workstation for machine learning with windows10/OpenSUSE leap 15.3 in 2021, nvidia driver 460.73.01, and CUDA 11.2.

2021-07-12 tutorial

linux

An Overview of ~50 AWS services

An overview of some of the most popular AWS services, with mindmap

2021-07-08 notes

AWS