Google Cloud Next is an annual conference focused on Google Cloud Platform (GCP), where Google presents all of the latest features that are coming to the cloud. We get announcements on many new features, updates on existing ones, and even new public betas that are ready for use. There are also hundreds of sessions, panels, and bootcamps to attend. One of our favorite parts of the conference: there are Googlers everywhere! You can directly connect with the product teams and there are endless opportunities for interactive demos, discussions, and networking.
This year’s conference focused on three main topics: Machine Learning & Artificial Intelligence (AI), Data Analytics, and Application Development. We spent three days attending and are excited to share with you our highlights primarily focused around Machine Learning & AI. Here is the top five list of our favorite announcements from Google Cloud Next.
Number 1: BigQuery ML (Machine Learning)
BigQuery just got a huge update! We now have BigQuery ML which is a way for users to create and execute machine learning models directly in BigQuery using standard SQL. We’ve been using BigQuery ML for a few months and it’s awesome! Now that it’s in public beta, you can use it, too!
This is important because Google has made it very easy to train your machine learning models inside BigQuery with just a few very simple SQL-like statements. This means no more exporting data back and forth, building out separate TensorFlow models in Python, or trying to run off sample data on your local computer. We now have a quick and easy way for anyone that has a basic understanding of SQL and Machine Learning to execute quickly. With everything staying inside of BigQuery, this also makes other tasks that were tedious in the past—like retraining, prediction and result analysis—that much simpler.
If you’re getting started we want to highlight that we’re currently limited to 2 basic machine learning algorithms, linear regression and logistic regression. So, for any complex custom models, TensorFlow and Cloud ML Engine are still the way to go.
Number 2: BigQuery Clustering
The name of the feature is clustering which may be a bit confusing because it suggests it has something to do with a data science related term, but it actually has little to do with it.
If you regularly use BigQuery you know that you can partition tables either via ingestion time method or based on a partition column. Partitioning data is nice because you can query data on those time-based partitions which save cost and have performance benefits.
BigQuery clustering is the extension of that idea, except now you can partition by multiple columns that you may be frequently querying. BigQuery will sort data internally based on those columns and store it separately. So, at query time, there is no need to do a full column scan of the data, rather, only a scan of the cluster you want to read from. This adds big performance and cost benefits, which is a big win for all!
Bellow is the comparison of the two features:
|Cardinality||Less than 10k||Unlimited|
|Dry Run Pricing||Available||Not available|
|Query Pricing||Exact||Best Effort|
|Data Management||Like a Table||Use DML|
Let’s consider an example using stock data. Every stock is essentially a big time series which is great because we can time partition stocks based on the timestamp. Imagine we have a table called “stock” that houses that data, the columns might be “timestamp”, “stock_name” and “price” for simplicity. So, column “timestamp” is our partition column and using that we can efficiently navigate the date ranges. Usually we’d want to query data for one single stock, so we’d do a WHERE clause, where “stock_name” equals “GOOGL”.
That’s OK, but BigQuery will do a full column scan of “stock_name” column (and any other column in the SELECT clause) reading a lot of stock names we don’t want and then filter out what we want.
With clustering we can cluster based on the “stock_name” column and, behind the scenes, BigQuery will store data in a way that when we run the same query again we’ll only read from the “GOOGL” cluster, reading only the data we actually want to read and thus avoiding full column scans of the columns in the SELECT clause.
Number 3: Training and online prediction through scikit-learn and XGBoost in Cloud ML Engine
While we love TensorFlow here at AP, we welcome the new additions to MLE. We particularly use scikit-learn a lot, especially in the first stages of the ML cycle, later transitioning to TF models, but those aren’t always needed, so we can now easily deploy those and continue our dive into more complex TF models later.
With the new additions, there’s a quicker way to transition to Cloud MLE as you’re no longer constrained by only TF. For us, we hope that means delivering production models faster as well as iterating and improving them more efficiently.
There’s not much else to say here except this is now generally available.
Number 4: AutoML
Google is bringing ML even closer to developers. Our first highlight listed above was about BigQuery ML which is primarily meant for data analysts who know SQL and data scientists, but AutoML goes a step further and is meant for developers who don’t have to know anything about Machine Learning. There are three versions in beta which are vision, natural language, and translation.
Let’s say you have a lot of images of home interiors and you want to be able to say if the image is showing the kitchen, living room, yard, bedroom etc…
What you need to do is pair those labels (kitchen, living room, etc.) with your images and show them to the AutoML Vision algorithm.
AutoML Vision will train on your specific data and get back to you with a fully trained model.
The idea is similar to Vision with the exception that it’s for text, not images.
Let’s say you have a lot of articles and your labels are article categories (politics, sports, etc.).
Again, you label your articles with your categories and expose the data to AutoML Natural Language, which will train on your data and return a model for you to make predictions on previously unseen articles.
Google already has a Translation API so the value of AutoML Translation doesn’t show immediately. The custom models that AutoML Translation can provide are most beneficial with jargon text where the usual Translation API might not perform as well.
All of the versions provide you with a scalable REST API prediction endpoint, so you can easily integrate it with your code and start making predictions.
Number 5: New BigQuery UI and Data Studio Explorer
BigQuery UI got a makeover and, yes, you guessed it, standard SQL is now the default. Yes!!!
The UI, for now, has pretty much the same functionality as the old one but one cool addition is a deeper integration with Data Studio where you can visualize your data with the click of a button.
The new UI brings many of the little features we missed in the old one as well as the look and feel now aligns with the rest of the GCP. With things like faster and easier searching of projects, you can now actually text search for the one you want as opposed to scrolling down a list of them.
Creating and, especially updating, views has also become easier to do. One nice addition is a deeper integration with Data Studio, where you can visualize your data with the click of a button.
New UI is available here.
Number 6: New App Engine runtimes and Cloud Functions
This may not be a very exciting announcement to some, but we use a lot of App Engine for all sorts of things and we also use Cloud Functions in many scenarios. It was nice to see Cloud Function finally out of beta and into the general availability category, as well as a new Python 3.7 standard App Engine environment being introduced.
That’s it. We promised our top 5 but you got 6 instead. There was a huge list of more announcements, which Google has 100 you can read from here. We can’t wait for next year!