public dataset are datasets that BigQuery hosts for you to access and integrate into your applications. Google pays for the storage of these datasets and provides public access to the data via a project. You pay only for the queries that you perform on the data. The first 1 TB per month is free
funcqueryPartitionedTable(wio.Writer,projectID,datasetID,tableIDstring)error{// projectID := "my-project-id"
// datasetID := "mydataset"
// tableID := "mytable"
ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %w",err)}deferclient.Close()q:=client.Query(fmt.Sprintf("SELECT * FROM `%s.%s` WHERE `date` BETWEEN DATE('1800-01-01') AND DATE('1899-12-31')",datasetID,tableID))// Run the query and process the returned row iterator.
it,err:=q.Read(ctx)iferr!=nil{returnfmt.Errorf("query.Read(): %w",err)}for{varrow[]bigquery.Valueerr:=it.Next(&row)iferr==iterator.Done{break}iferr!=nil{returnerr}fmt.Fprintln(w,row)}returnnil}
desired offset
if you would like to start from a specific row, RowIterator has a StartIndex field that you can set to the desired offset.
funcqueryPartitionedTable(wio.Writer,projectID,datasetID,tableIDstring,pageOffsetuint64)error{...q:=client.Query(fmt.Sprintf("SELECT * FROM `%s.%s` WHERE `date` BETWEEN DATE('1800-01-01') AND DATE('1899-12-31')",datasetID,tableID))// Run the query and process the returned row iterator.
it,err:=q.Read(ctx)iferr!=nil{returnfmt.Errorf("query.Read(): %w",err)}it.StartIndex=pageOffsetfor{varrow[]bigquery.Valueerr:=it.Next(&row)iferr==iterator.Done{break}iferr!=nil{returnerr}fmt.Fprintln(w,row)}}
limiting the number of rows
You can limit the number of rows returned by setting the MaxResults field of the RowIterator.
funcqueryPartitionedTable(wio.Writer,projectID,datasetID,tableIDstring,pageOffsetuint64,pageSizeint)error{...q:=client.Query(fmt.Sprintf("SELECT * FROM `%s.%s` WHERE `date` BETWEEN DATE('1800-01-01') AND DATE('1899-12-31')",datasetID,tableID))// Run the query and process the returned row iterator.
it,err:=q.Read(ctx)iferr!=nil{returnfmt.Errorf("query.Read(): %w",err)}it.StartIndex=pageOffsetit.MaxResults=pageSizefor{ifit.PageInfo().Remaining()==0{break}varrow[]bigquery.Valueerr:=it.Next(&row)iferr==iterator.Done{break}iferr!=nil{returnerr}fmt.Fprintln(w,row)}}
Note: Above code snippet is not working as expected!!!!!!!!!!!
if the query result size, it.PageInfo().Remaining(), reach a magic size(I don’t know the exact size), it will miss some row.
refactoring the code
The PageInfo Remaining method not working as expected, so we created a new variable to keep track of the number of rows read.
funcqueryPartitionedTable(wio.Writer,projectID,datasetID,tableIDstring,pageOffsetuint64,pageSizeint)error{...q:=client.Query(fmt.Sprintf("SELECT * FROM `%s.%s` WHERE `date` BETWEEN DATE('1800-01-01') AND DATE('1899-12-31')",datasetID,tableID))// Run the query and process the returned row iterator.
it,err:=q.Read(ctx)iferr!=nil{returnfmt.Errorf("query.Read(): %w",err)}it.StartIndex=pageOffsetit.MaxResults=pageSizevarrowsReadintfor{ifrowsRead>=pageSize{break}varrow[]bigquery.Valueerr:=it.Next(&row)iferr==iterator.Done{break}iferr!=nil{returnerr}fmt.Fprintln(w,row)rowsRead++}}
Is there a better way to do this?
suggest checking out the new storage api, and the code sample on the samples page.