BigQuery-Pagination Query in Go

 ·  ☕ 4 

Pagination Query

The following code snippet is an example of a pagination query using the BigQuery Go client library.
Read data with BigQuery API using pagination

public dataset are datasets that BigQuery hosts for you to access and integrate into your applications. Google pays for the storage of these datasets and provides public access to the data via a project. You pay only for the queries that you perform on the data. The first 1 TB per month is free

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
func queryPartitionedTable(w io.Writer, projectID, datasetID, tableID string) error {
        // projectID := "my-project-id"
        // datasetID := "mydataset"
        // tableID := "mytable"
        ctx := context.Background()
        client, err := bigquery.NewClient(ctx, projectID)
        if err != nil {
                return fmt.Errorf("bigquery.NewClient: %w", err)
        }
        defer client.Close()

        q := client.Query(fmt.Sprintf("SELECT * FROM `%s.%s` WHERE `date` BETWEEN DATE('1800-01-01') AND DATE('1899-12-31')", datasetID, tableID))
        // Run the query and process the returned row iterator.
        it, err := q.Read(ctx)
        if err != nil {
                return fmt.Errorf("query.Read(): %w", err)
        }
        for {
                var row []bigquery.Value
                err := it.Next(&row)
                if err == iterator.Done {
                        break
                }
                if err != nil {
                        return err
                }
                fmt.Fprintln(w, row)
        }
        return nil
}

desired offset

if you would like to start from a specific row, RowIterator has a StartIndex field that you can set to the desired offset.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
func queryPartitionedTable(w io.Writer, projectID, datasetID, tableID string,pageOffset uint64) error {
        
	    ...

        q := client.Query(fmt.Sprintf("SELECT * FROM `%s.%s` WHERE `date` BETWEEN DATE('1800-01-01') AND DATE('1899-12-31')", datasetID, tableID))
        // Run the query and process the returned row iterator.
        it, err := q.Read(ctx)
        if err != nil {
                return fmt.Errorf("query.Read(): %w", err)
        }
        it.StartIndex = pageOffset
		
        for {
            var row []bigquery.Value
            err := it.Next(&row)
            if err == iterator.Done {
                break
            }
            if err != nil {
                return err
            }
            fmt.Fprintln(w, row)
        }        
}

limiting the number of rows

You can limit the number of rows returned by setting the MaxResults field of the RowIterator.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
func queryPartitionedTable(w io.Writer, projectID, datasetID, tableID string,pageOffset uint64, pageSize int) error {
        
	    ...

        q := client.Query(fmt.Sprintf("SELECT * FROM `%s.%s` WHERE `date` BETWEEN DATE('1800-01-01') AND DATE('1899-12-31')", datasetID, tableID))
        // Run the query and process the returned row iterator.
        it, err := q.Read(ctx)
        if err != nil {
            return fmt.Errorf("query.Read(): %w", err)
        }
        it.StartIndex = pageOffset
        it.MaxResults = pageSize
		
        for {
            if it.PageInfo().Remaining() == 0 {
                break
			}
            var row []bigquery.Value
            err := it.Next(&row)
            if err == iterator.Done {
                break
            }
            if err != nil {
                return err
            }
            fmt.Fprintln(w, row)
        }        
}

Note: Above code snippet is not working as expected!!!!!!!!!!!
if the query result size, it.PageInfo().Remaining(), reach a magic size(I don’t know the exact size), it will miss some row.

refactoring the code

The PageInfo Remaining method not working as expected, so we created a new variable to keep track of the number of rows read.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
func queryPartitionedTable(w io.Writer, projectID, datasetID, tableID string,pageOffset uint64, pageSize int) error {
        
        ...
        
        q := client.Query(fmt.Sprintf("SELECT * FROM `%s.%s` WHERE `date` BETWEEN DATE('1800-01-01') AND DATE('1899-12-31')", datasetID, tableID))
        // Run the query and process the returned row iterator.
        it, err := q.Read(ctx)
        if err != nil {
                return fmt.Errorf("query.Read(): %w", err)
        }
        it.StartIndex = pageOffset
        it.MaxResults = pageSize
        
        var rowsRead int
        for {
            if rowsRead >= pageSize {
                break
            }
            var row []bigquery.Value
            err := it.Next(&row)
            if err == iterator.Done {
                break
            }
            if err != nil {
                return err
            }
            fmt.Fprintln(w, row)
            rowsRead++
        }        
}

Is there a better way to do this?
suggest checking out the new storage api, and the code sample on the samples page.

reference


Meng Ze Li
Meng Ze Li
Kubernetes / DevOps / Backend