Filtering multiple data sources simultaneously with SQL Parameters

Last updated 1 year ago

Was this helpful?

Filtering multiple data sources simultaneously with SQL Parameters

Context

Data, particularly visualized on a map, provides powerful insights that can guide and accelerate decision-making. However, working with multiple data sources, each of them filled with numerous variables, can be a challenge.

In this tutorial, we're going to show you how to use SQL Parameters to handle multiple data sources at once when building an interactive map with CARTO Builder. We'll be focusing on the start and end locations of Citi Bike trips in New York City, considering different time periods and neighborhoods. By the end, you'll have a well-crafted, interactive Builder map completed with handy widgets and parameters. It'll serve as your guide for understanding biking patterns across the city. Sounds good? Let's dive in!

Step-by-Step Guide:

Access the Data Explorer from your CARTO Workspace using the Navigation menu.

Search for the demo_data > demo_tables within the CARTO Data Warehouse and select “manhattan_citibike_trips”.

Examine "manhattan_citibike_trips" Map and Data preview, focusing on the geometry columns (start_geom and end_geom) that correspond to trip start and end bike station points.

Return to the Navigation Menu, select Maps, and create a "New map".

Begin by adding the start station locations of Citi Bike Trips as the first data source.
- Select the Add source from button at the bottom left on the page.
- Click on the CARTO Data Warehouse connection.
- Select Type your own query.
- Click on the Add Source button.

The SQL Editor panel will be opened.

Extract the bike stations of the start of the Citi bike trips grouping by the start_station_name while obtaining the COUNT() of all the trips starting at that specific location. For that, run the query below:

SELECT 
    start_station_name, 
    COUNT(*) as trip_count, 
    ANY_VALUE(geoid) as geoid,
    ANY_VALUE(start_geom) as geom, 
    ANY_VALUE(start_ntaname) as start_ntaname
FROM `carto-demo-data.demo_tables.manhattan_citibike_trips` 
GROUP BY start_station_name

Rename the layer to "Trip Start" and style it by Trip_count using Color based on option and set the radius size by the same Trip_count variable using 2 to 6 range.

Extract the bike stations of the end of the trips. We will repeat Step 7 and Step 8, this time retrieving the end station variables. For that, execute the following query.

SELECT 
    end_station_name, 
    COUNT(*) as trip_count, 
    ANY_VALUE(geoid) as geoid,
    ANY_VALUE(end_geom) as geom, 
    ANY_VALUE(end_ntaname) as end_ntaname
FROM `carto-demo-data.demo_tables.manhattan_citibike_trips` 
GROUP BY end_station_name

Once the data has been added to the map display, you will notice that is overlaying with the 'Trip Start' layer.

Edit the name, style of the new layer and update the visualisation of 'Trip Start' layer as follows:
- Disable 'Trip Start' layer visibility by clicking over the eye located right on the layer tab.
- Rename "Layer 2" to "Trip End".
- Style 'Trip End' layer by trip_count using a different color palette.

Change the Basemap to Dark Matter for better visibility.

Enable the Layer selector and Open when loading the map options within Legend > More Legend Options.

Use the Split View mode to examine the 'Trip Start' and 'Trip End' layers before creating SQL Parameters.

Ensure that the 'Trip Start' layer is positioned above the 'Trip End' layer. You can adjust layer visibility by toggling the eye icon in the Legend.

As per below screenshot, the left panel is dedicated to showcasing the 'Trip Start' layer, while the right panel displays the 'Trip End' layer. Split View mode is highly beneficial for comparison purposes.

Now we are ready to start using SQL Parameters over both SQL Query sources.

SQL Parameters are a powerful feature in Builder that serve as placeholders in SQL Query data sources. They provide flexibility and ease in performing data analysis by allowing dynamic input and customization of queries.

Create a SQL Parameter by clicking over Create a SQL Parameter icon located on the top right of your Sources panel.

A pop-up window will be opened where you can extract further information about SQL Parameters and select the SQL Parameter type you would like to use.

Click Continue to jump into the next page where you can choose the parameter type.
Select Dates as the parameter type and click Continue.

Navigate to the configuration page for the Dates parameter and set the parameters as indicated in the following screenshot and click Create parameter.

Please note that the dataset for Manhattan Citi Bike Trips only includes data from January until May 2018. Please ensure your date selection falls within this range.

A new parameter named Time Period appears on the left panel.

Edit the SQL Query for both 'SQL Query 1' and 'SQL Query 2' data sources to include the WHERE statement that will filter starttime column by the input Time Period date range and execute the queries.

WHERE starttime >= {{trip_period_from}} AND starttime <= {{trip_period_to}}

The output query for 'SQL Query 1' linked to 'Trip Start' layer should be as follows:

SELECT 
    start_station_name, 
    COUNT(*) as trip_count, 
    ANY_VALUE(geoid) as geoid,
    ANY_VALUE(start_geom) as geom, 
    ANY_VALUE(start_ntaname) as start_ntaname
FROM `carto-demo-data.demo_tables.manhattan_citibike_trips` 
WHERE starttime >= {{trip_period_from}} AND starttime <= {{trip_period_to}}
GROUP BY start_station_name

The output query for 'SQL Query 2' linked to 'Trip End' layer should be as below, as we are interested on the start time of the trip for both sources:

SELECT 
    end_station_name, 
    COUNT(*) as trip_count, 
    ANY_VALUE(geoid) as geoid,
    ANY_VALUE(end_geom) as geom, 
    ANY_VALUE(end_ntaname) as end_ntaname
FROM `carto-demo-data.demo_tables.manhattan_citibike_trips` 
WHERE starttime >= {{trip_period_from}} AND starttime <= {{trip_period_to}}
GROUP BY end_station_name

Once you have executed the SQL Queries, a calendar will appear within Trip Period parameter.

Users will have the flexibility to alter the time frame using the provided calendar. This allows you to filter the underlying data sources to suit your needs, affecting both the 'Trip Start' and 'Trip End' data sources.

Create a new SQL Parameter. This time, select the Text parameter type and set the configuration as below, using start_ntaname column from 'SQL Query 1' source to add Manhattan neighborhoods. Once complete, click on Create Parameter button.

A new parameter named Start Neighborhood will be added to the Map.

Edit the SQL Query for both 'SQL Query 1' and 'SQL Query 2' to include the WHERE statement that will filter start_ntaname column by the input of Start Neighborhood parameter and execute the queries.

start_ntaname IN {{start_neighborhood}}

The output query for 'SQL Query 1' linked to 'Trip Start layer' should be as follows:

SELECT 
    start_station_name, 
    COUNT(*) as trip_count, 
    ANY_VALUE(geoid) as geoid,
    ANY_VALUE(start_geom) as geom, 
    ANY_VALUE(start_ntaname) as start_ntaname
FROM `carto-demo-data.demo_tables.manhattan_citibike_trips` 
WHERE starttime >= {{trip_period_from}} AND starttime <= {{trip_period_to}}
AND start_ntaname IN {{start_neighborhood}}
GROUP BY start_station_name

The output query for 'SQL Query 2' linked to 'Trip End' layer should be as below, as we are interested on the start time of the trip for both sources.

SELECT 
    end_station_name, 
    COUNT(*) as trip_count, 
    ANY_VALUE(geoid) as geoid,
    ANY_VALUE(end_geom) as geom, 
    ANY_VALUE(end_ntaname) as end_ntaname
FROM `carto-demo-data.demo_tables.manhattan_citibike_trips` 
WHERE starttime >= {{trip_period_from}} AND starttime <= {{trip_period_to}}
AND start_ntaname IN {{start_neighborhood}}
GROUP BY end_station_name

After executing the SQL Queries, a drop-down list of start trip neighborhoods will populate. This interactive element allows users to selectively choose which neighborhood(s) serve as the starting point of their trip.

Repeat Step 20 and Step 21 to create a SQL Parameter, but this time we will filter the end trip neighborhoods.

The output query for 'SQL Query 1' linked to Trip Start layer should be as follows:

SELECT 
    start_station_name, 
    COUNT(*) as trip_count, 
    ANY_VALUE(geoid) as geoid,
    ANY_VALUE(start_geom) as geom, 
    ANY_VALUE(start_ntaname) as start_ntaname
FROM `carto-demo-data.demo_tables.manhattan_citibike_trips` 
WHERE starttime >= {{trip_period_from}} AND starttime <= {{trip_period_to}}
AND start_ntaname IN {{start_neighborhood}} AND end_ntaname IN {{end_neighborhood}}
GROUP BY start_station_name

The output query for 'SQL Query 2' linked to 'Trip Start' layer should be as follows:

SELECT 
    end_station_name, 
    COUNT(*) as trip_count, 
    ANY_VALUE(geoid) as geoid,
    ANY_VALUE(end_geom) as geom, 
    ANY_VALUE(end_ntaname) as end_ntaname
FROM `carto-demo-data.demo_tables.manhattan_citibike_trips` 
WHERE starttime >= {{trip_period_from}} AND starttime <= {{trip_period_to}}
AND start_ntaname IN {{start_neighborhood}} AND end_ntaname IN {{end_neighborhood}}
GROUP BY end_station_name

Disable Split View Mode, make both 'Trip Start' and 'Trip Layer' visible using the Legend eye icons and compare the bike trips between two different neighborhoods. For that, set the Start Neighborhood parameter to be "Upper West Side" and the End Neighborhood parameter to be "Chinatown".

We can clearly see which are the start and end stations which are gathering most of the bike trips for this neighborhood combination.

Create a Formula Widget to represent the Total Trips setting the configuration as below.

Add a Category Widget to display the Start Stations ordered by the Total Trips.

Add a Category Widget to display the End Stations ordered by the Total Trips.

The Builder Map provides user with an interactive application to gather insights about New York Citi Trips and the patterns between the different neighborhoods. However, it is difficult to visualize the boundary limits between both the start trips and the end trips.

For that, let's use "newyork_neighborhood_tabulation_areas" table, available on CARTO Data Warehouse within demo_data > demo_tables.

Add a new SQL Query as the data source using the following query which aggregates geometry of the start trip neighborhood(s).

SELECT 
    ST_UNION_AGG(geom) as geom 
FROM `carto-demo-data.demo_tables.newyork_neighborhood_tabulation_areas` 
WHERE ntaname IN {{start_neighborhood}}

Add a new SQL Query as the data source using the following query. This time the aggregated geometry will be for the end trip neighborhood(s).

SELECT 
    ST_UNION_AGG(geom) as geom 
FROM `carto-demo-data.demo_tables.newyork_neighborhood_tabulation_areas` 
WHERE ntaname IN {{end_neighborhood}}

Rename the recently added layers, and position them beneath the 'Trip Start' and 'Trip End' layers for better visibility.
Feel free to experiment with styling options - adjusting layer opacity, trying out different color palettes, until you achieve the optimal visual representation.

Change the name of the map to "New York Citi Bike Trips".
Finally we can make the map public and share the link to anybody.
- For that you should go to Share section on the top right corner and set the map as Public.
- Activate SQL parameters controls options so that Viewer users can control the exposed parameters.

Finally, we can visualize the results!

By the end of this tutorial, you should have a clear understanding of how to utilize SQL Parameters to filter multiple data sources, particularly in the context of Citi Bike trips in New York City.

Last updated 1 year ago

Was this helpful?