This tutorial will guide you through building a fully functional Streamlit application on top of Couchbase. The app will leverage Couchbase's travel-sample
dataset to enable users to search for flights between cities and visualize routes dynamically on an interactive map. By following this tutorial, you will learn how to:
By the end of this tutorial, you will have a working flight visualization tool and a deeper understanding of how to integrate Couchbase with Streamlit for interactive data applications.
You can experience the application live on Streamlit Cloud: Try the Couchbase Connector Demo App. If the application doesn't load, the app may have gone to sleep due to inactivity. If you see a 'Zzzz' screen, click the 'Yes, get this app back up!' button to wake it up and wait for a while as it restarts.
The original code for this demo is available here.
Couchbase is a NoSQL document database that stores data in JSON format. This allows for flexible and scalable data modeling.
Couchbase Uses JSON because:
Couchbase organizes data into a hierarchical structure:
Couchbase Concept | Relational Equivalent | Description |
---|---|---|
Bucket | Database | Top-level data storage container. |
Scope | Schema | Logical namespace within a bucket. |
Collection | Table | Group of related JSON documents. |
Document | Row | Individual JSON record. |
By understanding these key concepts, you'll be well-prepared to build and optimize applications using Couchbase and Streamlit.
The travel-sample
dataset in Couchbase consists of multiple scopes and collections related to travel and transportation data. The primary scope used in this application is inventory
, which contains five collections:
airline (190 documents): Contains information about airlines, including their name, country, ICAO, IATA codes, and callsigns.
name
, country
, icao
, iata
, callsign
, id
, type
airport (1,968 documents): Stores details of airports worldwide, including names, locations, ICAO and FAA codes, and geographical coordinates.
airportname
, city
, country
, faa
, geo
(latitude, longitude, altitude), icao
, id
, type
, tz
hotel (917 documents): Contains information about hotels, including addresses, contact details, pricing, amenities, and reviews.
name
, address
, city
, country
, price
, free_internet
, free_breakfast
, pets_ok
, reviews
, ratings
, geo
(latitude, longitude, accuracy)landmark (4,495 documents): Includes data on notable landmarks, their locations, descriptions, images, and accessibility details.
name
, city
, country
, content
, geo
(latitude, longitude, accuracy), type
route (24,024 documents): Contains airline routes with details about the source and destination airports, airline operators, distances, schedules, and stopovers.
airline
, airlineid
, sourceairport
, destinationairport
, distance
, stops
, schedule
(day, flight, UTC)These collections provide the necessary data for flight visualization, enabling efficient search and filtering of routes between airports.
Efficient Data Retrieval from Couchbase
@st.cache_data
to avoid redundant queries.User Authentication and Configuration
Seamless User Experience
This section outlines the step-by-step process of building the Streamlit application that integrates with Couchbase for retrieving and visualizing data related to airports, flight routes, landmarks, and hotels.
Before setting up the environment, ensure you have the following:
Create an isolated Python environment, run the following commands:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Install the required libraries for this project:
pip install pandas plotly geopy couchbase-streamlit-connector
Run the following command to check if Streamlit is installed correctly:
streamlit hello
If everything is set up correctly, a browser window should open with Streamlit's demo application.
To optimize performance, data retrieval functions are cached using @st.cache_data
, which stores previously fetched data to prevent redundant queries and speed up the app. However, the _connection
argument is not cached because database connection objects are not hashable. The underscore prefix is used to explicitly exclude it from caching, ensuring that Streamlit does not attempt to hash the connection. Since @st.cache_data
requires function arguments to be hashable, unhashable objects like database connections must be excluded to avoid errors. For more details, refer to the official documentation: Streamlit st.cache_data
.
get_all_airports(_connection)
: Fetches airport details.
@st.cache_data
def get_all_airports(_connection):
query = """
SELECT geo.lat, geo.lon, city, country, airportname as name, faa, icao, id
FROM `travel-sample`.inventory.airport
WHERE geo.lat IS NOT NULL
AND geo.lon IS NOT NULL
AND faa IS NOT NULL;
"""
result = _connection.query(query)
return pd.DataFrame([row for row in result.rows()])
get_routes_for_airports(_connection, selected_airports_df)
: Retrieves routes between selected airports.
This function fetches route information from the route
collection in the travel-sample.inventory
dataset based on selected airports. The function first constructs a list of FAA codes from selected_airports_df
, formatting it as a valid SQL++ query list. It then executes a query to find routes where both the source and destination airports match the selected FAA codes. The results are extracted from the query response and returned as a Pandas DataFrame.
@st.cache_data
def get_routes_for_airports(_connection, selected_airports_df):
airports_faa = str(selected_airports_df["faa"].to_list()) # Initialize a string to store FAA codes in a list format
query = f"""
SELECT * FROM `travel-sample`.`inventory`.`route`
WHERE (sourceairport IN $airports_faa AND destinationairport IN $airports_faa);
"""
result = _connection.query(query, opts=QueryOptions(named_parameters={"airports_faa": airports_faa}))
data = []
for row in result:
data.append(row["route"])
return pd.DataFrame(data)
get_all_landmarks(_connection)
: Retrieves landmarks from the Couchbase database.
@st.cache_data
def get_all_landmarks(_connection):
query = """
SELECT name, geo.lat, geo.lon, activity, address, city, country, content, hours, price, type
FROM `travel-sample`.inventory.landmark
WHERE geo.lat IS NOT MISSING
AND geo.lon IS NOT MISSING
"""
result = _connection.query(query)
landmarks = []
for row in result:
landmark_info = {
'name': row['name'],
'lat': row['lat'],
'lon': row['lon'],
'activity': row.get('activity', 'Not specified'),
'address': row.get('address', 'Not specified'),
'city': row.get('city', 'Not specified'),
'country': row.get('country', 'Not specified'),
'content': row.get('content', 'No description available'),
'hours': row.get('hours', 'Not specified'),
'price': row.get('price', 'Not specified'),
'type': row.get('type', 'Not specified')
}
landmarks.append(landmark_info)
return landmarks
get_hotels_near_landmark(_connection, landmark_lat, landmark_lon)
: Finds hotels near a given landmark.
This function retrieves hotel data from the Couchbase travel-sample.inventory.hotel
collection and filters hotels based on their proximity to a specified landmark. It first executes a query to fetch hotel details, ensuring latitude and longitude values are available. Then, it iterates through the results, calculating the geographical distance between each hotel and the given landmark using the geodesic
function from the geopy.distance
module. Hotels within the specified maximum distance (default 10 km) are added to the final list, including relevant details such as price, description, and available amenities.
@st.cache_data
def get_hotels_near_landmark(_connection, landmark_lat, landmark_lon, max_distance_km=10):
query = """
SELECT
h.name,
h.geo.lat,
h.geo.lon,
h.price,
h.description,
h.free_breakfast,
h.free_internet,
h.free_parking
FROM `travel-sample`.inventory.hotel h
WHERE h.geo.lat IS NOT MISSING
AND h.geo.lon IS NOT MISSING
"""
result = _connection.query(query)
hotels = []
for row in result:
hotel_coords = (row['lat'], row['lon'])
landmark_coords = (landmark_lat, landmark_lon)
distance = geodesic(hotel_coords, landmark_coords).kilometers # Calculate distance in km
if distance <= max_distance_km:
hotels.append({
'name': row['name'],
'lat': row['lat'],
'lon': row['lon'],
'distance': distance,
'price': row['price'],
'description': row.get('description', 'No description available'),
'free_breakfast': row.get('free_breakfast', False),
'free_internet': row.get('free_internet', False),
'free_parking': row.get('free_parking', False)
})
return hotels
get_all_cities(_connection)
: Retrieves a list of cities with hotels.
@st.cache_data
def get_all_cities(_connection):
query = """
SELECT DISTINCT city
FROM `travel-sample`.inventory.hotel
WHERE geo.lat IS NOT MISSING
AND type = "hotel"
AND geo.lon IS NOT MISSING
"""
result = _connection.query(query)
cities = []
for row in result:
cities.append(row["city"])
return pd.DataFrame(cities, columns=["city"])
get_all_hotels(_connection, cities)
: Fetches hotels in the selected cities.
This function retrieves hotel data from the travel-sample.inventory.hotel
collection in Couchbase for the given list of cities. It constructs a query string dynamically to filter hotels based on the provided cities, ensuring only those with valid latitude (geo.lat
) and longitude (geo.lon
) values are selected. Additionally, it calculates the average overall rating of each hotel by aggregating ratings from its reviews. The function caches its results using Streamlit's @st.cache_data
decorator to improve efficiency by avoiding redundant database queries.
@st.cache_data
def get_all_hotels(_connection, cities):
query = f"""
SELECT h.*, geo.lat as lat, geo.lon as lon, ARRAY_AVG(ARRAY r.ratings.Overall FOR r IN h.reviews WHEN r.ratings.Overall IS NOT MISSING END) as avg_rating
FROM `travel-sample`.inventory.hotel h
WHERE h.geo.lat IS NOT MISSING
AND h.type = "hotel"
AND h.geo.lon IS NOT MISSING
AND h.city IN $cities;
"""
result = _connection.query(query, opts=QueryOptions(named_parameters={"cities": cities}))
hotels = []
for row in result:
hotels.append(row)
return pd.DataFrame(hotels)
plot_airports_and_routes(airports_df, routes_df)
: Displays airports and their flight routes.
This function visualizes flight routes between airports using Plotly. It first extracts airport coordinates from airports_df
into a dictionary for quick lookup. Then, it iterates through routes_df
to collect latitude and longitude pairs for each flight route, ensuring that non-existent airports are skipped. A scatter map plot is created using Scattermap
to represent routes as blue lines. Additionally, a separate scatter plot of airports is overlaid, with markers color-coded in red and displaying airport details on hover. The final visualization is displayed using st.plotly_chart
.
def plot_airports_and_routes(airports_df, routes_df):
fig = go.Figure()
# Create a dictionary mapping FAA codes to latitude and longitude for quick lookup
filtered_airports_df = airports_df.dropna(subset=["faa"]) # Remove rows where faa is NaN
airport_coords = dict(zip(filtered_airports_df["faa"], zip(filtered_airports_df["lat"], filtered_airports_df["lon"])))
lats = []
lons = []
# Iterate through routes to fetch airport coordinates and construct flight paths
for _, row in routes_df.iterrows():
source_coords = airport_coords.get(row["sourceairport"])
dest_coords = airport_coords.get(row["destinationairport"])
if source_coords and dest_coords:
lats.extend([source_coords[0], dest_coords[0], None]) # None for breaks
lons.extend([source_coords[1], dest_coords[1], None])
# Add flight routes as blue lines on the map
fig.add_trace(go.Scattermap(
mode="lines",
lat=lats,
lon=lons,
line=dict(width=1, color="blue")
))
# Overlay airport locations as red markers with hover details
airports_markers = px.scatter_map(
airports_df,
lat="lat",
lon="lon",
hover_name= "name", # Show airport name on hover
hover_data= {
"faa": True,
"city": True,
"country": True
}, # Additional details
color_discrete_sequence=["red"], # Color of airport markers
)
fig.add_traces(airports_markers.data)
# Set map style and layout
fig.update_geos(fitbounds="locations")
fig.update_layout(
map_zoom= 0.5, # Zoom level
showlegend= False, # Hide legend
mapbox_style="open-street-map",
margin=dict(l=0, r=0, t=50, b=0), # Remove extra margins
title="Airports and Flight Routes"
)
st.plotly_chart(fig, use_container_width=True)
create_landmark_map(landmarks, hotels_near_landmark)
: Shows landmarks along with nearby hotels.
This function visualizes landmarks and nearby hotels on an interactive map using Plotly. Hotels are color-coded based on their distance from landmarks: red for distances ≤3 km, orange for ≤6 km, and gold for farther hotels. Each hotel is plotted with a marker, and a tooltip displays the name and distance. Landmarks are represented as blue star-shaped markers. The map uses OpenStreetMap styling and is embedded in a Streamlit app for easy visualization.
def create_landmark_map(landmarks, hotels_near_landmark):
fig = go.Figure()
centre = {"lat": 0, "lon": 0}
num_points = 0
# Plot hotels with color-coded markers based on distance
for hotel in hotels_near_landmark:
color = 'red' if hotel.get('distance') <= 3 else 'orange' if hotel.get('distance') <= 6 else 'gold'
fig.add_trace(go.Scattermap(
lat=[hotel.get('lat')],
lon=[hotel.get('lon')],
mode='markers',
marker=dict(size=10, color=color),
text=(
f"HOTEL: {hotel.get('name')}<br>Distance: {hotel.get('distance'):.2f} km",
),
hoverinfo='text',
name=f'Hotel ({color})'
))
centre = {"lat": centre["lat"] + hotel.get('lat'), "lon": centre["lon"] + hotel.get('lon')}
num_points += 1
# Plot landmarks as blue star markers
for landmark in landmarks:
fig.add_trace(go.Scattermap(
lat=[landmark.get('lat', 'N/A')],
lon=[landmark.get('lon', 'N/A')],
mode='markers',
marker=dict(size=10, color='blue', symbol='star'),
text=(
f"LANDMARK: {landmark.get('name', 'N/A')}",
),
hoverinfo='text',
name='Landmark'
))
centre = {"lat": centre["lat"] + landmark.get('lat', 0), "lon": centre["lon"] + landmark.get('lon', 0)}
num_points += 1
if num_points > 0:
centre = {"lat": centre["lat"] / num_points, "lon": centre["lon"] / num_points}
fig.update_geos(fitbounds="locations")
# Configure map layout
fig.update_layout(
map_zoom=11,
map_center=centre,
mapbox_style='open-street-map',
margin=dict(l=0, r=0, t=50, b=0),
title='Landmarks and Hotels Nearby',
showlegend=False,
)
st.plotly_chart(fig, use_container_width=True)
create_hotel_map(hotels_df)
: Plots hotels, color-coded by their average ratings.
This function visualizes hotel locations on an interactive map using Plotly and Streamlit. Hotels are color-coded based on their average ratings, with a continuous color scale for rated hotels and a distinct color (orange) for those without ratings. It ensures that the map remains interactive even when no data is available by adding an invisible marker. The function also converts numeric ratings into a star-based format for better readability in the hover tooltips.
def create_hotel_map(hotels_df):
if hotels_df.empty:
fig = go.Figure()
fig.update_layout(
mapbox_style="open-street-map",
margin=dict(l=0, r=0, t=50, b=0),
title="Hotels (colored by average rating)"
)
# Add an invisible marker at lat:0 and lon:0
fig.add_trace(go.Scattermap(
lat=[0],
lon=[0],
mode='markers',
marker=dict(size=0, opacity=0)
))
st.plotly_chart(fig, use_container_width=True)
return
if 'avg_rating' not in hotels_df.columns:
hotels_df['avg_rating'] = np.nan # Add avg_rating column if it doesn't exist
hotels_df['avg_rating'] = pd.to_numeric(hotels_df['avg_rating'], errors='coerce')
centre = {
"lat": hotels_df['lat'].mean(),
"lon": hotels_df['lon'].mean()
}
# Create a column for star ratings
hotels_df['star_rating'] = hotels_df['avg_rating'].apply(lambda x: '⭐' * int(round(x)) if not np.isnan(x) else 'No rating')
# Separate hotels with no rating
no_rating_hotels = hotels_df[hotels_df['avg_rating'].isna()]
rated_hotels = hotels_df[hotels_df['avg_rating'].notna()]
# Plot hotels with ratings
fig = px.scatter_map(
rated_hotels,
lat="lat",
lon="lon",
hover_name="name",
hover_data={
"avg_rating": True,
"star_rating": True
},
color="avg_rating",
color_continuous_scale=px.colors.sequential.Viridis_r, # Use Blues color scale
range_color=[0, 5], # Ratings typically range from 0 to 5
zoom=1,
size_max=10
)
fig.update_traces(
hovertemplate="<b>%{hovertext}</b><br>Avg Rating: %{customdata[0]:.2f} <br>Stars: %{customdata[1]}"
)
# Plot hotels with no ratings in orange
no_rating_markers = px.scatter_map(
no_rating_hotels,
lat="lat",
lon="lon",
hover_name="name",
hover_data={"avg_rating": False}, # Explicitly state no ratings given
custom_data=["name"], # Add custom data to use in hover template
color_discrete_sequence=["orange"],
size_max=10
)
no_rating_markers.update_traces(
hovertemplate="<b>%{customdata[0]}</b><br>No rating available"
)
fig.add_traces(no_rating_markers.data) # Add no-rating hotels to the map
# Set up layout and color bar for ratings
fig.update_layout(
map_zoom=10,
map_center=centre,
mapbox_style="open-street-map",
margin=dict(l=0, r=0, t=50, b=0),
title="Hotels (colored by average rating)",
coloraxis_colorbar=dict(
title="Avg Rating",
tickvals=[0, 1, 2, 3, 4, 5],
ticktext=["0", "1", "2", "3", "4", "5"]
)
)
st.plotly_chart(fig, use_container_width=True)
The application is structured into three tabs:
This function, tab1_visual
, is responsible for displaying a selection interface for airports and visualizing flight routes between them. It first fetches all available airports and identifies a subset of airports involved in predefined routes. Using Streamlit's UI components, users can choose specific airports or select all at once. Upon clicking the "Update Map" button, the selected airports are filtered, relevant routes are retrieved, and both are plotted on a map.
def tab1_visual():
all_airports = get_all_airports(connection) # Fetch all available airports
route_airports = set()
# Define a set of hardcoded routes
for route in [
{"sourceairport": "TLV", "destinationairport": "MRS"},
{"sourceairport": "TLV", "destinationairport": "NCE"},
{"sourceairport": "TNR", "destinationairport": "CDG"}
]:
route_airports.add(route["sourceairport"])
route_airports.add(route["destinationairport"])
# User selection interface for choosing airports
with st.expander("Select Airports"):
st.checkbox("Select All Airports", key="select_all")
container = st.container()
with container:
selected_airports = st.multiselect(
"Choose airports",
options=all_airports["name"],
default=all_airports["name"] if st.session_state.get("select_all") else []
)
# Process selection and update the visualization
if st.button("Update Map"):
filtered_airports = all_airports[all_airports["name"].isin(selected_airports)] # Filter selected airports
selected_routes = get_routes_for_airports(connection, filtered_airports) # Retrieve routes for selected airports
plot_airports_and_routes(filtered_airports, selected_routes) # Plot the airports and corresponding routes
This function, tab2_visual()
, enables users to select landmarks from a list and visualizes nearby hotels on a map. It first retrieves all available landmarks from the database. If landmarks exist, the first one is pre-selected by default. Users can select multiple landmarks using a multi-select dropdown. The function then filters the selected landmarks and fetches nearby hotels for each chosen landmark using their latitude and longitude. Finally, the selected landmarks and corresponding hotels are passed to create_landmark_map()
for visualization.
def tab2_visual():
# Fetch all available landmarks from the database
landmarks = get_all_landmarks(connection)
# Set a default selection (first landmark) if landmarks exist
default_landmark = [landmarks[0]['name']] if landmarks else []
# Allow users to select multiple landmarks from the available list
selected_landmarks = st.multiselect("Select landmarks", [landmark['name'] for landmark in landmarks], default=default_landmark)
# Filter selected landmarks to get their details
selected_landmarks_info = [landmark for landmark in landmarks if landmark['name'] in selected_landmarks]
hotels_near_landmarks = []
# Retrieve hotels near each selected landmark
for landmark in selected_landmarks_info:
hotels_near_landmarks.extend(get_hotels_near_landmark(
connection,
landmark['lat'],
landmark['lon']
))
# Display the selected landmarks and nearby hotels on a map
create_landmark_map(selected_landmarks_info, hotels_near_landmarks)
This function allows users to select multiple cities from a predefined list, fetches the corresponding hotels from the database, and visualizes them on a map. It ensures that users can explore hotel availability across different locations in an interactive manner.
def tab3_visual():
# Retrieve the list of all available cities from the database
all_cities = get_all_cities(connection)["city"].tolist()
# Allow users to select multiple cities; defaults to London
cities = st.multiselect("Select cities", all_cities, default=["London"])
# Fetch hotels based on the selected cities
hotels = get_all_hotels(connection, cities)
# Display the hotels on a map for visualization
create_hotel_map(hotels)
In the Streamlit sidebar, users need to enter their Couchbase credentials to connect to the database. The connection is established using the CouchbaseConnector
class.
This code sets up a Streamlit application that connects to a Couchbase database using user-provided credentials. The sidebar collects connection parameters like the connection string, username, password, and collection details. Once the user clicks the "Connect" button, the app attempts to establish a connection and stores it in st.session_state
for persistence across different interactions. If the connection is successful, the app provides three tabs for data visualization: one for mapping flight routes, another for locating hotels near landmarks, and the last for finding hotels in cities.
st.title("Couchbase Streamlit App")
# Sidebar inputs for Couchbase connection parameters
st.sidebar.header("Enter Couchbase Credentials")
conn_str = st.sidebar.text_input("Connection String", "couchbases://your-cluster-url")
username = st.sidebar.text_input("Username", "Administrator")
password = st.sidebar.text_input("Password", type="password") # Password input is masked
bucket_name = st.sidebar.text_input("Bucket Name", "travel-sample")
scope_name = st.sidebar.text_input("Scope Name", "inventory")
collection_name = st.sidebar.text_input("Collection Name", "airline")
if st.sidebar.button("Connect"):
try:
# Establish connection to Couchbase using the provided credentials
connection = st.connection(
"couchbase",
type=CouchbaseConnector,
CONNSTR=conn_str,
USERNAME=username,
PASSWORD=password,
BUCKET_NAME=bucket_name,
SCOPE_NAME=scope_name,
COLLECTION_NAME=collection_name
)
st.session_state["connection"] = connection # Store connection in session state
st.sidebar.success("Connected successfully!") # Display success message
except Exception as e:
st.sidebar.error(f"Connection failed: {e}") # Handle connection errors
# Check if a connection exists before proceeding to visualization
if "connection" in st.session_state:
connection = st.session_state["connection"]
tab1, tab2, tab3 = st.tabs(["Flight Routes Map", "Find hotels near Landmarks", "Find hotel in cities"])
# Visualization components for each tab
with tab1:
tab1_visual()
with tab2:
tab2_visual()
with tab3:
tab3_visual()
To start the Streamlit app, run the following command:
streamlit run app.py
This will launch the app in your browser, allowing you to interactively explore Couchbase data with intuitive visualizations.
Now that you've built your demo app, it's time to deploy it for free using Streamlit Community Cloud!
Ensure your app is stored in a GitHub repository with the following files:
app.py
(your main script)requirements.txt
(dependencies).streamlit/config.toml
for customizationTo generate requirements.txt
use this command in your terminal:
pip freeze > requirements.txt
Sign up or log in at Streamlit Community Cloud, then link your GitHub account.
app.py
).Your app will be live in minutes, and any future updates to the GitHub repo will auto-deploy!
For a detailed guide, check out: Host Your Streamlit App for Free.
In this guide, we explored building a Streamlit app and deploying it on Streamlit Community Cloud for free. From setting up the development environment to hosting your app online, we covered essential steps to get your app live with minimal effort. Streamlit’s simplicity, combined with seamless GitHub integration, makes it a great choice for quickly showcasing your projects.
Here are some helpful resources for learning more about Couchbase and Streamlit: