title: Design your service architecture
description: The steps to design and implement the best architecture to match your business and technical requirements.
products: [cloud]
content_group: Getting started
Multitenancy is a system architecture that enables multiple users (tenants), to utilize the same application or database while ensuring their data remains isolated and secure. This approach offers several advantages, such as cost efficiency, improved scalability, and optimized performance. By allowing tenants to share infrastructure and resources, multitenancy helps minimize expenses while maintaining operational effectiveness.
Timescale supports two approaches for separating tenant data in a multitenant architecture:
Service per tenant.
Schema per tenant.
Each of these designs comes with its own advantages and trade-offs. Let’s explore these multitenant database strategies in more detail to understand how they work and when to use them.
Service per Tenant
In Tiger Cloud, a service-per-tenant model aligns with the database-per-tenant strategy, where each tenant is assigned a dedicated TigerData service to store and manage their data independently.Since Tiger Cloud provisions one database per service, adopting a service-per-tenant approach involves creating a separate Timescale service for each tenant. This can be done through the Tiger Cloud interface or API. Once provisioned, the service is dedicated to that tenant, ensuring full data isolation and security.
Advantages
Each tenant operates within its own Timescale service, providing the highest level of separation and security.
Deleting a tenant is as straightforward as decommissioning the associated service, eliminating risks of unintentional data exposure.
Businesses operating under strict regulatory requirements benefit from strong data separation and easier compliance management.
Since tenants do not share the same database instance, performance issues in one tenant’s service do not impact others.
This approach aligns naturally with Tiger Cloud’s architecture, making it easy to implement.
Limitations
Managing multiple services can become complex as the number of tenants increases. Each service requires monitoring, maintenance, backups, and updates, which can introduce administrative challenges.
Running a separate Timescale service for each tenant results in higher infrastructure costs compared to shared database models.
Since each service operates independently, CPU, memory, and disk resources are not shared across tenants, potentially leading to inefficiencies.
Referential integrity constraints, such as foreign keys, cannot be used across services, limiting the ability to enforce relationships between shared datasets.
Common datasets, such as global configuration settings or metadata, must be replicated across all services, requiring synchronization mechanisms to keep data consistent.
Aggregating metrics across multiple tenants requires querying separate services, making multi-tenant analytics more challenging.
The service-per-tenant approach in Tiger Cloud is ideal for organizations that prioritize strict data isolation, regulatory compliance, and performance guarantees for each tenant. However, for businesses looking to optimize cost and operational efficiency, other multitenancy strategies, such as schema-per-tenant or shared table models, may be more suitable.
Schema per tenant.
The schema-per-tenant model is a multitenancy strategy that organizes each tenant’s data within its own schema while using a shared database. A schema serves as a logical container for database objects such as tables, views, and functions, ensuring data isolation within a single Timescale service.A new schema is created for each tenant using the CREATE SCHEMA statement:Once created, tenant-specific tables and other database objects can be added within this schema. The schema can be verified using:Each tenant must have access only to their respective schema. This is enforced by creating a user and assigning the necessary privileges:To ensure that the tenant operates within their designated schema by default, the search path can be configured:This ensures that when the tenant logs in, operations default to their schema. Objects created within the schema remain inaccessible to other users unless explicitly granted access.
Advantages
Unlike the service-per-tenant model, all tenants share the same database instance, optimizing resource usage.
Common data, typically stored in the public schema, can be referenced across schemas.
While tenants share the same database, their data is logically separated within distinct schemas.
Maintenance tasks such as backups, monitoring, and indexing are centralized within a single database.
Unlike the service-per-tenant model, transactions spanning multiple tenants can be executed efficiently.
Adding new tenants is straightforward; simply create a new schema.
Since connections are established at the database level rather than per tenant, connection pooling is more effective.
Limitations
Since all tenant data resides in the same database, physical isolation is not achieved.
If each tenant has minimal data but the total tenant count is high, schema management may become complex.
Since all tenants share a single database, performance optimizations must account for workload balancing.
When designing a PostgreSQL database schema, choosing between wide and narrow table layouts is crucial, as each has distinct advantages and trade-offs.
A narrow table typically has few data value columns but may include multiple metadata or categorizing columns. This design is beneficial when dealing with a low number of known metrics and data types. It allows for individual consideration of each metric’s data type and simplifies table design by using independent tables for each metric. Extending the schema with additional metrics is straightforward; simply create a new table for the new metric.
Advantages
Simplifies the addition of new metrics by creating new tables.
Allows for precise data type management for each metric.
Disadvantages
Managing a high number of tables can become complex.
Querying multiple metrics simultaneously may require complex joins or unions, increasing query complexity and system I/O load.
In contrast, a wide table has many columns, often with one column per metric. This design is suitable when all potential metrics are known upfront and are relatively static over time. It simplifies queries, as all related data is in a single row, eliminating the need for joins between metric tables.
Advantages
Simplifies query construction, as all metrics are in a single table.
Efficient querying for subsets of columns, especially when using columnar compression.
Disadvantages
Adding new metrics requires altering the table schema, which can be resource-intensive.
Dropping columns does not immediately free up disk space, potentially leading to storage inefficiencies.
A medium table layout strikes a balance between narrow and wide designs by creating one column for each necessary data type. This approach is practical if you know the expected data types but are uncertain about all potential metrics.
Advantages
Easier to extend with new metrics without altering the table schema.
Simplifies multi-tenancy by creating one hypertable per customer.
Disadvantages
Requires a mechanism to map metrics to their corresponding data type columns.
Querying multiple metrics of different data types may involve complex SQL constructs.
When using Tiger Cloud, all table layouts are fully supported with compression. Narrow tables often achieve excellent compression ratios due to their uniformity. Medium and wide tables also benefit from Tiger Cloud’s compression capabilities, and columns with NULL values compress efficiently, minimizing storage bloat.The choice between wide, narrow, or medium table layouts depends on your specific use case, including factors like the number of metrics, data types, schema evolution, and query patterns. Understanding these factors will help you design a PostgreSQL schema that balances performance, scalability, and maintainability.For a more in-depth discussion, refer to this article.
Hypertables are a key feature in Tiger Cloud that transforms regular PostgreSQL tables into scalable, partitioned versions ideal for managing large datasets. By automatically breaking data into smaller, manageable pieces called chunks, hypertables help your database perform efficiently, even when dealing with billions of rows.This section is designed to help you get the hypertable settings right from the start, ensuring your database not only performs well but can also scale without issues. Below, we’ll cover essential best practices and practical steps for optimizing hypertables in Tiger Cloud.Creating HypertablesTo create a hypertable, you first create a regular PostgreSQL table. Here’s an example script:To convert this PG table to a hypertable, use the create_hypertable() function. The function takes several arguments, but here’s what you need to be aware of:
The default chunk time interval is 7 days and is controlled by the chunk_time_interval argument. Tune the chunk_time_interval relative to your data ingest rates, as this interval determines how large your chunks will be. This setup must align with your service’s workload and total memory allocation. Refer to the Sizing Hypertable Chunks section for a more detailed discussion on how to size your hypertable chunks.
An index on the partitioning column of a hypertable is created by default. This behavior is controlled by the create_default_indexes setting, which can be true or false. If you plan to add a lot of data to a new hypertable, consider setting this to false to speed up the data migration. You can create the index later. Please note that if you have a unique or primary key constraint, the partitioning column must be included in the unique or primary key index.
If you want to convert an existing table with data into a hypertable, you must set migrate_data to ON; otherwise, the function will fail. Using the migrate_data option may take longer to execute, depending on the table size. The recommended approach is to first create an empty hypertable and then insert the data into it.
You can partition a hypertable using one or a set of columns. The partitioning column/s can be of type Timestamp, Date, or Integer.This example partitions conditions hypertable on integer column id. Each chunk contains 1000 records.For UNIX time format, use the same approach as integer based column. By default, Unix Time is represented in seconds, however, depending on your requirements, you can convert into other time units e.g milliseconds. The example below uses 1 day chunk interval equivalent in seconds (60 * 60 * 24).For integer-based partitioning columns, we need to create the integer_now_func custom function. Policies (compression, refresh, retention, or tiering) use this function to determine the age of a chunk. Below is a demonstration of how to create and use this custom function.For UNIX-based time, you only need to create the function once, and all hypertables can share it. On the other hand, integer-based partitioning columns use sequence to evaluate the age of chunks, which means we need to create a custom function for each hypertable.UNIX time example.Integer-based example.It is possible to sub-partition a chunk using additional dimensions. An example is partitioning using the devide_id column in addition to the primary time column. This can improve performance by parallelizing I/O operations in a setup with multiple physical disk configurations (RAID). However, we DO NOT encourage this design in Tiger Cloud services because all chunks are created in a single EBS volume, and therefore, there’s little benefit to query performance.
Ideally, chunks should be small enough to fit into memory to enable quick data access without needing to read from disk, yet large enough to avoid excessive numbers of chunks, which can slow down query planning and lower compression ratio.A practical approach is to configure the chunk_time_interval so that all “active” chunks (including indexes) can fit into shared_buffers memory allocation. “Active” chunks are those currently receiving new data. In contrast, “inactive” chunks refer to older data that is unlikely to be changed or updated. In the Tiger Cloud, shared_buffers is set to 25% of the service’s total memory by default. For instance, on a system with 64 GB of memory, ensure all active hypertable chunks plus indexes do not exceed 16 GB (25% of 64 GB).
By default, when you create a hypertable, Tiger Cloud automatically generates indexes on the partitioning column, but you can control this behavior by setting the create_default_indexes argument of the create_hypertable() function to false.Indexes primarily speed up SELECT queries but can slow down INSERT operations. Creating unused indexes not only reduces INSERT performance but also increases storage costs unnecessarily.When creating composite unique indexes, especially for maintaining data integrity, you must include all columns that comprise the unique index as partitioning columns of the hypertable, including the time column.For example, if you have a hypertable partitioned on time and you want to create a composite unique index on device_id and location_id, you can create a unique index like this:Using timescaledb.transaction_per_chunk option avoids exclusive locks on a hypertable by creating indexes at the chunk level.It is possible to have some indexes that are invalid. This can result from block corruption or when CREATE index command fails midway. Use the command below to identify invalid indexes. Drop and recreate such indexes.When ingesting large amounts of data that involve creating many chunks (e.g., migrating historical data), properly set maintenance_work_mem to speed up index creation operations. As a general rule of thumb, you can set maintenance_work_mem to about 5 % of your total system RAM. This article shares more details on key parameters to consider when tuning performance, while this article goes further to explain more about index optimization.
Use the DROP TABLE command to delete a hypertable. If the hypertable is too large, the DROP command may time out due to insufficient memory. In such cases, DROP individual chunks first using the drop_chunks() functions before issuing the DROP TABLE command.You can set chunk time intervals when creating a new hypertable or modifying an existing hypertable. When you set chunk intervals for an existing hypertable, the new interval will only apply to new chunks.The ALTER commands on a hypertable work like a regular PostgreSQL table. Currently, we have a limitation that you need to decompress data first to add a column with constraints or defaults to a hypertable with compression enabled.To clear the bloats introduced by INSERT/ DELETE/ UPDATE operations, Timescale uses the autovacuum process to reclaim wasted space. The autovacuum process is turned on by default, but the default settings may not be sufficient based on the frequency of INSERT/DELETE/UPDATE operations. This EDB blog post covers the VACUUM/ANALYZE best practices in detail.Queries against bloated hypertables don’t perform optimally, and sometimes you might need to manually VACUUM ANALYZE hypertables to clear the bloats and update hypertable statistics.If you intend to delete all the data in a hypertable chunk, use the drop_chunks() function instead of DELETE command to manually delete the entire chunk at once and avoid the bloats introduced by the DELETE command. If you’re deleting data as part of your business retention policies, timescale provides a data retention feature that can automate the process of deleting old data that meets your criteria.You can view hypertable details from the service explorer page. We also have informational views that you can query directly from the database.
Compression reduces the amount of storage space required to store data. This can lead to significant cost savings, especially for large datasets. Tiger Cloud uses a hybrid row/columnar storage design, where recent data is stored in an uncompressed, row-oriented format for efficient shallow and wide queries. As data ages, one can enable compression, automatically converting the data to a compressed, columnar format for deep and narrow queries. This approach allows you to achieve high ingestion rates and efficient querying of historical data while saving up on storage.
Data is not frequently modified: While Tiger Cloud supports data modifications inside a compressed chunk, they may become inefficient once a chunk is compressed.
Query patterns shift from shallow and wide to deep and narrow: As your application matures and you start analyzing historical trends, your queries will likely shift from needing to access many columns (wide) to focusing on specific columns over time (narrow). This is where compression shines.
Data exhibits trends or patterns: Compression algorithms are most effective when data is ordered and exhibits trends or patterns.
Chunking: Hypertables are divided into smaller “chunks.” Compression operates on these chunks individually, optimizing data storage within each one.
Array-like Structure: Instead of storing individual data points separately, compression groups similar values together, similar to creating a list. For example, instead of storing each data point in a separate row, you group related values in a single row, making it much more compact - and then the compression algorithms compress that group to reduce the storage further.
Compression in Tiger Cloud is highly configurable. You can fine-tune the compression settings to achieve optimal performance and storage efficiency. Here are the key configuration options:
timescaledb.compress: This option enables or disables compression for a hypertable.
timescaledb.compress_orderby: This option specifies how the rows in a compressed chunk are ordered. For most time-series data, ordering by the time column (time DESC or time ASC) is sufficient and often optimal. However, you can specify other columns or combinations of columns if your data has specific patterns or your queries require a different ordering.
timescaledb.compress_segmentby: This option specifies how the compressed data is segmented. Segmenting by a column frequently used in WHERE clauses or GROUP BY clauses can significantly speed up queries. This is because Tiger Cloud can efficiently filter and access data within compressed chunks based on the segmentby column. For example, if you frequently analyze data by device_id, segmenting by device_id will improve query performance.
**timescaledb.compress_chunk_time_interval: **This is optional, but it allows you to specify a time interval for rolling up compressed chunks, further reducing the number of chunks.
Uniqueness: Select columns with low cardinality (fewer unique values) for better compression.
Query Patterns: Consider how your queries access data. Segmenting by columns frequently used in WHERE clauses or GROUP BY clauses can significantly improve query performance.
Tools:
count(distinct …): This SQL function allows you to count the number of distinct values in a column.pg_stats view: This PostgreSQL system view provides detailed statistics about columns, including n_distinct (estimated number of distinct values).
Compression in Tiger Cloud is enabled at the hypertable level. You can enable compression using the following commands. The first command enables the compression and the latter installs a compression policy that will automate the compression of the chunks in the background.Note that the above command is for Tiger Cloud versions older than 2.18.0. For this and newer versions, the blow command should be used:The first command enables compression on the measurements hypertable, sets the segmentby option to device_id, and sets the orderby option to time DESC.The second command creates a compression policy that automatically compresses chunks older than 7 days.
Identify a Test Chunk: To isolate testing, choose an older, uncompressed chunk. Older chunks are ideal because they are less likely to be modified and typically represent data accessed less frequently. This minimizes the performance impact of any modifications or queries during testing. You can use the show_chunks() function to identify suitable chunks. For example, to find chunks older than 3 days in a hypertable called conditions:
SQL
Baseline Query Performance: Identify the queries that are most important for your application and run them against the test chunk. Record the execution times for each query. These queries should reflect your typical usage patterns, including aggregations, filters, and time-based selections. Use EXPLAIN ANALYZE to understand the query plan which would show the execution time of a query.
Enable Compression: Use the ALTER TABLE command to define compression settings (segment by and order by columns).
Manually Compress: Manually compressing the test chunk allows you to test the compression settings without waiting for the automatic compression policy to run. Use the compress_chunk() function with the new settings to compress the test chunk
For versions older than 2.18.0:For versions onwards 2.18.0:Replace ‘_Tiger Cloud_internal._hyper_1_2_chunk’ with the actual name of your test chunk.
Check Compression Ratio:.Monitor the compression ratio achieved using the chunk_compression_stats() function. A higher compression ratio means more effective compression.
This command provides statistics on the compressed and uncompressed sizes of your chunks.
Compare Query Performance: Re-run the baseline queries from step 2 against the compressed test chunk. Compare the execution times to the uncompressed baseline. Ideally, you should see an improvement in performance for queries that benefit from compression, such as those involving aggregations or filters on the compress_segmentby column.
Iterate and Adjust: Compression settings are not one-size-fits-all. Iterating and adjusting allows you to fine-tune the settings for your specific data and query patterns.If the initial compression settings don’t provide the desired performance or compression ratio, decompress the chunk using either decompress_chunk() (for versions below 2.18.0) and convert_to_rowstore(for versions onwards 2.18.0) and modify the settings. Experiment with different compress_segmentby and compress_orderby columns, and consider adjusting the chunk time interval if necessary. Repeat steps 4-6 with the new settings.
Continuous aggregates themselves can also become quite large, especially when dealing with high-volume data. This is where compression comes in. They can also be compressed, further reducing storage costs and improving query performance for pre-aggregated data.You can enable compression on a continuous aggregate using the ALTER MATERIALIZED VIEW command with the timescaledb.compress option. Here’s an example for Tiger Cloud version older than 2.18.0:For version 2.18.0 onwards:This command enables compression on the continuous aggregate named my_cagg.While you can still refresh the compressed regions of a continuous aggregate, reviewing the refresh policy and ensuring that the compression policy does not compress actively refreshed regions for optimal performance is best practice.You can similarly add a compression policy on the continuous aggregate like we previously did on the hypertable.
Rollup compression was introduced in Tiger Cloud 2.9 and later. It allows you to combine multiple smaller, uncompressed chunks into a single, larger compressed chunk. This can be particularly useful for further reducing storage costs and improving query performance, especially when dealing with a large number of small uncompressed chunks.Here’s how roll-up compression works:
Set the compress_chunk_time_interval: When defining your compression policy, you can specify the compress_chunk_time_interval option. This option determines the time interval for rolling up compressed chunks. For example, if your uncompressed chunk interval is one week, you could set compress_chunk_time_interval to ‘2 weeks’ or ‘6 weeks’. \
Optimize compress_orderby: The default setting for compress_orderby is ‘time DESC’, which can lead to inefficient re-compression during rollup. To avoid this, set timescaledb.compress_orderby = ‘time ASC’ to ensure efficient rollup. \
Run compression operations: Once you have configured the compress_chunk_time_interval, Tiger Cloud will automatically roll up uncompressed chunks into a compressed chunk when the compression policy is executed.
Here’s an example of how to enable roll-up compression and manually re-compress the chunks:In this example, uncompressed chunks older than one week will be rolled up into a compressed chunk with a time interval of two weeks.
The latest Tiger Cloud version allows you to modify (UPDATE/DELETE) compressed data. To ensure only necessary data is decompressed while updating data, use columns in segment-by and order-by to filter data.When performing bulk inserts, decompress affected chunks first, as inserting data into compressed chunks is more computationally expensive.Insert operations against a compressed chunk with a unique constraint or primary keys cause a small amount of data to be decompressed, allowing speculative insertion and blocking any inserts that could violate constraints.Inserting into a hypertable with no unique constraint does not incur decompression.You can decompress individual chunks, use a range to filter chunks, or decompress specific chunks based on query filters.The latest Tiger Cloud versions support direct backfill into compressed chunks. You might encounter this limit error when the decompression process exceeds timescaledb.max_tuples_decompressed_per_dml_transaction (100k by default) limit. The solution provided is to disable or increase the limit.
To make backfilling easier, you can use the backfilling functions in the Tiger Cloud extras GitHub repository. In particular, the decompress_backfill procedure automates many of the backfilling steps for you.At the psql prompt, create a temporary table with the same schema as the hypertable you want to backfill into. In this example, the table is named example, and the temporary table is named cpu_tempInsert your data into the temporary table.Call the decompress_backfill procedure. This procedure halts the compression policy, identifies the compressed chunks that the backfilled data corresponds to, decompresses the chunks, inserts data from the backfill table into the main hypertable, and then re-enables the compression policy.
Backfill operations cause the affected chunks to be partially compressed. The compression policy will eventually recompress partially compressed chunks. To identify partially compressed chunks, query the catalog tables for chunks with a status of 9.
Tiger Cloud provides multiple migration strategies based on the volume of data and the downtime you can accommodate. Each approach is designed to streamline the transition process while ensuring data integrity and minimal service disruption. Below is an overview of the available migration methods.Organizations migrating over 400GB should request Timescale support to pre-provision resources for a smoother process.
For datasets under 100GB, pg_dump and pg_restore offer a straightforward migration method for transferring data from a self-hosted PostgreSQL or TimescaleDB instance to Tiger Cloud. This method is ideal for environments where some downtime is acceptable. It supports compressed hypertables without requiring decompression before migration. However, for larger datasets exceeding 100GB, live migration is recommended for efficiency. For full technical details and step-by-step guidance, refer to the pg_dump and restore migration guide.
Live migration provides an end-to-end solution for moving databases with minimal downtime, making it suitable for datasets between 100GB and 10TB+. This method leverages PostgreSQL logical replication and pgcopydb to copy data while keeping the source and target databases synchronized. It is best suited for scenarios where modifying application logic for dual writes is impractical. Live migration is not ideal for workloads exceeding 20,000 rows per second or databases with frequent UPDATE/DELETE operations on compressed tables. Organizations migrating over 400GB should engage Timescale support for assistance. For a detailed guide on live migration, visit the live migration documentation.
Dual-Write and Backfill for High-Ingestion Workloads
This strategy is designed for large-scale time-series workloads (100GB-10TB+) where minimal downtime is required. It involves three key steps: schema and relational data cloning, dual-write implementation, and backfilling historical time-series data. This method is particularly effective for append-only workloads where historical data does not require frequent updates. It supports migrations from PostgreSQL, Tiger Cloud, or any database that can export data in CSV format. Since it is more complex than pg_dump/restore, it requires careful planning and validation. For detailed implementation guidance, refer to the dual-write and backfill documentation.
Livesync enables continuous real-time synchronization between a PostgreSQL source database and a Tiger Cloud service, effectively making Tiger Cloud a logical replica. It is optimized for scenarios where real-time analytics on a Tiger Cloud replica is required rather than a one-time migration. Livesync supports copying large datasets at speeds of up to 150GB per hour, but it does not migrate schema changes automatically. This feature is currently in alpha and is not recommended for production use. For more details, visit the Livesync documentation.Each of these migration strategies provides a tailored approach based on workload requirements, ensuring a seamless transition to Tiger Cloud.Migration support is available for PostgreSQL, Tiger Cloud, AWS RDS, and Managed Service for Timescale users. If you encounter any issues during migration:
Additionally, the **#migration **channel in the Timescale community Slack provides direct access to developers who can assist with migration-related questions.
For users migrating from non-PostgreSQL databases, the recommended approach is to export data as a .csv file and use timescaledb-parallel-copy for efficient ingestion into Tiger Cloud. Other ingestion methods are available depending on the source database; more details can be found in the Ingest data from other sources guide.
C-Aggs speeds up query performance by pre-aggregating raw data (in hypertables). This reduces the overall query cost and lowers total query execution time.Statistics queries that use general-purpose aggregate functions like AVG, COUNT, MAX, MIN, and SUM can benefit data to C-Aggs. This blog talks more about data aggregation, and you can learn more about aggregation best practices from this other blog.If you have statistical queries that are too expensive (take more time and resources at run time), consider implementing Caggs.By default, when you create a continuous aggregate, the entire hypertable is populated with data immediately. This can take time, especially if you have a lot of historical data. We recommend to utilize the “WITH NO DATA” option when creating a Continuous Aggregate. It will create the view instantly, without waiting for the entire dataset to be processed. Here’s an example of creating a Cagg using “WITH NO DATA” option.Use refresh policies to refresh the Cagg automatically. This example adds a refresh policy to the cagg_rides_view Cagg that does a full refresh.After a full refresh, remember to properly set refresh window (start_offset) to a reasonable window. If your workload is append only, keep the refresh window very narrow. If you choose to set end_offset to NULL, ensure to confirm that you do not have future chunks as these might lead to wrong aggregate values. Here’s an example that modifies the refresh policy configs to use a smaller refresh window of 1 hour and sets end_offset to null:If necessary, you can also manually refresh it. If the underlying hypertable (or Cagg in the case of Hierarchical Caggs) has retention policies, ensure the Cagg refresh policy doesn’t overlap with data retention. This can cause unintended data loss if not well planned. An example of manual Cagg refresh with a refresh window of 5 days.You can automate manual refresh using User Defined Action (UDA) in the following scenarios:
When all your policies exceed timescaledb.max_background_workers limit.
This is not a hard limit but assuming all your policies execute concurrently, the database will run out of background workers.
This would require you to cluster several Caggs refresh in a single UDA.
To improve a Cagg refresh performance by customizing some database parameters e.g tune memory requirements or parallelism level.
Below is an example that uses a UDA to manually refreshes two Caggs, sets work_mem to 512MB, sets max_parallel_workers_per_gather to 4, and finally adds a background job to run every 10 minutes:When you create a Cagg, real-time aggregate feature is disabled by default. real-time aggregation JOINs materialized data with the hypertable data to include the most recent chunk’s data. Here are a few points to help you determine if your use case is valid for real-time Cagg :
How sensitive are your reporting needs to the most recent data?
What gap can you tolerate between aggregated (Cagg) and raw hypertable (new data after Cagg refresh)?
Does enabling real-time aggregate affect query performance?
An example of how to turn on/off real-time aggregates using the materialized_only parameter:You can also create Caggs against other Caggs using Hierarchical Continuous aggregates. This helps reduce computational costs and resources required to aggregate raw data. A simple example would look like: Hypertable (raw data) -> Seconds -> Minutes -> Hourly -> Daily -> Weekly -> Monthly e.t.c. Take note of these restrictions when creating Hierarchical Caggs. Below is a demonstration of how to create Hierarchical Caggs using the example highlighted above:By default, composite indexes are automatically created for each GROUP BY column and the bucket. This behavior can be turned off when creating a Cagg. You can also manually add indexes on Caggs if necessary. Use the EXPLAIN PLAN to evaluate query performance against the Cagg before deciding to add a new index. Check on Caggs index limitations. To add an index on a Cagg, use the standard CREATE INDEX command:Tiger Cloud uses UTC timezone by default. You can change Tiger Cloud service Timezone at the database level (ALTER DATABASE tsdb SET TIMEZONE TO ‘Europe/Berlin’;) or at the role level (ALTER ROLE readonly SET TIMEZONE TO ‘Europe/Berlin’;). The pg_timezone_names view provides a list of all supported Timezones.Caggs supports the local Timezone by default. You can define a Timezone, but it will be static. If you have users spread across different regions and want Caggs to report on Timezones specific to each region, create a Cagg for each region as a workaround.The example below creates a Cagg using the default UTC timezone on the view bucket, and explicitly defines another column to display the preferred timezone.To drop data from a Cagg, you can either:
You can manually drop hypertable chunks using the drop_chunks function or set up retention policies and then refresh the Cagg manually or by adjusting the start_offset argument of the refresh policy.
Drop Cagg’s materialized hypertable chunks using the drop_chunks function.
Caggs are supported by materialized hypertables, which adopt the same structure as regular hypertables.Enabling compression on Caggs follows a similar approach to hypertable compression. When compression is enabled and no other options are provided, the segment_by value will be automatically set to the group by columns of the continuous aggregate and the time_bucket column will be used as the order_by column in the compression configuration.You can also customize the segment_by and order_by arguments columns to align with the filters used in your queries. If you add a custom index to a Cagg, ensure the index’s columns are part of the segment_by or order_by arguments. If you enable compression in a Cagg and do not specify segment_by or order_by columns, these arguments are automatically set to the columns used in the group by clause of the Cagg view definition.Always check the troubleshooting page for common Caggs errors.
You can automatically delete historical data by setting up data retention policies or by manually dropping hypertable chunks that meet certain criteria e.g. “data older than one year”.
If you’ve defined a retention policy and it is no longer needed, use the remove_retention_policy() function to remove it. You can also disable the policy by setting the scheduled argument to False using the alter_job() function.
Timescale provides layered storage, with the first layer comprising uncompressed data and compressed data as the second. These two layers are stored in high-performance EBS volumes.The third layer (object storage) stores infrequently accessed data with lower performance requirements in AWS S3 storage.Object storage lowers storage cost, scales storage beyond the current 16 TB limit and data is transparently queriable.
To tier data, you first enable it from the service console Overview page. Then, add a data tiering policy to automate data tiering using the add_tiering_policy() function.The example below adds a tiering policy to remove
Sometimes, there’s a need to tier or untier chunks manually. An example is when you’re almost reaching the 16 TB storage limit, you can manually tier chunks to free up storage space.Tiered data is immutable; therefore, if you need to modify tiered data, you need to untier the data first. untier_chunk() function is used to untier chunks.
Sometimes, you want to move data from tiered storage back to high-performance EBS storage. Before moving forward, ensure you have enough free storage to accommodate untiered chunks.You must first untier all data or drop the tiered data before disabling data tiering.
To query tiered data, you must enable timescaledb.enable_tiered_reads parameter. This parameter is disabled by default. You can configure the setting at the service console as demonstrated on this page or by using the SET command at the database or session level.Once this setting is enabled, you can use a standard SELECT statement to query tiered data.
If you’d prefer, you can interact with your Timescale database via the Tiger Cloud Console rather than doing so directly through your terminal.By doing so, you can take advantage of several features of the console that better help you to visualize, modify, and manage your data.By going to the Services tab, you can view all of the services created under your project ID. If you do not have a service yet, you can click on the New service button to create your first one.You first choose your required capabilities based upon what you intend to use the service for. Many of our customers use us primarily for Time series and analytics, however you can also create a vanilla PostgreSQL service, or take advantage of our AI and Vector capabilities. For demonstration purposes, lets select the first option.You then select the region that you would like to create your service in. You should generally consider the geographic location of the majority of your end users when selecting a region for your service.You can then choose the compute and memory size for your service. By providing an estimate for how much data you will be working with, Timescale provides you with a recommendation for the most appropriate size for your service.Then, select the type of environment this will be, as well as whether or not you would like to create an HA replica to ensure that your service is always available. If you are unsure of whether to create one, Timescale will again recommend one based on what kind of environment you are creating.Finally, you can configure additional power-ups such as Connection pooling or connecting to your Timescale service directly from your own AWS VPC. These features can always be enabled/disabled afterwards if your business needs change.
Once you have created your service, you can then go back to the Services tab and click on your service to begin interacting with it.The Cloud Console offers several headers that allow you to interact with your database in various ways. An example would be the Explorer tab, where you can view information regarding your hypertables, compression, and configured polices.Another useful tab is the Jobs tab, where you can find information on the last run of your jobs, such as a compression job or a refresh of a continuous aggregate. This view is helpful for noticing job failures, which may indicate an underlying issue.There is also a built in SQL Editor, as well as the Operations tab which allows you to perform administrative actions on your service, such as modifying the CPU/memory, creating HA/read replicas, setting database parameters, and even changing the password of the service.The Operations tab also allows you to see what continuous aggregates you have created by going to the Service management section.There are also some tabs that can be very useful for observing trends in your metrics over time, or analyzing log messages to troubleshoot any issues. These can be found in the Metrics and Logs headers respectively.The Metrics tab will allow you to visualize changes in your CPU and Memory over time. In the top left corner, you can also adjust the timeframe of the graph. In the instance that you have created a replica, you can also view its metrics on this page by selecting it in a dropdown menu in the top-left corner.The other header, Logs, provides a helpful insight as to the activity that has been going on inside your service. You can filter based on the severity of each log message, allowing you to narrow your focus while searching for specific messages. In the instance that you experience an unexpected issue, you can always note the timestamp, and examine your service logs at this time to search for any clear root cause.The Insights tab is also incredibly helpful for visualizing change in metrics over time, as well as trying to identify the underlying cause. It also provides information on lock contention, so you can identify any jobs that may be conflicting with your applications, for example. Additionally, you can also view your top queries in this tab, which can be very helpful in visualizing how often queries are called and the resulting execution time.This is just a subset of the amazing things that you can do through the Tiger Cloud Console, which is a very convenient way of monitoring, configuring, and managing your database.
In some instances, it may be helpful to create a fork of your database for the purposes of troubleshooting, or to test the effects of a proposed change before doing so in production.A fork is an exact copy of your database at a given time, which after the fork is created, diverges from your original database and is then treated as its own independent service where data is not replicated from the original.You can create a fork of your service by first migrating to the Services tab, selecting the Operations header, and then lastly the Service management section.You are given the opportunity to create a fork of your service in the exact state it is in at that moment in time, or if you would prefer, make a recovery fork of your service to any point in the last 14 days using the backups taken from your environment.If you are not given the option to “Fork to a previous point in time”, it is likely that no backups are available yet.
A read replica is a read-only copy of your primary database that is kept in sync, replaying each transaction that occurs with WAL records shipped from the primary. This enables you to interact with up-to-date production data for analysis or to scale out reads beyond the limits of your primary data instance.You can create as many read replicas as you need. Each read replica appears as its own service. You use a unique connection string to interact with each read replica. This provides both security and resource isolation. To restrict access without isolation, you can create a read-only role for each Tiger Cloud service. Users with read-only permissions cannot access the primary data instance directly.To create a read replica:
Navigate to the Tiger Cloud console, and select your desired service
Click on the Operations header
Click on the Read replicas section
Select Add read replica, and select your desired configurations
Make note of the connection information for your replica, as each connection string is unique, and different from your primary instance
To manage lag for your replicas, you can view the current replication lag under the Operations header as well, which will better help you understand trends affecting your replica’s ability to keep up with your primary.If you note issues with replication lag, it is possible that you might benefit from adjusting the max_standby_streaming_delay and max_standby_archive_delay parameters, which can be set to configure the maximum amount of time that the database will wait before cancelling a transaction to apply pending WAL records.This typically happens when a long running, expensive query generates a good deal of replication lag while the query runs, as conflicting WAL records need to wait for the transaction to complete before being applied. By setting the above parameters, we can avoid excessive replication lag – but it is important to balance this with your need for your long-running queries to complete, as cancelling them may affect your end users.
Timescale also offers HA replicas as a means of maintaining as close to an up-to-date copy of your primary database as possible, which you have the ability to failover to in disaster scenarios. These copies are hosted in different AWS AZs within the same region as your primary node, and can take over if your primary becomes unavailable.HA replicas can be both synchronous and asynchronous:Synchronous (SYNC):
Commits its next write once the replica confirms the previous write is complete
No lag between the primary and the replica, in the same state at all times
Highest level of data integrity, however ingest is affected due to the wait for confirmation
Asynchronous (ASYNC):
Commits its next write without confirmation of the previous write completion
Often lags behind the primary a little
Preferable if you need the shortest ingest time
To configure an HA replica, or to modify the configuration of an existing replica, you can go to the Operations header and select the High availability section.There are a couple different configurations available:
Non-production: no replica, best for developer environments.
High availability: a single async replica in a different AWS availability zone from your primary. Provides high availability with cost efficiency. Best for production apps.
Highest availability: two replicas in different AWS availability zones from your primary. Available replication modes are: \
High performance - two async replicas. Provides the highest level of availability with two AZs and the ability to query the HA system. Best for absolutely critical apps. \
High data integrity - one sync replica and one async replica. The sync replica is identical to the primary at all times. Best for apps that can tolerate no data loss.
You should select the appropriate configuration for your use case based on your tolerance for data loss, and desire for maximum performance.