Skip to main content

Table Design

When designing a PostgreSQL database schema, choosing between wide and narrow table layouts is crucial, as each has distinct advantages and trade-offs.

Narrow Table Layout

A narrow table typically has few data value columns but may include multiple metadata or categorizing columns. This design is beneficial when dealing with a low number of known metrics and data types. It allows for individual consideration of each metric’s data type and simplifies table design by using independent tables for each metric. Extending the schema with additional metrics is straightforward; simply create a new table for the new metric.
Advantages
  • Simplifies the addition of new metrics by creating new tables.​
  • Allows for precise data type management for each metric.​
Disadvantages
  • Managing a high number of tables can become complex.
  • Querying multiple metrics simultaneously may require complex joins or unions, increasing query complexity and system I/O load.

Wide Table Layout

In contrast, a wide table has many columns, often with one column per metric. This design is suitable when all potential metrics are known upfront and are relatively static over time. It simplifies queries, as all related data is in a single row, eliminating the need for joins between metric tables.
Advantages
  • Simplifies query construction, as all metrics are in a single table.
  • Efficient querying for subsets of columns, especially when using columnar compression.
Disadvantages
  • Adding new metrics requires altering the table schema, which can be resource-intensive.
  • Dropping columns does not immediately free up disk space, potentially leading to storage inefficiencies.

Medium Table Layout

A medium table layout strikes a balance between narrow and wide designs by creating one column for each necessary data type. This approach is practical if you know the expected data types but are uncertain about all potential metrics.
Advantages
  • Easier to extend with new metrics without altering the table schema.
  • Simplifies multi-tenancy by creating one hypertable per customer.
Disadvantages
  • Requires a mechanism to map metrics to their corresponding data type columns.
  • Querying multiple metrics of different data types may involve complex SQL constructs.

Table Design Considerations for Tiger Cloud

When using Tiger Cloud, all table layouts are fully supported with compression. Narrow tables often achieve excellent compression ratios due to their uniformity. Medium and wide tables also benefit from Tiger Cloud’s compression capabilities, and columns with NULL values compress efficiently, minimizing storage bloat. The choice between wide, narrow, or medium table layouts depends on your specific use case, including factors like the number of metrics, data types, schema evolution, and query patterns. Understanding these factors will help you design a PostgreSQL schema that balances performance, scalability, and maintainability. For a more in-depth discussion, refer to this article.

Hypertables

Hypertables are a key feature in Tiger Cloud that transforms regular PostgreSQL tables into scalable, partitioned versions ideal for managing large datasets. By automatically breaking data into smaller, manageable pieces called chunks, hypertables help your database perform efficiently, even when dealing with billions of rows. This section is designed to help you get the hypertable settings right from the start, ensuring your database not only performs well but can also scale without issues. Below, we’ll cover essential best practices and practical steps for optimizing hypertables in Tiger Cloud. Creating Hypertables To create a hypertable, you first create a regular PostgreSQL table. Here’s an example script: To convert this PG table to a hypertable, use the create_hypertable() function. The function takes several arguments, but here’s what you need to be aware of:
  • The default chunk time interval is 7 days and is controlled by the chunk_time_interval argument. Tune the chunk_time_interval relative to your data ingest rates, as this interval determines how large your chunks will be. This setup must align with your service’s workload and total memory allocation. Refer to the Sizing Hypertable Chunks section for a more detailed discussion on how to size your hypertable chunks.
  • An index on the partitioning column of a hypertable is created by default. This behavior is controlled by the create_default_indexes setting, which can be true or false. If you plan to add a lot of data to a new hypertable, consider setting this to false to speed up the data migration. You can create the index later. Please note that if you have a unique or primary key constraint, the partitioning column must be included in the unique or primary key index.
  • If you want to convert an existing table with data into a hypertable, you must set migrate_data to ON; otherwise, the function will fail. Using the migrate_data option may take longer to execute, depending on the table size. The recommended approach is to first create an empty hypertable and then insert the data into it.

Partitioning Hypertables

You can partition a hypertable using one or a set of columns. The partitioning column/s can be of type Timestamp, Date, or Integer. This example partitions conditions hypertable on integer column id. Each chunk contains 1000 records. For UNIX time format, use the same approach as integer based column. By default, Unix Time is represented in seconds, however, depending on your requirements, you can convert into other time units e.g milliseconds. The example below uses 1 day chunk interval equivalent in seconds (60 * 60 * 24). For integer-based partitioning columns, we need to create the integer_now_func custom function. Policies (compression, refresh, retention, or tiering) use this function to determine the age of a chunk. Below is a demonstration of how to create and use this custom function. For UNIX-based time, you only need to create the function once, and all hypertables can share it. On the other hand, integer-based partitioning columns use sequence to evaluate the age of chunks, which means we need to create a custom function for each hypertable. UNIX time example. Integer-based example. It is possible to sub-partition a chunk using additional dimensions. An example is partitioning using the devide_id column in addition to the primary time column. This can improve performance by parallelizing I/O operations in a setup with multiple physical disk configurations (RAID). However, we DO NOT encourage this design in Tiger Cloud services because all chunks are created in a single EBS volume, and therefore, there’s little benefit to query performance.

Sizing Hypertable chunks

Ideally, chunks should be small enough to fit into memory to enable quick data access without needing to read from disk, yet large enough to avoid excessive numbers of chunks, which can slow down query planning and lower compression ratio. A practical approach is to configure the chunk_time_interval so that all “active” chunks (including indexes) can fit into shared_buffers memory allocation. “Active” chunks are those currently receiving new data. In contrast, “inactive” chunks refer to older data that is unlikely to be changed or updated. In the Tiger Cloud, shared_buffers is set to 25% of the service’s total memory by default. For instance, on a system with 64 GB of memory, ensure all active hypertable chunks plus indexes do not exceed 16 GB (25% of 64 GB).

Hypertable Indexes

By default, when you create a hypertable, Tiger Cloud automatically generates indexes on the partitioning column, but you can control this behavior by setting the create_default_indexes argument of the create_hypertable() function to false. Indexes primarily speed up SELECT queries but can slow down INSERT operations. Creating unused indexes not only reduces INSERT performance but also increases storage costs unnecessarily. When creating composite unique indexes, especially for maintaining data integrity, you must include all columns that comprise the unique index as partitioning columns of the hypertable, including the time column. For example, if you have a hypertable partitioned on time and you want to create a composite unique index on device_id and location_id, you can create a unique index like this: Using timescaledb.transaction_per_chunk option avoids exclusive locks on a hypertable by creating indexes at the chunk level. It is possible to have some indexes that are invalid. This can result from block corruption or when CREATE index command fails midway. Use the command below to identify invalid indexes. Drop and recreate such indexes. When ingesting large amounts of data that involve creating many chunks (e.g., migrating historical data), properly set maintenance_work_mem to speed up index creation operations. As a general rule of thumb, you can set maintenance_work_mem to about 5 % of your total system RAM. This article shares more details on key parameters to consider when tuning performance, while this article goes further to explain more about index optimization.

Hypertable Operations

Use the DROP TABLE command to delete a hypertable. If the hypertable is too large, the DROP command may time out due to insufficient memory. In such cases, DROP individual chunks first using the drop_chunks() functions before issuing the DROP TABLE command. You can set chunk time intervals when creating a new hypertable or modifying an existing hypertable. When you set chunk intervals for an existing hypertable, the new interval will only apply to new chunks. The ALTER commands on a hypertable work like a regular PostgreSQL table. Currently, we have a limitation that you need to decompress data first to add a column with constraints or defaults to a hypertable with compression enabled. To clear the bloats introduced by INSERT/ DELETE/ UPDATE operations, Timescale uses the autovacuum process to reclaim wasted space. The autovacuum process is turned on by default, but the default settings may not be sufficient based on the frequency of INSERT/DELETE/UPDATE operations. This EDB blog post covers the VACUUM/ANALYZE best practices in detail. Queries against bloated hypertables don’t perform optimally, and sometimes you might need to manually VACUUM ANALYZE hypertables to clear the bloats and update hypertable statistics. If you intend to delete all the data in a hypertable chunk, use the drop_chunks() function instead of DELETE command to manually delete the entire chunk at once and avoid the bloats introduced by the DELETE command. If you’re deleting data as part of your business retention policies, timescale provides a data retention feature that can automate the process of deleting old data that meets your criteria. You can view hypertable details from the service explorer page. We also have informational views that you can query directly from the database.

Compression

Compression reduces the amount of storage space required to store data. This can lead to significant cost savings, especially for large datasets. Tiger Cloud uses a hybrid row/columnar storage design, where recent data is stored in an uncompressed, row-oriented format for efficient shallow and wide queries. As data ages, one can enable compression, automatically converting the data to a compressed, columnar format for deep and narrow queries. This approach allows you to achieve high ingestion rates and efficient querying of historical data while saving up on storage.

When to Use Compression

Compression is most effective when:
  • Data is not frequently modified: While Tiger Cloud supports data modifications inside a compressed chunk, they may become inefficient once a chunk is compressed.
  • Query patterns shift from shallow and wide to deep and narrow: As your application matures and you start analyzing historical trends, your queries will likely shift from needing to access many columns (wide) to focusing on specific columns over time (narrow). This is where compression shines.
  • Data exhibits trends or patterns: Compression algorithms are most effective when data is ordered and exhibits trends or patterns.

How does it work

  • Chunking: Hypertables are divided into smaller “chunks.” Compression operates on these chunks individually, optimizing data storage within each one.
  • Array-like Structure: Instead of storing individual data points separately, compression groups similar values together, similar to creating a list. For example, instead of storing each data point in a separate row, you group related values in a single row, making it much more compact - and then the compression algorithms compress that group to reduce the storage further.

Compression Configuration Options

Compression in Tiger Cloud is highly configurable. You can fine-tune the compression settings to achieve optimal performance and storage efficiency. Here are the key configuration options:
  • timescaledb.compress: This option enables or disables compression for a hypertable.
  • timescaledb.compress_orderby: This option specifies how the rows in a compressed chunk are ordered. For most time-series data, ordering by the time column (time DESC or time ASC) is sufficient and often optimal. However, you can specify other columns or combinations of columns if your data has specific patterns or your queries require a different ordering.
  • timescaledb.compress_segmentby: This option specifies how the compressed data is segmented. Segmenting by a column frequently used in WHERE clauses or GROUP BY clauses can significantly speed up queries. This is because Tiger Cloud can efficiently filter and access data within compressed chunks based on the segmentby column. For example, if you frequently analyze data by device_id, segmenting by device_id will improve query performance.
  • **timescaledb.compress_chunk_time_interval: **This is optional, but it allows you to specify a time interval for rolling up compressed chunks, further reducing the number of chunks.

Choosing Segmentation Columns

  • Uniqueness: Select columns with low cardinality (fewer unique values) for better compression.
  • Query Patterns: Consider how your queries access data. Segmenting by columns frequently used in WHERE clauses or GROUP BY clauses can significantly improve query performance.
  • Tools:
count(distinct …): This SQL function allows you to count the number of distinct values in a column. pg_stats view: This PostgreSQL system view provides detailed statistics about columns, including n_distinct (estimated number of distinct values).

How to Enable Compression

Compression in Tiger Cloud is enabled at the hypertable level. You can enable compression using the following commands. The first command enables the compression and the latter installs a compression policy that will automate the compression of the chunks in the background. Note that the above command is for Tiger Cloud versions older than 2.18.0. For this and newer versions, the blow command should be used: The first command enables compression on the measurements hypertable, sets the segmentby option to device_id, and sets the orderby option to time DESC. The second command creates a compression policy that automatically compresses chunks older than 7 days.

Testing Compression

  1. Identify a Test Chunk: To isolate testing, choose an older, uncompressed chunk. Older chunks are ideal because they are less likely to be modified and typically represent data accessed less frequently. This minimizes the performance impact of any modifications or queries during testing. You can use the show_chunks() function to identify suitable chunks. For example, to find chunks older than 3 days in a hypertable called conditions:

    SQL
  2. Baseline Query Performance: Identify the queries that are most important for your application and run them against the test chunk. Record the execution times for each query. These queries should reflect your typical usage patterns, including aggregations, filters, and time-based selections. Use EXPLAIN ANALYZE to understand the query plan which would show the execution time of a query.
  3. Enable Compression: Use the ALTER TABLE command to define compression settings (segment by and order by columns).
  4. Manually Compress: Manually compressing the test chunk allows you to test the compression settings without waiting for the automatic compression policy to run. Use the compress_chunk() function with the new settings to compress the test chunk
For versions older than 2.18.0: For versions onwards 2.18.0: Replace ‘_Tiger Cloud_internal._hyper_1_2_chunk’ with the actual name of your test chunk.
  1. Check Compression Ratio:.Monitor the compression ratio achieved using the chunk_compression_stats() function. A higher compression ratio means more effective compression.
This command provides statistics on the compressed and uncompressed sizes of your chunks.
  1. Compare Query Performance: Re-run the baseline queries from step 2 against the compressed test chunk. Compare the execution times to the uncompressed baseline. Ideally, you should see an improvement in performance for queries that benefit from compression, such as those involving aggregations or filters on the compress_segmentby column.
  2. Iterate and Adjust: Compression settings are not one-size-fits-all. Iterating and adjusting allows you to fine-tune the settings for your specific data and query patterns.If the initial compression settings don’t provide the desired performance or compression ratio, decompress the chunk using either decompress_chunk() (for versions below 2.18.0) and convert_to_rowstore(for versions onwards 2.18.0) and modify the settings. Experiment with different compress_segmentby and compress_orderby columns, and consider adjusting the chunk time interval if necessary. Repeat steps 4-6 with the new settings.

Compression for Continuous Aggregates

Continuous aggregates themselves can also become quite large, especially when dealing with high-volume data. This is where compression comes in. They can also be compressed, further reducing storage costs and improving query performance for pre-aggregated data. You can enable compression on a continuous aggregate using the ALTER MATERIALIZED VIEW command with the timescaledb.compress option. Here’s an example for Tiger Cloud version older than 2.18.0: For version 2.18.0 onwards: This command enables compression on the continuous aggregate named my_cagg. While you can still refresh the compressed regions of a continuous aggregate, reviewing the refresh policy and ensuring that the compression policy does not compress actively refreshed regions for optimal performance is best practice. You can similarly add a compression policy on the continuous aggregate like we previously did on the hypertable.

Rollup-Compression

Rollup compression was introduced in Tiger Cloud 2.9 and later. It allows you to combine multiple smaller, uncompressed chunks into a single, larger compressed chunk. This can be particularly useful for further reducing storage costs and improving query performance, especially when dealing with a large number of small uncompressed chunks. Here’s how roll-up compression works:
  1. Set the compress_chunk_time_interval: When defining your compression policy, you can specify the compress_chunk_time_interval option. This option determines the time interval for rolling up compressed chunks. For example, if your uncompressed chunk interval is one week, you could set compress_chunk_time_interval to ‘2 weeks’ or ‘6 weeks’. \
  2. Optimize compress_orderby: The default setting for compress_orderby is ‘time DESC’, which can lead to inefficient re-compression during rollup. To avoid this, set timescaledb.compress_orderby = ‘time ASC’ to ensure efficient rollup. \
  3. Run compression operations: Once you have configured the compress_chunk_time_interval, Tiger Cloud will automatically roll up uncompressed chunks into a compressed chunk when the compression policy is executed.
Here’s an example of how to enable roll-up compression and manually re-compress the chunks: In this example, uncompressed chunks older than one week will be rolled up into a compressed chunk with a time interval of two weeks.

Essential considerations for roll-up compression

  • The compress_chunk_time_interval must be a multiple of the uncompressed chunk interval.
  • Rollup compression is most effective when you have many small uncompressed chunks.
  • Carefully monitor your database performance and storage usage after enabling roll-up compression to ensure it’s providing the desired benefits.

Disable compression

To disable compression on a hypertable, the following steps can be taken:
  1. Remove the compression policy: If you have a compression policy set up, you need to remove it to prevent future compressions.
  2. Decompress All Chunks:
  3. Disable the compression on the hypertable:

Backfill Operations

The latest Tiger Cloud version allows you to modify (UPDATE/DELETE) compressed data. To ensure only necessary data is decompressed while updating data, use columns in segment-by and order-by to filter data. When performing bulk inserts, decompress affected chunks first, as inserting data into compressed chunks is more computationally expensive. Insert operations against a compressed chunk with a unique constraint or primary keys cause a small amount of data to be decompressed, allowing speculative insertion and blocking any inserts that could violate constraints. Inserting into a hypertable with no unique constraint does not incur decompression. You can decompress individual chunks, use a range to filter chunks, or decompress specific chunks based on query filters. The latest Tiger Cloud versions support direct backfill into compressed chunks. You might encounter this limit error when the decompression process exceeds timescaledb.max_tuples_decompressed_per_dml_transaction (100k by default) limit. The solution provided is to disable or increase the limit.

Backfill with a supplied function

To make backfilling easier, you can use the backfilling functions in the Tiger Cloud extras GitHub repository. In particular, the decompress_backfill procedure automates many of the backfilling steps for you. At the psql prompt, create a temporary table with the same schema as the hypertable you want to backfill into. In this example, the table is named example, and the temporary table is named cpu_temp Insert your data into the temporary table. Call the decompress_backfill procedure. This procedure halts the compression policy, identifies the compressed chunks that the backfilled data corresponds to, decompresses the chunks, inserts data from the backfill table into the main hypertable, and then re-enables the compression policy.

Partially compressed chunks

Backfill operations cause the affected chunks to be partially compressed. The compression policy will eventually recompress partially compressed chunks. To identify partially compressed chunks, query the catalog tables for chunks with a status of 9.

Continuous Aggregates (Caggs)

C-Aggs speeds up query performance by pre-aggregating raw data (in hypertables). This reduces the overall query cost and lowers total query execution time. Statistics queries that use general-purpose aggregate functions like AVG, COUNT, MAX, MIN, and SUM can benefit data to C-Aggs. This blog talks more about data aggregation, and you can learn more about aggregation best practices from this other blog. If you have statistical queries that are too expensive (take more time and resources at run time), consider implementing Caggs. By default, when you create a continuous aggregate, the entire hypertable is populated with data immediately. This can take time, especially if you have a lot of historical data. We recommend to utilize the “WITH NO DATA” option when creating a Continuous Aggregate. It will create the view instantly, without waiting for the entire dataset to be processed. Here’s an example of creating a Cagg using “WITH NO DATA” option. Use refresh policies to refresh the Cagg automatically. This example adds a refresh policy to the cagg_rides_view Cagg that does a full refresh. After a full refresh, remember to properly set refresh window (start_offset) to a reasonable window. If your workload is append only, keep the refresh window very narrow. If you choose to set end_offset to NULL, ensure to confirm that you do not have future chunks as these might lead to wrong aggregate values. Here’s an example that modifies the refresh policy configs to use a smaller refresh window of 1 hour and sets end_offset to null: If necessary, you can also manually refresh it. If the underlying hypertable (or Cagg in the case of Hierarchical Caggs) has retention policies, ensure the Cagg refresh policy doesn’t overlap with data retention. This can cause unintended data loss if not well planned. An example of manual Cagg refresh with a refresh window of 5 days. You can automate manual refresh using User Defined Action (UDA) in the following scenarios:
  • When all your policies exceed timescaledb.max_background_workers limit.
    • This is not a hard limit but assuming all your policies execute concurrently, the database will run out of background workers.
    • This would require you to cluster several Caggs refresh in a single UDA.
  • To improve a Cagg refresh performance by customizing some database parameters e.g tune memory requirements or parallelism level.
Below is an example that uses a UDA to manually refreshes two Caggs, sets work_mem to 512MB, sets max_parallel_workers_per_gather to 4, and finally adds a background job to run every 10 minutes: When you create a Cagg, real-time aggregate feature is disabled by default. real-time aggregation JOINs materialized data with the hypertable data to include the most recent chunk’s data. Here are a few points to help you determine if your use case is valid for real-time Cagg :
  • How sensitive are your reporting needs to the most recent data?
  • What gap can you tolerate between aggregated (Cagg) and raw hypertable (new data after Cagg refresh)?
  • Does enabling real-time aggregate affect query performance?
An example of how to turn on/off real-time aggregates using the materialized_only parameter: You can also create Caggs against other Caggs using Hierarchical Continuous aggregates. This helps reduce computational costs and resources required to aggregate raw data. A simple example would look like: Hypertable (raw data) -> Seconds -> Minutes -> Hourly -> Daily -> Weekly -> Monthly e.t.c. Take note of these restrictions when creating Hierarchical Caggs. Below is a demonstration of how to create Hierarchical Caggs using the example highlighted above: By default, composite indexes are automatically created for each GROUP BY column and the bucket. This behavior can be turned off when creating a Cagg. You can also manually add indexes on Caggs if necessary. Use the EXPLAIN PLAN to evaluate query performance against the Cagg before deciding to add a new index. Check on Caggs index limitations. To add an index on a Cagg, use the standard CREATE INDEX command: Tiger Cloud uses UTC timezone by default. You can change Tiger Cloud service Timezone at the database level (ALTER DATABASE tsdb SET TIMEZONE TO ‘Europe/Berlin’;) or at the role level (ALTER ROLE readonly SET TIMEZONE TO ‘Europe/Berlin’;). The pg_timezone_names view provides a list of all supported Timezones. Caggs supports the local Timezone by default. You can define a Timezone, but it will be static. If you have users spread across different regions and want Caggs to report on Timezones specific to each region, create a Cagg for each region as a workaround. The example below creates a Cagg using the default UTC timezone on the view bucket, and explicitly defines another column to display the preferred timezone. To drop data from a Cagg, you can either:
  • Drop the view directly using DROP MATERIALIZED VIEW.
  • You can manually drop hypertable chunks using the drop_chunks function or set up retention policies and then refresh the Cagg manually or by adjusting the start_offset argument of the refresh policy.
  • Drop Cagg’s materialized hypertable chunks using the drop_chunks function.
Caggs are supported by materialized hypertables, which adopt the same structure as regular hypertables. Enabling compression on Caggs follows a similar approach to hypertable compression. When compression is enabled and no other options are provided, the segment_by value will be automatically set to the group by columns of the continuous aggregate and the time_bucket column will be used as the order_by column in the compression configuration. You can also customize the segment_by and order_by arguments columns to align with the filters used in your queries. If you add a custom index to a Cagg, ensure the index’s columns are part of the segment_by or order_by arguments. If you enable compression in a Cagg and do not specify segment_by or order_by columns, these arguments are automatically set to the columns used in the group by clause of the Cagg view definition. Always check the troubleshooting page for common Caggs errors.

Data Retention

You can automatically delete historical data by setting up data retention policies or by manually dropping hypertable chunks that meet certain criteria e.g. “data older than one year”.

Add data retention policy

The example below adds a retention policy on the conditions hypertable to delete data older than one year.

Remove data retention policy

If you’ve defined a retention policy and it is no longer needed, use the remove_retention_policy() function to remove it. You can also disable the policy by setting the scheduled argument to False using the alter_job() function.

Tiered Storage

Timescale provides layered storage, with the first layer comprising uncompressed data and compressed data as the second. These two layers are stored in high-performance EBS volumes. The third layer (object storage) stores infrequently accessed data with lower performance requirements in AWS S3 storage. Object storage lowers storage cost, scales storage beyond the current 16 TB limit and data is transparently queriable.

Add a data tiering policy

To tier data, you first enable it from the service console Overview page. Then, add a data tiering policy to automate data tiering using the add_tiering_policy() function. The example below adds a tiering policy to remove

Remove a tiering policy

Use remove_tiering_policy() to remove an existing tiering policy.

Tier or untier chunks manually

Sometimes, there’s a need to tier or untier chunks manually. An example is when you’re almost reaching the 16 TB storage limit, you can manually tier chunks to free up storage space. Tiered data is immutable; therefore, if you need to modify tiered data, you need to untier the data first. untier_chunk() function is used to untier chunks.

Exploring tiered chunks

To see all tired chunks for a hypertable: After invoking tier_chunk() function, you can see chunks scheduled for tiering using:

Disabling Tiering

Sometimes, you want to move data from tiered storage back to high-performance EBS storage. Before moving forward, ensure you have enough free storage to accommodate untiered chunks. You must first untier all data or drop the tiered data before disabling data tiering.

Querying tiered data

To query tiered data, you must enable timescaledb.enable_tiered_reads parameter. This parameter is disabled by default. You can configure the setting at the service console as demonstrated on this page or by using the SET command at the database or session level. Once this setting is enabled, you can use a standard SELECT statement to query tiered data.