Amazon Redshift suggests to define primary key or foreign key constraint wherever applicable. But they are information only. Redshift does not enforce these constraints. In other words, a column with primary key accepts duplicate values as well as a foreign key column also allows such a value that does not exists in the referenced table. So why does Redshift consider defining primary or foreign key as their best practice list? Because query optimizer uses them to choose the most suitable execution plan. But you need to be very careful while defining these constraint in your table. Let us see why with some real-life example.
At Sqreen we use Amazon Kinesis service to process data from our agents in near real-time. This kind of processing became recently popular with the appearance of general use platforms that support it (such as Apache Kafka). Since these platforms deal with the stream of data, such processing is commonly called the “stream processing”. It’s a departure from the old model of analytics that ran the analysis in batches (hence its name “batch processing”) rather than online.
Bruno Gonçalves Data Science, Machine Learning, Human Behavior