Sunday, June 22, 2008

Imp Data Base Concepts

KEYS

A key is a set of attributes (usually just one) that will always uniquely identify

a record.

• A key consisting of more than one attribute is called a composite key.

• The primary key for a table is the key chosen to uniquely identify records

within the table. Primary key is defined on a single column It does not have null values.

• A foreign key is a field from a table that refers to (or targets) a specific key, usually the primary key, in another table.

A unique key is defined as having no two of its values the same. The columns of a unique key cannot contain null values. A table can have multiple unique keys or no unique keys at all.

The fields or combination of fields that are not used as primary key are known as candidate key or alternate key.


CONSTRAINTS

A constraint is a property assigned to a column or the set of columns in a table that prevents certain types of inconsistent data values from being placed in the column(s). Constraints are used to enforce the data integrity. This ensures the accuracy and reliability of the data in the database.

Primary Key constraint

This constraint is used to guarantee that a column or set of columns on a table contain unique values for every record in the given table. This lets you ensure data integrity by always being able to uniquely identify the record in the table. A table can have only one primary key constraint defined on it, and the rows in the primary key columns cannot contain null values. A primary key constraint can be defined when a table is created, or it can be added later.

Unique Constraints

Unique constraints may be placed on multiple columns. They constrain the UPDATE/INSERTS on the table so that the values being updated or inserted do not match any other row in the table for the corresponding values.

Foreign Key (FK) Constraints

A foreign key constraint allows certain fields in one table to refer to fields in another table. This type of constraint is useful if you have two tables, one of which has partial information, details on which can be sought from another table with a matching entry. A foreign key constraint in this case will prevent the deletion of an entry from the table with detailed information if there is an entry in the table with partial information that matches it.

Check Constraints

A check constraint prevents updates/inserts on the table by placing a check condition on the selected column. The UPDATE/INSERT is allowed only if the check condition qualifies.

Not Null Constraint

A Not Null constraint enforces that the column will not accept null values. It enforces valid entries for a given column by restricting the type, the format, or the range of possible values.



STORED PROCEDURE

A stored procedure is an already written SQL statement that is saved in the database. We can run the stored procedure from the database's command environment.

  1. Precompiled execution. SQL Server compiles each stored procedure once and then reutilizes the execution plan. This results in tremendous performance boosts when stored procedures are called repeatedly.

  2. Reduced client/server traffic. If network bandwidth is a concern in your environment, you'll be happy to learn that stored procedures can reduce long SQL queries to a single line that is transmitted over the wire.

  3. Efficient reuse of code and programming abstraction. Stored procedures can be used by multiple users and client programs. If you utilize them in a planned manner, you'll find the development cycle takes less time.

  4. Enhanced security controls. You can grant users permission to execute a stored procedure independently of underlying table permissions.



JOINS

SQL Joins are used to relate information in different tables. A Join condition is a part of the sql query that retrieves rows from two or more tables.

Self Join

A self-join is a query in which a table is joined (compared) to itself. Self-joins are used to compare values in a column with other values in the same column in the same table

Inner Join

An INNER JOIN is most often (but not always) created between the primary key column of one table and the foreign key column of another table.

The INNER JOIN will select all rows from both tables as long as there is a match between the columns.

Outer Joins

An outer join returns all rows from the joined tables whether or not there's a matching row between them

Left Outer Joins

When you use a left outer join to combine two tables, all the rows from the left-hand table are included in the results.

Right Outer Joins

A right outer join is conceptually the same as a left outer join except that all the rows from the right-hand table are included in the results.

Full outer join

A full outer join combines the results of both left and right outer joins. The joined table will contain all records from both tables, and fill in NULLs for missing matches on either side.


Cross Join

A cross join returns not the sum but the product of two tables. Each row in the left-hand table is matched up with each row in the right-hand table. It's the set of all possible row combinations, without any filtering


Delete & Truncate

DELETE is a logged operation on a per row basis. This means that the deletion of each row gets logged and physically deleted.

You can DELETE any row that will not violate a constraint, while leaving the foreign key or any other constraint in place.

TRUNCATE is also a logged operation, but in a different way. TRUNCATE logs the deallocation of the data pages in which the data exists. The deallocation of data pages means that your data rows still actually exist in the data pages, but the extents have been marked as empty for reuse. This is what makes TRUNCATE a faster operation to perform over DELETE.

You cannot TRUNCATE a table that has any foreign key constraints. You will have to remove the constraints, TRUNCATE the table, and reapply the constraints.

TRUNCATE will reset any identity columns to the default seed value. This means if you have a table with an identity column and you have 264 rows with a seed value of 1, your last record will have the value 264 (assuming you started with value 1) in its identity columns. After TRUNCATEing your table, when you insert a new record into the empty table, the identity column will have a value of 1. DELETE will not do this. In the same scenario, if your rows are deleted, when inserting a new row into the empty table, the identity column will have a value of 265.

Why Truncate is faster than Delete?

Reason:When you type DELETE all the data get copied into the Rollback Table space first, then delete operation is performed. So when you type ROLLBACK after deleting a table ,you can get back the data(The system get it for you from the Rollback Tables pace).All this process take time, but when we use TRUNCATE, it removes data directly without copying it into the Rollback Table space. So TRUNCATE is faster. Once you Truncate you can't get back the data.

What is COMMIT & ROLLBACK statement in SQL ?

Commit statement helps in termination of the current transaction and do all the changes that occur in transaction persistent and this also commits all the changes to the database. COMMIT we can also use in store procedure.

ROLLBACK do the same thing just terminate the current transaction but one another thing is that the changes made to database are ROLLBACK to the database.

What is Data Integrity and it's categories ?

Enforcing data integrity ensures the quality of the data in the database. For example, if an employee is entered with an employee_id value of 123, the database should not allow another employee to have an ID with the same value. If you have an employee_rating column intended to have values ranging from 1 to 5, the database should not accept a value of 6. If the table has a dept_id column that stores the department number for the employee, the database should allow only values that are valid for the department numbers in the company. Two important steps in planning tables are to identify valid values for a column and to decide how to enforce the integrity of the data in the column. Data integrity falls into these categories:
1) Entity integrity

2)Domain integrity

3) Referential integrity

4) User-defined integrity

Entity Integrity: Entity integrity defines a row as a unique entity for a particular table. Entity integrity enforces the integrity of the identifier column(s) or the primary key of a table (through indexes, UNIQUE constraints, PRIMARY KEY constraints, or IDENTITY properties).
Domain Integrity: Domain integrity is the validity of entries for a given column. You can enforce domain integrity by restricting the type (through data types), the format (through CHECK constraints and rules), or the range of possible values (through FOREIGN KEY constraints, CHECK constraints, DEFAULT efinitions, NOT NULL definitions, and rules).
Referential Integrity: Referential integrity preserves the defined relationships between tables when records are entered or deleted. In Microsoft® SQL Server™ 2000, referential integrity is based on relationships between foreign keys and primary keys or between foreign keys and unique keys (through FOREIGN KEY and CHECK constraints). Referential integrity ensures that key values are consistent across tables. Such consistency requires that there be no references to nonexistent values and that if a key value changes, all references to it change consistently throughout the database. When you enforce referential integrity, SQL Server prevents users from:
· Adding records to a related table if there is no associated record in the primary table.
· Changing values in a primary table that result in orphaned records in a related table.
· Deleting records from a primary table if there are matching related records.
For example, with the sales and titles tables in the pubs database, referential integrity is based on the relationship between the foreign key (title_id) in the sales table and the primary key (title_id) in the titles table.
User-Defined: Integrity User-defined integrity allows you to define specific business rules that do not fall into one of the other integrity categories. All of the integrity categories support user-defined integrity (all column- and table-level constraints in CREATE TABLE, stored procedures, and triggers).

What's the maximum size of a row?

8060 bytes.

What is a deadlock and what is a live lock? How will you go about resolving deadlocks?

Deadlock is a situation when two processes, each having a lock on one piece of data, attempt to acquire a lock on the other's piece. Each process would wait indefinitely for the other to release the lock,
unless one of the user processes is terminated. SQL Server detects deadlocks and terminates one user's process.

A livelock is one, where a request for an exclusive lock is repeatedly denied because a series of overlapping shared locks keeps interfering. SQL Server detects the situation after four denials and
refuses further shared locks. A livelock also occurs when read transactions monopolize a table or page, forcing a write transaction to wait indefinitely.

Backup Scopes

The scope of a backup defines the portion of the database covered by the backup. It identifies the database, file(s) and/or filegroup(s) that SQL Server will backup.

Full Backups include all data within the backup scope. For example, a full database backup will include all data in the database, regardless of when it was last created or modified. Similarly, a full partial backup will include the entire contents of every file and filegroup within the scope of that partial backup.

Differential Backups include only that portion of the data that has changed since the last full backup. For example, if you perform a full database backup on Monday morning and then perform a differential database backup on Monday evening, the differential backup will be a much smaller file (that takes much less time to create) that includes only the data changed during the day on Monday.



TRIGGER

A trigger is an object contained within an SQL Server database that is used to execute a batch of SQL code whenever a specific event occurs. As the name suggests, a trigger is “fired” whenever an INSERT, UPDATE, or DELETE SQL command is executed against a specific table.

Triggers are associated with a single table, and are automatically executed internally by SQL Server.

Triggers are a powerful tool that can be used to enforce the business rules automatically when the data is modified. Triggers can also be used to maintain the data integrity. But they are not to maintain data integrity. Triggers should be used to maintain the data integrity only if you are unable to enforce the data integrity using CONSTRAINTS, RULES and DEFAULTS. Triggers cannot be created on the temporary tables.

(OR)

A trigger is a database object that is attached to a table. In many aspects it is similar to a stored procedure. As a matter of fact, triggers are often referred to as a "special kind of stored procedure." The main difference between a trigger and a stored procedure is that the former is attached to a table and is only fired when an INSERT, UPDATE or DELETE occurs. You specify the modification action(s) that fire the trigger when it is created.

When to Use Triggers

There are many reasons to use triggers. If you have a table which keeps a log of messages, you may want to have a copy of them mailed to you if they are urgent. If there were no triggers you would have some solutions, though they are not as elegant. You could modify the application(s) logging the messages. This means that you might be redundantly coding the same thing in every application that logs messages.

Tables can have multiple triggers. The CREATE TRIGGER statement can be defined with the FOR UPDATE, FOR INSERT, or FOR DELETE clauses to target a trigger to a specific class of data modification actions. When FOR UPDATE is specified, the IF UPDATE (column_name) clause can be used to target a trigger to updates affecting a particular column.

SQL Server 2000 greatly enhances trigger functionality, extending the capabilities of the triggers you already know and love, and adding a whole new type of trigger, the "Instead Of" trigger.

SQL Server 2000 has many types of triggers:

  1. After Trigger

  2. Multiple After Triggers

  3. Instead Of Triggers

  4. Mixing Triggers Type



VIEW

A view is a pre-written query that is stored on the database. A view consists of a SELECT statement, and when you run the view

A view can be useful when there are multiple users with different levels of access, who all need to see portions of the data in the database (but not necessarily all of the data). Views can do the following:

  • Restrict access to specific rows in a table

  • Restrict access to specific columns in a table

  • Join columns from multiple tables and present them as though they are part of a single table

  • Present aggregate information (such as the results of the COUNT function)

(OR)

A SQL View is a virtual table, which is based on SQL SELECT query. Essentially a view is very close to a real database table (it has columns and rows just like a regular table), except for the fact that the real tables store data, while the views don’t. The view’s data is generated dynamically when the view is referenced. A view references one or more existing database tables or other views.

In effect every view is a filter of the table data referenced in it and this filter can restrict both the columns and the rows of the referenced tables.

What is normalization?

Normalization is the process of designing a data model to efficiently store data in a database. The end result is that redundant data is eliminated, and only data related to the attribute is stored within the table.

For example, let's say we store City, State and ZipCode data for Customers in the same table as Other Customer data. With this approach, we keep repeating the City, State and ZipCode data for all Customers in the same area. Instead of storing the same data again and again, we could normalize the data and create a related table called City. The "City" table could then store City, State and ZipCode along with IDs that relate back to the Customer table, and we can eliminate those three columns from the Customer table and add the new ID column.

Normalization rules have been broken down into several forms. People often refer to the third normal form (3NF) when talking about database design. This is what most database designers try to achieve: In the conceptual stages, data is segmented and normalized as much as possible, but for practical purposes those segments are changed during the evolution of the data model. Various normal forms may be introduced for different parts of the data model to handle the unique situations you may face.

Whether you have heard about normalization or not, your database most likely follows some of the rules, unless all of your data is stored in one giant table. We will take a look at the first three normal forms and the rules for determining the different forms here.

Rules for First Normal Form (1NF)

Eliminate repeating groups. This table contains repeating groups of data in the Software column.

Computer

Software

1

Word

2

Access, Word, Excel

3

Word, Excel

To follow the First Normal Form, we store one type of software for each record.

Computer

Software

1

Word

2

Access

2

Word

3

Excel

3

Word

3

Excel


Rules for second Normal Form (2NF)

Eliminate redundant data plus 1NF. This table contains the name of the software which is redundant data.

Computer

Software

1

Word

2

Access

2

Word

3

Excel

3

Word

3

Excel

To eliminate the redundant storage of data, we create two tables. The first table stores a reference SoftwareID to our new table that has a unique list of software titles.

Computer

SoftwareID

1

1

2

2

2

1

3

3

3

1

3

3


SoftwareID

Software

1

Word

2

Access

3

Excel

Rules for Third Normal Form (3NF)

Eliminate columns not dependent on key plus 1NF and 2NF. In this table, we have data that contains both data about the computer and the user.

Computer

User Name

User Hire Date

Purchased

1

Joe

4/1/2000

5/1/2003

2

Mike

9/5/2003

6/15/2004

To eliminate columns not dependent on the key, we would create the following tables. Now the data stored in the computer table is only related to the computer, and the data stored in the user table is only related to the user.

Computer

Purchased

1

5/1/2003

2

6/15/2004


User

User Name

User Hire Date

1

Joe

5/1/2003

2

Mike

6/15/2004


Computer

User

1

1

2

1

What does normalization have to do with SQL Server?

To be honest, the answer here is nothing. SQL Server, like any other RDBMS, couldn't care less whether your data model follows any of the normal forms. You could create one table and store all of your data in one table or you can create a lot of little, unrelated tables to store your data. SQL Server will support whatever you decide to do. The only limiting factor you might face is the maximum number of columns SQL Server supports for a table.

SQL Server does not force or enforce any rules that require you to create a database in any of the normal forms. You are able to mix and match any of the rules you need, but it is a good idea to try to normalize your database as much as possible when you are designing it. People tend to spend a lot of time up front creating a normalized data model, but as soon as new columns or tables need to be added, they forget about the initial effort that was devoted to creating a nice clean model.

To assist in the design of your data model, you can use the DaVinci tools that are part of SQL Server Enterprise Manager.

Advantages of normalization

    1. Smaller database: By eliminating duplicate data, you will be able to reduce the overall size of the database.
    2. Better performance:

    a. Narrow tables: Having more fine-tuned tables allows your tables to have less columns and allows you to fit more records per data page.
    b. Fewer indexes per table mean faster maintenance tasks such as index rebuilds.
    c. Only join tables that you need.

Disadvantages of normalization

    1. More tables to join: By spreading out your data into more tables, you increase the need to join tables.
    2. Tables contain codes instead of real data: Repeated data is stored as codes rather than meaningful data. Therefore, there is always a need to go to the lookup table for the value.
    3. Data model is difficult to query against: The data model is optimized for applications, not for ad hoc querying.


No comments: