Monday, July 27, 2009

SQL Server Fundas…………….by Vikash kumar

1. Database Management System: Store --> Manage --> Retrieve (to persist information)
2. Difference between DBMS and RDBMS: RDBMS = DBMS + Referential integrity.
3. Difference between SQl Server 2000 and SQL Server 2005:
a. User defined functions or triggers can be written using any .Net language. (TSQL in 2000)
b. Reporting Services.
c. New data types.
d. Stored Procedure invocation is enabled using the SOAP protocol.
e. Data partitioning feature
f. SQL Server 2000 can have maximum of 16 instances where as SQL 2005 can have up to 50 instances.
g. SQL query analyzer has been integrated.
4. Entity Diagrams shows the relationship between the tables.
5. SQL Server stores data in
a. .MDF file (physical database file where data is finally stored.)
b. .LDF file (Log files are actually data which is recorded from the last time data was committed in DB)
6. ON DELETE Restrict: Should not allow to drop tables if dependency is there. ON DELETE Cascade: should delete the dependency first and then delete the table.
7. Types of Joins:
a. Inner Join: only matches present in both tables. Most commonly used.
b. Left outer join: All records of Left table
c. Right outer Join: All records of right table
d. Cross Join : Cartesian product
e. Self join : Join between two instances of same table.
8. Difference between DELETE and TRUNCATE
a. Truncate is faster.
b. Truncate table cannot trigger.
c. DELETE table can have criteria while TRUNCATE cannot.
d. DELETE TABLE syntax logs the deletes thus make the delete operation slow. TRUNCATE table does not log any information but it logs information about deallocation of data page of the table so TRUNCATE table is faster as compared to delete table.
9. Difference between UNION and UNION ALL
a. Union gives distinct records from both the tables but UNION ALL gives all the records.
10. Significance of SET ROW COUNT is that it limits the number of rows returned.
11. Exception Handling in SQL:
a. Use TRY Catch block.
b. Can designate ERROR_NUMBER, ERROR_Severity, ERROR_State, ERROR_MESSAGE, SET ERROR.
c. The RAISERROR statement is used to produce an ad hoc error message or to retrieve a custom message that is stored in the sysmessages table. You can use this statement with the error handling code presented in the previous section to implement custom error messages in your applications.

12. Types of Triggers:
a. Instead of
b. After trigger
Differences between the two:
INSTEAD OF triggers gets executed automatically before the Primary Key and the Foreign Key constraints are checked, whereas the traditional AFTER triggers gets executed after these constraints are checked. Unlike AFTER triggers, INSTEAD OF triggers can be created on views.

13. To define the sequence of the triggers on a table if more than one trigger exist on that table then use the system stored procedure sp_settriggerorder.
14. Difference between Data Set and Data Reader:
a. Data reader provides forward only and ready only access to data while dataset can hold more than a table.
b. Dataset persists the content but Data reader not.
15. Methods provided by the dataset for XML:
a. Read XML
b. Get XML
c. Write XML
16. For XML clause in SQL server returns data in XML rather than simple rows and columns.
17. To generate schema of a table: Select * from mytable For XML AUTO, XMLSCHEMA
18. OPEN XML is vice versa of FOR XML AUTO. When XML is passed it returns rows and column.
19. XMLA: XML for analysis is fundamentally based on web services and SOAP. MS SQL 2005 analysis services use XMLA to handle all client application communication for analysis services. XMLA is a SOAP based XML protocol designed specifically for universal data access to any standard multi dimensional data source residing on the web.
20. Normalization: Suppose there are few tables having relationships between them. So while designing these tables we need to follow some set of rules to connect these tables. These set of rules are called normalization.
Advantages of Normalization:
a. Avoid redundancy i.e. repetitive entries.
b. Reducing required storage space.
c. Preventing the need to restructure existing tables to accommodate new data.
d. Increased speed and flexibility of queries, sorts, and summaries.
First Normal Form For a table to be in first normal form, data must be broken up into the smallest units possible. In addition to breaking data up into the smallest meaningful values, tables in first normal form should not contain repetitions groups of fields. In a table if there are two columns city1,city2 and customer name. To make it in first normal form we should break the Customer name into two columns First Name and Last Name. Moreover the repetitive columns like city1, city2 should be merged as City.
Second Normal form The second normal form states that each field in a multiple field primary key table must be directly related to the entire primary key. So If we will shift City to a different master table then our database is said to be in Second Normal Form.
Third normal form A non-key field should not depend on other Non-key field. The field "Total" is dependent on "Unit price" and "qty". Suppose our table contains three fields Unit price, Qty and Total. Total can be calculated by multiplication of Unit price * Qty. So we can remove the Total field from the table. This will bring our table in Third normal form.
21. Denormalization is the process of putting one fact in numerous places (its vice-versa of normalization).Only one valid reason exists for denormalizing a relational design - to enhance performance. The sacrifice to performance is that you increase redundancy in database.
22. Indexes in SQL Server are similar to the indexes in books. They help SQL Server retrieve the data quickly. There are clustered and nonclustered indexes. A clustered index is a special type of index that reorders the way in which records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages. A nonclustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a nonclustered index does not consist of the data pages. Instead, the leaf nodes contain index rows.
23. How can we increase SQL performance
Following are tips which will increase your SQl performance :-
a. Every index increases the time takes to perform INSERTS, UPDATES and DELETES, so the number of indexes should not be too much. Try to use maximum 4-5 indexes on one table, not more. If you have read-only table, then the number of indexes may be increased.
b. Keep your indexes as narrow as possible. This reduces the size of the index and reduces the number of reads required to read the index.
c. Try to create indexes on columns that have integer values rather than character values.
d. If you create a composite (multi-column) index, the order of the columns in the key are very important. Try to order the columns in the key as to enhance selectivity, with the most selective columns to the left most of the key.
e. If you want to join several tables, try to create surrogate integer keys for this purpose and create indexes on their columns.
f. Create surrogate integer primary key (identity for example) if your table will not have many insert operations.
g. Clustered indexes are more preferable than nonclustered, if you need to select by a range of values or you need to sort results set with GROUP BY or ORDER BY.
h. If your application will be performing the same query over and over on the same table, consider creating a covering index on the table.
i. You can use the SQL Server Profiler Create Trace Wizard with "Identify Scans of Large Tables" trace to determine which tables in your database may need indexes. This trace will show which tables are being scanned by queries instead of using an index.
24. Following are the problems that occur if we do not implement locking properly in SQL SERVER.
a. Lost Updates: This happens due to most likely to occur with the READ UNCOMMITTED isolation level. Due to this we lose the first transaction if two transaction happens over the same data simultaneously.
b. Non-Repeatable Read: Non-repeatable reads occur if a transaction is able to read the same row multiple times and gets a different value each time.
c. Dirty Reads: Dirty reads are a special case of non-repeatable read. This happens if we run a report while transactions are modifying the data that we're reporting on.
d. Phantom Reads: Phantom reads occur due to a transaction being able to read a row on the first read, but not being able to modify the same row due to another transaction deleting rows from the same table.
25. Transaction Isolation level decides how is one process isolated from other process. Using transaction levels we can implement locking in SQL SERVER. There are four transaction levels in SQL SERVER :
a. READ COMMITTED: The shared lock is held for the duration of the transaction, meaning that no other transactions can change the data at the same time.
b. READ UNCOMMITTED: No shared locks and no exclusive locks are honored. This is the least restrictive isolation level resulting in the best concurrency but the least data integrity.
c. REPEATABLE READ: This setting disallows dirty and non-repeatable reads. However, even though the locks are held on read data, new rows can still be inserted in the table, and will subsequently be interpreted by the transaction.
d. SERIALIZABLE: This is the most restrictive setting holding shared locks on the range of data. This setting does not allow the insertion of new rows in the range that is locked; therefore, no phantoms are allowed. Following is the syntax for setting transaction level in SQL SERVER. SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
26. Depending on the transaction level six types of lock can be acquired on data :
a. Intent: Intent locks come in three flavors: intent shared (IS), intent exclusive (IX), and shared with intent exclusive (SIX). IS locks indicate that the transaction will read some (but not all) resources in the table or page by placing shared locks. IX locks indicate that the transaction will modify some (but not all) resources in the table or page by placing exclusive locks. SIX locks indicates that the transaction will read all resources, and modify some(but not all) of them. This will be accomplished by placing the shared locks on the resources read and exclusive locks on the rows modified. Only one SIX lock is allowed per resource at one time; therefore, SIX locks prevent other connections from modifying any data in the resource (page or table), although they do allow reading the data in the same resource.
b. Shared locks (S) allow transactions to read data with SELECT statements. Other connections are allowed to read the data at the same time; however, no transactions are allowed to modify data until the shared locks are released.
c. Update locks (U) are acquired just prior to modifying the data. If a transaction modifies a row, then the update lock is escalated to an exclusive lock; otherwise, it is converted to a shared lock. Only one transaction can acquire update locks to a resource at one time. Using update locks prevents multiple connections from having a shared lock that want to eventually modify a resource using an exclusive lock. Shared locks are compatible with other shared locks, but are not compatible with Update locks.
d. Exclusive locks (X) completely lock the resource from any type of access including reads. They are issued when data is being modified through INSERT, UPDATE and DELETE statements.
e. Schema modification locks (Sch-M) are acquired when data definition language statements, such as CREATE TABLE, CREATE INDEX, ALTER TABLE, and so on are being executed. Schema stability locks (Sch-S) are acquired when store procedures are being compiled.
f. Bulk update locks (BU) are used when performing a bulk-copy of data into a table with TABLOCK hint. These locks improve performance while bulk copying data into a table; however, they reduce concurrency by effectively disabling any other connections to read or modify data in the table.
27. Lock escalation is the process of converting of low level locks (like row locks, page locks) into higher level locks (like table locks). Every lock is a memory structure too many locks would mean, more memory being occupied by locks. To prevent this from happening, SQL Server escalates the many fine-grain locks to fewer coarse-grain locks.
28. Replication is way of keeping data synchronized in multiple databases. SQL server replication has two important aspects publisher and subscriber.
a. Publisher: Database server that makes data available for replication is called as Publisher.
b. Subscriber: Database Servers that get data from the publishers is called as Subscribers.
There are three types of replication supported by SQL SERVER.
a. Snapshot Replication. Snapshot Replication takes snapshot of one database and moves it to the other database. After initial load data can be refreshed periodically. The only disadvantage of this type of replication is that all data has to be copied each time the table is refreshed.
b. Transactional Replication In transactional replication data is copied first time as in snapshot replication, but later only the transactions are synchronized rather than replicating the whole database. We can either specify to run continuously or on periodic basis.
c. Merge Replication. Merge replication combines data from multiple sources into a single central database. Again as usual the initial load is like snapshot but later it allows change of data both on subscriber and publisher, later when they come on-line it detects and combines them and updates accordingly.
29. User defined Functions:
User defined functions are TSQL routines used to encapsulate the programming logic.
Types of UDF:
1. Scalar: Returns a single value of a scalar data type. Cannot return text, timestamp or image.
Things we cannot do:
a. Insert/Update/Delete
b. Execute SP’s
c. Error Handling
d. Print
2. Inline: Return tables equivalent to views with parameters.
Insert\update \Delete are possible.
3. Multi statement: Tables can be returned.
Following are some major differences between a stored procedure and user defined functions:
a. Stored procedures are called independently, using the EXEC command, while functions need to be called from within another SQL statement.
b. Stored procedure allow you to enhance application security by granting users and applications permission to use stored procedures, rather than permission to access the underlying tables. Stored procedures provide the ability to restrict user actions at a much more granular level than standard SQL Server permissions.
c. UDF cannot be used in XML FOR clause but SP’s can be used.
d. Functions must always return a value (either a scalar value or a table). Stored procedures may return a scalar value, a table value or nothing at all.
e. If there is an error in UDF its stops executing. But in SP’s it just ignores the error and moves to the next statement.
f. UDF cannot make permanent changes to server environments while SP’s can change some of the server environment.
g. UDF can be used in the SQL statements anywhere in the WHERE/HAVING/SELECT section where as Stored procedures cannot be.
30. Performance Tuning SQL Server Cursors:
If possible, we should avoid using SQL Server cursors. They generally use a lot of SQL Server resources and reduce the performance and scalability of our applications. If we need to perform row-by-row operations here are some alternatives:
• Use WHILE LOOPS
• Use temp tables
• Use derived tables
• Use correlated sub-queries
• Use the CASE statement
• Perform multiple queries
1. If we have to use a cursor, we should try to reduce the number of records to process by using temp table. The lower the number of records to process, the faster the cursor will finish.
2. If the number of rows you need to return from a query is small, and we need to perform row-by-row operations on them, don't use a server-side cursor. Instead, consider returning the entire rowset to the client and have the client perform the necessary action on each row, then return any updated rows to the server
3. If we have no choice but to use a server-side cursor in our application, we should try to use a FORWARD-ONLY or FAST-FORWARD, READ-ONLY cursor. Also we should try to locate the SQL Server tempdb database on its own physical device for best performance. This is because cursors use the tempdb for temporary storage of cursor data. The faster your disk array running tempdb, the faster our cursor will be.
4. If we need to perform a JOIN as part of our cursor, keyset and static cursors are generally faster than dynamic cursors, and should be used when possible.

31. To get all the row values concatenated.
Declare @colors varchar(250)
Select @colors = COALESCE(@colors + '/','') + CompanyName from dbo.Company
print @colors

1 comment:

  1. This looks good as point by point description; Can you categorise like; Basic SQL Server 2005, DTS in Sql Server, XML in Sql Server 2005, Tips for DBA etc...

    -Saurabh
    Bangalore

    ReplyDelete