Start with Data Analysis Prior to Coding

When any data retrieval query is being run, it is compulsory to specify where the data is coming from. Specifying which tables are going to be accessed can be done with the FROM clause of the SELECT statement. FROM table lists do not necessarily have to be just standard tables. It is also valid to specify derived tables, joined tables, views, user-defined functions that return tables, and system functions that return tables.

A derived table is a subquery that retrieves rows from the database. A SELECT clause within a FROM clause of another SELECT, referred to by an alias, produces a derived table. The resulting rows take on a construct similar to that of a standard table. Derived tables produce a performance hit in most situations, and where possible, a join should be coded to replace the derived table.

The name of a view is often substituted for a table name. A view, stored as a query on the server, often provides necessary column restrictions and application considerations that makes it a beneficial data source.

User-defined functions can generate recordset output and in that form be used in place of a table name. A function can declare an internal table variable, insert rows into the variable, and then return the variable as its return value. A class of user-defined functions known as in-line functions return the resultset of a SELECT statement as a variable of type table.

System functions often generate a set of records that can be envisioned as a table. These rowset functions return an object that can be used in place of a table reference in a T-SQL statement.

Data sources can be located throughout the organization. They can be located at many sites, be on different computers and are created using many data management tools. SQL Server has the functionality available to connect to and use data anywhere it resides. Of course the data in its raw format won't be presentable and will require formatting. After a data source has been determined, the coding can begin.

Making Data Presentable

Often data returned from the database is not presented in a manner that the end user can identify with and access in a reasonable manner. Many SQL Server functions allow for the alteration of the data type or more appropriate presentation to the user. If data is not in a type presentable to the user, the CONVERT() and CAST() functions will be the first ones used.

Data needs to be converted from one form to another when you are using diverse functions or data types that don't match up with the type needed by a particular function. Data can be converted from one data type to another with the CONVERT() and CAST() functions. CAST() is based on the SQL-92 standard and is favored over CONVERT(), although both CONVERT() and CAST() have their particular strengths in handling individual types of data.

You will need to supply both the expression that needs to be converted and the data type to convert the given expression to perform conversion operation. SQL Server automatically converts certain data from one data type to another. For example, if a smallint is compared to an int, the smallint is automatically converted to an int before the comparison proceeds. These are called implicit conversions because you don't have to use the CAST() or CONVERT() functions. When data is converted to a new data type and the data is too large for the new data type to handle, SQL Server displays an asterisk or identifies the problem with an error message, depending on the data types involved.

The CONVERT() function transforms data from one data type to another.CONVERT() is also used extensively with date operations to format a date. The following example converts the Price column into a CHAR data type so that you can concatenate it with another string expression. Remember, concatenation can be done only using strings, so it must be converted:

 SELECT 'The book costs ' + CONVERT(CHAR(5), Price) FROM Titles

The CAST() function is similar to the CONVERT() function in that it converts data. It is preferred over the CONVERT() function because it's based on the SQL-92 standard.

CAST() is usually preferred over CONVERT() for operations with more advanced processing requirements.

The following example illustrates the use of CAST() to convert numeric data to a string to allow for its use in a concatenation:

 SELECT 'The book costs ' + CAST(Price AS CHAR(5)) FROM Titles

When working in character strings, whether converting, concatenating, or applying string data to another application, you may need to trim off space at the beginning or end. Often data is space filled. Within T-SQL there are two functions for trimming spaces: RTRIM() and LTRIM(). If you want to trim all spaces, these can be combined, and though it looks awkward, it does accomplish the task.

 LTRIM(String) RTRIM(String) LTRIM(RTRIM(String))

By formatting data you're presenting the information so that more value is produced in the finished product. Formatting information can be as simple as placing output in the correct sequence; however, formatting can also mean the use of these procedures for the analysis and display of data.

Functions Make Data Work

There are functions for making strings more usable. There are functions for performing mathematics. There are functions that work the calendar and functions that tell time, functions that move data around and functions that rhyme. Functions perform tasks against the data, they perform activities, and they produce results.

All functions are either deterministic or nondeterministic. The specification of whether a function is deterministic or nondeterministic is called the determinism of the function. Deterministic functions always return the same result anytime they are called with a specific set of input values. Nondeterministic functions may return different results each time they are called with a specific set of input values.

An example of a deterministic function would be DATEADD(), a built-in function that always returns the same result for any given set of argument values. GEtdATE() is not deterministic because it is always invoked with the same argument, yet the value it returns changes each time it is executed.

Be careful with the use of appropriate functions because nondeterministic functions cannot be used in a lot of processes. An index cannot be created on a computed column if the expression references any nondeterministic functions, and a clustered index cannot be created on a view if the view references any nondeterministic functions.

The most commonly used nondeterministic functions are the aggregate functions that provide answers based on the entire set of data being assimilated.

Aggregate Collective Functions

Aggregate functions are functions that provide summary data about sets. Questions like "How many rows are in that table?" "How many widgets did we sell last week?" and "What is the average price we charged for a widget?" are all answered with aggregate functions. Some are more commonly used than others, and anyone familiar with working in a spreadsheet environment has seen these before.

Aggregate functions take one of three types of arguments. Some take a simple wildcard (*). This means either that the operation doesn't apply to rows or that it should apply to all rows. In COUNT(*) the wildcard returns the number of rows in the table. The number of rows that are in the table is independent of any individual column. The function is applied across the entire group, regardless of the content of each row.

All the functions take a column name as an argument, and then the aggregate applies only to that column. COUNT(PersonID) returns the number of non-null PersonIDs in the table. Some functions enable you to apply the function to distinct values. COUNT (DISTINCT PersonID) provides a count of the number of distinct IDs in the column, ignoring duplications.

The AVG() Function

The AVG() function returns the average value for a given column. It requires a column name, and optionally you can use DISTINCT() to get an average for just the distinct values in the table. Here's an example that determines the average size of an order from the sample Sales table:

 SELECT AVG(QtyPurchased) FROM Sales

The COUNT() and COUNT_BIG() Functions

The COUNT() function returns an integer representing the number of rows in the table. The COUNT_BIG() function does the same thing, but it returns a number of type bigint, instead used for analyzing large datasets.

The MAX() and MIN() Functions

MAX() and MIN() are simple. They return the maximum or the minimum value in the set. The highest and lowest values within the set being evaluated will be returned in the data type of the column being used for the analysis.

The SUM() Function

The SUM() function is nearly as simpleit just returns the sum of all the values in the group. SELECT SUM(Quantity) FROM [Order Details] will provide a total for the number of products sold.

Functions Used for Deviation and Variance

Without delving too deep into statistical operations, we'll say that these sets of functions are useful in analyzing populations. The VAR() and VARP() functions calculate the variance of the group. The STDDEV() and STDDEVP() functions calculate the standard deviation of the group. Variance and standard deviation are used in statistics to determine the spread of a sample set. A sample set has a high variance and a high standard deviation if there is a wide range of values in the sample set. If there is a small range of values, the variance and standard deviation is smaller. The STDEVP() and STDEV() functions differ in that the STDEV() determines the sample standard deviation, whereas STDEVP() returns the population standard deviation. VAR() and VARP() are similar.

Checksum Operations

A checksum is a number generated from a data value that essentially summarizes the data value. If you store that value and then change the data, you can test the checksum to quickly notice this change. Checksums are used frequently in data storage to denote data corruption.

If you're working with large datasets, though, the capability to compare checksums saves an immense amount of time over comparing strings together.

There is an approximately two in four billion chance of a checksum being the same for two pieces of data being tested. The checksums return some number that's an integer, and the chances that two pseudorandom integers will be the same are very small. This is a useful function for detecting changes in rows as an alternative to timestamps.

The CHECKSUM() and CHECKSUM_BINARY() functions aren't true aggregate functions. They return a checksum for whatever you ask them to, typically for a row. The difference between CHECKSUM() and CHECKSUM_BINARY() is that a CHECKSUM() checks data that has been localized for the server, whereas CHECKSUM_BINARY() checks raw data. In this case, "localized" means that the string is translated to how it would compare before computing the checksum. So if you're on a server that's set up as not case sensitive, which is the default, the CHECKSUM() for "Widget" and "widget" would be the same, whereas the BINARY_CHECKSUM() would be different.

The CHECKSUM_AGG() function is the aggregate function used to compute checksums over sets. This function enables you to get the aggregated checksum for an entire table. Look at some examples. First, here's an example of using just the CHECKSUM() function on a table:

 SELECT CHECKSUM(*) FROM

This returns a checksum for each row based on the entire content of the row. Now, if you want to use a checksum for the entire table, you can do this:

 SELECT CHECKSUM_AGG(CHECKSUM(*)) FROM [Order Details]

This gives you a checksum for the entire table. Now, look at a way to use this information. The key thing to remember about using a checksum is that it's a snapshot. To find differences, you've got to have a checksum before and a checksum after the changes occur.

Of Calendars and Clocks

In T-SQL, as with many other programming languages, date and time manipulation have been given special importance by the many functions available. Dates and times are implemented as datetime and smalldatetime data types and need to be treated with special functions. To first examine dates, you must realize that dates cannot be added or subtracted using the regular operators. The date functions available in SQL Server enable you to change and manipulate dates easily.

The DATEADD() Function

The DATEADD() function adds a number to the part of the date you specify, such as month or day, and then returns the new datetime value. The function adds an interval to a date specified. The following example adds three days to the ShipDate column of a BackOrders table:

 SELECT DATEADD(day, 3, ShipDate) FROM BackOrders

The portion of the date that you work with allows for several different entries, as presented in Table 6.1.

Table 6.1. Possible Values for Date Portion Entries
Date-Part	Abbreviations
Year	yy, yyyy
Quarter	qq, q
Month	mm, m
Dayofyear	dy, y
Day	dd, d
Week	wk, ww
Hour	hh
Minute	mi, n
Second	ss, s
Millisecond	ms

The DATEDIFF() Function

The DATEDIFF() function is used in a similar manner. This function, however, calculates the amount of time in date-parts between the second and first of two dates you specify. DATEDIFF() subtracts the start date from the end date to produce the result. Look at the following example:

 SELECT PubDate,   DATEDIFF(year, PubDate, getdate()) AS 'Difference'     FROM Titles

The method of counting across boundaries, such as minutes, seconds, and milliseconds, makes the result given by DATEDIFF() consistent across all data types. The result is a signed integer value equal to the number of date-part boundaries crossed between the first and second date. For example, the number of weeks between Monday, March 5, and Monday, March 12, is 1.

The GETDATE() Function

To get the current date, use the GETDATE() function. This function can be useful when you are producing reports that need to be dated. The GETDATE() function returns the current system date and can be used in a SELECT statement. You can use GEtdATE() in designing a report to have the current date and time printed every time the report is produced. GETDATE() is also useful for functions such as logging the time a transaction occurred on an account. You can use GETDATE() anywhere to return the current system date.

The DATENAME() Function

The DATENAME() function can be used in determining part of a date in the form of a string. You may, for instance, need to know specifically the day or month of a publication. The DATENAME() function returns a character string representing the specified date-part from the date. Here's an example of using DATENAME() to find the name of the current month:

 Select datename(mm,getdate()) as 'month'

The DATEPART() and DATENAME() functions produce the specified part of a datetime value (the year, quarter, day, hour, and so on) as either an integer or a string. Because smalldatetime is accurate only to the minute, when a smalldatetime value is used with either of these functions, the seconds and milliseconds returned are zero.

Using DATEPART()

DATEPART() is similar to DATENAME() except that it returns an integer value indicating the date rather than a character string. The following is an example of using DATEPART() to find the current month and display it numerically:

 SELECT datepart(mm,getdate()) AS 'month'

Many useful date functions give the developer flexibility in dealing with what is traditionally one of the more pesky data types: dates. With history reflecting unfavorably on the storage of date information, it is important to consider this data type to be infinite in that dates need to remain unique as time progresses.

Using Mathematical Functions in T-SQL

In most programming languages numeric data types have always had useful functionality supplied with the coding environment. SQL Server is no exception with its adoption of a standard set of mathematical functions.

A mathematical function performs a math operation on numeric expressions and returns the result of the operation. Math functions operate on any of the SQL Server numeric data types (decimal, integer, float, real, money, smallmoney, smallint, and tinyint). The initial precision of built-in operations on the float data type data is six decimal places.

By default, a number passed to a mathematical function is interpreted as a decimal data type. The CAST() or CONVERT() functions can be used to change the data type to something else, such as a float.

You can use the diverse set of math functions provided by T-SQL to fulfill many database needs. SQL Server provides numerous mathematical functions so that you can perform the most complex calculations possible. These numerous functions are provided in most programming language environments. Some of the more commonly used math functions are listed in Table 6.2.

Table 6.2. Mathematical Functions in T-SQL
Function	Parameters	Result
ABS	(Numeric expression)	The absolute value of a number
CEILING	(Numeric expression)	Smallest integer greater than or equal to the expression
EXP	(Float expression)	Exponential value
FLOOR	(Numeric expression)	Largest integer less than or equal to the expression
PI	( )	Returns 3.1415926535897931
POWER	(Numeric expression, y)	Value of expression to the power of y
RAND	(Seed)	Random value between zero and one
ROUND	(Numeric expression, length)	Numeric expression rounded off to the precision (length) of a number
SQRT	(Float expression)	Square root of the expression
SQUARE	(Float expression)	Square of the expression

The following example shows you a query that uses some of the many mathematical functions in SQL Server. Notice that when a query is non-data retrieval, as this one is, a FROM clause does not have to be specified.

 SELECT pi() AS 'PI' , abs(-44) AS 'Absolute',  power(2,3) AS 'Power'

Discussion of mathematical functions would be an involved endeavor for this medium. This book does not set out to make mathematical geniuses out of every SQL developer. To become fully versed in all mathematical functions would require a significant education in the science of mathematics.

Character functions interact against alphanumeric data in a similar fashion as numeric data interacts with mathematical functions. Some of these were discussed within the realm of formatting data, but there are a few more worth looking into.

String It Up

SQL Server also provides a full array of character (string) functions. Character functions enable you to easily manipulate attributes that are character specific. Most of the functions provided in Table 6.3 are normally used, and therefore you should regularly practice using them.

Table 6.3. Character Functions in T-SQL
Function	Parameters	Result
+	expr1 + expr2	Concatenates strings, joining them together.
ASCII	String expression	Gets the ASCII code value of the string.
CHAR	Integer expression	Changes the ASCII integer into a character.
LEN	String expression	Identifies the length of an expression.
PATINDEX	Pattern expression	Returns the position of the beginning of a pattern. If the pattern is not found, it returns zero.
REPLICATE	String expression, integer expression	Repeats a character expression the number of times specified.
REVERSE	String expression	Returns the reverse of a string expression.
SPACE	Integer expression	Returns a string of repeated spaces.
STUFF	String expr1, expr2 start, length, String	Deletes a specified length of characters and inserts another set of characters at a specified starting point.
SUBSTRING	String expr, start, length	Returns part of a string expression.

Concatenation of string expressions is a quite common process. Joining strings together can assist in presenting data in a more readable form. You must be somewhat cautious in using concatenation with expressions that can result in a NULL value. Concatenation of a string value to a NULL string results in NULL. A check should be performed for string content in situations in which concatenated information could be partially or completely NULL.

String functions are quite useful in specific situations. You may want to delve into each of the functions with pertinent examples in SQL Server Books Online. Most of these functions are quite straightforward and easily used. The one function that may warrant a short illustration is the SUBSTRING() function.

The SUBSTRING() Function

It may be that you want to return a part of a string rather than the whole string. The SUBSTRING() function, which returns a part of a character or binary string, can be used for this task. The SUBSTRING() function, in a WHERE clause, may perform a table scan where an index was supposed to function (if an index was implemented). When the SUBSTRING() function does not include the first letter of the column being searched, a table scan is performed. This is not efficient processing. The following example illustrates the use of SUBSTRING() in a concatenation process:

 SELECT AU_FName + ' ' + AU_LName AS 'Full Name',   SUBSTRING (AU_FName, 1,1) + SUBSTRING (au_lname, 1,1) AS 'Initials'    FROM Authors

All the string functions have their own place where they can be applied to assist with string manipulations. For the most part, you should find them very straightforward to use and not the focus of exam elements because of their ease of use. The final area of SQL Server functionality left to discuss is a series of functions that allow for various system-level interactions.

Working with System-Level Functions

As you have seen in the preceding two compilations, many functions are supplied to perform mathematical and character-manipulation operations. You can also use a third category of functions with the SELECT list known as system-specific functions. You can use system functions to retrieve special system or database information through T-SQL and the SELECT statement. Table 6.4 shows a compilation of the numerous system functions available in T-SQL.

Table 6.4. Database/System Functions
System Function	Parameters	Description
COL_LENGTH()	(table name, column name)	Length of a column
COL_NAME()	(table_id, Column_id)	Name of a column
DATALENGTH()	(Expression)	Byte length of an expression
DB_ID()	(database_name)	Database number
DB_NAME()	(database_ID)	Database name
GETANSINULL()	(database_name)	Nullability setting
HOST_ID()	( )	Number for workstation
HOST_NAME()	( )	Name of workstation
IDENT_INCR()	(table or view)	Increment value for identity
IDENT_SEED()	(table or view)	Starting number for identity
INDEX_COL()	(table name, index_id, key_id)	Indexed column's name
ISNULL()	(expression, value)	Changes null values
NULLIF()	(expression1, expression2)	Null if equivalent
OBJECT_ID()	(object_name)	Object identification number
OBJECT_NAME()	(Object_ID)	Database object name
STATS_DATE()	(Table_ID, Index_ID)	Date statistics were updated
SUSER_SID()	(login name)	User's login identification
SUSER_SNAME()	(server_user_id)	User's login name
USER_ID()	(user_name)	User's database identification
USER_NAME()	(user_ID)	User's database username

ISNULL() can be useful when you want to convert all null values to a particular value. For example, look at the following query. This query converts all null values into zeros:

 SELECT ISNULL (price, 0.0000), price FROM titles

System functions, information schema views, or the system stored procedures can be used to gain access to system information without querying the system tables directly. System tables can change significantly between versions of SQL Server.

SQL Server provides system stored procedures or information schema views for obtaining information about the properties of data, such as the type of data in a column (numeric, text, and so on) or the length of a column. This type of information is called metadata and is maintained by SQL Server for all server and database objects. Metadata can be used to find out information about the structure of data, the contents of a server, or information that specifies the design of objects.

Although it is possible to obtain data by querying any of the system tables directly, the system tables may not provide the information required in the future. It is recommended that system stored procedures, system functions, and information schema views be used because the contents of the system tables may change in future releases.

Information Schema Views

Information schema views provide a method independent of the system tables to view metadata. These views enable applications to work properly even though significant changes may have been made to the system tables and more changes may be made in the future. An application that uses the views rather than a direct query against the system tables should function in the same manner in the future as it does in the current SQL Server release.

The information schema views included in SQL Server conform to the SQL-92 Standard definition for the INFORMATION_SCHEMA. Names used in the SQL-92 standard for these views are different from those used by SQL Server, though the names from SQL Server can equivalently be mapped to those of the standard. The following list shows the SQL-92 names and the SQL Server equivalents:

A SQL-92 "Catalog" is a SQL Server "Database."
"Schema" in SQL-92 is an "Owner" in SQL Server.
"Object" is the same in both SQL-92 and SQL Server.
A "Domain" in SQL-92 is a user-defined data type in SQL Server.

When retrieving metadata from the information schema views, you must use a qualified name that includes the INFORMATION_SCHEMA in the position where you usually specify the username. For example:

 SELECT * FROM Northwind.INFORMATION_SCHEMA.TABLES

For more information on the variety of metadata that can be obtained through the use of information schema views, use the Index tab of SQL Server Books Online. When you type information schema, the index shows links to all the appropriate views. Many system stored procedures can also be used to find information about server and database objects. With many of the procedures, however, you can also perform actions against the server, whereas information schema views are used solely to obtain metadata.

System Stored Procedures

Many administrative and informational activities in SQL Server can be accomplished through the use of SQL Server's many system stored procedures. System stored procedures are available to perform various activities, from obtaining information about server settings and objects to managing processes on the server to performing maintenance activities and much more.

It is not possible to cover all the procedures in this book, and SQL Server Books Online has full definitions and examples for these procedures. At various points throughout the book, references will be made to those procedures you are likely to find on the exam and others that will serve useful purposes in the future.

The following are typical situations that are tested on the exam:

Date conversion using appropriate date functions
String concatenation, including building and parsing functions
System application design, in which system functions are used to programmatically control or manipulate the software functionality

Beyond the hundreds of functions available to SQL Server, there are other entire worlds we have yet to explore that we need to consider when developing and coding applications using T-SQL. We have already visited a few of the basic commands used to perform simple queries.