danthedataman.com

November 11, 2010

Query Optimization by David DeWitt at SQLPASS: Awesome!

Filed under: SQL Server — Dan @ 12:37 pm

Wow!  Amazing to listen to such an expert in the database technology field.  He presented way way more than I could possibly process this morning.  His keynote should be posted here later today.  David works at Microsoft Jim Gray Systems Lab in Wisconsin.  Here is the slide deck from his presentation:

“Query Optimization is not rocket science.  When you flunk out of  query optimization, we make you go build rockets.”  After getting a little visibility today into this field, I believe it.  Wow!

Takeaway for me was the importance to query optimization of updating statistics.  As I deal mostly with data/reporting/BI, I either forgot that, or never grasped the connection.  But David’s presentation today really opened my eyes to the need to make sure statistics are being updated.

Advertisement

August 18, 2009

IIS reqs for SQL 2005 on Windows 7 (or Vista)

Filed under: SQL Server — Dan @ 10:26 am
Tags:

Great link from Alastair Waddell listing these requirements.  Want to keep track of this in case I need it for future reference, so I’m going to post it in my blog.  Fuller instructions in Alastair’s blog, but here are the IIS features required, modified for Windows 7:

Component Folder
Static Content Common HTTP Features
Default Document Common HTTP Features
HTTP Redirection Common HTTP Features
Directory Browsing Common HTTP Features
ASP.Net Application Development
ISAPI Extension Application Development
ISAPI Filters Application Development
Windows Authentication Security
IIS Metabase IIS 6 Management Compatibility
IIS 6 WMI IIS 6 Management Compatibility

Source: http://support.microsoft.com/kb/920201

June 22, 2009

SQL Server release & update calendar

Filed under: SQL Server — Dan @ 1:37 pm
Tags: ,

Found a handy calendar on SQLServerPedia with info about releases of versions of SQL Server.  Here’s a sample:

SQL Server 2008 Release Date Calendar

Version Name Version Number Release Date Mainstream Support Ends Download
SQL Server 2008 SP1 Cumulative Update Package 2 10.00.2714 May 2009   Download SP1 CUP2
SQL Server 2008 SP1 Cumulative Update Package 1 10.00.2710 April 2009   Download SP1 CUP1
SQL Server 2008 SP1 10.00.2531 April 2009 Not Yet Scheduled Download SP1

I have not used SQLServerPedia before, but it appears to be a SQL Server wiki sponsored by Quest Software. 

Wanted to be sure to blog this when I found it, so that I can find it again…..there are a number of things documented here that, at one time or another, I have had to search to find an answer to.  Nice to have all this in one place.

June 9, 2009

Windows tip: retrieve off-screen window

Filed under: Windows — Dan @ 9:27 am
Tags: ,

This post is not specifically data related, but is more of a Windows tip that I want to remember.  I had my laptop hooked up to a second monitor, which I just recently removed, and now I’ve had a window pop up off screen where I can’t get to it.  How-To Geek pointed me to the solution here: once the window is in focus (although still off-screen), right click it on the Taskbar and select Move.  Then hit any of the arrow keys (Left, Right, Up, Down), and then move your mouse, and the window will be back! right where your mouse is.  Easy enough, but definitely frustrating if you don’t know how.

June 8, 2009

What is a Dashboard?

Filed under: definition — Dan @ 11:41 am
Tags: ,

As our institution is close to kicking off a Data Council, defining data terms is on my mind lately.  Found an excellent quote of how to tell if a Dashboard is doing its job, which I wanted to post here for myself for future reference.  This comes from Stephen Few’s most excellent book, which came to me with multiple recommendations, and I have finally started reading, Information Dashboard Design: The Effective Visual Communication of Data (O’Reilly, 2006).  Here is Stephen’s definition of a Dashboard, which I will give first, for context: “A dashboard is a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance.”  This definition originally appeared in Stephen Few’s article “Dashboard Confusion”, in the March 20, 2004 edition of Intelligent Enterprise.

The statements that really caught my attention, from his book mentioned above, “The dashboard does its primary job if it tells you with no more than a glance that you should act.  It serves you superbly if it directly opens the door to any additional information that you need to take that action.”  Nice, really nice.  I like that because it tells you how to know if the dashboard is functional, providing its primary function, and then it tells you how to know if the dashboard is really excellent, taking service/functionality to the next level.  Thanks, Stephen!

June 5, 2009

find VMWare ESX/ESXi update/build (and found a new source of VMWare expertise)

Filed under: VMWare — Dan @ 4:26 pm
Tags: ,

As a follow up to my previous post, supported virtualization environments for SQL Server, here is a great blog post at Techhead on How to Determine the VMWare ESX or ESXi Build Version.  I was looking to find the Update# from the Build#, and Techhead’s handy table provided that:

VMware ESX Update: Build Number:
ESX 3.5.0 Update 1 64607
ESX 3.5.0 Update 2 110268
ESX 3.5.0 Update 3 123630
   
ESX 3i (3.5.0) Update 3 123629

He lists three ways via the VMWare Infrastructure Client to find the version & build, as well how to use the Service Console to find version, build, and updates applied/update level.  He also provides a link to the VMWare kb that covers this.

Nice to find what looks to be a great reference on VMWare technology, in addition to just getting an answer to my build# question.

April 22, 2009

supported virtualization environments for SQL Server

Filed under: SQL Server — Dan @ 4:58 pm
Tags: , ,

It’s useful to know which virtualization environments Microsoft considers “validated” for use by SQL Server, whether or not you have a Premier support contract with Microsoft or not.  And they do of course provide a level of SQL Server support even if you run it in a non-validated virtualization environment.

This blog entry from Microsoft Customer Service and Support SQL Server Engineers gives details on support for SQL Server in a virtualized environment.  Here’s the summary at the end of the article:

  • If you use Windows Server 2008 with Hyper-V or Microsoft Hyper-V Server 2008, you are supported for SQL Server 2005 and 2008.
  • If you choose to use a non-Microsoft virtualization solution, check to see that your configuration is SVVP certified. If it is, Microsoft CSS will support you and work with your vendor to help find a resolution to your issue.
  • If your configuration is not SVVP certified, be prepared for Microsoft CSS to ask you to reproduce your problem outside of the virtualization environment should you choose to seek assistance from CSS.
  • Here is a list of currently SVVP-certified products:

    This list comes from here, which you can get to from the SVVP page by clicking Products under Additional Information in the left pane.  Note that there are hardware specs for each of the SVVP-certified products (click the link for the product you want to see).  No doubt this list will change as more products get certified.

    Oh, it might also be worth mentioning that it appears that SVVP-certified products are certified to run Microsoft’s server software, as specified on the SVVP Products page, which currently includes Windows Server 2008, Windows Server 2003 SP2, and Windows 2000 Server SP4 and later.  So it sounds to me like the SVVP certification is for the Windows OS, beyond just SQL Server.

    Thanks to Michael Otey (again) for this tip, pointing me to the SQL Server Engineers Blog.

    SQL Server licensing with muliple CPUs or virtualization

    Filed under: SQL Server — Dan @ 3:41 pm
    Tags: , ,

    This article by Michael Otey deals with a couple questions about SQL Server 2005 licensing. 

    The per CPU license is applied per CPU socket, rather than the number of cores.  So a server with a single quad-core CPU would need just one license, not four.

    When virtualizing SQL Server, the number of CPUs on the host does not matter.  You need to license for the number of CPUs the virtual machine (VM) is configured to use.  So if you have a host with 4 CPU [sockets], running 3 separate VMs with SQL Server, and each one is configured to use one processor, you would only need 3 SQL Server per-CPU licenses, rather than 4.  Conversely, I suppose if you had 5 separate VMs with SQL Server each configured to use one processor, you would need 5 SQL Server per-CPU licenses.  And of course, if some of your VMs are using multiple CPUs, that increases the number of licenses needed.  For example, in the case with 5 SQL Server VMs, if 3 of those were using 2 CPUs each, then you would need 3×2+2×1=8 SQL Server per-CPU licenses.

    Caveats:

    • Michael’s article deals with SQL Server per-CPU licensing.  There are other licensing options by CAL (client access license).
    • This article is now a couple years old, so I would advise checking for yourself on current licensing for SQL Server 2005, as well as 2008. 

    With that in mind, I’ll add in here links to Microsoft’s SQL Server 2005 and 2008 licensing documentation, in case additional questions arise.

    Microsoft SQL Server 2005 Licensing 
    Microsoft SQL Server 2008 Licensing (Word Doc)

    March 23, 2009

    More: export XML from Unidata (and import into SQL Server)

    Filed under: ETL — Dan @ 8:46 pm
    Tags: , , , , ,

    Had a comment posted on my export XML from Unidata blog entry requesting more info on our ETL process, so this blog entry is an attempt to answer those questions.  These answers, of course, are based on the current state of our ETL process.

    How many files are you exporting to SQL Server?
    We are exporting 54 files to SQL Server.  Included in that are 22 files which are part of 5 file suites (5 fiscal years of BUD.WORK (BWK.ACYR) and GL Summary (GLS.ACYR) files; 3 financial aid years of TA, SA, SL, and LA files).  The remaining 32 files are not part of file suites.

    What are the performance specs of this extract (how long does it take you to export STUDENT.ACAD.CRED?).
    The XML extract for the 50+ files as mentioned above seems to take approximately 40 minutes, give or take about 5 minutes (actually last night looks like it took 34 1/2 minutes).
    So the STUDENT.ACAD.CRED XML extract, for example, looks like it’s taking about 10 minutes, that’s for just over 300,000 records, and makes about a 300 MB xml file.  In comparison, PERSON, which is almost 200,000 records, takes about 4 1/2 minutes to make a 100 MB file.  STUDENT.TERMS, about 65,000 records, takes almost 10 minutes to generate a 12 MB file.  Number of fields and number of records plays a small part in the time to dump the data to XML, but the big performance difference-maker is computed-column complexity.  We just added STTR.CUM.GPA to our STUDENT.TERMS extract, and that’s why there’s such a drastic difference in performance between that file’s extract, and some of the other files.

    How are you handling multi-valued fields and especially associations?  I would expect that the multi-value fields (LISTS) would break out as an “_LS” dependent (Child) table while the associations would also become a dependent table both using the ID of the Parent table and an Ordinal position.  Using your XML approach how did you make this happen?  Also, are you using SQL Server 2005 with DTS or SSIS?
    Since XML is hierarchical, each Unidata file exports to a single XML file, with the multi-values part of that XML file’s data structure.  I am using SQL Server Integration Services (SSIS) to import the XML data into SQL Server 2005.  One of the Data Flow Sources in SSIS is an XML Source, which is what I used.  So this XML Source automatically separates the XML file into separate tables for each multi-value.  Specifically, it creates one table for the single-value data, and then it creates 2 tables for each multi-value dataset.  Perhaps the SSIS package could be built to output the data differently, but for the most part I just took the defaults.  So for example, based on the fields we have included in our extract from the PERSON file, here are some of the tables that are generated in SQL Server: PERSON, PERSON_PARENTS_MS, PERSON_PARENTS_MV, PERSON_PEOPLE_EMAIL-MS, and PERSON_PEOPLE_EMAIL-MV.  PERSON is generated by default, and contains any single value data, and the SSIS-generated Id which links to the multi-value tables.  The PERSON_PEOPLE_EMAIL tables were generated from the PEOPLE.EMAIL association, and include all the fields from that association that were included in our XML extract.  The PERSON_PARENTS tables come from the field PARENTS, which is a Q-pointer.  SSIS breaks each multi-value dataset into 2 tables: the …MV table contains the SSIS-generated ID from the PERSON table (i.e., PERSON_Id), and the SSIS-generated ID corresponding to the multi-value data table (i.e., PEOPLE_EMAIL-MV_Id).  The …MS table contains the data for the multi-value dataset, along with the corresponding key (i.e., PEOPLE_EMAIL-MV_Id) to link it back to it’s originating PERSON record.

    Some more things I learned in the process of implementing this with SSIS in SQL Server 2005:

    • The XML mapping file on the Linux server that I referred to in my original blog entry on this is necessary to strip dots out of Unidata/Colleague field and file names, so that SSIS/SQL Server can process the data structures.  Also, in some cases, I renamed the XML record itself, so that the SSIS-generated ID name would not conflict with the pre-existing Unidata/Colleague key field name.  (The example in the original blog entry is a case of this: I changed the record name to COUNTIES_rec, because there was an existing field already called COUNTIES_ID.)
    • Something that I was not able to handle with the mapping file, but that needed to be taken care of before the XML could be processed by SSIS, was dots in multi-value field/association names.  I couldn’t get the mapping file to handle those, so ended up running the XML file, after it was generated, through the Linux sed command for each multi-value/association with a dot.  (Thanks to my supervisor Curt for pointing me in that direction.)
    • There were a few Unidata (Colleague) files which contained character(s) in the data which SSIS did not like.  I blogged about that, along with what I did about it, here.
    • One last thing I’ll mention is that XML Data Flow Source in SSIS gives you an opportunity to generate the .xsd file (XML schema definition) from the XML file that you provide (and I used that capability).  There were occasions, though, when I would later have to modify the data type in an .xsd from what SSIS chose.

    Well thank you Greg for the questions, and I’m sorry for the delay getting answers out.  Hope this helps you and perhaps some others, too.

    February 17, 2009

    types of reports

    Filed under: Uncategorized — Dan @ 3:36 pm

    Perhaps I had heard before about different types of reports, but I remember when first hearing/reading from John Van Weeren about types of reporting it felt a little bit like a revelation to me.  Here, for example, is John’s “The Information for Everyone Initiative: A Datatel Position Paper” from datatel.com.  I remember being introduced to four reporting types: Operational, Ad-hoc, Analytic, and Strategic.  Each has its place, and is important within the enterprise.

    This concept is also useful in the context of considering your institution/organization’s reporting tools, and which reporting types they could cover.   For example, Microsoft Access could easily handle operational and ad-hoc, Excel could handle those as well as analytic.  So those considerations could lead into the question, Are there holes in your organization’s coverage of Reporting?

    Next Page »

    Blog at WordPress.com.