Data Refresh Issues in the Power BI Service Due to Invalid Dates

Last week I got involved with a customer issue. A refresh of the data imported to a PBIX always works in Power BI Desktop, but the refresh operation intermittently fails in the Power BI Service. Their workaround had been to refresh the PBIX in Desktop and re-upload the file to the Service. This post is about finding and fixing the root cause of the issue - this is as of March 2018, so this behavior may very well change in the future. 

Turns out, the source of the problem was that the customer's Open Orders table can contain invalid dates - not all rows, just some rows. Since Open Orders data can fluctuate, that explains why it presented as an intermittent refresh issue. Here's a simple mockup that shows one row which contains an invalid date:

InvalidDates_SQLTable.jpg
 

At this point, we have two open questions:
(1) What is causing the refresh error? 
(2) Why is the refresh behavior different in the Service than the Desktop tool?

What is Causing the Data Refresh Error?

The thing that made it difficult to figure out is that the error message in the Service indicated a different table every time - there were about 15 tables in this data model. Since the error wasn't specific, uploaded tables one by one to the Service to determine which table actually was causing the error. When there is just one table present, the error message is helpful, indicating it's not a legal date:

InvalidDates_PowerBIErrorMessage.jpg
 

If Microsoft is able to make the above error message display even when there are numerous tables, that would be extremely helpful. Ok, at this point we know it's a date issue. Which brings us to the next question...

Why is the Refresh Behavior in the Power BI Service Different than Desktop?

The customer had opened a ticket with Microsoft, and the support person provided this information:

"The code path is slightly different between the Desktop and the Service:
-In the Desktop, the query runs through an OLEDB-based provider and the conversion from CLR date to OLE date is handled by our own code (and the faulty value is silently replaced by NULL). 
-In the Service, the query runs through an ADO.NET-based provider and the conversion from CLR date to OLE date is handled by another layer which does not swallow the error but bubbles it out."

So, at this point in time (March 2018), the date conversions are handled differently in the Power BI Service, and the Power BI Service is not tolerant of an invalid date. The invalid dates in my customer's Open Orders table looked mostly like this: 0018-03-01 or 0017-01-01. Year "18" and year "17" are less than the minimum year of "101" allowed by the Power BI Service.

Options for Handling Invalid Dates to Avoid Refresh Failure

Option 1: In the Source Database. It's always best to handle data quality issues as far upstream (i.e., close to the source) as possible. In my customer project, I was fortunate to have the ability to modify the underlying Open Orders database view where Power BI accesses the data:

CREATE VIEW DW.vwOpenOrders
AS
SELECT 
   Column1
  ,Column2
  ,Calendar_Date = CASE WHEN LEFT(CONVERT(CHAR(10), Calendar_Date), 2) IN('19','20') THEN Calendar_Date ELSE NULL END
FROM DW.OpenOrders
WHERE...

The reason I strongly prefer to handle it in the source database is because it's resolved *once* then *all* downstream queries and tools (like Power BI) can take advantage of it.
  
Option 2: In the Power BI Query Editor. Alternatively, you could do something like this in your M script (this Table.TransformColumns snippet was provided by the Microsoft support team):

InvalidDates_FixingInPowerBI.jpg

The Custom1 step above finds any date or datetime column in the entire query and, if it's less than the year 101, changes the date to 01/01/1900 (you could change it to a null if you want, but null dates and attributes aren't ideal in reporting scenarios). Since the M script traverses all columns in the query, you'll want to check performance if you have a large set of data.

Now, just to reiterate...although handling it inside of Power BI Desktop is an option if you cannot change the underlying source, I would far prefer to handle it upstream in the database as shown in Option 1. Having said that, here is the text from the above image so you don't have to type it:

let
  Source = Sql.Database("servername.database.windows.net", "AdventureWorksLT"),
  dbo_DateTableTest = Source{[Schema="dbo",Item="DateTableTest"]}[Data],
  Custom1 = Table.TransformColumns(
    #"dbo_DateTableTest", 
    List.Transform(
        Table.ColumnsOfType(#"dbo_DateTableTest", {type date, type nullable date, type datetime, type nullable datetime, type datetimezone, type nullable datetimezone})
        , (name) => {name, (date) => if date = null or Date.Year(date) < 101 then #date(1900, 01, 01) else date, Type.TableColumn(Value.Type(#"dbo_DateTableTest"), name)}
        )
    )
in
#"Custom1"

Hope this post is helpful if you find yourself dealing with date-related data refresh errors in the Power BI Service.

You Might Also Like...

Reusing Datasets Imported to the Power BI Service

Why the Default Summarization Property in Power BI is So Important