Database links have been a key feature of the Oracle database for a long time, allowing one database instance to interact with another. These "remote" databases do not have to be Oracle databases; the heterogeneous gateway allows for ODBC connectivity to non-Oracle systems, providing seamless and transparent movement of SQL and data between environments.
Until it doesn't. Often, using a database link directly in your SQL in Oracle is enough to access the remote data but there are times when you might need a feature that is only available in the non-Oracle system or data does not translate well through the gateway.
An example might be a function native to Oracle that doesn't have an equivalent on the remote system. Or you've issued a query with a filter that should ideally be executed over on the remote system to quickly return a small number of rows, but it takes forever. Upon closer inspection of the execution plan, you see that it is pulling all the data over to Oracle before applying the filter, and no hint (e.g., DRIVING_SITE) is helping to force it to "do the right thing."
There are many ways to tackle this problem, such as creating a view on the remote system with the filter in place or possibly consuming the data via another route altogether, like a RESTful web service. However, if you are truly limited to using the database link, this is where the DBMS_HS_PASSTHROUGH package can come to the rescue.
Let's get the downsides of using this package out of the way:
You have to write code, sometimes quite a lot of code.
It processes the data row by row, column by column so doesn't scale all that well.
The database link may still be quicker depending on your environment as Oracle does an incredible job managing large sets of data in memory so pulling all the data over and filtering locally could still be the better option.
So now you know it isn't a perfect solution to use in every situation. For those special cases where you do need it, how do you use it?
Let's assume I have a database link created called hashnode_db that I am having a hard time working with. I am trying to get a count of records grouped by TRUNC(some_date, 'MONTH') but I see that using the link directly in SQL, Oracle is pulling over 200 million records and grouping it locally in Oracle taking several minutes.
When I execute the same query using the remote systems equivalent date function (for example DATE_TRUNC('month', some_date) in Snowflake) it comes back in seconds and returns only a handful of rows. I would much prefer this response time and data payload please!
I will create a pipelined function wrapper to return this data. First, I will create a nested table TYPE in the database to hold the result set.
CREATE OR REPLACE TYPE T_SRC_DATE_REC AS OBJECT
(
SRC_DATE DATE,
REC_COUNT NUMBER
);
CREATE OR REPLACE TYPE T_SRC_DATE_TBL AS TABLE OF T_SRC_DATE_REC;
My new function will then "pipe out" the results using this TYPE having retrieved the data via the DBMS_HS_PASSTHROUGH package.
FUNCTION GET_REMOTE_DATA(P_SQL_STRING IN VARCHAR2) RETURN T_SRC_DATE_TBL
PIPELINED IS
V_CUR INTEGER;
V_ROWCOUNT NUMBER;
V_OUTPUT T_SRC_DATE_TBL := T_SRC_DATE_TBL (1);
V_OUTPUT_ROW T_SRC_DATE_REC;
V_OUT_VAL VARCHAR2(4000);
BEGIN
-- Create the cursor
V_CUR := DBMS_HS_PASSTHROUGH.OPEN_CURSOR@hashnode_db;
-- parse the sql (it will raise an error if invalid)
DBMS_HS_PASSTHROUGH.PARSE@hashnode_db(V_CUR, P_SQL_STRING);
<<RECORD_SET>>
LOOP
-- Loop through all the rows returned
V_ROWCOUNT := DBMS_HS_PASSTHROUGH.FETCH_ROW@hashnode_db(V_CUR);
EXIT WHEN V_ROWCOUNT = 0; -- no more rows to fetch
V_OUTPUT_ROW := NULL;
-- Get the first column from the row, the date
DBMS_HS_PASSTHROUGH.GET_VALUE@hashnode_db(V_CUR, 1, V_OUTPUT_ROW.SRC_DATE);
-- Get the second column from the row, the record count
DBMS_HS_PASSTHROUGH.GET_VALUE@hashnode_db(V_CUR, 2, V_OUTPUT_ROW.REC_COUNT);
V_OUTPUT(1) := V_OUTPUT_ROW;
PIPE ROW(V_OUTPUT(1));
END LOOP RECORD_SET;
-- Close the cursor
DBMS_HS_PASSTHROUGH.CLOSE_CURSOR@hashnode_db(V_CUR);
RETURN;
EXCEPTION
WHEN NO_DATA_NEEDED THEN
NULL;
WHEN OTHERS THEN
RAISE;
END GET_REMOTE_DATA;
Hopefully the code is relatively self-explanatory as to what we are doing so we can now call this function as follows:
SELECT SRC_DATE,
REC_COUNT
FROM GET_REMOTE_DATA(P_SQL_STRING =>
q'[SELECT DATE_TRUNC('month', some_date) AS DT, COUNT(*) AS REC_COUNT
FROM some_remote_table
GROUP BY DATE_TRUNC('month', some_date)]');
and get the data to return in a matter of seconds. So what does the package do?
Accepts the SQL statement we pass in as a parameter P_SQL_STRING
Gets a pointer (V_CUR) to a cursor opened against our database link
Parses the SQL against the remote database, raising an error if it is invalid
Loops through each row fetching them one at a time, exiting when no more rows are found
Extracts the value from the first column and saves it to our record type in the SRC_DATE field
Extracts the value from the second column and saves it to our record type in the REC_COUNT field
Adds this record object to the nested table type
Immediately PIPES out the data (meaning we only ever need keep one row in the nested table)
When all rows have been fetched and processed through the loop, close the cursor.
There are a few more things you can do with this package such as working with bind variables via the DBMS_HS_PASSTHROUGH.BIND_VARIABLE.
If I want to filter my original query to only aggregate a specific subset of the main data, I can simply include that in the string as a literal, as shown in the WHERE clause here:
SELECT SRC_DATE,
REC_COUNT
FROM GET_REMOTE_DATA(P_SQL_STRING =>
q'[SELECT DATE_TRUNC('month', some_date) AS DT, COUNT(*) AS REC_COUNT
FROM some_remote_table
WHERE some_other_column = 'MYFILTER'
GROUP BY DATE_TRUNC('month', some_date)]');
Using bind variables is better for performance and security because it allows queries to be cached, assuming the remote non-Oracle database uses similar caching techniques. Here is an improved version:
SELECT SRC_DATE,
REC_COUNT
FROM GET_REMOTE_DATA(P_SQL_STRING =>
q'[SELECT DATE_TRUNC('month', some_date) AS DT, COUNT(*) AS REC_COUNT
FROM some_remote_table
WHERE some_other_column = ?
GROUP BY DATE_TRUNC('month', some_date)]',
P_FILTER => 'MYFILTER');
We now need to provide the bind variable and use it when calling DBMS_HS_PASSTHROUGH. In this version you will see a new parameter of P_FILTER has been added to the pipelined function.
FUNCTION GET_REMOTE_DATA(P_SQL_STRING IN VARCHAR2, P_FILTER IN VARCHAR2)
RETURN T_SRC_DATE_TBL
PIPELINED IS
V_CUR INTEGER;
V_ROWCOUNT NUMBER;
V_OUTPUT T_SRC_DATE_TBL := T_SRC_DATE_TBL (1);
V_OUTPUT_ROW T_SRC_DATE_REC;
V_OUT_VAL VARCHAR2(4000);
BEGIN
-- Create the cursor
V_CUR := DBMS_HS_PASSTHROUGH.OPEN_CURSOR@hashnode_db;
-- parse the sql (it will raise an error if invalid)
DBMS_HS_PASSTHROUGH.PARSE@hashnode_db(V_CUR, P_SQL_STRING);
-- Apply the filter via the bind variable
DBMS_HS_PASSTHROUGH.BIND_VARIABLE@hashnode_db(V_CUR, 1, P_FILTER);
<<RECORD_SET>>
LOOP
-- Loop through all the rows returned
V_ROWCOUNT := DBMS_HS_PASSTHROUGH.FETCH_ROW@hashnode_db(V_CUR);
EXIT WHEN V_ROWCOUNT = 0; -- no more rows to fetch
V_OUTPUT_ROW := NULL;
-- Get the first column from the row, the date
DBMS_HS_PASSTHROUGH.GET_VALUE@hashnode_db(V_CUR, 1, V_OUTPUT_ROW.SRC_DATE);
-- Get the second column from the row, the record count
DBMS_HS_PASSTHROUGH.GET_VALUE@hashnode_db(V_CUR, 2, V_OUTPUT_ROW.REC_COUNT);
V_OUTPUT(1) := V_OUTPUT_ROW;
PIPE ROW(V_OUTPUT(1));
END LOOP RECORD_SET;
-- Close the cursor
DBMS_HS_PASSTHROUGH.CLOSE_CURSOR@hashnode_db(V_CUR);
RETURN;
EXCEPTION
WHEN NO_DATA_NEEDED THEN
NULL;
WHEN OTHERS THEN
RAISE;
END GET_REMOTE_DATA;
The line:
DBMS_HS_PASSTHROUGH.BIND_VARIABLE@hashnode_db(V_CUR, 1, P_FILTER);
now applies the bind variable to the first ? placeholder.
If there was more than one bind variable in the query, the second parameter in the call to the BIND_VARIABLE procedure would be 2 and of course you would need another parameter passed into the pipelined function.
The third parameter in the BIND_VARIABLE procedure is the value to use and should match the datatype of the column being filtered. So, if your filter is against a DATE column, the P_FILTER parameter needs to be of type DATE instead of VARCHAR2, as shown in the previous example.
To summarize, while this package is a great addition to our toolkit as developers, it is worth emphasising that it should be evaluated on a case by case basis and only when using the database link directly or finding an alternative, potentially simpler solution has been exhausted.