Posts

Showing posts with the label MarkLogic

Fast queries using cts:value-tuples

MarkLogic has an extremely fast search engine but it can be adversely affected by inefficient queries and the absence of element/attribute indexing. Here is an example which makes use of a very efficient API function called cts:value-tuples that searches on indexed elements and attributes. Explaining the code: utility function to transform results to CSV format (lines 5-13) fetch journal titles for a given year (lines 15-24) iterate through the years and journal titles (lines 26-27) get all records for a given journal for a given year (lines 29-34) build references for fetching journal title, search term and publication year (lines 36-40) return frequency of a given search term per journal per year using cts:value-tuples (lines 42-46) Note how results are output by frequency in descending order and transformed into CSV format.   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 xquery versi

Avoiding XDMP-EXPNTREECACHEFULL Errors

If you have spent any length of time having to process large numbers of records in MarkLogic, then you must have already experienced the dreadful XMDP-EXPNTREECACHEFULL error message. This is simply, but mercilessly, telling you that there is not enough memory in your system to process your query and generate the final result(s). Most of the times, it is the result of poor code techniques and/or lack of indexing on the relevant elements/attributes. The simple method below has proven useful on many occasions and is rather straightforward to implement. Basically, instead of accumulating the processing of all records in memory, you are handing the processing of each record to a separate thread and letting the Task Server spawn them all and return the result(s). In this example, I am assuming you want to update the dataset in MarkLogic. Explaining the code: efficiently identify the records to process - lines 1 to 9 process each record separately - line 11 use xdmp:spawn-function() for e