As hit rates increase for the sportscar, the system duplicates the data about the sports car to new memory locations on other computers or nodes.
In one scenario where a single node can service a search request in 0. If incoming searches cross the desired service threshold, the system can reserve new memory locations accordingly. Service response time is only one factor of many when selecting the number of new memory locations. Other factors include location, cost, time to duplicate, available machines, competing domain-specific searches, service level agreements SLAs , green computing, energy consumption, energy efficiency, etc.
Other distribution scenarios exist for selecting the number of new memory locations.
The system distributes processing more or additional domain-specific searches amongst the at least one new memory location An example algorithm for performing this task is roughly analogous to the translation layer , shown in FIGS. Once the data is in multiple locations, the system applies an algorithm to distribute searches across the multiple locations. The system can randomly distribute processing of searches amongst all memory locations or distribute them serially or based on other criteria. The algorithm can vary based on time of day, quality of service constraints, competing domain-specific searches, cost, and so forth.
While the number of searches remains high, the system will maintain the expanded search space and continue to distribute searches amongst at least part of the expanded search space and the original search space. In time, the demand for searches in a particular domain may be reduced or some other event may affect the need for the expanded search space.
In order to shrink the number of memory locations or the search space, the system optionally identifies a reduction event related to the domain-specific searches and collapses the expanded search space such that searches for the key data are processed with data in a reduced search space smaller than the expanded search space but having at least one memory location In the sportscar example, if the buzz dies down around the sportscar after the manufacturer reveals that it gets a maximum of 3 miles per gallon, assume that searches for the sportscar sharply taper off.
When searches drop below a certain threshold, the system collapses the expanded search space by deleting the data in all or some of the new memory locations or controlling the algorithm to no longer include those memory locations in the distribution. The system can do this in one step or gradually with a series of steps.
The first memory location can act as a parent of all the new memory locations and can be uncollapsable, but in one aspect, collapsing the expanded search space is not limited to preserving the original memory location so long as at least one location still contains the key data. For example, the system duplicates the original memory location at address 0x to memory locations 0x0A29, 0xC3F0, and 0x82D2, each of which can be in a different computer.
The system can collapse 0x first, even though it is the original memory location. The workload manager can intelligently decide which memory location is best suited for collapsing first. This may be based on time i. The system can collapse the expanded search space by removing any of the memory locations in any order. In one variation, the system further organizes hit rates by network segment, identifies a network segment associated with searches for the key data, and duplicates the key data to at least one new memory location in close proximity to the identified network segment.
For example, if key data relates to the Green Bay Packers and most searches originate in Wisconsin, then the system identifies network connections between the compute environment and Wisconsin as key network segments. The system duplicates key data regarding the Green Bay Packers as close to the key network segment as possible. Or, for example, if the network path with the least latency to Wisconsin is from Des Moines, the system can duplicate the key data to memory locations in Des Moines and provision nodes and other network resources in a similar manner.
A system can predictively expand a search space by receiving information associated with an expected increase in use of a search phrase or searches specific to a particular domain to be processed in a search space in a database , identifying data in the search space having at least one memory location in the database, the data being data identified in response to the domain-specific searches , expanding the search space by duplicating data in the search space into at least one new memory location and distributing more or additional domain-specific searches amongst the expanded search space In other words, searches that are received after the duplication of the data are then distributed amongst at least the ne expanded search space and preferably including the original search space as well.
Information regarding an expected increase in use of a search phrase can be as simple as compiling a list of upcoming summer movies and responding to the expected increase in Internet searches for each upcoming movie. The number of duplicated memory locations can correlate with the each movie's previews and expected box office performance, for example.
In the case of an election where 10 individuals are competing for a nomination for candidacy, the system can predictively expand the search space for each of the 10 individuals in the weeks before the nomination. One example way of identifying domain-specific data is through a special administrative search field. An administrator may have a google search field such that the data that is retrieved is identified as hot data and immediately duplicated as disclosed herein.
Schema Matching and Mapping (Data-Centric Systems and Applications) [Zohra Bellahsene, Angela Bonifati, Erhard Rahm] on pturalun.tk *FREE* shipping. Editorial Reviews. From the Back Cover. Requiring heterogeneous information systems to Schema Matching and Mapping (Data-Centric Systems and Applications) Edition, Kindle Edition. by Zohra Bellahsene (Editor), Angela Bonifati.
This can be done manually as above or automated. Prediction based on new website data may also be used. For example, web crawling algorithms can analyze websites and data obtained therefrom. If new websites have recently been deployed or built that can be identified in a particular domain, then the system can act to prepare extra indexed data as set forth herein to manage predicted searches to those websites.
In another example, the system may analysis blogs to identify topics or terms that are gaining traction and predictively prepare for additional searches on those topics or terms. Often, searches for key data are cyclical.
Cyclical searches can occur on a daily, weekly, monthly, yearly, or other period. For example, searches for Santa Clause spike annually during December, searches for news spike daily in the morning hours on weekdays, and searches for the magicicada spike every 13 or 17 years in different regions. In order to address these cyclical searches, the system can further store received information in a log organized by date and time, predict a schedule of expected increases based on the log, and expand the search space based on the predicted schedule.
As noted above, the data that may be searched is any data. In one aspect, the data is indexed data that is obtained from crawling and indexing websites. In another aspect, the data is the website itself. In that case, the system may identify an event such as response time or hit rates to a certain website and dynamically duplicate the website itself such that the requests for the particular URL are distributed to at least one new memory location. The collapse may occur, for example, by modifying i. This may be helpful if there are too many requests for a streamed video presentation at a website, for example, and additional servers are necessary to manage the requests.
For example, the manager can reserve a set of CPU's, perhaps re-provisioning an appropriate operating system on the nodes say from Microsoft Windows to Linux , copy the data into cache, and initiate triggered jobs to begin streaming video data from the new nodes to additional search requests from users. The compute environment mentioned herein may be any utility computing, cluster, grid, cloud computing environment, on-demand center, server farm, and so forth. The workload manager of the present disclosure can be configured to manage searches for data in any type of computing environment.
Therefore, whether the data is web crawled index data, website data per se, or any other type of database, the principled disclosed herein can improve the search times and service response for inquiries or searches into such databases. Embodiments within the scope of the present invention may also include computer-readable media for carrying or having computer-executable instructions or data structures stored thereon.
Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer, including the functional design of any special purpose processor as discussed above. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions, data structures, or processor chip design.
When information is transferred or provided over a network or another communications connection either hardwired, wireless, or combination thereof to a computer, the computer properly views the connection as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of the computer-readable media. Computer-executable instructions include, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions.
Computer-executable instructions also include program modules that are executed by computers in stand-alone or network environments. Generally, program modules include routines, programs, objects, components, data structures, and the functions inherent in the design of special-purpose processors, etc. Computer-executable instructions, associated data structures, and program modules represent examples of the program code means for executing steps of the methods disclosed herein.
The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps. Those of skill in the art will appreciate that other embodiments of the invention may be practiced in network computing environments with many types of computer system configurations, including personal computers, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like.
Embodiments may also be practiced in distributed computing environments where tasks are performed by local and remote processing devices that are linked either by hardwired links, wireless links, or by a combination thereof through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices. The various embodiments described above are provided by way of illustration only and should not be construed to limit the invention.
For example, the principles herein may be applied to text, image, audio, or movie searches on the Internet or an internal network. One of skill in the art will recognize that the principles described herein may be modified to apply to searches for images, speech, audio, video, multi-modal searches, etc. As the number of processor cores increases and their associated caches increase in personal computers, the same principles may even be applied to a single desktop computer.
Those skilled in the art will readily recognize various modifications and changes that may be made to the present invention without following the example embodiments and applications illustrated and described herein, and without departing from the true spirit and scope of the present invention. Effective date : Disclosed herein are systems, methods, and computer-readable media for dynamically managing data-centric searches. The method includes identifying an event related to domain-specific searches to a database, identifying data at a first memory location in the database, duplicating the data to a new memory location, and distributing processing more domain-specific searches amongst an expanded search space including the new memory location and the first memory location.
The expanded search space can be reduced or collapsed in response to a reduction event by removing duplicate data from the first memory location or the new memory location and adjusting the distribution of processing amongst the remaining memory location. The method can optionally include organizing multiple events by network segment, identifying a network segment associated with domain-specific searches, and duplicating the data to a new memory location in close proximity to the identified network segment.
Field of the Invention The present invention relates to searches and more specifically to managing resources to process data centric searches. Introduction Data centric searches have grown in popularity in the last 15 years. SUMMARY Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Understanding that these drawings depict only exemplary embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which: FIG.
I claim: 1. A method comprising: identifying, at a first time, data at a memory location in response to topic-specific searches;. The method of claim 1 , wherein responding to the first topic-specific search further comprises responding using the data at the memory location. The method of claim 1 , wherein the new memory location is selected at random.
The method of claim 1 , wherein the new memory location is selected serially. The method of claim 1 , the method further comprising: organizing multiple events by network segment; and. The method of claim 5 , wherein a distance between the new memory location and the network segment is measured by at least one of geographic distance, network latency, number of intermediate network hops, temperature, and cost. The method of claim 1 , wherein a workload manager manages at least part of identifying data, duplicating the data, and responding.
A method comprising: receiving information, at a first time, associated with an expected event related to topic-specific searches to a database;. The method of claim 8 , the method further comprising: storing received information in a log organized by date and time;. The method of claim 8 , the method further comprising: identifying a network segment associated with the expected event related to topic-specific searches; and.
The method of claim 10 , wherein the proximity is measured by at least one of geographic distance, network latency, number of intermediate network hops, temperature, and cost. The method of claim 8 , wherein the method is processed by a workload manager.
A system comprising: a processor; and.