Wednesday, April 21, 2010

solr some usefull link

http://wiki.apache.org/solr/LukeRequestHandler#numTerms
http://wiki.apache.org/solr/SolrRequestHandler
http://wiki.apache.org/solr/MoreLikeThisHandler
http://wiki.apache.org/solr/DisMaxRequestHandler#q.alt
http://wiki.apache.org/solr/SearchHandler#q.op
http://wiki.apache.org/solr/SolrForrest
http://wiki.apache.org/solr/SolrQuery
http://wiki.apache.org/solr/SolrConfigXml
http://www.ibm.com/developerworks/java/library/j-solr-update/
http://www.ibm.com/developerworks/java/library/j-solr1/

Who will use SOLR?

Public Websites using Solr



  • http://www.whitehouse.gov/ - Uses Solr via Drupal for site search w/highlighting & faceting (More details: 1, 2)

  • AT&T Interactive uses Solr to run local search at yp.com, the new yellowpages.com

  • AOL is using Solr to power its channels

  • FindBestOpenSourceLists best open source across all categories.

  • GameSpot uses Solr for its search features.

  • Netflix uses Solr for their site search feature

  • eReplacementParts uses Solr for its intense search of over 300,000 tool parts.

  • CNET Reviews and Shopper.com use Solr for product search and faceted browsing

  • Zappos.com uses Solr & SolrJ for their search and navigation across 150,000 styles of shoes and other products.

  • SourceForge uses Solr to provide faceted search across all its projects.

  • http://news.com uses Solr for search.

  • digg uses Solr for search

  • Buy.com's international sites are powered by Solr (LucidWorks), including http://ca.buy.com(Canada); http://fr.buy.com (France); http://de.buy.com (Germany) and http://uk.buy.com (United Kingdom).

  • Internet Archive is an opensource digital archive that uses Solr for all of its collections (audio/video/books) search, along with PHP and mysql.

  • StubHub!, an eBay company, uses Solr for Browsing and Searching tickets

  • FanSnap uses Solr to help fans search for tickets to live sports, concert, and theater events.

  • Smithsonian Institution cross catalog faceted search

  • Homeland Security Digital Library (HSDL) uses Solr for search.

  • http://www.lucidimagination.com/search Indexes the Lucene ecosystem: Solr, Lucene Java, Mahout, Nutch, Tika, Droids and the ports of Lucene. Includes mail archives, websites, JIRA issues and wikis.

  • http://www.plaxo.com - uses Solr for sitewide activity and people search.

  • http://factbook.bodukai.com/ - CIA World Factbook data imported into Solr.

  • http://www.alltreatment.com/ Uses Solr to power search and a directory for drug rehab centers nationwide (US).

  • eBharatJobs.com A vertical job search engine targeted for India. Using Solr to index & perform search

  • SimplyFreshers.com A website for needs of Freshers (in India). Using Solr to index & perform search on jobs, placement papers, interviews. Using filters to further refine searches

  • Sarkari Tenders - Indian Government Tenders A vertical tenders search engine targeted for India. Using Solr to index & perform search, and filters results.

  • TheBigJobs.com A Job portal build using Drupal CMS. Using solr to index & search jobs posted on website.

  • Pubget indexes the worlds medical related scientific journals (Medline) together with full-text PDF locations and university library holdings rules. It uses release 1.3 with shards.

  • Citysearch uses Solr for all searches on the site. Includes facets by location and category. Index contains 19 million documents and is distributed over multiple servers. Serving over 3 million visitors a day.

  • manta.com uses Solr for company profiles search and browsing. Facets by location and industry and filtering by sales volume, the number of employee and public/private. Has 14 Million US companies. About 10 requests per second. Running under tomcat 6.x with 6 G JVM Memory (3 Linux servers). The optimized index size is 7.3 G.

  • Game Change change your games for free (Germany)

  • Yankee Group uses Solr to create a search and browse experience which includes its published subscription content, press releases, and public blog postings. Registered users (including guests) can save searches, turn them into email alerts, and consume them as RSS feeds.

  • Central Desktop Multi-tenant SaaS business collaboration tool. >3M documents >1TB raw data >35GB index size.

  • 6pm.com uses Solr for their entire site search and navigation

  • Enormo - uses Solr to search more than 3,000,000 property listings in 49 European and North African countries, in 30 languages, features range and minimum and maximum percentile faceting

  • search.com queries various Solr search collections

  • Cleaning Service Directory: Uses Solr to power search over thousands of contractors

  • Volunteer Solutions uses Solr for basic keyword search and faceted browsing of 100s of categories of volunteer opportunities

  • http://krugle.com uses Solr to store information about projects and user-created code notes.

  • CHOWHOUND uses Solr for its search facility.

  • Local Clinical Trials is a clinical trials search engine that uses Solr for trial and facility spatial search.

  • Discogs uses Solr for keyword search and faceted browsing

  • Uptake Vacations: Uses Solr to search over 10,000,000 travel listings of different types including hotels, things to do, reviews, and opinions behind a ruby on rails front end.

  • Collex Collex @ NINES. Collex is a faceted browsing system which Solr behind a Ruby on Rails front-end, with some custom request handlers and caching.

  • ICT Vacatures a Belgian job board for IT and telecommunication jobs powered by Solr.

  • Intellog search, content contribution and sharing site for industrial communities (initially, oil & gas). Utilizes Solr implemented on Amazon Web Services.

  • Tokenizer: Shopping Price Engine - web crawler and search engine.

  • The tsr.ch vidéo player page uses a Solr index as a back-end for rich navigation features. There's some additional info on Bertrand's weblog

  • http://www.railshostinginfo.com/ uses Solr for its faceted search to find Ruby on Rails hosting.

  • University of Alberta Libraries Peel's Prairie Provinces uses Solr for full text search and faceted search.

  • University Laval library catalogue (1.8 M docs) http://ariane2.bibl.ulaval.ca

  • Hitflip uses Solr to allow visitors to search the content database. Currently (Feb. 2007) the content database contains over 400.000 items and handles an estimated 30k-50k searches per day. Solr runs on a Intel Xeon 5130 server, which has 2 GB of RAM dedicated to Solr. Solr system load is minimal.

  • Spiele Tauschen Change your Xbox 360 games, PlayStation games, GameCube games and more

  • Slando Russian Classifieds uses Solr for searching through classified ads in various Eastern European languages including Russian.

  • La Repubblica Newspaper - One of the main Italian Newspapers uses Solr for faceted browsing/filtering through classifieds (Annunci)

  • Instructables uses solr for faceted browsing and search. Logged in users get a nice solr-powered library interface.

  • Photosite uses Solr for site search.

  • Zvents uses Solr for search and faceted browsing of events & activities listings.

  • Versteigerung is an Free

  • PriceJunkie is a price comparison engine that uses solr to handle search and faceted browsing of merchant products.

  • reddit uses Solr for search

  • Russian Community Business Vote & Dating Music Community

  • UNT Libraries' Digital Collections uses Solr for search features

  • The Portal to Texas History uses Solr for search features

  • The Congressional Research Service Archive uses Solr for search features

  • Scintilla: search and MoreLikeThis (Drupal Solr module)

  • MAME Reviews: faceted search (Drupal Solr module)

  • Peel Sessions: search (Drupal Solr module)

  • Golfballs.com uses Solr for all site browsing functions

  • Howtoons.com uses a solr powered blog.

  • Nsyght uses Solr for search

  • ReviewGist is a product reviews search and comparison tool that uses Solr for faceted browsing and review search.

  • The Digital Commonwealth uses a solr based OAI-PMH engine to search digital collections within Massachusetts.

  • Cosmotourist uses Solr for search and faceted browsing.

  • Flix55 uses Solr to search user-uploaded videos

  • Hitflip uses Solr for search.

  • Hitmeister uses Solr for search.

  • homestars.com uses Solr to search 1.6 million companies.

  • Snooth uses Solr for search.

  • DollarDays uses Solr for product and CRM searches handling tens of thousands of requests per day against over 1.5 million records using Apache-Tomcat.

  • TVtrip use Solr for hotel search and faceted browsing

  • IS Inc. uses Solr for course search and browsing

  • Computer Shop German Top Computer shop

  • Well.ca uses Solr for product searches

  • Autoo.ro Romanian car sales aggregator. Uses Solr for search and faceted browsing.

  • Rez.ro Romanian real estate portal. Uses Solr for search and faceted browsing.

  • www.sladsonline.com uses Solr to index and search Sri Lankan advertisements published online.

  • The Dewey Browser was developed by the OCLC office of Research and uses Solr to provide resource discovery based on the Dewey Decimal Classification.

  • StyleFeeder: uses Solr for search

  • Avvo: uses Solr for search and faceted browsing.

  • Polyvore: uses Solr for faceted search and browsing of items, sets, and groups.

  • Sapo-Videos: uses Solr for content search and to suggest related videos.

  • Sapo-Blogs: uses Solr to search through a collection of 6.5 million documents.

  • Shopreflex use Solr for content search and faceted search

  • Staragora use Solr for content search and faceted search

  • http://dlib.nyu.edu/findingaids/ uses Solr to search among and within archival finding aids for NYU and N-YHS

  • bulDinle.com use Solr for music content search

  • hoppala German online shopping portal uses Solr for product search

  • Imoo Romanian real estate listings aggregator. Uses Solr for search and faceted browsing.

  • Wego Travel Search Engine, use Solr to index data for fast retrieval

  • Oshieru Manaberu Matchmaking site uses Solr to find teachers and students, each other.

  • Onmeda German health portal uses Solr for content search, including faceted searches. (~750.000 articles, ~9000 searches per day, JBOSS)

  • http://www.pagenstecher.de uses Solr do search usergenerated Content, more than 2 Mio documents and 5000 searches a day, using faceted search. Try it out

  • http://www.rallyformusic.com uses Solr for navigation and full site search.

  • Fitnessfootwear.com uses Solr for faceted browsing of Hunter, Merrell, Caterpillar. Dual CPU/2GB RAM.

  • http://www.machdudas.de uses Solr for location based classified search using geodata.

  • http://uaprom.net and http://ruprom.net Ukrainian and Russian B2B sites. Use Solr for search and faceted browsing of products, companies and buying leads.

  • http://www.kerevizli.com uses Solr to search music albums.

  • Cheaptickets.com Uses solr for faceted search. Searches all the hotel reviews for each location and provides the results to the user.

  • WebCity uses Solr for faceted browsing of restaurants, hotels, events...

  • shoe-shop.com uses Solr and SolrSharp for faceted category browsing on microsites such as http://www.adidastrainers.co.uk and http://www.fashiontrainers.co.uk to provide search by colour & size.

  • flug.idealo.de uses Solr for faceted browsing of airports, airlines and flight-offers and some searches (Ajax Suggest). Solr (with Jetty) runs on a 8 CPU (2 x E5450 @ 3.00GHz) machine with 32 GB of RAM, 16 are dedicated to Jetty/Solr. Actually (Oct 2008) we have between 200,000 and 500,000 Requests from faceted browsing (up to 15/sec) and 100,000 Searches per day. We have 15,000,000 Docs and faceted browsing is very fast.

  • Firm-UA.com uses Solr for site search

  • CiteSeerX uses Solr as the Index for full text, citation similarity search.

  • YAY Micro: uses Solr to search in over 500 000 images and vector files. Solr has given YAY one of the fastest searches in the stock image industry.

  • podcast.de: German podcasting portal uses Solr for channel, episode, user search and faceted browsing

  • http://www.airliners.net: Aviation photography website.

  • findthatfile.com: File based search engine for locating media on the Internet encompassing Web, FTP, Usenet and P2P resources including 43 file types and 200+ file extensions. Last count 300m records.

  • buddyfetch: Social network, IM, and dating people search engine (approx 70 million people)

  • Shopr.com uses Solr for keyword and faceted search of its users and user recommended products.

  • Poptent

  • Restaurant.nl is using Solr 1.3 for keyword and faceted search in 15.000 restaurants (and very happy with it).

  • ProvincieGroningen.nl is using Solr 1.4 for indexing documents, files and the website itself and it works very well.

  • Apontador.com a LBS-based website, using Solr to index data. Approximately 12 million of documents and 5.5 million of requests-per-day distributed on 4 servers.

  • ilocal.nl Company search engine for the Netherlands and Belgium. Includes location based search using Local Solr and result grouping using Field collapsing. Includes approximately 1.1 million companies and over 800k locations.

  • welke.nl Kitchen, Bathroom and other interior product search engine. Employs extensive facetation and illustrates how to combine Hippo CMS with Solr.

  • MacMall Uses Solr for search and faceted browsing.

  • MySecondhome Uses Solr for searching/faceting real estate in Europe

  • Bound to Stay Bound Books is using Solr to power search.

  • Kölner Weinkeller uses Solr for the product search, including facetted search

  • Gala.fr, Voici.fr and Cuisine-actuelle.fr, websites of the Prisma Presse group (Gruner & Jar), are using Solr to power search.

  • Bedriftsøket.no, Bedriftsøket.no is a Norwegian company catalogue using solr and SolrNet for all search/faceting functionality. The main index, of three in total, contains ~800k documents

  • DiVA - Academic Archive On-line DiVA is an institutional repository and archive used by 24 Scandinavien universities. Solr is used for searching and faceting. There are three different indexes which contain ~260k documents each.

  • HuriSearch uses Solr in order to provided a search site dedicated to Human Rights by indexing 5000 web sites

  • Saveur.com uses Solr to provide facetted search through recipes, ingredients, courses and occasions

  • Planetary Data System (PDS) uses Solr to search for dataset, mission, instrument, target, and host information

  • JobVille uses Solr for searching/faceting job offers crawled from various job posting sites in Italy.

  • VoucherCD uses Solr to search voucher codes and provide price comparison for hundreds of thousands of products from retailers in the UK.

  • IPC Media uses Solr for searching, faceting and 'super page' creation across all of its properties.

  • http://store.patriziapepe.com - uses Solr for Product Search - Powered by Svapna Technologies Srl, Italy

  • kaufDA is the leading German website for local shopping opportunities and presents its over 5 million users brochures of retailers of various sectors like supermarkets, discounters, furniture stores, health food stores and many more. It uses Solr for its various indexes for a search with faceted browsing, auto-completions, suggestions, a "did you mean" functionality, its localization and other features.

  • Observatory of Presidential Elections – Colombia 2010 uses Solr for searching/faceting (Aditional info on Kudos Ltda. Blog )

  • Trove service of the National Library of Australia

  • pointoo is a local search platform in Germany and serves millions of page requests using Solr for faceted browsing and search.



Other Solr users


Commonly used SearchComponents

NameDescription and example query
QueryComponent
Responsible for submitting the query to Lucene and returning the list of Documents.



http://localhost:8983/solr/select?&q=iPod&start=0&rows=10
FacetComponentDetermines the facets for the set of results.



http://localhost:8983/solr/select?&q=iPod&start=0&rows=10&facet=true&facet.field=inStock
MoreLikeThisComponentFor each search result, finds documents that are similar (i.e. "More Like This") to that result and return those results as well.

http://localhost:8983/solr/select?&q=iPod&start=0&rows=10&mlt=true&mlt.fl=features&mlt.count=1
HighlightComponentHighlights the location of query terms in the text of the search results.



http://localhost:8983/solr/select?&q=iPod&start=0&rows=10&hl=true&hl.fl=name
DebugComponentReturns information about how the query was parsed, as well as details on why each document scored the way it did.


http://localhost:8983/solr/select?&q=iPod&start=0&rows=10&debugQuery=true
SpellCheckComponentSpell checks the input query and provides possible alternatives, based on the contents of the index.

http://localhost:8983/solr/spellCheckCompRH?&q=iPood&start=0&rows=10&spellcheck=true&spellcheck.build=true



How get it More link in SOLR give me Example?

For each search result, finds documents that are similar (i.e. "More Like This") to that result and return those results as well.

http://localhost:8983/solr/select?&q=iPod&start=0&rows=10&mlt=true&mlt.fl=features&mlt.count=1

What us Facet Sample in SOLR?

FacetComponent Determines the facets for the set of results.

http://localhost:8080/solr/select?&q=iPod&start=0&rows=10&facet=true&facet.field=inStock

What is QueryComponent in SOLR?

http://localhost:8080/solr/select?&q=iPod&start=0&rows=10

Tuesday, April 20, 2010

SOLR Questions

How to Enable Highlighting in SOLR?

How to Add the filed in Highlighting in SOLR?
How to exact word Highlighting in SOLR?
How to increase the SOLR hilgighting character?
What is Maximum Charcter Analyzed solr?
How to Stop Particular word in SOLR?
How to search Case Sensitive in SOLR?
How to remove HTML Entity in SOLR?
How to add synonyms in SOLR?
What is Facet in SOLR?
How to do the Facet sytax in SOLR?


http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming
http://wiki.apache.org/solr/SimpleFacetParameters#Facet_Fields
http://wiki.apache.org/solr/HighlightingParameters#hl
http://wiki.apache.org/solr/SolrConfigXml

Monday, April 19, 2010

Directory Structure for SOLR

A Solr home directory contains Solr's configuration and data for a running Solr instance.Solr home directory when it contains either
a solr.xml file or if it contains both a
conf and a data directory,though strictly speaking these might not be the
actual requirements

The SOLR Folder and file Structure


bin: Suggested directory to place Solr replication scripts, if you have a more advanced setup.


conf: Configuration files. The two I mention below are very important, but it will also contain some other .txt and .xml files, which are referenced by these two files for different things such as special text analysis steps.

conf/schema.xml: This is the schema for the index including field type definitions with associated analyzer chains.

conf/solrconfig.xml: This is the primary Solr configuration file.

conf/xslt: This directory contains various XSLT files that can be used to transform Solr's XML query responses into formats such as Atom/RSS.

data: Contains the actual Lucene index data. It's binary data, so you won't be doing anything with it except perhaps deleting it occasionally.

lib: Optional placement of extra Java JAR files that Solr will load on startup, allowing you to externalize plugins from the Solr distribution (the WAR file) for convenience. If you extend Solr without modifying Solr itself, then those modifications can be deployed in a JAR file here.

difference in Databas and SOLR?

SOLR is single-table database without any support for relational queries (JOINs).and database may be in "third normal form" but the index will be completely de-normalized and contain mostly just the data needed to be searched.

* Substring Search versus Text Search

* Scored Results and Boosting

* Slow commits

What are the special Feature in SOLR

*HTTP request processing for indexing and querying documents.
*Several caches for faster query responses.

*A web-based administrative interface including:
*Runtime performance statistics including cache

*A query form to search the index.

*A schema browser with histograms of popular terms along with some statistics.


*Detailed breakdown of scoring mathematics and text
analysis phases.

*Configuration files for the schema and the server itself (in XML).

*Solr adds to Lucene's text analysis library and makes it configurable through XML.

*Introduces the notion of a field type (this is important yet surprisingly not in Lucene). Types are present for dates and special sorting concerns.

*The disjunction-max query handler is more usable by end user queries and applications than Lucene's underlying raw queries.

*Faceting of query results.

*A spell check plugin used for making alternative query suggestions (that is, "did you mean ___")

*A more like this plugin to list documents that are similar to a
*chosen document.

*A distributed Solr server model with supporting scripts to support larger scale deployments.

What is the special feature in SOLR?

standard ability to return a list of search results for some query, it has numerous other features such as result highlighting, faceted navigation (for example, the ones found on most e-commerce sites), query spell correction, auto-suggest queries, and "more like this" for finding similar documents

How it is work?

It is written in Java, and that language is used to further extend/modify Solr. However, being a server that communicates using standards such as HTTP and XML, knowledge of Java is very useful but not strictly a requirement

who will use SOLR?

CNet
Zappos
Netflix
and so many intranet sites

What Is SOLR?

Solr is an open source enterprise search server. It is a mature product powering.