{"title": "Text-Based Information Retrieval Using Exponentiated Gradient Descent", "book": "Advances in Neural Information Processing Systems", "page_first": 3, "page_last": 9, "abstract": null, "full_text": "Text-Based Information Retrieval  Using \n\nExponentiated  Gradient  Descent \n\nRon Papka, James P.  Callan, and Andrew  G.  Barto  * \n\nDepartment of Computer Science \n\nUniversity of Massachusetts \n\nAmherst,  MA  01003 \n\npapka@cs.umass.edu, callan@cs.umass.edu, barto@cs.umass.edu \n\nAbstract \n\nThe following  investigates  the  use  of single-neuron  learning  algo(cid:173)\nrithms  to improve  the  performance of text-retrieval  systems  that \naccept  natural-language  queries.  A  retrieval  process  is  explained \nthat transforms the natural-language query into the query syntax \nof a real retrieval system:  the initial query is expanded using statis(cid:173)\ntical and learning techniques and is then used for document ranking \nand binary classification.  The results  of experiments suggest that \nKivinen and Warmuth's Exponentiated Gradient Descent learning \nalgorithm works significantly better than previous approaches. \n\nIntroduction \n\n1 \nThe following  work explores two learning algorithms - Least Mean Squared (LMS) \n[1]  and  Exponentiated  Gradient  Descent  (EG)  [2]  - in  the  context  of text-based \nInformation Retrieval  (IR)  systems.  The experiments  presented in  [3]  use  connec(cid:173)\ntionist learning models to improve the retrieval of relevant documents from  a  large \ncollection of text.  Here, we  present further  analysis of those experiments.  Previous \nwork in the area employs various techniques for  improving retrieval  [6,  7,  14].  The \nexperiments  presented  here  show  that  EG  works  significantly  better  than  widely \nused ad hoc methods for  finding  a  good set of query term weights. \nThe  retrieval  processes  being  considered  operate on  a  collection  of documents,  a \nnatural-language  query,  and  a  training  set  of documents  judged  relevant  or  non(cid:173)\nrelevant  to  the  query.  The  query  may  be,  for  example,  the  information  request \nsubmitted through a  web-search engine,  or through the interface of a  system with \n\nThis material is based on work supported by the National Science Foundation, Library \nof Congress,  and  Department  of Commerce  under cooperative  agreement  number EEC-\n9209623.  Any  opinions,  findings  and  conclusions  or  recommendations  expressed  in  this \nmaterial are those of the author and do  not  necessarily  reflect  those of the sponsor. \n\n\f4 \n\nR.  Papka, J. P.  Callan and A. G.  Barto \n\ndomain-specific information such as  legal,  governmental, or news  data maintained \nas a  collection of text.  The query, expressed as complete or incomplete sentences, is \nmodified through a learning process that incorporates the terms in the test collection \nthat  are  important for  improving retrieval  performance.  The  resulting  query can \nthen be used against collections similar in domain to the training collection. \n\nNatural language query: \nAn  insider-trading  case. \n\nIR system query using default  weights: \n#WSUM(  1.0  An  1.0  insider  1.0  trading  1.0  case  ); \n\nAfter stop  word and stemming process: \n#WSUM(  1.0  insid  1.0  trade  1.0  case  )j \n\nAfter Expansion and learning new weights: \n#WSUM(  0.181284  insid  0.045721  trade  0.016127  case  0.088143  boesk \n0.000001  ivan  0.026762  sec  0.052081  guilt  0 . 074493  drexel  0.000001  plead \n0.003834  fraud  0.091436  takeov  0.018636  lavyer  0.000000  crimin  0.137799 \nalleg  0.057393  attorney  0.155781  charg  0.024237  scandal  0.000000  burnham \n0.000000  lambert  0.026270  investig  0.000000  vall  0.000000  firm  0.000000 \nilleg  0.000000  indict  0.000000  prosecutor  0.000000  profit  0.000000  ); \n\nFigure 1:  Query Transformation Process. \n\nThe  query  transformation  process  is  illustrated  in  Figure  1.  First,  the  natural(cid:173)\nlanguage  query  is  transformed  into  one  which  can  be  used  by  the  query-parsing \nmechanism of the IR system.  The weights associated with each term are assigned a \ndefault value of 1.0, implying that each term is equally important in discriminating \nrelevant documents.  The query then undergoes  a  stopping  and  stemming  process, \nby which morphological stemming and the elimination of very common words, called \nstopwords, increases both the effectiveness and efficiency of a  system [9].  The query \nis subsequently expanded using a statistical term-expansion process producing terms \nfrom  the  training  set  of  documents.  Finally,  a  learning  algorithm  is  invoked  to \nproduce new weights for  the expanded query. \n\n2  Retrieval Process \nText-based  information  retrieval  systems  allow  the  user  to pose  a  query  to a  col(cid:173)\nlection  or  a  stream  of  documents.  When  a  query  q  is  presented  to  a  collection \nc,  each  document  dEc is  examined  and  assigned  a  value  relative  to  how  well  d \nsatisfies  the  semantics  of the  request  posed  by  q.  For  any  instance  of the  triple \n<  q, d,c  >,  the  system  determines  an  evaluation  value  attributed  to  d  using  the \nfunction  eval(q, d, c) . \nThe evaluation function  eval(q, d, c)  =  L:t~:i;idi \n\nwas  used for  this work,  and is  based on an implementation of INQUERY [8].  It is \nassumed that q  and d are vectors of real numbers, and that c contains precomputed \ncollection statistics in addition to the current set of documents.  Since the collection \nmay change over time, it may be necessary to change the query representation over \ntime;  however,  in  what follows  the training collection is  assumed to be static,  and \nsuccessful learning implies that the resulting query generalizes to similar collections. \nAn  IR  system  can  perform  several  kinds  of retrieval  tasks.  This  work  is  specif(cid:173)\nically  concerned  with  two  retrieval  processes:  document  ranking  and  document \nclassification.  A  ranking of documents  based on query  q  is  achieved by  sorting all \ndocuments in a  collection  by eval1,1ation  value.  Binary classification is  achieved by \ndetermining  a  threshold  ()  such  that for  class  R,  eval (q, d, c)  ~ ()  -+  d  E  R,  and \n\n\fText-Based Information Retrieval Using Exponentiated Gradient Descent \n\n5 \n\neval (q, d, c)  < ()  --+  d E  R, so that R is the set of documents from the collection that \nare classified as relevant to the query,  and R is the set classified as  non-relevant. \nCentral  to  any  IR  system  is  a  parsing  process  used  for  documents  and  queries, \nwhich  produces tokens called  terms.  The terms derived from  a  document  are  used \nto  build  an \ninverted  list  structure  which  serves  as  an  index  to  the  collection. \nThe natural-language query is  also  parsed into a  set  of terms.  Research-based IR \nsystems  such  as  INQUERY,  OKAPI  [111,  and  SMART  [5],  assume  that  the  co(cid:173)\noccurrence  of a  term  in  a  query  and  a  document  indicates  that  the  document  is \nrelevant to the query to some degree, and that a query with multiple terms requires \na  mechanism by  which  to combine the evidence each  co-occurrence contributes to \nthe document's degree of relevance  to the query.  The document representation for \nsuch systems is  a  vector, each element of which is associated with a  unique term in \nthe document.  The values in the vector are produced by a  term-evaluation function \ncomprised of a  t.erm  frequency component, tf, and an inverse document frequency \ncomponent, idj, which are described in  [8,  11].  The tf component causes the term(cid:173)\nevaluation value to increase as a query-term's occurrence in the document increases, \nand the idj component causes the term-evaluation value to decrease as the number \nof documents in the collection in which the term occurs increases. \n3  Query Expansion \nThough  it  is  possible  to  learn  weights  for  terms  in  the  original  query,  better  re(cid:173)\nsults  are  obtained  by  first  expanding  the  query  with  additional  terms  that  can \ncontribute  to  identifying  relevant  documents,  and  then  learning  the  weights  for \nthe expanded query.  The optimal number of terms by  which to expand a  query is \ndomain-dependent, and query expansion can be performed using several techniques, \nincluding  thesaurus expansion and  statistical  methods  [12].  The query  expansion \nprocess  performed  in  this  work  is  a  two-step  process:  term  selection  followed  by \nweight  assignment.  The  term  selection  process  ranks  all  terms  found  in  relevant \ndocuments by  an information metric described in  [8].  The top n  terms are used in \nthe expanded query.  The experiments in this work used values of 50 and 1000 for n. \nThe most  common  technique for  weight  assi~ment is  derived from  a  closed-form \nfunction  originally  presented  by  Rocchio  in  l6],  but  our  experiments  show  that  a \nsingle-neuron learning approach is  more effective. \n3.1  Rocchio Weights \nWe  assume that the terms of the original query are stored in  a  vector t,  and that \ntheir  associated  weights  are  stored  in  q.  Assuming  that  the  new  terms  in  the \nexpanded query are stored t', the weights for  q'  can be determined using a  method \noriginally developed  by Rocchio  that has  been improved  upon in  [7,  8].  Using  the \nnotation  presented  above,  the weight  assignment  can be  represented  in  the linear \nform:  q' =  Ci* j(t) + /hr(t', R q , c) +\"I*nr(t', Rq , c), where  j  is a function operating \non the terms in the original query,  r  is  a  function operating on the term statistics \navailable  from  the  training  set  of relevant  documents  (Rq),  and  nr  is  a  function \noperating on the  statistics from  the  non-relevant  documents  (Rq ).  The values  for \nCi,  (3,  and \"I  have been the focus of many IR experiments, and 1.0, 2.0, and 0.5, have \nbeen found to work well with various implementations of the functions  j, r, and nr \n[7]. \n3.2  LMS and EG \nIn the experiments that follow,  LMS and EG were used to learn query term weights. \nBoth algorithms were used in a training process attempting to learn the association \nbetween  the set of training instances  t documents)  and their  corresponding binary \nclassifications  (relevant  or  non-relevant).  A  set  of weights  tV  is  updated  given  an \ninput instance x and a target binary classification value y.  The algorithms learn the \nassociation between x and y perfectly if tV\u00b7 x =  y,  otherwise the value (y - tV\u00b7 x) is \nthe error or loss incurred.  The task of the learning algorithm is  to learn the values \nof tV  for  more than one instance of X. \nThe update rule for  LMS is  tVt+l  =  tVt + Tt,  where it =  -21Jt(tVt' Xt  - Yt)Xt,  where \nIS  Wt+l,i  = \"N \"  ;:  _, were \nthe step-SIze  1Jt  =  x .x  . \n\ne  up  ate  ru e  or \n\nI  \u00a3  EG' \n\nuh  - eFt,; \n\nh \n\n-. \n\n. \n\nTh \n\nd \n\n1 \nt \n\nt \n\n~j=l Wt ,; e  t\" \n\n\f6 \n\nR. Papka, J.  P.  Callan and A. G. Barto \n\nr  . - -2\" (w  . X  - y)x  .  and\"  -\n'It  -\nt,t  -\n\nt ,t, \n\n'It \n\nt \n\nt \n\nt \n\n3(maxi( Xt ,i)-mini(Xt,i\u00bb' \n\n2 \n\nThere are several fundamental  differences  between  LMS  and EG;  the most  salient \nis  that  EG  has  a  multiplicative  exponential  update  rule,  while  LMS  is  additive. \nA  less  obvious  difference  is  the derivation of these two  update  rules.  Kivinen  and \nWarmuth  [2]  show that both rules  are  approximately  derivable  from  an optimiza(cid:173)\ntion  task  that  minimizes  the  linear  combination  of  a  distance  and  a  loss  func(cid:173)\ntion:  distance (Wt+1 ,Wt) + 1Jtloss(Yt, Wt  . Xt).  But the  distance  component for  the \nderivation  leading  to  the  LMS  update  rule  uses  the  squared  Euclidean  distance \nIlwt+1 - wtll~, while the derivation leading to the EG update rule uses  relative en-\ntropy  or  l:~1 Wt+1,i In W~:,l:i .  Entropy  metrics  had  previously  been  used  as  the \nloss component [4] . \nOne purpose of Kivinen and Warmuth's work was to describe loss  bounds for  these \nalgorithms;  however,  they  also  observed  that  EG  suffers  significantly  less  from  ir(cid:173)\nrelevant attributes than does  LMS.  This hypothesis  was  tested in  the experiments \nconducted for this work. \n4  Experiments \nExperiments  were  conducted  on  100  natural-language  queries.  The  queries  were \nmanually  transformed  into  INQUERY  syntax,  expanded  using  a  statistical  tech(cid:173)\nnique described in [8],  and then given a  weight assignment as a  result of a learning \nprocess,  One  set  of experiments  expanded  each  query  by  50  terms  and  another \nset of experiments expanded each  query by 1000 terms.  The purpose of the  latter \nwas to test the ability of each algorithm to learn in the presence of many irrelevant \nattributes. \n4.1  Data \nThe  queries  used  are the  description  fields  of information  requests  developed  for \nText  Retrieval  Conferences  (TREC)  [10] .  The first  set  of queries  was  taken  from \nTREC  topics  51-100  and  the  second  set  from  topics  101-150,  for  a  total  of  100 \nqueries.  After  stopping  and  stemming,  the  average  number  of terms  remaining \nbefore expansion was 8.34 terms. \nTraining and testing for  all queries was conducted on subsets of the Tipster collec(cid:173)\ntion,  which  currently  contains  3.4  gigabytes of text,  including  206,201  documents \nwhose  relevance  to the TREC  topics  has  been  evaluated.  The collection  is  parti(cid:173)\ntioned  into  3  volumes.  The judged  documents  from  volumes  1  and  2  were  used \nfor  training, while the documents from  volume 3 were  used for  testing.  Volumes  1 \nand  2 contain  741,856 documents from  the Associated  Press(1988-9),  Department \nof  Energy  abstract,  Federal  Register(1988-9),  Wall  Street  Journal(1987 -91),  and \nZiff-Davis  Computer-select  articles.  Volume  3  contains  336,310  documents  from \nAssociated Press(1990), San Jose Mercury News(1991),  and Ziff-Davis  articles. \nOnly  a  subset  of the  data  for  the  TREC-Tipster  environment  has  been  judged. \nBinary judgments are assessed by humans for  the top few  thousand documents that \nwere retrieved for each query by participating systems from various commercial and \nresearch institutions.  Based on the judged documents  available for  volumes  1 and \n2,  on average 280  relevant documents and 1236 non-relevant documents  were  used \nto train each query. \n4.2  Training Parameters \nRocchio weights were assigned based on coefficients described in  Section 3.1.  LMS \nand EG  update rules  were  applied  using  100,000 random presentations of training \ninstances.  It  was  empirically  determined  that  this  number  of presentations  was \nsufficient  to  allow  both  learning algorithms  to produce better  query  weights  than \nthe Rocchio assignment based on performance metrics calculated using the training \ninstances. \nIn  reality,  of course,  the  number  of documents  that  will  be  relevant  to  a  partic(cid:173)\nular  query  is  much  smaller  than the number  of documents  that  are  non-relevant. \nThis  property  gives  rise  to the  question  of what  is  an  appropriate  sampling  bias \n\n\fText-Based Information Retrieval Using Exponentiated Gradient Descent \n\n7 \n\nof training instances,  considering that  the  ratio of relevant  to  non-relevant  docu(cid:173)\nments approaches 0 in the limit.  In the following experiments, LMS benefitted from \nuniform  random  sampling from  the  set  of training instances,  while  EG  benefitted \nfrom a  balanced sampling, that is  uniform random sampling from  relevant training \ninstances on even iterations and from non-relevant instances on odd iterations. \nA  pocketing  technique  was  applied  to  the  learning  algorithms  [131.  The  purpose \nof this  technique  is  to find  a  set  of weights  that  optimize  a  specilic  user's  utility \nfunction.  In the  following  experiments,  weights  were  tested  every  1000  iterations \nusing a  recall and precision performance metric.  If a  set of weights produced a  new \nperformance-metric maximum, it was saved.  The last set saved was assumed to be \nthe result of the algorithm, and was used for  testing. \nA binary classification value pair (A, B) is supplied as the target for training, where \nA  is  the  classification  value  for  relevant  documents,  and  B  is  the  classification \nvalue for  non-relevant documents.  Using  the  standard classification  value  pair  (1, \n0),  INQUERY's  document  representation  inhibits  learning  due  to  the  large error \ncaused  by these  unattainable values.  Therefore,  testing  was  done  and  resulted  in \nthe observation that .4 was  the lowest attainable evaluation  value for  a  document, \nand  .47  appeared  to  be  a  good  classification  value  for  relevant  documents.  The \nclassification value pair used for  both the  LMS  and EG  algorithms  was  thus  (.47, \n.40). \n\n4.3  Evaluation \nIn the experiments that follow,  R-Precision (RP) was used to evaluate ranking per(cid:173)\nformance,  and  a  new  metric,  Lower  Bound  Accuracy  (LBA)  was  used  to evaluate \nclassification.  Both metrics  make use of recall  and precision,  which  are defined  as \nfollows:  Assume  there exists  a  set of documents  sorted by evaluation value  and  a \nprocess  that  has  performed  classification,  and  that  a  =  number  of relevant  doc(cid:173)\numents  classified  as  relevant,  b =  number of non-relevant  documents  classified  as \nrelevant,  c  =  number  of relevant  documents  classified  as  non-relevant,  and  d  = \nnumber of non-relevant documents  classified  as  non-relevant;  then,  Recall  =  a~c' \nand  Precision =  a~b  [3]. \n\nPrecision  and  recall  can  be  calculated  at  any  cut-off  point  in  the  sorted  list  of \ndocuments.  R-Precision  is  calculated  using  the top  n  documents,  where  n  is  the \nnumber of relevant training documents available for  a  query. \nLower Bound Accuracy (LBA) is a metric that assumes the minimum of a classifier's \naccuracy with respect to relevant documents and its accuracy with respect to non-\nrelevant documents.  It is  defined  as  LBA =  min(a~c' btd)'  An LBA value can be \ninterpreted as the lower bound of the percent of instances a  classifier will  correctly \nclassify,  regardless of an imbalance between the actual number of relevant and non(cid:173)\nrelevant  documents.  This  metric  requires  a  threshold  e.  The  threshold  is  taken \nto be the evaluation value of the  document  at a  cut-off point  in  the sorted list  of \ntraining documents where LBA is  maximized.  Hence, e =  maXi (LBA(di , Rq, Rq\u00bb, \nwhere di  is  the ith document in the sorted list. \n\n4.4  Results \n\nQuery type  RP  LBA \n88.6 \nNL \n92.0 \nEXP \n94.0 \nROC \n89.8 \nLMS \n95.1 \nEG \n\n22.0 \n28.7 \n33.4 \n32.5 \n40.3 \n\nTable 1:  Query expansion by 50  terms \n\n\f8 \n\nR.  Papka, J.  P.  Callan and A. G.  Barto \n\nThe following  results  show  the  ability  of a  query weight  assignment  to generalize. \nThe  weights  are  derived  from  a  subset  of the  training  collection,  and  the  values \nreported  are  based  on  performance  on  the  test  collection.  The  results  of the  50-\nterm-expansion experiments are listed in Table 1 1.  They indicate that the expanded \nquery has an advantage over the original query, and that the EG-trained query gen(cid:173)\neralized  better  than  the  other  algorithms,  while  Rocchio  appears  to  be  the  next \nbest.  In terms of ranking, EG gives rise to a 20% improvement over the Rocchio as(cid:173)\nsignment, and realizes 1.2% improvement in terms of classification.  This apparently \nslight improvement in  classification in fact  implies that EG  is  correctly classifying \nat least 3000 documents more than the other approaches. \nTable 2 shows a  cross-algorithm analysis in which any two algorithms can be com(cid:173)\npared.  The  analysis  is  calculated  using  both  RP  and  LBA  over  all  queries.  An \nentry for  row i  column j  indicates the number of queries for which the performance \nof algorithm i  was  better than algorithm j.  Based on sign  tests with  0:  =  .01,  the \nresults confirm that EG significantly generalized better than the other algorithms. 2 \n\nNL \n-\n\nQuery type \nNL \nEXP \nROC \nLMS \nEG \n\nEG \n\n12 - 13 \n11  - 19 \n17 - 37 \n13 - 15 \n\nROC \n\nQuery counts:  RP-LBA \nEXP \nLMS \n30 -37  18 - 13  24 - 53 \n35 - 66 \n53 - 73 \n\n72  -79 \n\n-\n\n9 - 17 \n\n60 - 62 \n71- 86 \n66 - 46  54 -34  38 - 26 \n70 - 62 \n79 - 85 \n\n80 -80 \n\n-\n\n-\n\n74 - 84 \n\n-\n\nTable 2:  Cross Algorithm Analysis over 100 queries expanded by 50  terms. \n\nAs  explained in Section 4.3, the thresholds used to calculate the LBA  performance \nmetric are determined by obtaining an evaluation value in the training data corre(cid:173)\nsponding to the cut-off point  where  LBA  was  maximized.  The threshold analysis \nin Table 3 shows the best attainable classification performance against performance \nactually  achieved.  The  results  indicate  that  there  is  still  room for  improvement; \nhowever, they also indicate that this methodology is  acceptable. \nThe  results  for  queries  expanded  by  1000  terms  are  listed  in  Table  4.  Since  the \naverage  document  length  in  the  Tipster  collection  is  806  terms  (non-unique),  at \nleast 20% of the terms in the expanded query are generally irrelevant to a particular \ndocument.  The results indicate that irrelevant attributes prevent all  but EG from \ngeneralizing  well.  Comparing the  performance  of EG  and  LMS  adds  evidence  to \nthe  Kivinen-Warmuth hypothesis  that  EG  yields  a  smaller  loss  than  LMS,  given \nmany irrelevant attributes.  Juxtaposing the results of the 50-term and 1000-term(cid:173)\nexpansion experiments suggests that using a statistical filter for selecting the top few \nterms is  better than expanding the query  by many terms  and  having the learning \nalgorithm perform term selection. \n5  Conclusion \nThe experiment results presented here provide evidence that single-neuron learning \nalgorithms can be used  to improve retrieval performance in  IR systems.  Based on \nperformance  metrics  that  test  the  quality  of a  classification  process  and  a  docu(cid:173)\nment  ranking  process,  the weights  produced  by  EG  were  consistently  better than \npreviously available methods. \n\nlR-Precision  (RP)  and Lower  Bound Accuracy  (LBA)  performance values are normal(cid:173)\n\nized to a  0-100 scale.  Values are reported for:  NL =  original natural language query;  EXP \n=  expanded query with weights set to 1.0;  ROC =  expanded query with weights based on \nRocchio  assignment;  LMS  = expanded query  with  weights  based on  LMS  learning;  and \nEG= expanded query with weights based on EG learning. \n\n2Recent experiments using the optimization algorithm DFO  (presented in  [7])  suggest \n\nthat certain parameter settings make it competitive with  EG. \n\n\fText-Based Information Retrieval Using Exponentiated Gradient Descent \n\n9 \n\nI Query type  I Potential LBA  I Actual LBA  I \nNL \nEXP \nROC \nLMS \nEG \n\n91.9 \n95.5 \n96.7 \n92.6 \n97.1 \n\n88.6 \n92.0 \n94.0 \n89.8 \n95.1 \n\nTable 3:  Threshold Analysis:  Query expansion by 50  terms. \n\nI Query type  I RP  I LBA  I \nNL \nEXP \nROC \nLMS \nEG \n\n22.0 \n14.4 \n19.7 \n20.4 \n35.0 \n\n88.6 \n76.5 \n82.5 \n86.7 \n93.2 \n\nTable 4:  Query expansion by 1000 terms. \n\nReferences \n\n[1]  B. Widrow and M. Hoff,  \"Adaptive switching circuits\", In 1960 IRE WESCON \n\nConvention Record,  pp.  96-104,  New  York,  1960. \n\n[2]  J.  Kivinen,  Manfred  Wartmuth,  \"Exponentiated  Gradient  Versus  Gradient \nDescent for  Linear Predictors\", UCSC  Tech  report:  UCSC-CRL-94-16,  June \n21,  1994. \n\n[3]  D.  Lewis,  R.  Schapire,  J.  Callan,  and  R.  Papka,  \"Thaining  Algorithms  for \n\nLinear Text  Classifiers\", Proceeding of SIGIR 1996. \n\n[4]  B.S.  Wittner  and  J.S.  Denker,  \"Strategies  for  Teaching  Layered  Networks \n\nClassification Tasks\",  NIPS proceedings,  1987. \n\n[5]  G. Salton,  \"Relevance Feedback and optimization of retrieval effectiveness.  In \nThe Smart system - experiments in automatic document processing\" , 324-336. \nEnglewood Cliffs,  NJ:  Prentice Hall Inc.,  1971. \n\n[6]  J.J. Rocchio,  \"Relevance Feedback in Information Retrieval in The Smart Sys(cid:173)\n\ntem - Experiments in Automatic document  processing\", 313-323.  Englewood \nCliffs,  NJ:  Prentice Hall Inc.,  1971. \n\n[7]  C.  Buckley  and  G.  Salton,  \"Optimization of Relevance  Feedback  Weights\", \n\nProceeding of SIGIR 95  Seattle WA,  1995. \n\n[8]  J.  Allan,  L.  Ballesteros,  J.  Callan,  W.B.  Croft,  and  Z.  Lu,  \"Recent  Experi(cid:173)\n\nments with Inquery\", TREC-4 Proceedings, 1995. \n\n[9]  M.  Porter,  \"An Algorithm for Suffix Stripping\", Program, Vol  14(3), pp.  130-\n\n137,1980. \n\n[10]  D.  Harman,  Proceedings of Text REtrievl Conferences  (TREC),  1993-5. \n[11]  S.E.  Robertson,  W.  Walker,  S.  Jones,  M.M.  Hancock-Beaulieu,  and \n\nM.Gatford,  \"Okapi at TREC-3\" , TREC-3 Proceedings,  1994. \n\n[12]  G.  Salton,  Automatic  Text  Processing,  Addison-Wesley  Publishing Co,  Mas(cid:173)\n\nsachusetts,  1989. \n\n[13]  S.I. Gallant,  \"Optimal Linear Discrimants\", Proceedings ofInternational Con(cid:173)\n\nference  on Pattern Recognition,  1986. \n\n[14]  B.T.  Bartell,  \"Optimizing Ranking Functions:  A Connectionist Approach to \n\nAdaptive Information Retrieval\" , Ph.D.  Theis,  UCSD  1994. \n\n\f", "award": [], "sourceid": 1186, "authors": [{"given_name": "Ron", "family_name": "Papka", "institution": null}, {"given_name": "James", "family_name": "Callan", "institution": null}, {"given_name": "Andrew", "family_name": "Barto", "institution": null}]}