{"title": "Modeling Conversational Dynamics as a Mixed-Memory Markov Process", "book": "Advances in Neural Information Processing Systems", "page_first": 281, "page_last": 288, "abstract": null, "full_text": "Modeling  Conversational Dynamics  as  a \n\nMixed-Memory  Markov Process \n\nTanzeem Choudhury \n\nIntel Research \n\ntanzeem.choudhury@intel.com \n\nSumit Basu \n\nMicrosoft Research \n\nsumitb@microsoft.com \n\nAbstract \n\ninfluences \n\nIn  this  work,  we  quantitatively  investigate  the  ways  in  which  a \ngiven  person \nthe  joint  turn-taking  behavior  in  a \nconversation.  After  collecting  an  auditory  database  of  social \ninteractions  among  a  group  of twenty-three  people  via  wearable \nsensors  (66  hours  of data  each  over  two  weeks),  we  apply  speech \nand conversation detection methods to  the auditory streams.  These \nmethods  automatically  locate  the  conversations,  determine  their \nparticipants,  and  mark  which  participant  was  speaking  when.  We \nthen  model  the  joint  turn-taking  behavior  as  a  Mixed-Memory \nMarkov  Model  [1]  that  combines  the  statistics  of the  individual \nsubjects'  self-transitions  and  the  partners '  cross-transitions.  The \nmixture parameters in  this  model describe how much each person's \nindividual  behavior contributes  to  the joint turn-taking behavior of \nthe  pair.  By  estimating  these  parameters,  we  thus  estimate  how \nmuch  influence  each participant  has  in  determining  the  joint turn(cid:173)\nthis  measure  correlates \ntaking  behavior.  We \nsignificantly  with  betweenness  centrality  [2],  an \nindependent \nmeasure  of an  individual's  importance  in  a  social  network.  This \nresult  suggests  that  our  estimate  of  conversational  influence  is \npredictive of social influence. \n\nshow  how \n\n1 \n\nIntroduction \n\nPeople's  relationships  are  largely  determined  by  their  social  interactions,  and  the \nnature  of their conversations plays a  large  part in defining those  interactions.  There \nis  a  long  history  of  work  in  the  social  sciences  aimed  at  understanding  the \ninteractions  between  individuals  and  the  influences  they  have  on  each  others' \nbehavior.  However,  existing studies of social  network interactions have either been \nrestricted  to  online  communities,  where  unambiguous  measurements  about  how \npeople  interact  can  be  obtained,  or  have  been  forced  to  rely  on  questionnaires  or \ndiaries  to  get  data  on  face-to-face  interactions.  Survey-based  methods  are  error \nprone  and  impractical  to  scale  up.  Studies  show  that  self-reports  correspond poorly \nto  communication behavior as  recorded by independent observers [3]. \n\n\fIn  contrast,  we  have  used  wearable  sensors  and  recent  advances  in  speech \nprocessing  techniques  to  automatically  gather  information  about  conversations: \nwhen they occurred, who  was  involved, and  who  was  speaking when.  Our goal  was \nthen to  see if we  could examine the  influence a given  speaker had on the  turn-taking \nbehavior of her conversational  partners.  Specifically, we  wanted to  see if we  could \nbetter explain  the  turn-taking  transitions  observed  in  a  given  conversation  between \nsubjects  i  and} by combining  the  transitions  typical  to  i  and  those  typical  toj.  We \ncould then interpret the  contribution from  i  as  her influence  on the joint turn-taking \nbehavior. \n\nIn this paper, we  first  describe how we  extract speech and conversation information \nfrom  the  raw  sensor data, and how  we  can use  this  to  estimate the  underlying  social \nnetwork.  We  then  detail  how  we  use  a  Mixed-Memory Markov Model  to  combine \nthe  individuals '  statistics.  Finally,  we  show  the  performance  of our  method  on  our \ncollected data and how it correlates well  with other metrics of social  influence. \n\n2  Sensing and Modeling Face-to-face Communication Networks \nAlthough  people  heavily  rely  on  email,  telephone,  and  other  virtual  means  of \ncommunication,  high  complexity  information  is  primarily  exchanged  through  face-to(cid:173)\nface  interaction  [4].  Prior work  on  sensing  face-to-face  networks  have  been  based  on \nproximity measures  [5],[6],  a weak approximation of the  actual  communication network. \nOur  focus  is  to  model  the  network  based  on  conversations  that  take  place  within  a \ncommunity. To do this, we need to gather data from real-world interactions. \n\nWe thus used an experiment conducted at MIT [7]  in which 23  people agreed to  wear the \nsociometer,  a  wearable  data  acquisition  board  [7],[8].  The  device  stored  audio \ninformation  from  a  single  microphone  at  8 KHz.  During  the  experiment the  users  wore \nthe  device  both  indoors  and  outdoors  for  six  hours  a  day  for  11  days.  The  participants \nwere  a  mix  of students,  facuity,  and  administrative  support  staff who  were  distributed \nacross different floors of a laboratory building and across different research groups. \n\n3  Speech and Conversation Detection \n\nGiven  the  set  of auditory  streams  of each  subject,  we  now  have  the  problem  of \ndetecting  who  is  speaking  when  and  to  whom  they  are  speaking.  We  break  this \nproblem into two parts:  voicing/speech detection and conversation detection. \n\n3.1  Voicing  and  Speech  Detection \n\nTo  detect  the  speech,  we  use  the  linked-HMM  model  for  VOlClllg  and  speech \ndetection  presented  in  [9].  This  structure  models  the  speech  as  two  layers  (see \nFigure  1); the  lower level  hidden  state represents whether the  current frame  of audio \nis  voiced or unvoiced (i.e.,  whether the  audio  in  the  frame  has  a harmonic  structure, \nas  in a vowel),  while the  second level represents whether we  are  in a  speech or non(cid:173)\nspeech segment.  The principle behind the  model is  that while there are  many voiced \nsounds  in  our  environment  (car  horns,  tones,  computer  sounds,  etc.),  the  dynamics \nof voiced/unvoiced  transitions  provide  a  unique  signature  for  human  speech;  the \nhigher  level  is  able  to  capture  this  dynamics  since  the  lower  level 's transitions  are \ndependent on this  variable. \n\n\fspeech layer (S[t)  =  {O, I}) \n\nvoicing layer (V[t)  =  {O,l}) \n\nobservation layer (3  features) \n\nFigure 1:  Graphical model for  the voicing and speech  detector. \n\nTo  apply  this  model  to  data,  the  8  kHz  audio  is  split  into  256-sample  frames  (32 \nmilliseconds)  with  a  128-sample  overlap.  Three  features  are  then  computed:  the \nnon-initial  maximum  of the  noisy  autocorrelation,  the  number  of autocorrelation \npeaks,  and  the  spectral  entropy.  The  features  were  modeled  as  a  Gaussian  with \ndiagonal  covariance.  The  model  was  then  trained  on  8000  frames  of fully  labeled \ndata.  We  chose  this  model  because  of its  robustness  to  noise  and  distance  from  the \nmicrophone :  even  at  20  feet  away  more  than  90%  of voiced  frames  were  detected \nwith  negligible false  alarms  (see  [9]). \n\nThe  results  from  this  model  are  the  binary  sequences  v[t}  and  s[t}  signifying \nwhether the  frame  is  voiced  and  whether  it  is  in  a  speech segment  for  all  frames  of \nthe audio. \n\n3.2  Conversation  Detection \n\nOnce  the  voicing  and  speech  segments  are  identified,  we  are  sti II  left  with  the \nproblem  of determining  who  was  talking  with  whom  and  when.  To  approach  this, \nwe  use  the  method  of conversation  detection  described  in  [10].  The  basic  idea  is \nsimple:  since the  speech  detection  method described  above  is  robust to  distance,  the \nvoicing segments  v[t}  of all the participants in the  conversation will be picked up  by \nthe  detector  in  all  of the  streams  (this  is  referred  to  as  a  \"mixed  stream\"  in  [10]). \nWe  can  then  examine  the  mutual  information  of  the  binary  voicing  estimates \nbetween  each  person  as  a  matching  measure.  Since  both  voicing  streams  will  be \nnearly  identical,  the  mutual  information  should  peak  when  the  two  participants  are \neither  involved  in  a  conversation  or  are  overhearing  a  conversation  from  a  nearby \ngroup.  However,  we  have the added complication that the streams are only roughly \naligned  in  time.  Thus,  we  also  need  to  consider a  range  of time  shifts  between  the \nstreams.  We  can express the alignment measure  a[k]  for  an  offset of k  between the \ntwo  voicing streams  as  follows: \n\np(v,[t]=i,v, [t-l]=j) \na[k] = l(vJt], v, [t - k]) = L.\" p(vJt] = i,  v, [t - k] = j) log --.:...--'--'-'~----=-=---=---....::...:....-\np(vJt]=i)p(v, [t-k]=j) \n\n\" \n\ni.j \n\nwhere  i  and j  take  on  values  {O,  l}  for  unvoiced  and  voiced  states  respectively. \nThe distributions  for  p(v\\, vJ  and its  marginals are estimated over a window of one \nminute  (T=3750  frames).  To  see  how  well  this  measure  performs,  we  examine  an \nexample  pair of subjects  who  had  one  five-minute  conversation  over  the  course  of \nhalf an hour.  The  streams  are  correctly aligned at  k=0,  and by examining the  value \nof ark}  over  a  large  range  we  can  investigate  its  utility  for  conversation  detection \nand for aligning the auditory streams (see Figure 2). \n\nThe peaks  are  both strong  and unique  to  the  correct alignment (k=0),  implying  that \nthis  is  indeed  a  good  measure  for  detecting  conversations  and  aligning  the  audio  in \nour setup.  By  choosing  the  optimal  threshold  via  the  ROC  curve,  we  can  achieve \n100% detection with no  false  alarms using time  windows T of one minute. \n\n\fFigure 2:  Values of ark] over ranges:  1.6 seconds, 2.5  minutes, and  11  minutes. \n\nFor each  minute  of data  in  each  speaker' s  stream,  we  computed ark]  for  k  ranging \nover  +/- 30  seconds  with  T=3750  for  each  of the  other  22  subjects  in  the  study. \nWhile  we  can  now  be  confident  that  this  will  detect  most  of  the  conversations \nbetween  the  subjects,  since  the  speech  segments  from  all  the  participants  are  being \npicked  up  by all  of their microphones  (and  those  of others  within  earshot),  there  is \nstill  the  problem of determining  who  is  speaking  when.  Fortunately,  this  is  fairly \nstraightforward.  Since  the  microphones  for  each  subject  are  pre-calibrated  to  have \napproximately  equal  energy response,  we  can  classify each  voicing  segment among \nthe  speakers  by  integrating  the  audio  energy  over  the  segment  and  choosing  the \nargmax  over  subjects. \nIt  is  still  possible  that  the  resulting  subject  does  not \ncorrespond  to  the  actua l  speaker  (she  could  simply  be  the  one  nearest  to  a  non(cid:173)\nsubject  who  is  speaking),  we  determine  an  overall  threshold  below  which  the \nassignment to  the  speaker is  rejected.  Both of these  methods  are  further  detailed  in \n[10]. \n\nFor  this  work,  we  rejected  all  conversations  with  more  than  two  participants  or \nthose  that  were  simply  overheard  by  the  subj ects.  Finally,  we  tested  the  overall \nperformance  of  our  method  by  comparing  with  a  hand-labeling  of  conversation \noccurrence  and  length  from  four  subjects  over 2  days  (48  hours  of data)  and  found \nan  87%  agreement  with  the  hand  labeling.  Note  that  the  actual  performance  may \nhave been better than this, as  the  labelers did miss some conversations. \n\n3.3  The  Turn-Taking  Signal  S; \n\nFinally,  given  the  location  of the  conversations  and  who  is  speaking  when,  we  can \ncreate a new signal  for each  subject i ,  S;, defined over five-second  blocks, which  is \n1 when the  subject is  holding the  turn  and  0 otherwise.  We  define  the  holder of the \nturn  as  whoever  has  produced  more  speech  during  the  five-second  block.  Thus, \nwithin  a  given  conversation  between  subjects  i  and j ,  the  turn-taking  signals  are \ncomplements of each other, i.e., Si = -,SJ . \n\nI \n\nI \n\n4  Estimating the Social Network Structure \nOnce  we  have  detected  the  pairwise  conversations  we  can  identify  the  communication \nthat  occurs  within  the  community  and  map  the  links  between  individuals.  The  link \nstructure  is  calculated  from  the  total  number  of conversations  each  subject  has  with \nothers:  interactions  with  another  person  that  account  for  less  than  5%  of the  subject's \ntotal  interactions  are  removed  from  the  graph.  To  get  an  intuitive  picture  of  the \ninteraction  pattern  within  the  group,  we  visualize  the  network  diagram  by  performing \nmulti-dimensional  scaling  (MDS)  on  the  geodesic  distances  (number  of hops)  between \nthe  people  (Figure  3).  The  nodes  are  colored  according  to  the  physical closeness  of the \nsubjects'  office  locations.  From  this  we  see  that  people  whose  offices  are  in  the  same \ngeneral space seem to be close in the communication space as  well. \n\n\fFigure 3: Estimated network of subjects \n\n5  Modeling  the  Influence  of Turn-taking  Behavior  in \n\nConversations \n\nWhen  we  talk  to  other  people  we  are  influenced  by  their  style  of  interaction. \nSometimes this  influence  is  strong  and  sometimes  insignificant  - we  are  interested \nin  finding  a  way  to  quantify  this  effect.  We  probably  all  know people  who  have  a \nstrong  effect  on  our  natural  interaction  style  when  we  talk  to  them,  causing  us  to \nchange  our  style  as  a  result.  For  example,  consider  someone  who  never  seems  to \nstop  talking  once  it  is  her  turn.  She  may  end  up  imposing  her  style  on  us,  and  we \nmay  consequently  end  up  not  having  enough  of a  chance  to  talk,  whereas  in  most \nother circumstances we  tend to  be  an  active and equal  participant. \n\nIn  our case, we  can  model  this  effect  via  the  signals  we  have  already  gathered.  Let \nus  consider  the  influence  subject}  has  on  subj ect  i.  We  can  compute  i's  average \nself-transition table, peS: I S;_I)  , via simple counts over all  conversations for  subject \ni  (excluding  those  with i).  Similarly,  we  can  compute j's average  cross-transition \ntable, p(Stk  I Sf- I)'  over all  subjects  k  (excluding  i)  with  which} had  conversations. \nThe  question  now  is,  for  a  given  conversation  between  i  and},  how much  does} 's \naverage cross-transition help explain  peS: I S;_I ' Sf- I)  ? \n\nWe  can  formalize  this  contribution  via  the  Mixed-Memory Markov Model  of Saul \nand Jordan [1].  The basic idea of this  model  was to  approximate a high-dimensional \nconditional probability table  of one variable conditioned on many others as a convex \ncombination  of the  pairwise  conditional  tables.  For  a  general  set  of N  interacting \nMarkov  chains  in  the  form  of a  Coupled  Markov  Model  [11],  we  can  write  this \napproximation as: \n\npeS; I sLI,\u00b7\u00b7\u00b7, St~l) = I a ijP(S; I S f- I) \n\nj \n\nFor our case  of a  two  chain  (two  person)  model  the  transition probabilities  will  be \nthe following: \n\n\fpeS:  I S,'_, , S,2_,) = a Il P(S,'  I S,'_,) + a 12P(S,k  I S,2_, ) \n\np(S,2  I S,'_, , S,2_,) = a 2,P(S,k  I S,'_,) + a 22P(S,2  I S,~, ) \n\nThis  is  very  similar  to  the  original  Mixed-Memory  Model,  though  the  transition \ntables  are  estimated  over  all  other  subjects  k  excluding  the  partner  as  described \nabove.  Also,  since the  a ij sum to  one over j, in this case  a ll  =  1- a '2 .  We thus  have \n\na  single  parameter,  a'2'  which  describes  the  contribution  of  p(Stk  I St2_ 1) \nto \nexplaining  P(S~ I SLl,St~I)' i.e.,  the  contribution of subject 2's  average  turn-taking \nbehavior on her interactions with subject 1. \n\n5.1  Learning  the  influence  parameters \n\nTo  find  the  a ij values,  we  would  like  to  maximize  the  likelihood  of the  data.  Since \nwe  have already estimated the relevant conditional probability tables,  we  can do  this \nvia  constrained  gradient  ascent,  where  we  ensure  that  a ij>O  [12].  Let  us  first \nexamine how the  likelihood function simplifies for the Mixed-Markov model: \n\nConverting  this  expression  to  log  likelihood  and  removing  terms  that  are  not \nrelevant to  maximization over  a ij  yields: \n\nNow we  reparametrize for the normality constraint with  fJij  = a ij  and  fJ;N  = 1- LfJij  , \n\nremove the terms not relevant to  chain i,  and take the derivatives: \n\na \n\nafJij  (.) = ~ LfJ;kP(S; I S,~,)+(I- LfJ;k )P(S; I S,~,) \n\npeS; I S,~,) - pes; I S,~,) \n\nWe  can  show  that  the  likelihood  is  convex  in  the  a ij '  so  we  are  guaranteed  to \nachieve  the  global  maximum  by  climbing  the  gradient.  More  details  of  this \nformulation  are  given in  [12],[7]. \n\n5.2  Aggregate  Influence  over  Multiple  Conversations \nIn  order  to  evaluate  whether  this  model  provides  additional  benefit  over  using  a \ngiven  subject's  self-transition  statistics  alone,  we  estimated  the  reduction  in  KL \ndivergence  by  using  the  mixture  of interactions  vs.  using  the  self-transition  model. \nWe  found  that  by  using  the  mixture  model  we  were  able  to  reduce  the  KL \ndivergence  between  a  subject's  average  self-transition  statistics  and  the  observed \ntransitions by 32% on average.  However,  in the  mixture  model  we  have added  extra \ndegrees  of  freedom,  and  hence  tested  whether  the  better  fit  was  statistically \nsignificant  by  using  the  F-test.  The  resulting  p-value  was  less  than  0.01 ,  implying \nthat the mixture model  is  a significantly better fit to  the data. \n\n\fIn order to  find  a single influence parameter for  each person,  we  took a subset of 80 \nconversations and aggregated all the pairwise influences  each  subject had  on all  her \nIn  order  to  compute  this  aggregate  value,  there  is  an \nconversational  partners. \nIf the  subject's  self-transition \nadditional  aspect  about  a ij  we  need  to  consider. \nmatrix  and  the  complement of the  partner's cross-transition matrix  are  very similar, \nthe  influence  scores  are  indeterminate,  since  for  a  given  interaction  S; = -,s: : i.e. , \nwe  would essentially be trying to  find the best way to  linearly combine two  identical \ntransition  matrices.  We  thus  weight  the  contribution  to  the  aggregate  influence \nestimate  for  each  individual  Ai  by  the  relevant  I-divergence  (symmetrized  KL \ndivergence)  for each conversational partner: \n\nAi =  L  J(P(S: I-,SL,) II peS: I S:_,))a ki \n\nkEpartners \n\nThe upper panel  of Figure 4  shows the  aggregated influence values  for  the  subset of \nsubjects contained in the set of eighty conversations analyzed. \n\n6  Link between Conversational Dynamics and Social Role \n\nBetweenness  centrality  is  a  measure  frequently  used  in  social  network  analysis  to \ncharacterize  importance  in the  social network.  For a  given person  i,  it  is  defined as \nbeing proportional  to  the  number of pairs  of people  (j,k)  for  which  that person  lies \nalong  the  shortest  path  in  the  network  between j  and  k. \nIt is  thus  used  to  estimate \nhow much control  an  individual  has  over the  interaction of others,  since  it is  a count \nof how often  she  is  a \"gateway\" between  others.  People  with  high  betweenness are \noften perceived as  leaders [2]. \n\nWe computed the  betweenness centrality  for  the  subjects  from  the  80  conversations \nusing  the  network  structure  we  estimated  in  Section  3.  We  then  discovered  an \ninteresting  and  statistically  significant  correlation  between  a  person's  aggregate \ninfluence  score  and  her  betweenness  centrality  --\nit  appears  that  a  person's \ninteraction  style  is  indicative  of  her  role  within  the  community  based  on  the \ncentrality  measure.  Figure  4  shows  the  weighted  influence  values  along  with  the \ncentrality  scores.  Note  that  ID  8  (the  experiment  coordinator)  is  somewhat  of an \noutlier -- a plausible  explanation  for  this  can  be  that  during the  data  collection  ID  8 \nwent  and  talked  to  many  of the  subjects,  which  is  not  her  usual  behavior.  This \nresulted  in  her  having  artificially  high  centrality  (based  on  link  structure)  but  not \nhigh influence based on her interaction style. \n\nWe  computed  the  statistical  correlation  between  the  influence  values  and  the \ncentrality  scores,  both  including  and  excluding  the  outlier  subject  ID  8.  The \ncorrelation  excluding  ID  8  was  0.90  (p-value  <  0.0004,  rank  correlation  0.92)  and \nincluding ID  8 it was 0.48  (p-value <0.07, rank correlation 0.65).  The two  measures, \nnamely  influence  and  centrality,  are  highly  correlated,  and  this  correlation  is \nstatistically  significant  when  we  exclude  ID  8,  who  was  the  coordinator  of  the \nproject and whose centrality is  likely to be artificially large. \n\n7  Conclusion \nWe have  developed a  model  for  quantitatively representing the  influence  of a given \nperson j's turn-taking  behavior on  the joint-turn  taking  behavior with  person  i.  On \nreal-world  data  gathered  from  wearable  sensors,  we  have  estimated  the  relevant \ncomponent  statistics  about  turn  taking  behavior  via  robust  speech  processing \n\n\ftechniques,  and have  shown how  we  can use  the  Mixed-Memory Markov  formalism \nto  estimate  the  behavioral  influence.  Finally,  we  have  shown  a  strong  correlation \nbetween a person's aggregate  influence  value  and  her betweenness  centrality score. \nThis  implies  that  our  estimate  of conversational  influence  may  be  indicative  of \nimportance within the social network. \n\nAggregate Influence Values \n\n0.25 \n\" ~  0.2 \n> \nl!  0.15 \n~ \u00a3  0 .1 \n\n0.05 \no \n\n10 \n\n11 \n\n12 \n\n13 \n\n14 \n\nBelweenneS5 CenlralHy Scores \n\n~  0 .2 \n~ \n~0. 15 \ne i \n\n0 .1 \n\no \n\n0.05 \n\nFigure 4:  Aggregate influence values and corresponding centrality scores. \n\n8  References \n[1]  Saul,  L.K.  and  M.  Jordan.  \"Mixed  Memory  Markov  Models.\"  Machine \n\nLearning,  1999.37: p.  75-85. \n\n[2]  Freeman,  L.c.,  \"A  Set  of  Measures  of  Centrality  Based  on  Betweenness.\" \n\nSociometry,  1977.40: p.  35-41. \n\n[3]  Bernard,  H.R.,  et  aI.,  \"The  Problem  of Informant  Accuracy:  the  Validity  of \n\nRetrospective data.\" Annual Review of Anthropology,  1984.  13: p.  pp. 495-517. \n\n[4]  Allen,  T.,  Architecture  and  Communication  Among  Product  Development \n\nEngineers.  1997, Sloan School of Management, MIT:  Cambridge.  p.  pp.  1-35. \n\n[5]  Want,  R.,  et  aI.,  \"The  Active  Badge  Location  System.\"  ACM Transactions  on \n\nInformation Systems,  1992.10: p.  91-102. \n\n[6]  Borovoy,  R. ,  Folk  Computing:  Designing  Technology  to  Support Face-to-Face \n\nCommunity Building.  Doctoral Thesis in Media Arts and Sciences.  MIT, 2001. \n\n[7]  Choudhury,  T. ,  Sensing  and  Modeling  Human  Networks,  Doctoral  Thesis  in \n\nMedia Arts and Sciences. MIT.  Cambridge, MA, 2003. \n\n[8]  Gerasimov,  V.,  T.  Selker,  and  W.  Bender,  Sensing  and Effecting Environment \n\nwith Extremity Computing Devices.  Motorola Offspring, 2002.  1(1). \n\n[9]  Basu,  S.  \"A  Two-Layer  Model  for  Voicing  and  Speech  Detection.\"  in  Int 'l \n\nConference on Acoustics,  Speech, and Signal Processing (ICASSP).  2003. \n\n[10]Basu,  S.,  Conversation  Scene  Analysis.  Doctoral  Thesis \nEngineering and Computer Science.  MIT.  Cambridge, MA  2002. \n\nin  Electrical \n\n[11]Brand,  M.,  \"Coupled  Hidden  Markov  Models  for  Modeling  Interacting \n\nProcesses.\"  MIT Media Lab  Vision  &  Modeling Tech Report,  1996. \n\n[12]Basu,  S.,  T.  Choudhury,  and  B.  Clarkson.  \"Learning  Human  Interactions  with \nthe  Influence  Model.\" MIT Media Lab  Vision and  Modeling Tech Report #539. \nJune, 2001. \n\n\f", "award": [], "sourceid": 2624, "authors": [{"given_name": "Tanzeem", "family_name": "Choudhury", "institution": null}, {"given_name": "Sumit", "family_name": "Basu", "institution": null}]}