askcos字幕
so hi everyone umthanks for inviting me hereuh it's my pleasure to share my recentwork i'm hong koa fierce student in chemical engineeringat mitso today the topic i want to talk aboutis molecular synthesis and syntheticregeneration forsoluble molecular design and this topiccovers uh two papers of minethat i listed heresomolecular designis the problem i mean just being whichis a fundamental problem in chemicalscience and engineeringand as we know the structure of themolecule 40 liters means fullydetermines its propertyso discovery of any functional moleculesrequire designing the molecularstructureandhere is a one for example of how eachcomponent of the structure contributesto the property of imacinytyrosine kinase inhibitorand of course not only forpharmaceutical application but thediscovery of normal functional materialis also important application ofmolecular design methods that couldpotentially address many brandchallenges faced by human society suchasorganic photocell for clean energyand formallythe problem of molecular design could beformulated as an optimization problemso we are looking forsome molecule mto optimize uh some desired property fof n such as a redox potential forbattery design orinhibition against the disease targetfor drug designandthe search spaceis a set of available compoundscalled chemical spaceandin the history of combinationalchemistry people focus more on the qsprproblemso quantitative structure propertyrelationshipwhich is aboutgiven the molecule m to infer theproperty f of m from itand the molecule design is the inverseproblem of thatassuming already have aaccess to f of nto search for an optimal nand currently uh systematic systematicapproacheswhich i mean unlike the original designthat are restricted by a case or asubdomaincan be generalized can be generallycategorized into two classesscreening and the novel design soscreening is basically just anexhaustive searchthat to enumerate and evaluate everycandidateand the defects are obvious very timeand resource consumingand we can stillonly spring a tiny fraction of the wholechemical spaceand the normal design is to constructthe molecule from scratchand aim to find the optimum moreefficientlythat's ideally uh general design if itworks it's more preferable thanscreeninghere is one recent successfulapplication of genome design methods indrug discoveryso in this work a team from insidiousmedicine developed a deep learning basedalgorithm called gentlethat successfully identified aninhibitor that exhibits in vitrobioactivity against ddr1 kinasethoughthis work is a little controversycontroversialthey they did discover thisdrug candidates within a month and ahalfwhere the process usually takes yearshowever that doesn't mean uh we arealready hereindeed the team of in silicon medicinethey uhmanually selected only six moleculesfrom the 40based on synthetic accessibility andthat 40 is already filtered from a listof 30 000initially generatedand the problem is not a single caseindeedmost denoble design algorithmsmix based poor single ability whichimpedes further experimental validationso before the deep learning eraonly single digits of the novel designmolecules have been experimentallyvalidated each yearand cases are probably even worse afterthe boom of deep learning because youknow in recent years wesee a lot of new models uh beingproposedbut very few of them uh proceeded tovalidate experimental validationand here is a recent treat from aresearcher in the field and we can seeit features the problemand yeah it is nature to think aboutapproaching sensibility problem as amulti-objective organizationhowever um i will show it's not thateasysensibilityby definition is the easiness of thechemist to synthesize the moleculeso to some extent um it's an intuitiveand subjective conceptso different chemists may feeldifferently about the same moleculeand uhbecause chemical reactions have a ratioselectivityso uh even changing the position of afunctional group like a hair from thisposition to this positionor just an atom in a positionin a ringcan largely affect the hardness of thesynthesisso uh though the overall structures arevery similar so that is thetraditional uh simple heuristics uhcannot distinguish themuh that the the signal says heart uhdestinability uh differs a lot so thisthis uh we call ithigh non-linearityuh make the simple heuristic or machinelearning scores very difficult tocapture the subtletyand it is also sensitive to chemicalavailability so a very complicatedstructure may be easily obtained from areaction with a natural product as aprecursorandformal attempts include three classesfirst is crosstalk scoring which islet's have a group of experts to scorethe moleculethat's its intuitive and subjectiveconceptand the second is uh based on thestructure complexity such as sa scorewhat it usedthat uh uhso it says gorgeous the measures use thesubstructures frequency of orbitals asmeasurements of synthetic accessibilityand the third class is based onsynthetic pathway like sc scorewhich is a deep learning method thattrained to distinguishif a molecule is more tend to bereactant or productand for the reactants because theyrequirefewer reaction steps so we think theyare easier to synthesize which is achemical intuitionand among the threethe most cognizant metric is still thedirect directed scoringby a group ofexperimental expertsand indeedthe export scores are usually used asthe ground choose to compare such as inthis figure from the essay sport paperor to train other machine learningmodelsbut of course to have a group group ofexperts that is large enough to reach anon-biased valueit's labor intensive and it is also hardto replicatescalablesolet's recall howchemists figure out how to synthesize amolecule and the methodologies called aritual semantic ratio synthetic analysisandthe processis done computationally it's calledcomputer-aided synthetic planningcaspsothemodern task tools such as our group'simplementation called ascostypically rely on a monte carlo researchprocessthatfrom a target mode target compounds ineach stepin each iteration we fit the target intoa neural network to obtain the potentialprecursorand for each precursorwe use another model to check if there'sa physical reaction or notand filter out the unvisible ones thenwecheck the obtained reactants are viableor notif not then we repeat this process andcontinue to grow the branchif yes then we have a synthetic root tothe target productso uh that's a casp toolso the question is could a customserve as an artificial chemist toevaluate the sensibility score as analternative to theexpert scoring[Music]and to be short the answer is yesthe detail is in ourfirst paperwe found that the result of ascos iscompatible with our intuitionso it does capture thenon-linearity of the syntheticaccessibility so it can find a pathwayfor this but cannot find even our pathforthis moleculeand besides compared to a simplecourseit actually provides an actionablesynthetic pathwayso to some extent it provides theinterpretability of syntheticaccessibilityand unlike of course unlike humanchemists a custom can be accessedunlimitedlysobut it is time consuming ritualsynthetic analysis for one moleculetypically takes about one minutethat's already faster than the earliercast tool but that's pretty expensive asan oracle or filter in the moleculardesignso based on ascos we evaluated thenovel design methods in quark modeland here the best from datarepresents the virtual screening fromcanvale as a baselineand the color represents thesynthesizable fraction of 200suggestionswe run experiments for 10 differentobjective functions in guacamole tosimulate the process of drug discoveryandwe can see all of the methods have apretty high risk of proposing unseensoluble chemicalsand the results vary according todifferent properties but most of themall of all of them are significantsignificantly worse than the virtualscreening baselineand in some cases we can see that thereis even no synthetic molecules found inthe top 100 suggestions which implies apost-hoc filtering after normal denobledesign algorithm is may not be a goodstrategyi have one questionsureumwhy do some properties kind of give ahighernumber of umchemically uh accessible molecule andothers can give kind of a really lownumberdo you have a rationale behind thati think that's because uh the i thinkthe landscape of those uh some of theobjective functionsprobably matches someuh two songs and matches the uhlandscape of the synthetic accessibilityif that's a sportsosome of them are like some machinelearning models especially they uhextrapolate poorlysoafter uh when we when it comes to somemolecules that beyond uh like normaltraining set it may predict a very skypevery high score for them but the datastructure is very weird so in that caseuh it has a alignment between those twouh propertiesso yeahso the your hypothesis then if weincrease maybe the training set on whichsome of those generative method has beentrainedthen we might be able to increase theaccessibility of some of these but onlyto some extent becauselike there's still be some part of thisumof the molecule that will be outside ofthestatistical the accessible uh molecularspacethat's what can you say againyeah sois your hypothesis that uh if weincrease the size of the training sidethen we willincrease the percentage of molecule thatare easily accessiblesorry not so reallyi thinkso uhso the you know the chemical spaces uhindeed uhunlimitedand uh we cannot like test theuhuh like the property of uh the unstablemoleculeslike uh i will show some examples laterbutyeah if there's a lot of likeunrealistic molecules that may not evenexist in nature so we will never have uhpropertiesorlike at least experimental propertiesfor themso i think there will always be aproblem ofextrapolation if we use the machinelearning model as an oracleokay thank youyeah but of course uh if you enlarge thesize of the training set uh yeah i thinkin general it improves the quality ofthe oracle definitelyyeahsouh[Music]so yes uhis there a better approachand uhfrom abovewe know what we know isthe best way to ensure thesustainability of a molecule isexplicitly findinga synthetic pathway for themand so why not directly design syntheticpaths instead of uh design the molecularstructureand a valid synthetic path to a targetmolecule like what i'm showing here canbeabstracted as a tree structurewhere all of the lymph nodesshould be viable building blocks thatare available in the chemical marketand the root node is the productmoleculeand each link between the nodes shouldrepresent avalid chemical reactionfurtheras we knowthe task of caspsynthetic planningis also to find a synthetic path for atarget moleculeso these two taskscould be solved within one shared taskof a synthetic tree generation usingalgorithms like uh we did to themolecular graphsso within this frameworkuh synthesis planning is to generatesynthetic treeswhose product molecule matches thetarget moleculeand the sinusoidal molecule designbecomes to optimize theproperty of interest of the productmolecule with respect to the structureof a synthetic treeand we need to mention that this idea offorce is not nobleespecially the synthesis-based moleculardesign has a pretty long historybut the majority of them specialized inwork cannot cover the whole space ofsynthetic pathway like molecule chef orpgfsand among the recent deep learning basedmethods only john brusha's dog modelscould generate a convergent since thissince it's passedby convergent i meanlike i'm showing here there are two subbranches could join togetheruh conversion so we will just talk aboutuh the dog discuss dog pairso what is still lacking from dlgpwe think uh there are major two majorflows currently with the dlgfirst uh dog model rely on a forwardreaction predictorbut all of those machine learningreaction predictors suffer from a biasof positive reaction datathat is caused bythe custom that people onlyor people only only successful reactionscould be publishedandso people just throw away thosereaction dataandso it cannot be easily solvedand secondis thealso because they are using the reactionpredictorthey don't explicitly use the molecularstructure of the intermediate productsas the input informationsowithin their generation after the truthin the reactant they just predict theaction to code thereaction particulator as a blackboardfunctionand then proceed to select the nextaction stepso the model the rn model kindahave to learn to approximate thereaction prediction outcome so which isinherentlyunnecessary and challengingand for census planninguh very few have tried thatjohn branson clg has one of themand their model called uhdirector dogbut unfortunately in their paper theyonly showed three unrecovered casesandyeah as far as i know there's nosuccessful planning that doesn't rely onuh research process beforeso toimprove from that um we formulate thegeneration of synthesis trees as amarkov decision processto explicitly use the internetintermediate product structurewe define the state as root molecules ofthe intermediate treesand because as we obtained thatintermediatehow we obtained that intermediatedoesn't in fact affect the furtherplanning soit exhibits a market propertyand we enforce a depth first order forgeneration so that at most two sub treescan occur simultaneously which leads toat most two root molecules um can occurand the odd of the expansions alwaystakes place from the most recent oneso which introduced the order betweenthe two moleculesandwe define each reaction step as anaction step in this mvpand define four types of actionsso using this uh tree as an examplefrom an empty setwe first need to conduct an add actionto sample a new viable building blockfrom a list of givengiven viable building blocksand apply a reaction here a unimodalreaction to obtain a productandexpand action that takes the most recentintermediate as reactantand conductor reaction here abimolecular oneand uhand add reaction againa biomolecule reactionand expand action with a unimolecularreactionand then a merge action could take thetwo intermediate through x reactants andconduct a bimolecularand in this way it allows the generationof a convergent synthetic pathwayand until the end uhafter all the end action means to finishthe generation and output the treeso to ensure uh the validity of thesynthetic tree we also need to make sureeach step uh is an actionablechemical reactionso currently we have two choices firstis machine learning reaction predictorand the domain specific reaction rulesencoded as reaction templatesso such as what i mean i'm showing herethose two examples uh one unimolecularreaction and one biomolecular reactionall encoded in smartreaction smartsso of course based on the discussionbefore we choose areaction templateand so the transition dynamic would bewe just reject all reactions that don'tfollow a non-templateand lastlyalthough we will not use it in our modeltraining but uh your word could benaturally defined by the purpose of thetaskso for census planning it could bedefined as the similarity between theproduct and target molecule and fordesign of course it's just the propertyof the interest of the product moleculeso togiven thatmtv formulationrather than atraditional reinforcement learningapproach that involves explicitlysearchingprocess we formulatethe sentence planning as a probabilisticgenerative generative modeling ofsynthetic trees conditioned on a targetmoleculeso that we can monetize the searchingcosts with the training costand we use offline data to train ourpolicy networkand within each stepwe sample an actionwhich is a set of four components wewill introduce in detail laterandfromfrom thepolicy network using the current stateand the target molecule as they improvedand applied this reaction to theenvironment to grow the synthetic treeand we will take the synthetic tree andthe product molecule as the outputandif the product molecule is identical tothe target molecule then it's asuccessful synthetic planning and wecall it recoverand concretelywe use morgan fingerprintswhich is a graphic basedrepresentation of the molecular graph torepresent moleculesand becausethere will be at most two root nodes umandbetween them there's an order so we canjust concatenate the embedding of themand with the embedding of a targetmolecule as conditional code and thisforms the state embeddingand in uh each action stepweneed to sample fouruh componentaction type uh first reactantreaction template and the secondreactant if the reaction is by molecularoneand we trained uh four separatednetworks to predict the four componentsand all of them takes the state as theinputand some of them takes the previousoutput as the input spellthe action and the reaction networks areclassification networks but as there aretoo many viable reactants the reactantselection networks uh only predict256-bit fingerprints and conduct ak-nearest neighbor surge to select fromto select thereactant from available building blocksumwe have one question in the chatso cass is asking if the reactiontemplate take into account the necessaryreaction conditiontemperature and ph and those kind ofthingscurrently nosoyeah it just[Music]encode the transformation from areactant to a product or a product toreactantokay and what happen if the the reactiontemplate give you kind of two productshow do you continue thesentences do you pick onei just run on peopleokay okay cool umthere's few other questions in the chatlike but related to the previous uhyeahto the baseline and and everything so wewe might ask them at the end of themeeting[Music]so uh so given that uh previous networkweindeed uh have a decoder the decodemorgan fingerprint to a cyber moleculeso uh it translates the senate cybermolecular design problem to a moreconvenient numerical optimization offingerprints and we apply a geneticoffering algorithm for uh to do thatso in a typical setting we can select arandom batch of fingerprints frommolecules in the same database as theinitial proofand crossover and mutateto obtain the offspring poolsanddecode to obtain the molecules in cybermolecules and selectbased on the desired propertiesand the crossover is defined asinheriting about halfof beats from one parent and theremaining from another and the mutationis just defined as theflipping number of bits with someprobabilityuh we iterately apply this procedureuntil you stop practicing smartand uh for data we uh manually selected91 russian templates from previouspublicationscomprisinguh 13 unimodal reaction and78 biomolecular reactionsand forparticiple compoundswe use in a building block u.s stockas a purchasablelist of purchasable compounds about 150and synthetic trees um are generated byapplying uh radon policy to theaforementioned dpand we filter the symmetry by the qedtechnique drag light thickness of thetruth moleculeand after all we obtained about 200ksynthetic path for training and about70k for validation and testing eachand each network is trained as aseparate surprise learning problem usinga subset of information from thenon-synthetic groupsso to validate our model we first testthe amortized synthesis of planningwhich is the task that retro dog failedso we use the test set constructed fromour templates andbuilding blocks as reachabledata which means all of the molecules inthis set can be constructed within ourframeworkand a random sample from cambo hasunreachable data which means uh themolecules in this set can notnecessarily beconstructed with our template set andthe viable compoundsandwe use k 3 in the nearest neighborsearch of the first reactant and pickone for the remainingandbecause of the amortized approachinstead of a fulltree search it only takes about uh onesecond to plan a single pathway for amolecule compared to previously as oneminuteand uhso even with that speeduh our method could recover recoverabout more than half of the reachabletarget moleculewhich uh shows the promising approach torapid census planningand for unreachable ones we could stillreconstruct about five percent of themwe conclude the gap is mainly due to theincompleteness of the reaction andviable reactantsto so further um in case ofyepuh what based on those experiments likewhat would you say the the rate ofrecovery uh is for the whole chemicalspace you have any idealike in general let's say you you're notlimited to thoseumthose two sides in general for any givenmolecule would you say that your modelis able to recover that a loti think theuh campbell case could represent that uhand if we are talking about a drug likechemical spaceso yeah a random sample from campbellmay uh representthat so i think it's about afive percentuh rate to recover them thank youokay and i suppose that this fivepercent mean that at optimization timeyour method willdefinitely do a bit worse than all themethods that are not restricted tothe uh synthesis material rightuh sorryuh i'm asking if umthis recurring rate actually means thatif at optimization time your method willuh actually do a bit worse than all theother methods that are not restricted tofind the synthesis rootuhi think so soyeah this is a planning tasksouh yeahsoby five percent i mean uh it's just uhuh used topick thethree nearest neighbor in the first andthen pick uh the top uh predicted one inthe all of the following stepsso it's just like a one shot uhgeneration of the synthetic treeinstead of uhlike a tree searching process that wetypically appliessoyeah it's a little bit low uh percentageof recovery but yeah if weuh like uh have a beam search that uhkeep uh enlarge the width of thesearchingthen uh we will have a larger probably alargeruhrecovery rateyeah but uh more consuming times alsoso it's just uh it will be it will go toin between of the currentapproach and the traditional researchapproachand i'm sure if that answers yourquestionyeah thank you thanksuhand uh so furtherin the unrecovered caseswe find that the produce molecules areusually structurally similar to thetarget moleculeand to verify that by the averagesimilarity the ko divergence and the fcdistance between the input and outputsetsandthose are two successfully recoveredcases for census planning with targetsfrom chemdailso these are the targets so these twoare the most similar molecules that wefind in our training sets and all ofthem both of them only have about 0.3something similarityand those are the pathwayour model predictso we can see that our modelsuccessfully constructed synthetic pathfor them even didn't seeany similar molecules during thetrainingshowing a good generalization abilityand here um i what i'm showing is thecorrelation between some commonproperties of target molecule and theupper moleculeso as a scorelow p molecular weight and the qedand in each graph the x-axis are theproperties of the input moleculeand the y-axis the property of the opmoleculeso we can see thatas long as the structure structurallysimilar structure similaritycan lead to a similarity in propertyso even the unrecovered cases from ourmodelcould potentially serve as a synonymousanalog recommendation that suggests thestandard cyber analogues with propertythat close to the input moleculeand here i'm showing two unrecoveredcases so in the first casealthough our model cannotreconstruct this target moleculeit gave us a pretty similar uhstructure that have any commonsubstructure to itwhile the most similar molecules in thetraining setsdiffers a lotand in the second case the targetmolecule is indeed a symmetric moleculeandour model kind of constructed like halfof itso although they are not very similar instructureuh we think the synthetic roots of thismolecule could also inspire thecould somehow inspire the synthesis oforiginal targetsothe output we think is also valuableand forsince this is asoluble molecular design with a geneticalgorithmwe first validate our model with somecommon heuristic oracle functionsrelevant to drug discoverysowe show top three molecules in thistable the values of top three moleculesin this tableand we find thatin terms ofoptimization abilityour model consistently outperforms uhgcpnor modikin that are two reinforcedlearning methodsand is in general comparable togeoplasty and mass across differenttasksso in general itis comparable to the soda performancebut not very stronguh but of course our focus is todesign some cyber moleculehere i'm showing the top bar moleculesfrom the best performing models in thegsk3 beta taskso we can see that even and even theones without a chemistry background cansee our designalthough marginally worse in the valueof the gsk compared to other modelsit's much simpler in structureand combined with a plausible uhsynthetic path obtained fromour modelwith just one step of suzuki reaction asa like byproducti thinkour design is more valuableso even though the values are lowerand this is the result of jnk and we cansee that although this design doesn'tseem uh as good as the previous previousoneit is still much better than the designfrom the other modelsand in those cases they are still have avery high scoring so that's uh theproblem i mentioned some oraclefunctions especially here is a machinelearning model you can never cover thosekind of molecules in your training dataset rightandto simulate a more realistic uhapplication caseand have a more thorough quantificationof the quantification of evaluation ofthe performancewe also optimized a docking scoreagainst two important disease targetswithin the tdc generative benchmarkso dopamine d3 receptor and the meanprotease of the stars called twoand to conduct a fair comparison welimit the number of oral glucose to 5000timesand here is the result of dopamine d3receptoragain uh the docking score are notas optimized as the most uh soda modelsbut our model successfullyhad a both a high passing rate of thestructure quality filter and the lowaverage as a scoreuh80 percent of the generated moleculepast the filterand while others only have like uh 10 1010 or 20 percent can pass or even thesingle digits can pass the filterand in terms in terms of sa score wealso have a pretty low algorithmand all of this implies that ourgenerator molecules have agood structural quality and can beeasily synthesizedso not even mentioned we have aactionable synaptic pathway as abyproductand uh here's some results of thedocument this receptor docking taskand this is a non-inhibitor and wesuccessfully obtained several candidateswith stronger binding affinity comparedto thatand all with reasonable structuresthis is the synthetic path for our topfund designand the same result for the mainprotease of the subscribe toalso obtained uh potential inhibitorswith stronger binding affinity comparedto a reported oneand to conclude there are still somelimitationsfirst is the reaction templatehi excuse me ijust have a question for the for theresults um i was wondering ifum you you looked into adding some somerestrictions to the uhto the size of the molecule for exampleor something else that that constrainsthe uhthe optimization in some way becauselike having worked these kinds of tasksbefore it can happen thatthe policy especially for for baselineslike multi qm can exploit the oracle insome way by byconsistently adding somesome atoms that that will end up givinga higher scoreand this like as you mentioned is notnecessarily desirable so i was justwondering if youadded some some constraints like thatyou mean in my uh baseline comparison oruhin my methodsuh well in any in any task or methodlike if yeah if you if you have imposedlike a size limit on on the results orsomething like thatlet's see so for a baseline comparisoni'm just using their recommended settingfrom the original rippleand uh yeah and in my methods uh i thinkthe constraint is just the sensibilityso itself i think is a constraintokay yeahyeah so basically yeah it happens uhthatall of the general algorithms tend toexplore the optimal optimum of thelandscape of the propertya surrogate oracle especiallyandso i think uh the best way tolike like avoid that is just have aconstraint on the sensibilityone questiondo your metal prefer like molecule withuha smaller number of steps to synthesisordoes thatoh no that's not a concern at allcurrently no but i think yeah it is apotential application of the model thatwe can uh like add some uh propertiesof the synthetic paths like number passoruh what kind of reactions we prefer orlike the the greenness accessibility ofthe reactions into consideration of theorganizationobjective yesand so currentlythere are some limitations first of allthe reaction templates are not perfectwhen we applied reactions to filter theviable building blocks we find a lot ofpretty active and commonreaction reactants are not cannot matcheven one single reactionsso which means our reactions are notcovering enough space of reactionsand uh so[Music]so you know compared to machine learningreaction behavior that are overoptimistically so our reaction currentlyare too restrictedandwe introduced a depth first orderin the generation which leads to acanonical order of the reactant and thatis unphysicaland that is the problem ofthe dag directed as a bicyclicgraph type mdp and the tree type mdpproblemdiscussed in a recent paper issues by uhemmanuel benjioand uh binarywe use the binary presence basedfingerprint and they cannot distinguishthe repeating units so my example iswhen we fit into the model with thismoleculeand i'll put this oneso the similarity is very high 0.85 butyou can see this is just the one unit ofthisso the current fingerprint cannotdistinguish the repeating unitsif there are more of themandoverall the bottlenecks is the firstreactant selectionwhich isthe task that's given theleast informationand the topwa accuracy it currently isjust about 30and so conclusion uh so we formulate thetask of multi-step synthesis planningand sensible monitor designas a single share task of conditionalsynthetic tree generationand we formulated the markov decisionprocess that could model the multi-stepconvergence inside pathwayand so wethe proposed uh the model we proposedcould capable of rapid photomapsynthesis planning and constrainedmolecular organizationthat export the chemical space definedby reaction templates and purchasablestarting materialand we demonstrated some initial resultson the recovery rate or thedrug of molecular organizationsouh thanks for listening and since theonr and mpds for the funding and thanksmylab numbers for all the helpthank you so much for the talk um[Music]there has been few questions in the chatthat we we didn't answer so we go overthem too quickly umi thinkjira met at some point was asking ifthe generative modeli developed that you were compared toare developed in this sentencesynthesissynthesis very synthesisably iaware fashion so i think you weretalking about the first base lineyeah um no so the answer i think is noso uhoops oopsso those are so most of the likeprevious methods uh theylike implicitly define the chemicalspace to navigateand the way they define that implicitlydefine a cancer space is justlike abyallowing thevalid uh smell string or uhbloody chemical graphthat are constrained by the chemicalbalanceso all of those algorithmsare just generating validchemistryvalid chemical structures in terms ofchemical balanceso either mouse or graphsand most of the previous methods justdoing thatokay umlike just to piggyback on that questionand uh one of the slides at the endtable at the end show that the lstm haveapretty low uhaccessibility as wellsobut if we didn't have any of thosecontent can we explain whythe accessibility is so low or for thatkind of methodyeah that's also surprising to me aswell uh but i think uhfirst of all i think uhso uh it it does uh have some prettygood percentage of thesensibility even in the first uhuh benchmarking studyuh but also but it also does apretty worse performance in some of theobjective uh objective casesand in this case uh first of all this isthis is just one umuh like a taskand it also has a pretty low percentageof passing the quality filtersothey are not like uh fully determine thesensibility i think souh i still think umso in terms of central abilityto exexplicitly finding a pathway is stillthe best methodsand also both of them although iquantify it uh it's just touh as a additional evidence to supportmyuhuh conclusionand uh for the reason the shouldermemory have a relatively low scorei think is that uh it doesn't so youknow large antenna memory this is alarger memory uh with a helioclamineso at first uh it's a distributionlearning that uhlearns the distribution of normalmolecules so in the trainings dataand then movebeyond the training data step by stepiteration by iterationso uh if it didn't move too far then itcould uh like learn some normality ofthe molecule andpreserve it during the optimization ithink that's the i think the reason itbehaves not bad yeahokay coolnext question from emmanuel day umfor the graph ga sentence abilityobviously depend on the on the actionspace mutation crossover have youconsidered a national space informed bychemical reaction in your benchmarkcertainly noandyeah we also think aboutmolecularuh genetic algorithmsgenetic algorithms directly uh workingin the space of asynthetic tree because we saw that ageneral organ has a pretty strongorganizationbut the problem is uh if welike change uhlike if we like uh separate those uh twobranches and to like cross over withanother branch it's it cannot guaranteethe like the molecules they could reactuh by following a reaction templateandalso if you change like oneuh previousreactant it can also lead tolikesome uhon physical unfeasible reactions in thelatersoyeahto apply a genetic algorithmwhile keeping the synthetic treeuh valid i think it's a more challengingquestion than simply a ultimateoperating a molecular structureokayumyeah julian was asking how sensitive isthe recovery rate to the reachable k inthe reachable case to the setting of kfor the reactant search so i guess hewas talking about the beam search if youincrease in okayuhso currently i'm just uh usinguh k larger than one in the firstreactant searchbecause as i mentioned in the last thefirst reactant searches uh bottleneckcurrently if i just keep it uh as uh kequal one in all of theselection then the recovery is about uhoverover thirty percentandas i just mentioned thethe accuracy of the total accuracy ofsearching the first reactant is alsoabout 30per 30 percentso yeah basically all the overallperformance is bottlenecked by the firstreactant selectionsouh when i enlarge the first reactantsearch from one to three uh it just therecovery increased to like 51and uh for theuhbeam search forenlarging theuhthe caves in the following stepsuh lucille uh how to implement thatversion and she finds thatthe recovery rate uh didn't well it's anearly attempt so it's not uhlike a final configuration i think uh sothe recovery rate didn't increase toomuch but the rate of cannot complete thetree is decreasedokayumyoung tsu is asking uhwill there be a quantitative measure tocompare your work to other methods thatconsider sentence abilityforbiddingss score anda dodgyyeah that's a like tricky question uhbecauseyeahall of the sensibility uh is indeeddefined by the as we have mentioned soscore or the quality filters cannotfully describe the sensibility we wantand so the best way to uhlike evaluate tenability is to findexplicitly find a pathway for themandthe validity of the synthetic path aredefined by the available reactions andavailable building blocksand in the setting of dog andin our modelsthey are different the use of buildingblocks and the reactionsbut under our own uhunder our ownlike definition all of the moleculesgenerated by the models are sensiblebut uh yeah if you ask is there a like ageneralcomparison between those twoi think uh it's a little bit hard unlesswe can eventually uh synthesize it inthe labbecause even a cast tool also have aalso need a definition oftheavailable reactions and actionablereactions and available video blocks soit also has a definition of differentdefinition of separabilityokayi hope that answered the question nextquestion umdenise is askingin which step the generative model istheis the genetic algoin which step of the generative model isthe genetic algorithm appliedsothe generating model here i constructedis justtaking the inputs of the target moleculeand decode to product a similarity andthe product rightso i use it as a decodersoiuse the generic algorithm tooptimize the fingerprintsandwhenwe have offspringswe use the previous model to decode thefingerprints to a synthetic tree and toevaluate to selectthem based on the product moleculeproperty of the product moleculeand we select the top ones to proceed tothe next roundand that's how genetic algorithm worksis that cleari also have a question i was wonderingif you can say anything abouthow your method generalizesuh with respect to thethe synthesis spot lengthso i can imagine that the longer thesynthesis spot the more difficult it istoyesyeah of course the more difficult it isbut do you see any any other kind ofinterestingresults in terms of generalizabilityuhyeah yeah of course you're right uh thelong the in the synthetic planningthe longer it is themoreuh the less uh probability to recover itand uhyeah we didn't uhlike uh find an interesting result fromthat i think the longest if i rememberright the longestpathway we recovered in those two setsareabout five stepsandyeah but in the molecular design theycan generate longer paths because yeahit's just evaluated by the property atlastand how did yousplit your original data set that yougenerated how did you split that in kindof trained test setyeah i just random split the by thesynthetic treeokay okay would be interesting to see ifyou split that by path length or someother kind ofproperty of the maybe target molecule ifyou see somedifferent results in terms of how itgeneralizesokay thank you yeah that's interestingthanksyou
askcos字幕网页机翻
所以大家好,谢谢你邀请我来这里,
很高兴分享我最近的工作我是 hong Gao MIT化学工程专业学生
今天我想谈的主题是分子合成和合成再生用于可溶性分子设计,这个主题涵盖了我的两篇论文,我在这里列出了分子设计是我的意思是问题,这是化学科学和工程中的一个基本问题
正如我们所知道的分子的结构 40 升意味着完全确定了它的特性所以发现任何功能性分子都需要设计分子结构,这里有一个例子,说明结构的每个组成部分如何影响伊马尼酪氨酸激酶抑制剂的特性,当然不仅用于药物应用,而且发现正常功能材料也是分子设计方法的重要应用,可以潜在地解决许多品牌面临的挑战人类社会,例如用于清洁能源的有机光电池,形式上,分子设计问题可以表述为优化问题,因此我们正在寻找一些分子来优化 uh n 的某些所需特性,例如电池设计的氧化还原电位或药物设计对疾病目标的抑制,搜索空间是一组可用的化合物称为化学空间,在组合化学的历史上,人们更关注 qspr 问题,即定量结构性质关系,即给定分子 m 以从中推断出 m 的性质 f,而分子设计是假设已经可以访问 n 的 f 以寻找最佳 nand 的逆问题目前 uh 系统化的系统化的方法,我的意思是不像原来的设计,受一个案例或一个子领域的限制,可以推广可以一般分为两类筛选和新颖的设计,所以筛选基本上只是一种详尽的搜索,枚举和评估每个候选物,缺陷很明显,非常耗费时间和资源,我们仍然只能产生整个化学空间的一小部分,正常的设计是从头开始构建分子,旨在更有效地找到最佳方案,如果可行的话,那是理想的通用设计比筛选更可取的是最近在药物发现中成功应用基因组设计方法因此在这项工作中,insidiousmedicine 的一个团队开发了一种基于深度学习的算法,称为温和,成功地鉴定出一种抑制剂,该抑制剂表现出针对 ddr1 激酶的体外生物活性虽然这项工作有点争议,但他们确实在一个一个半月,这个过程通常需要几年,但那不会'这意味着,呃,我们已经是硅医学团队,他们手动从 40 个基于合成可访问性的分子中选择了 6 个分子,并且已经从最初生成的 30 000 个列表中过滤了 40 个,并且问题不在于单个案例,事实上,大多数高贵的设计算法基于混合的差的单一能力,这阻碍了进一步的实验验证所以在深度学习时代之前,每年只有个位数的新颖设计分子经过实验验证,而在深度学习蓬勃发展之后,情况可能更糟,因为你知道,近年来我们看到了很多新模型,嗯,但其中很少有人继续验证实验验证,在这里是该领域的研究人员最近发表的一篇文章,我们可以看到它具有问题的特点,是的,考虑接近感性问题是自然而然的事情作为一个多目标组织,但是,嗯,我会证明这并不容易,根据定义,化学家合成分子的容易程度在某种程度上,嗯,它是'这是一个直观和主观的概念,所以不同的化学家可能对同一个分子有不同的感觉,而且因为化学反应具有比率选择性所以,甚至将官能团(如头发)的位置从这个位置改变到这个位置,或者只是一个原子在环中的某个位置,都会在很大程度上影响合成的硬度所以呃虽然整体结构非常相似,所以这是传统的呃简单启发式 呃无法区分信号表明心脏 呃命运 呃差异很大所以这个呃我们称之为高非线性呃使得简单的启发式或机器学习分数很难捕捉到微妙之处对化学有效性也很敏感,因此可以很容易地从与天然产物的反应中获得非常复杂的结构,因为前体和正式的尝试包括三个classesfirst 是串扰评分,islet's 有一组专家对分子进行评分's 它的直观和主观概念,第二类是基于结构复杂性,例如 sa score被训练来区分一个分子是否更倾向于成为反应物或产物以及反应物,因为它们需要更少的反应步骤,因此我们认为它们更容易合成这是一种化学直觉,并且在三个最知名的指标仍然是一组实验专家的直接指导评分,实际上出口分数通常是用作地面选择比较,例如论文体育论文中的这个图或训练其他机器学习模型,但当然要有一个小组一组足够大以达到无偏值的专家,它是劳动密集型的,也很难复制可扩展的,所以让我们回忆一下化学家是如何弄清楚如何合成分子的方法,以及称为算术语义比合成分析的方法,这个过程是通过计算完成的,它被称为计算机辅助合成规划,例如现代任务工具,例如我们小组称为 ascos 的实现通常依赖于蒙特卡洛研究过程,即从目标模式中的目标化合物在每次迭代的每一步中,我们将目标拟合到神经网络中以获得潜在的前体,并且对于每个前体,我们使用另一个模型来检查是否存在是否进行物理反应并过滤掉不可见的反应,然后我们检查获得的反应物是否可行,如果不是,那么我们重复此过程并继续生长分支,如果是,那么我们就有了目标产品的合成根。sa casp 工具所以问题是可以作为人工化学家来评估敏感性分数作为专家评分[音乐]的替代方案吗?简而言之,答案是肯定的,细节在我们的第一篇论文中我们发现 ascos 的结果与我们的直觉相符所以确实如此捕获合成可及性的非线性,因此它可以为此找到一条途径,但甚至找不到我们的这种分子的途径,除了与简单的课程相比,它实际上提供了一种可操作的合成途径,因此在某种程度上它提供了合成可及性的可解释性,当然不像人类化学家那样,定制可以无限地被访问,但它对一个分子进行常规合成分析通常需要大约一分钟时间,这已经比早期的工具快,但那是耗时的s 相当昂贵的一个预言机或分子设计中的过滤器所以基于 ascos 我们评估了夸克模型中的新设计方法,这里最好的数据表示来自canvale 的虚拟筛选作为基线,颜色表示 200 条建议的可合成部分我们在鳄梨酱中对 10 种不同的目标函数进行实验以模拟在药物发现过程中,我们可以看到所有方法都有相当高的风险提出不可见的化学物质,结果根据不同的特性而有所不同,但其中大多数都比虚拟筛选基线显着差,在某些情况下,我们可以看到甚至没有合成在前 100 条建议中发现的分子,这意味着在正常的 denobledesign 算法之后进行事后过滤可能不是一种好策略。问题为什么某些属性会给出更高数量的嗯化学上可接近的分子,而其他属性可以给出一个非常低的数字你有一个背后的理由吗我认为那是因为呃我认为那些呃一些目标函数的景观可能与某些两首歌曲相匹配并且与呃景观相匹配的综合可访问性如果那'他们中的一个运动体就像一些机器学习模型,特别是它们的 uhextrapolate 很差,所以当我们谈到一些超出正常训练集的分子时,它可能会预测它们的得分非常高,但数据结构非常奇怪,所以在那种情况下它有一个这两个属性之间的对齐所以是的所以你的假设那么如果我们增加可能已经训练了其中一些生成方法的训练集那么我们可能能够增加其中一些的可访问性,但只是在某种程度上,因为仍然有一些分子的这个分子的一部分将在可访问的 uh 分子空间的统计范围之外你能再说一遍吗是的,你的假设是,如果我们增加训练侧的大小,那么我们将增加易于访问的分子的百分比,对不起,不是那么真的呃不稳定的分子,比如呃,我稍后会展示一些例子但是如果有的话'很多类似的不切实际的分子甚至可能在自然界中都不存在,所以我们永远不会有超特性或至少像它们的实验特性所以我认为如果我们将机器学习模型用作预言机,那么总会有外推的问题,谢谢你,但是当然,如果你放大训练集 嗯,是的,我认为总的来说它提高了预言的质量是的,是的设计合成路径而不是设计分子结构和通往目标分子的有效合成路径,就像我一样此处显示的 m 可以抽象为树形结构,其中所有淋巴结应该是可行的构建块,可在化学市场上买到,根节点是产品分子,节点之间的每个链接都应该代表有效的化学反应,此外,我们知道,caspsynthetic 规划的任务也是找到一条合成路径对于目标分子,这两个任务可以在一个共享的任务中解决关于合成树的结构,我们需要提到的是,这种力的想法并不高尚,尤其是基于合成的分子设计有相当长的历史,但他们中的大多数专门从事的工作不能像分子厨师或 pgfs 那样涵盖合成途径的整个空间,在最近的基于深度学习的方法中,只有 john busha 的狗模型可以产生收敛,因为它通过收敛我的意思是m 在这里显示有两个子分支可以连接在一起uh 转换所以我们只讨论uh the dog 讨论狗对所以dlgp 仍然缺少什么我们认为uh 目前使用dlgfirst uh dog 模型主要有两个主要流程依赖于前向反应预测器但所有这些机器学习反应预测器受到积极反应数据的偏见,这是由只有人或只有人只有成功反应才能发布的习惯引起的,所以人们只是扔掉那些反应数据,所以它不容易解决,其次也是因为他们使用了他们没有使用的反应预测器't 明确地使用中间产物的分子结构作为输入信息,因此在它们的生成过程中,在反应物的真相之后,他们只是预测动作,将反应粒子编码为黑板函数,然后继续选择下一个动作步骤,因此模型 rn 模型必须学习近似反应预测结果所以这本质上是不必要和具有挑战性的,对于人口普查计划,很少有人尝试过约翰布兰森 clg 有一个他们的模型叫做 uhdirector dog,但不幸的是,在他们的论文中,他们只展示了三个未恢复的案例,是的,据我所知,没有成功的计划不t依赖于之前的研究过程,因此我们将合成树的生成公式化为马尔可夫决策过程以明确使用互联网中间产品结构我们将状态定义为中间树的根分子,因为当我们获得中间体时,我们如何获得中间体并不事实上,这会影响进一步的规划,因此它表现出市场特性,我们对生成强制执行深度一阶,以便最多可以同时出现两个子树,这导致最多可以出现两个根分子,并且扩展的奇数总是从最近的那些发生,所以介绍了两个分子之间的顺序,我们在这个 mvp 中将每个反应步骤定义为一个动作步骤,并定义了四种类型的动作,所以以这个 uh 树为例,从一个空集合中,我们首先需要执行一个添加动作,以从给定的可行建筑列表中采样一个新的可行构建块在这里阻断并应用反应 单峰反应以获得产物和扩大作用,以最近的中间体作为反应物和导体反应在这里是双分子的,并且手动加反应再次生物分子反应和扩大作用用单分子反应然后合并动作可以通过x个反应物将两个中间体进行双分子,这样就可以生成收敛的合成途径,直到最后,所有的最终动作都意味着完成生成并输出树,以确保呃的有效性合成树我们还需要确保每个步骤 uh 都是可操作的化学反应所以目前我们有两个选择首先是机器学习反应预测器和编码为反应模板的领域特定反应规则,例如我的意思我在这里展示了这两个例子 uh 一个单分子反应和一个生物分子反应都编码在 smartreaction smarts 中当然基于我们选择动作模板之前的讨论所以过渡动态将是我们只是拒绝所有不遵循非模板的反应最后虽然我们不会使用它在我们的模型训练中,但是你的话可以自然地由任务的目的来定义,所以对于人口普查计划,它可以定义为产品和目标分子之间的相似性,当然也可以定义为设计。这只是产品分子的兴趣属性,鉴于 mtv 公式化而不是涉及显式搜索过程的传统强化学习方法,我们将句子规划公式化为以目标分子为条件的合成树的概率生成生成模型,因此我们可以通过训练成本将搜索成本货币化,并且我们使用离线数据训练我们的策略网络,并在每一步中采样一个动作,这是一组四个组件,我们将在稍后详细介绍并从策略网络中使用当前状态和目标分子,因为他们改进了并将此反应应用于环境以生长合成树,我们将获取合成树和产品分子作为输出,如果产物分子与目标分子相同,则它's 一个成功的综合计划,我们称之为恢复,具体来说,我们使用摩根指纹,这是一种基于图形的分子图表示来表示分子,因为在它们之间最多会有两个根节点。是一个顺序,因此我们可以将命令的嵌入与目标分子的嵌入作为条件代码连接起来,这形成了状态嵌入,并且在每个动作步骤中,我们需要采样四个组分动作类型,如果反应是由分子进行的,则第一个反应物反应模板和第二个反应物,我们训练uh 四个分离的网络来预测这四个组件,它们都将状态作为输入,其中一些将先前的输出作为输入拼写动作和反应网络是分类网络,但是由于可行的反应物太多,反应物选择网络只能预测 256 位指纹和进行 ak-最近邻激增以从可用的构建块中选择反应物我们在聊天中有一个问题 cass 询问反应模板是否考虑考虑必要的反应条件温度和 ph 值和那些目前没有的东西是的运行在peopleokay 好的cool umthere'聊天中的其他几个问题,但与之前的 uhyeahto 基线和所有问题有关,所以我们可能会在会议结束时问他们嗯,它将参议院的网络分子设计问题转化为更方便的数值优化指纹,我们应用遗传算法来做到这一点,因此在典型设置中,我们可以从同一数据库中的分子中随机选择一批指纹作为初始证明并交叉和变异以获得后代poolsanddecode 以获得cybermolecules中的分子并根据所需的属性进行选择,并且交叉定义为从一个父母那里继承大约一半的节拍并从另一个父母那里继承,并且刚刚定义了突变由于以某种概率翻转位数,我们反复应用此过程,直到您停止练习 smart 并且 uh 对于数据,我们从以前的出版物中手动选择 91 个俄罗斯模板,包括 13 个单峰反应和 78 个生物分子反应,以及我们在构建块中使用的分词化合物。s stock 作为可购买化合物的可购买列表,大约 150 和合成树 um 是通过对上述 dp 应用 uh 氡策略生成的,我们通过 qedtechnique 拖动光的真相分子厚度过滤对称性,毕竟我们获得了大约 200 条用于训练的合成路径和大约 7 万条用于验证和测试的路径每个网络都使用来自非合成组的信息子集作为单独的惊喜学习问题进行训练所以为了验证我们的模型我们首先测试计划的摊销合成这是复古狗失败的任务所以我们使用从我们的模板和构建块构建的测试集作为可访问数据,这意味着这组中的所有分子都可以在我们的框架内构建,并且来自cambo的随机样本具有无法访问的数据,这意味着呃这组中的分子可以不必用我们的模板集和可行的化合物来构建,我们在第一个反应物的最近邻搜索中使用 k 3 并为剩余的反应物选择选择,因为与以前相比,它只需要大约 uh 1 秒来规划分子的单一路径一分钟,即使速度如此之快,我们的方法也可以恢复大约一半以上的可到达目标分子,这显示了快速普查计划的有希望的方法,对于无法到达的目标分子,我们仍然可以重建大约 5% 我们得出的结论是,该差距主要是由于反应的不完全性和可行的反应物所以进一步嗯,如果是的话,基于这些实验你会说什么恢复率呃是你有任何想法的整个化学空间一般假设您不限于任何给定分子的这两个方面来自 campbell 可能 uh 表示,所以我认为恢复它们的概率约为 5%,谢谢,好吧,我想这 5% 意味着在优化时,你的方法肯定会比所有不限于 uh 合成材料的方法差一点,对吧,对不起,我'我问这个重复率是否真的意味着如果在优化时你的方法实际上会比所有其他不受限制的方法更糟糕找到合成 rootuhi 这么认为,这是一个计划任务,是的,5% 我的意思是,呃,它只是 uhuh 使用在第一个中选择三个最近的邻居,然后在以下所有步骤中选择 top predict one 所以它就像合成树的一次性生成而不是就像我们的树搜索过程通常适用所以是的,它的恢复百分比有点低,但是是的,如果我们喜欢 uh 有一个波束搜索 uhkeep uh 扩大搜索的宽度然后 uh 我们可能会有更大的恢复率只是,嗯,它将介于当前方法和传统研究方法之间,我确定这是否回答了您的问题是的,谢谢,谢谢,呃,所以在未恢复的情况下,我们发现产品分子通常在结构上与目标分子相似,并通过以下方式验证平均相似度 ko 散度和输入和输出集之间的 fc 距离,这些是使用 chemdail 的目标进行人口普查规划的两个成功恢复的案例,所以这些是目标,所以这两个是我们在我们的训练集中找到的最相似的分子,它们都只有大约 0.3一些相似之处,这些是我们的模型预测的路径,所以我们可以看到我们的模型成功地为它们构建了合成路径,甚至没有在训练过程中看不到任何相似的分子表现出良好的泛化能力,我在这里展示的是目标分子和上层分子的一些共同特性之间的相关性,作为一个分数低的 p 分子量和 qed,在每个图中,x 轴是输入的特性分子和 y 轴是 opmolecules 的属性所以我们可以看到,只要结构在结构上相似 结构相似性可以导致属性相似,因此即使是我们模型中未恢复的案例也可能作为同义模拟推荐,建议具有接近输入的属性的标准网络类似物分子在这里我'm 显示了两个未恢复的情况,所以在第一种情况下,虽然我们的模型无法重建这个目标分子,但它给了我们一个非常相似的超结构,它有任何共同的子结构,而训练集中最相似的分子有很多不同,在第二种情况下,目标分子确实是一个对称分子和我们的模型类型虽然它们在结构上不是很相似,但是我们认为这种分子的合成根也可以激发原始目标的合成,我们认为输出也很有价值,因为这是使用遗传算法的可溶性分子设计,我们首先用一些常见的启发式验证我们的模型与药物发现相关的预言机函数因此我们在此表中显示前三个分子在此表中前三个分子的值我们发现在优化能力方面,我们的模型始终优于 uhgcp 或 modikin 这两种强化学习方法,并且通常可与不同任务中的地质成形术和质量相媲美,因此通常与苏打水的性能相当,但不是很强大,但当然我们的重点是设计一些网络分子这里我展示了来自gsk3 beta 任务中表现最好的模型所以我们可以看到,即使是没有化学背景的人也可以看到我们的设计,尽管与其他模型相比 gsk 的价值略差it'结构更简单,并结合从我们的模型中获得的合理的超合成路径,只需一步 suzuki 反应,就像副产品一样,我认为我们的设计更有价值,所以即使值较低,这是 jnk 的结果,我们可以看到,虽然这种设计看起来并不嗯,和之前的一样好,它仍然比其他模型的设计好得多,在那些情况下,它们仍然有很高的得分,所以'呃这个问题我提到了一些oracle函数,特别是这里是一个机器学习模型,你永远不能在你的训练数据集中覆盖那些分子,模拟一个更真实的应用案例,对性能评估的量化进行更彻底的量化我们还针对两种重要疾病优化了对接分数tdc 生成基准内的目标so 多巴胺 d3 受体和称为 two 的星星的平均蛋白酶为了进行公平比较,我们将口服葡萄糖的次数限制为 5000 次,这里是多巴胺 d3 受体的结果,再次,对接分数没有像大多数苏打水模型那样优化,但是我们的模型成功地具有结构质量过滤器的高通过率和较低的平均值,即生成的分子通过过滤器的分数高达 80%,而其他人只有像呃 10 1010 或 20% 可以通过,甚至个位数可以通过过滤器,就 sa 分数而言,我们的算法也非常低,所有这些都意味着我们的生成分子具有良好的结构质量并且可以很容易地合成所以甚至没有提到我们有一个可操作的突触通路作为副产品,呃,这里'这是该受体对接任务的文件的一些结果,这是一种非抑制剂,我们成功地获得了几个具有更强结合亲和力的候选物,并且都具有合理的结构这是我们顶级基金设计的合成路径,并且订阅的主要蛋白酶也获得了相同的结果也获得了潜力与报道的抑制剂相比,具有更强结合亲和力的抑制剂并得出结论仍然存在一些限制首先是反应模板,对不起,我只是对结果有疑问,我想知道如果您考虑对超分子的大小添加一些限制,例如或其他以某种方式限制优化的东西,因为就像在发生此类任务之前完成了这些类型的任务,尤其是针对基线的策略multi qm 可以通过持续添加一些原子来以某种方式利用 oracle,这些原子最终会给出更高的分数,就像你提到的那样,这并不是必需的,所以我只是想知道你是否添加了一些类似的约束,你的意思是在我的基线比较中或者在我的方法中很好在任何任务或方法中,如果是的话,如果你对结果施加了大小限制或类似的东西,让我们看看基线比较i'我只是使用他们推荐的设置从原始的涟漪,嗯,是的,在我的方法中,嗯,我认为约束只是敏感性所以我认为它本身是一个约束,是的,是的,所以基本上是的,它发生了,所有的一般算法都倾向于探索物业景观的最优优化a尤其是代理神谕,所以我认为最好的避免这种情况的方法就是对敏感性有一个限制,一个问题是你的金属更喜欢像分子一样的分子,而合成的步骤数更少吗?哦不完全不是问题目前没有,但我认为是的,这是模型的潜在应用,我们可以呃喜欢添加一些合成路径的属性,比如数字 passoruh 我们更喜欢什么样的反应或喜欢考虑到组织目标的反应的绿色可访问性是的所以目前有一些限制首先反应模板并不完美当我们应用反应来过滤可行的构建块时我们发现很多非常活跃和常见的反应反应物甚至不能匹配一个反应所以这意味着我们的反应没有覆盖足够的反应空间呃所以[音乐]所以你知道,与机器学习相比,反应行为过于乐观,因此我们目前的反应过于受限,我们在一代中引入了深度一阶,导致反应物的规范顺序是非物理的,这就是 dag 定向为双环图类型 mdp 和树型 mdp 问题在 uhemmanuel benjio 和 uh binary 最近的一篇论文中讨论的问题,我们使用基于二进制存在的指纹,他们无法区分重复单元,所以我的例子是当我们用这个分子拟合模型,我会放这个,所以相似度非常高 0.85 但你可以看到这只是这个的一个单元,所以如果有更多的重复单元,当前指纹无法区分重复单元总体瓶颈是第一个反应物选择这是任务's 给定最少的信息和 topwa 准确度,它目前只有 30 左右,所以得出结论,所以我们将多步合成规划和合理监控设计的任务制定为条件合成树生成的单一共享任务,我们制定了马尔可夫决策过程,可以对路径内的多步收敛进行建模和所以我们提出了,我们提出的模型可以进行快速的光图合成规划和受约束的分子组织,输出由反应模板和可购买的起始材料定义的化学空间,我们展示了一些关于分子组织的回收率或药物的初步结果,感谢您的聆听,并且由于 onr 和 mpds 的资助并感谢mylab 号码提供的所有帮助非常感谢您的谈话嗯[音乐]说我们没有t 回答所以我们过快地越过它们 umi thinkjira 在某个时候遇到过询问是否开发的生成模型与您进行比较是在这个句子中开发的综合综合 iaware 时尚所以我认为你在谈论第一个基线是的,不,所以我认为的答案noso uhoops oopsso 这些都是大多数类似的方法根据化学平衡生成有效的化学有效的化学结构,因此鼠标或图表以及大多数以前的方法只是这样做,嗯,就像只是为了解决这个问题,呃其中一个最后表中的幻灯片显示,lstm 的可访问性也相当低,但是如果我们没有任何这些内容,我们能否解释为什么可访问性如此之低或那种方法是的这也让我感到惊讶,呃,但我认为呃首先我认为呃,呃,即使在第一次呃基准测试研究中,它确实有一些相当不错的敏感性百分比,但它在一些客观的呃客观情况下的表现也差很多在这种情况下,呃首先这就是这只是一个像任务一样的呃呃,它通过质量过滤器的百分比也很低,他们不像呃完全确定敏感性我认为souh我仍然认为umso在明确找到途径的中央能力方面仍然是最好的方法,也是他们两个,虽然我量化它,呃,它只是作为支持我的结论的额外证据,呃,因为肩部记忆得分相对较低的原因我认为是呃它没有't 所以你知道大天线内存 这是更大的内存 用 helioclamine 所以起初是一个分布学习,学习正常分子的分布,所以在训练数据中然后移动超出训练数据一步一步迭代迭代所以呃如果它没有移动太远那么它可以呃学习一些分子的常态并在优化过程中保留它我认为'我认为它表现得还不错的原因是的,好吧,很酷,emmanuel day 的下一个问题,对于图形 ga 句子的能力,显然取决于动作空间突变交叉你是否考虑过在你的基准测试中通过化学反应通知的国家空间当然没有,是的,我们也考虑分子遗传算法遗传算法直接呃在合成树的空间中工作,因为我们看到一个通用器官有一个非常强大的组织,但问题是,呃,如果我们喜欢改变,呃,如果我们喜欢呃分开那些呃两个分支并喜欢与另一个分支交叉它'因为它不能保证它们可以通过遵循反应模板进行反应,而且如果你像以前的反应物一样改变它也可能导致稍后出现类似的物理上不可行的反应,所以可以在应用遗传算法的同时保持合成树有效,我认为这是一个更具挑战性的问题,而不是简单的最终操作分子结构 okayumyeah julian 询问在可达情况下对可达 k 的回收率对反应物搜索的 k 设置有多敏感,所以我想他是在谈论束搜索,如果你现在增加 Okyuhso 我'我只是在第一次反应物搜索中使用 uh k 大于一个因为正如我在最后提到的第一个反应物搜索 uh 瓶颈当前如果我在所有选择中保持 uh 为 uh kequal one 那么回收率约为 uhover 30% 和我刚刚提到的准确性搜索第一个反应物的总准确度也约为 30per 30% 所以是的,基本上所有的整体性能都受到第一个反应物选择的限制,所以当我将第一个反应物搜索从 1 扩大到 3 时,它只是恢复增加到像 51 和 uh 用于扩大洞穴的光束搜索以下步骤suh lucille uh how to implement that version,她发现恢复率 做得不好这是一个早期的尝试所以它不像最终配置我认为呃所以恢复率没有增加太多但是未完成树的率正在增加好的嗯年轻的tsu是问会有一个量化的措施来比较你的与其他方法一起工作考虑句子能力forbiddingss score and a dodgyyeah that'sa 像棘手的问题 uh,因为 uh 的感觉确实是由我们所提到的 soscore 定义的,或者质量过滤器无法完全描述我们想要的敏感性,因此评估 uhlike 稳定性的最佳方法是找到明确的途径,并且定义了合成路径的有效性通过可用的反应和可用的构建块以及在狗的设置和我们的模型中它们是不同的构建块的使用和反应但是在我们自己的uhunder我们自己类似的定义下所有由模型生成的分子都是合理的但是如果你问有没有类似的一般比较在这两个之间我认为呃它这有点难,除非我们最终可以在实验室合成它,因为即使是演员工具也需要定义可用的反应和可操作的反应和可用的视频块,所以它也有不同的定义可分离性的定义okayi 希望回答了下一个问题 umdenise 在哪个步骤中提出的问题生成模型是遗传算法 生成模型的哪一步是应用遗传算法所以我在这里构建的生成模型只是获取目标分子的输入并解码以产生相似性和产品权限所以我将其用作解码器所以使用通用算法来优化指纹当我们有后代我们使用之前的模型将指纹解码为合成树,并根据产品分子的属性进行评估以选择它们产品分子,我们选择最上面的分子进行下一轮,这就是遗传算法的工作原理,很清楚我还有一个问题我想知道你是否可以谈谈你的方法如何概括合成点长度,所以我可以想象合成点越长越难是的,当然越难,但是你有没有看到任何其他有趣的结果,就普遍性而言,是的,当然你是对的我们没有t uhlike uh 找到一个有趣的结果,如果我没记错的话,我认为最长的我们在这两个集合中恢复的最长路径大约是五个步骤,是的,但是在分子设计中,它们可以生成更长的路径,因为是的,它只是由属性 atlas 评估的,你是如何拆分原始数据集的你生成了你是如何在某种训练有素的测试集中分割它的是的你是的,这很有趣谢谢你s刚刚由atlast的属性评估,你是如何分割你生成的原始数据集的如果你在泛化方面看到一些不同的结果,那么可能的目标分子 好的,谢谢,是的,这很有趣,谢谢s刚刚由atlast的属性评估,你是如何分割你生成的原始数据集的如果你在泛化方面看到一些不同的结果,那么可能的目标分子 好的,谢谢,是的,这很有趣,谢谢
0人点赞