Databricks Partner

As Databrick partners for Data and Fine-tuning of LLMs, Bitext integrates open source machine learning advancements and Lakehouse models for data warehousing. Our engagement with data integration tools like dbt and Fivetran, and Databricks SQL, enhances our data management capabilities.

With the significant growth in SaaS LLM APIs usage and ML applications, dbt has emerged as the most rapidly expanding tool, indicating a surge in efficient data manipulation within the Databricks Lakehouse. The shift towards Delta Lake from on-prem and cloud data warehouses signifies an industry move towards more efficient data science (DS) methodologies aimed at accelerating growth, improving predictability, and enhancing customer experiences.

Our expertise spans a range of DS/ML Applications, such as Speech Recognition, Simulations & Optimizations, Recommender Systems, Natural Language Processing, and Industry Data Modeling. By employing specialized Python libraries including NLTK, Transformers, and FuzzyWuzzy, as well as Transformer-related libraries and LLM tools like LangChain, we tackle complex DS/ML projects. The use of the MLflow Tracking Server and MLflow Model Registry is central to our approach in managing project tracking and model lifecycle.

At Bitext, our collaboration with Databrick goes beyond data integration and business intelligence (BI). It’s about leveraging the latest in data science and machine learning to offer solutions that are not just successful but also sustainable, ensuring our clients can efficiently adapt to new technologies and methodologies in AI and ML.

Consulting

Bitext Builds Datasets for Fine-tuning LLMs

Unlock the full potential of Large Language Models (LLMs) with our advanced data preparation solution. We understand that one of the critical factors in achieving exceptional performance with LLMs is the quality and relevance of the training data. That’s why we offer a comprehensive suite of tools and services specifically designed to automate and streamline the datasets for fine-tuning LLMs.

Our Data Preparation Solution Has Two Components

1. We Leverage Your Internal Data Sources

chatbot-Training-Data-bitext
Data Collection

We help you identify and collect high-quality datasets that align with your specific application and domain. Our team of experts assists in sourcing diverse and relevant data, ensuring that you have a robust foundation for training your LLM.

Data Cleaning and Preprocessing

Our advanced data cleaning and preprocessing techniques ensure that your training data is of the highest quality. We apply data cleaning algorithms, handle noisy or irrelevant samples, and perform any necessary data transformations to optimize the dataset for fine-tuning.

data-pre-annotation-tool-bitext
Annotation and Labeling

If your LLM requires annotated or labelled data, we offer efficient annotation services. Our experienced annotators precisely label the data based on your specific requirements, whether it’s sentiment analysis, named entity recognition, or any other custom annotation task.

2. We Expand Your Internal Data with Synthetic Text (NLG)

Data Augmentation

Enhance the diversity and richness of your training data through our data augmentation techniques. We generate synthetic samples, perform data synthesis, and apply data augmentation algorithms to expand the size and variety of your dataset.

generation-multilingual-training-data
dataset-community-AI-Bitext
Privacy and Compliance

We understand the importance of data privacy and compliance. Rest assured that your data will be handled with the utmost confidentiality and in compliance with applicable data protection regulations.

Customization and Flexibility

We tailor our data preparation services to meet your unique needs. Whether you require domain-specific data, specific data formats, or custom preprocessing steps, we work closely with you to deliver a solution that aligns with your objectives.

generation-multilingual-training-data
Obtain-better-search-queries-for-your-catalogue-chatbot-bitext
Collaboration and Support

Our dedicated team of data scientists and engineers collaborate closely with you throughout the data preparation process. We provide guidance, support, and expertise to ensure that your data is prepared to maximize the performance of your LLM.

With our Data Preparation for Fine-tuning LLMs solution, you can accelerate the training process, enhance model performance, and achieve exceptional results in natural language understanding, text generation, sentiment analysis, and more.

Datasets for Fine-tuning LLMs

 

Training LLMs to respond accurately and efficiently across a variety of communication scenarios demands meticulous attention and linguistic capabilities. The multilingual datasets we provide are specifically tailored to enhance the performance of these advanced NLP and Generative AI models.

Our datasets are distinguished by:

  • Extensive Contextual Variety: We develop datasets that reflect wide-ranging interaction scenarios. This allows LLMs to adapt and be effective in countless environments, from customer support to business data management.
  • Linguistic Diversity and Register: We account for the various ways users communicate, whether in a formal tone or in everyday colloquial language, ensuring the models are prepared for any type of interaction.
  • Innovation in Realistic Noise Generation: We incorporate “noisy” elements, such as common spelling and punctuation errors found in human communication, to strengthen the robustness of the models when faced with imperfect data.
  • Adaptation to Constant Changes: Industries evolve, and so do the ways we communicate. Therefore, we continually update our datasets to keep LLMs abreast of current linguistic trends and needs.

The excellence of our datasets for LLMs are the direct result of decades of research and development in computational linguistics. Our expertise in creating hybrid data, which blends advanced synthetic techniques with meticulous expert supervision, has set new standards in the training and fine-tuning of linguistic models; Bitext allows AI systems to process and understand human language with unparalleled complexity and nuance.

 

Language Register Variations – Tailored Communication

 

Creating conversational agents that can smoothly interact with users requires a deep understanding of language registers. Our datasets are enriched with a spectrum of linguistic registers, ranging from formal business exchanges to casual everyday conversations. This enables the fine-tuning of Large Language Models (LLMs) to fit the tone and style appropriate for diverse communication contexts.

Recognizing the tone, employing the right language, and grasping the context are key for AI to resonate with users from various cultural backgrounds. Whether it’s an official inquiry or an informal chat, our datasets equip LLMs to respond suitably, enhancing the user experience and the accuracy of AI conversations.

For a comprehensive view of how these linguistic attributes are annotated and tailored within our datasets to meet the dynamic needs of language-based AI applications, please see our focused exposition:

Explore the Linguistic Features

With Bitext’s tools at your disposal, you can confidently fine-tune your AI to provide cohesive and contextually aware communication, mirroring the richness and diversity of human interaction.

 

Realism through Noise – Enhanced Robustness

 

To make the training data more robust and lifelike, we introduce noise, such as spelling mistakes, spacing errors, and missing punctuation. This prepares our Prebuilt Chatbots to handle the type of “noisy” input they might encounter in real-life interactions.

List of Fine-Tuning LLM Verticals

At Bitext, we understand that specialization and adaptability are essential for the seamless operation of automated customer support services. That’s why we are dedicated to fine-tuning large language models (LLMs) to deliver precise, industry-tailored results. Regardless of whether you’re in the automotive sector, academia, or even the intricate world of healthcare, Bitext has specialized datasets to meet the specific needs of any vertical.

We meticulously cater to each industry to facilitate understanding and improve responses to the most common inquiries. By integrating our vertical datasets, we ensure that your customer support systems are equipped to interact with and satisfy a wide array of linguistic demands. Simulating linguistic variations and common writing errors also contributes to the resilience of your system against the unpredictability of everyday language.

We encourage you to explore our range of verticals and download our datasets for evaluation. Learn more about us and discover how vertical-specific data optimization can strengthen the effectiveness of your customer support systems.

    Bitext’s LLM Evaluation Methodology for Conversational AI

    Bitext’s methodology evaluates your conversational AI without the need for historical data or for manual tagging of evaluation data. The process is based on the generation (NLG) of custom evaluation datasets, pre-tagged with intent information and linguistic features.

    Overview

    Bitext performs evaluation tasks for any NLU engine in the market, to test accuracy along different metrics according to the user profile. Bitext’s LLM evaluation methodology measures how a conversational AI performs during its life-span, from deployment to retirement, throughout all changes and updates.

    Our main advantage is that Bitext automates most steps in the evaluation pipeline, including the generation of an evaluation dataset, which is a critical step in the absence of historical evaluation data.

    This semi-supervised process is based on standard accuracy metrics (like the F1-score, that takes into account both precision and recall together). The analysis of these metrics is then compiled in a report highlighting strengths and weaknesses, both at the bot level and at the intent level.

    The process combines software tools, evaluation data and expert insights in one single methodology. This methodology is transparent and easy to explain to end users.

    The Evaluation Dataset for Fine-tuning LLMs

    Data & Flags

    The key to this process is a rich proprietary dataset designed for evaluation that contains thousands of utterances per intent. These utterances are tagged with intent information, so there is no need to manually tag them.

    Also, these utterances are categorized with flags according to their linguistic features

    • Language register: colloquial, formal…
    • Regional variant: UK/US English; Spain/Mexico Spanish; Canada/France French …
    • And more: offensive language, spelling errors, punctuation errors…

    These flags are key to automatically evaluating the accuracy of the chatbot in different use environments; they permit the chatbot to perform seamlessly with users of virtually any demographic.

     

    The Evaluation Methodology

    The evaluation methodology is built on an iterative process to train the conversational AI model, evaluate performance, retrain and remeasure performance. This iterative process provides systematic performance improvements.

    The evaluation system is designed as a continuous improvement process that is implemented in cycles:

     

    • Select training dataset
    • Train conversational AI model
    • Select evaluation dataset
    • Evaluate trained conversational AI model
    • Identify accuracy gaps
    • Identify problems and fixes
    • Re-train with new fixes
    • Re-evaluate to measure improvements

    Datasets

    Pre-Built Datasets to train your LLMs

    • Data Services for Enterprise Generative AI: Data Creation & Evaluation, Model Finetuning & Verticalization
    • Text Annotation Tools to tag your data with Linguistic Knowledge: POS, NER, Topic
    • Lexical and Semantic Data for NLP applications in 77 languages and 25 variants
    • Synthetic Text Generation Tools to produce custom data with NLG technology
    • Pre-Built Datasets to train and evaluate your assistant/chatbot

    Datasets Available

    Each Prebuilt Question Answering (QA) dataset contains 20 to 40 of the most frequent intents for the corresponding vertical, designed to give you the best out-of-the-box performance possible.

    Our Prebuilt QA datasets are designed to deal with language register variations including polite/formal, colloquial and offensive language. We have profiled the language register use in user queries from a wide range of business sectors, and we use this information to generate training data with a similar profile, ensuring maximum linguistic coverage.

    We also introduce noise into the training data, including spelling mistakes, run-on words and missing punctuation. This realistic data makes our Prebuilt Datasets more resilient in the face of “noisy” input that is common in real life.

    Your Title Goes Here
    Automotive (List of Intents)
    English
    CATEGORY_intent
    APPOINTMENT_cancel_appointment
    APPOINTMENT_reschedule_appointment
    APPOINTMENT_schedule_appointment
    BILLING_change_billing_information
    BILLING_dispute_invoice
    BILLING_invoices
    BILLING_set_up_billing_information
    CONTACT_customer_service
    CONTACT_finance_department
    CONTACT_human_agent
    CONTACT_roadside_assistance
    CONTACT_service_center
    DEALERSHIP_availability_vehicle
    DEALERSHIP_find_dealer
    FINANCE_buy_vehicle
    FINANCE_information_payment_in_full
    FINANCE_leasing
    FINANCE_pay_in_installments
    INFORMATION_diesel_vehicles
    INFORMATION_electric_vehicles
    INFORMATION_hybrid_electric_vehicles
    INFORMATION_hybrid_vehicles
    INFORMATION_offers
    INFORMATION_petrol_vehicles
    INFORMATION_pre-owned_vehicles
    LEASE_change_due_date
    LEASE_change_leasing_information
    MAINTENANCE_cancel_maintenance_plan
    MAINTENANCE_get_manual
    MAINTENANCE_information_maintenance_plans
    MAINTENANCE_sign_up_maintenance_plan
    PARTS_ACCESSORIES_buy_accessories
    PARTS_ACCESSORIES_buy_parts
    PARTS_ACCESSORIES_information_accessories
    PARTS_ACCESSORIES_information_parts
    PAYMENT_pay
    PAYMENT_payment_methods
    PAYMENT_report_payment_issue
    REPAIRS_check_status_repairs
    REPAIRS_find_closest_garage
    REPAIRS_loaner_vehicle
    REPAIRS_request_loaner_vehicle
    SERVICE_cancel_service_appointment
    SERVICE_cancel_service_plan
    SERVICE_information_service_plans
    SERVICE_reschedule_service_appointment
    SERVICE_schedule_service_appointment
    SERVICE_sign_up_service_plan
    WARRANTY_buy_extended_coverage
    WARRANTY_check_coverage
    WARRANTY_check_start_date_warranty
    WARRANTY_download_warranty

     

    Retail Banking (List of Intents)
    English
    CATEGORY_intent
    ACCOUNT_check_recent_transactions
    ACCOUNT_close_account
    ACCOUNT_create_account
    ATM_dispute_ATM_withdrawal
    ATM_recover_swallowed_card
    CARD_activate_card
    CARD_activate_card_international_usage
    CARD_block_card
    CARD_cancel_card
    CARD_check_card_annual_fee
    CARD_check_current_balance_on_card
    CONTACT_customer_service
    CONTACT_human_agent
    FEES_check_fees
    FIND_find_ATM
    FIND_find_branch
    LOAN_apply_for_loan
    LOAN_apply_for_mortgage
    LOAN_cancel_loan
    LOAN_cancel_mortgage
    LOAN_check_loan_payments
    LOAN_check_mortgage_payments
    PASSWORD_get_password
    PASSWORD_set_up_password
    TRANSFER_cancel_transfer
    TRANSFER_make_transfer

     

    Education (List of Intents)
    English
    CATEGORY_intent
    ACCOMMODATION_accommodation
    AFTER_ADMISSION_change_program
    AFTER_ADMISSION_decline_admission_offer
    APPLICATION_INFORMATION_REQUEST_admission_requirements
    APPLICATION_INFORMATION_REQUEST_admission_requirements_international_students
    APPLICATION_INFORMATION_REQUEST_application_deadlines
    APPLICATION_INFORMATION_REQUEST_contact_admission_counseling_service
    APPLICATION_INFORMATION_REQUEST_documents_required_apply
    APPLICATION_INFORMATION_REQUEST_language_requirements
    APPLICATION_INFORMATION_REQUEST_medical_requirements
    CONTACT_human_agent
    DEGREE_INFORMATION_REQUEST_career_opportunities
    DEGREE_INFORMATION_REQUEST_information_degree
    FINANCIAL_AID_accept_admission_offer
    FINANCIAL_AID_apply_loan
    FINANCIAL_AID_information_scholarships
    FINANCIAL_AID_requirements_loan
    PAYMENT_pay
    PAYMENT_payment_methods
    PAYMENT_report_payment_issue
    POLICIES_university_polices
    STUDENT_PORTAL_course_schedule
    STUDENT_PORTAL_find_student_ID
    STUDENT_PORTAL_grades_report
    STUDENT_PORTAL_recover_password
    STUDENT_PORTAL_report_student_portal_issue
    STUDENT_PORTAL_sign_up_student_portal
    STUDENT_SUPPORT_contact_student_support
    UNIVERSITY_APPLICATION_PROCESS_application_status
    UNIVERSITY_APPLICATION_PROCESS_change_application
    UNIVERSITY_APPLICATION_PROCESS_sign_up_course
    UNIVERSITY_APPLICATION_PROCESS_submit_application
    UNIVERSITY_APPLICATION_PROCESS_withdraw_application
    UNIVERSITY_INFORMATION_REQUEST_information_campus
    UNIVERSITY_INFORMATION_REQUEST_information_programs
    UNIVERSITY_INFORMATION_REQUEST_information_registration_fees
    UNIVERSITY_INFORMATION_REQUEST_information_university

     

     

    Events & Ticketing (List of Intents)
    English
    CATEGORY_intent
    CANCELLATIONS_cancel_ticket
    CANCELLATIONS_check_cancellation_fee
    CANCELLATIONS_check_cancellation_policy
    CANCELLATIONS_track_cancellation
    CONTACT_customer_service
    CONTACT_event_organizer
    CONTACT_human_agent
    DELIVERY_delivery_options
    DELIVERY_delivery_period
    EVENTS_find_upcoming_events
    EVENTS_information_about_type_events
    PAYMENT_pay
    PAYMENT_payment_methods
    PAYMENT_report_payment_issue
    POLICY_check_privacy_policy
    REFUNDS_check_refund_policy
    REFUNDS_get_refund
    REFUNDS_track_refund
    TICKETS_buy_ticket
    TICKETS_change_personal_details_on_ticket
    TICKETS_find_ticket
    TICKETS_information_about_tickets
    TICKETS_sell_ticket
    TICKETS_transfer_ticket
    TICKETS_upgrade_ticket

     

    Field Service (List of Intents)
    English
    CATEGORY_intent
    APPOINTMENT_cancel
    APPOINTMENT_place
    APPOINTMENT_quote
    APPOINTMENT_reschedule
    APPOINTMENT_schedule
    APPOINTMENT_technician
    APPOINTMENT_time_arrival
    BILLING_check_bill
    CONTACT_customer_service
    CONTACT_human_agent
    CONTACT_technical_support
    FEEDBACK_file_complaint
    FEEDBACK_leave_review
    GENERAL_INFORMATION_location
    GENERAL_INFORMATION_rates
    GENERAL_INFORMATION_service_hours
    PAYMENT_pay
    PAYMENT_payment_methods
    QUOTE_accept_quote
    QUOTE_change_quote
    QUOTE_decline_quote
    SERVICES_emergencies
    SERVICES_information
    SERVICES_inspection
    SERVICES_installation
    SERVICES_maintenance
    SERVICES_repairs

     

    Healthcare (List of Intents)
    English
    CATEGORY_intent
    ADMISSION_PROCESS_information_about_the_admission_process
    APPOINTMENT_cancel_appointment
    APPOINTMENT_request_a_referral
    APPOINTMENT_reschedule_appointment
    APPOINTMENT_schedule_appointment
    BILLING_change_billing_information
    BILLING_dispute_invoice
    BILLING_invoices
    BILLING_set_up_billing_information
    CONTACT_admissions_office
    CONTACT_billing_department
    CONTACT_contact_information
    CONTACT_health_professional
    CONTACT_patient
    CONTACT_technical_support
    EMERGENCY_emergencies
    EMERGENCY_get_directions
    EMERGENCY_information_about_emergency_rooms
    HEALTH_INFORMATION_clinical_trials
    HEALTH_INFORMATION_events
    HEALTH_INFORMATION_health_advice
    LAB_RESULTS_information_lab_results
    LAB_RESULTS_see_lab_results
    LEGAL_medical_records
    LEGAL_patient_rights
    LEGAL_privacy_policy
    LOCATION_AND_DIRECTION_check_location
    LOCATION_AND_DIRECTION_directions
    LOCATION_AND_DIRECTION_find_healthcare_center
    LOCATION_AND_DIRECTION_parking
    LOCATION_AND_DIRECTION_public_transportation
    PATIENT_PORTAL_access_patient_portal
    PATIENT_PORTAL_information_about_patient_portal
    PAYMENT_pay
    PAYMENT_payment_methods
    PAYMENT_report_payment_issue
    REVIEW_file_complaint
    REVIEW_leave_review
    VISITING_INFORMATION_patient_&_visitor_guide
    VISITING_INFORMATION_visiting_hours

     

    Hospitality (List of Intents)
    English
    CATEGORY_intent
    BILLING_invoices
    CANCELLATION_FEES_cancellation_fees
    CHECK_IN_check_in
    CHECK_OUT_check_out
    CONTACT_human_agent
    EVENT_host_event
    FEEDBACK_file_complaint
    FEEDBACK_leave_review
    HOTEL_book_hotel
    HOTEL_cancel_hotel_reservation
    HOTEL_change_hotel_reservation
    HOTEL_check_hotel_facilities
    HOTEL_check_hotel_offers
    HOTEL_check_hotel_prices
    HOTEL_check_hotel_reservation
    HOTEL_search_hotel
    LUGGAGE_store_luggage
    MENU_check_menu
    NIGHT_add_night
    PARKING_SPACE_book_parking_space
    PETS_bring_pets
    POINTS_redeem_points
    REFUND_get_refund
    SHUTTLE_SERVICE_shuttle_service

     

    Insurance (List of Intents)
    English
    CATEGORY_intent
    AUTO_INSURANCE_information_auto_insurance
    CLAIMS_accept_settlement
    CLAIMS_file_claim
    CLAIMS_negotiate_settlement
    CLAIMS_receive_payment
    CLAIMS_reject_settlement
    CLAIMS_track_claim
    COMPLAINTS_appeal_denied_insurance_claim
    COMPLAINTS_dispute_invoice
    COMPLAINTS_file_complaint
    CONTACT_customer_service
    CONTACT_human_agent
    CONTACT_insurance_representative
    COVERAGE_change_coverage
    COVERAGE_check_coverage
    COVERAGE_downgrade_coverage
    COVERAGE_upgrade_coverage
    ENROLLMENT_buy_insurance_policy
    ENROLLMENT_cancel_insurance_policy
    ENROLLMENT_cancellation_fees
    ENROLLMENT_compare_insurance_policies
    GENERAL_INFORMATION_general_information
    HEALTH_INSURANCE_information_health_insurance
    HOME_INSURANCE_information_home_insurance
    INCIDENTS_report_incident
    INCIDENTS_schedule_appointment
    LIFE_INSURANCE_information_life_insurance
    PAYMENT_check_payments
    PAYMENT_pay
    PAYMENT_payment_methods
    PAYMENT_report_payment_issue
    PAYMENT_schedule_payments
    PET_INSURANCE_information_pet_insurance
    POLICY_change_personal_details
    QUOTE_calculate_insurance_quote
    QUOTE_check_rates
    RENEW_renew_insurance_policy
    TRAVEL_INSURANCE_information_travel_insurance

     

    Legal Services (List of Intents)
    English
    CATEGORY_intent
    BILLING_change_billing_information
    BILLING_dispute_invoice
    BILLING_invoices
    BILLING_set_up_billing_information
    CLAIMS_file_claim
    CLAIMS_track_claim
    CONTACT_customer_service
    CONTACT_human_agent
    CONTACT_lawyer
    GENERAL_INFORMATION_law_firm
    GENERAL_INFORMATION_legal_services
    GENERAL_INFORMATION_litigation_process
    GENERAL_INFORMATION_work_with_lawyer
    LEGAL_PLANS_buy_legal_plan
    LEGAL_PLANS_cancel_legal_plan
    LEGAL_PLANS_change_coverage
    LEGAL_PLANS_change_personal_details
    LEGAL_PLANS_check_coverage
    LEGAL_PLANS_compare_legal_plans
    LEGAL_PLANS_downgrade_legal_plan
    LEGAL_PLANS_information_legal_plans
    LEGAL_PLANS_renew_legal_plan
    LEGAL_PLANS_upgrade_legal_plan
    PAYMENT_fees
    PAYMENT_pay
    PAYMENT_payment_methods
    PAYMENT_report_payment_issue
    PRIVACY_AND_SECURITY_privacy_policy
    PRIVACY_AND_SECURITY_security

     

    Manufacturing (List of Intents)
    English
    CATEGORY_intent
    BILLING_change_billing_information
    BILLING_dispute_invoice
    BILLING_invoices
    BILLING_set_up_billing_information
    COMPANY_brand
    COMPANY_company
    COMPANY_customers
    COMPANY_earnings
    COMPANY_events
    COMPANY_facilities
    COMPANY_latest_news
    COMPANY_management_team
    CONTACT_customer_service
    CONTACT_human_agent
    CONTACT_sales_representative
    DELIVERY_delivery_period
    LEGAL_certifications
    ORDER_cancel_order
    ORDER_place_order
    ORDER_product_configuration
    ORDER_track_order
    PRODUCT_description
    PRODUCT_download_documentation
    PRODUCT_warranty
    QUOTE_accept_quote
    QUOTE_change_quote
    QUOTE_decline_quote
    QUOTE_request_quote
    SERVICES_information_services
    SHIPPING_change_shipping_adress
    SHIPPING_set_up_shipping_adress
    SHIPPING_shipping_points
    SUPPLY_CHAIN_product_process
    SUPPLY_CHAIN_supply_chain

     

    Media Streaming (List of Intents)
    English
    CATEGORY_intent
    CONTACT_customer_service
    CONTACT_human_agent
    CONTENT_report_copyright_infringement
    CONTENT_report_inappropiate_content
    FUNCTIONING_devices
    FUNCTIONING_general_use
    FUNCTIONING_quickstart_guide
    PAYMENT_pay
    PAYMENT_payment_methods
    PAYMENT_report_payment_issue
    PROGRAM_SCHEDULE_program_schedule
    PROGRAM_SCHEDULE_releases
    SETTINGS_change_language
    SETTINGS_change_subtitle_language
    SETTINGS_parental_control
    SETTINGS_recover_password
    SUBSCRIPTION_cancel_subscription
    SUBSCRIPTION_change_subscription
    SUBSCRIPTION_free_trial
    SUBSCRIPTION_premium_subscription
    SUBSCRIPTION_renew_subscription
    SUBSCRIPTION_subscribe
    SUBSCRIPTION_subscription
    SUBSCRIPTION_subscription_prices

     

    Mortgages & Loans (List of Intents)
    English
    CATEGORY_intent
    CONTACT_contact_agent
    CONTACT_customer_service
    CONTACT_human_agent
    FEES_check_late_payment_fee
    FEES_lock_interest_rate
    INFORMATION_REQUEST_borrowing_limit
    INFORMATION_REQUEST_check_application_process
    INFORMATION_REQUEST_check_application_requirements
    INFORMATION_REQUEST_check_fees
    INFORMATION_REQUEST_check_loans
    INFORMATION_REQUEST_compare_loans
    INFORMATION_REQUEST_estimate_loan_payment
    LOAN_APPLICATION_PROCESS_change_application
    LOAN_APPLICATION_PROCESS_check_application_status
    LOAN_APPLICATION_PROCESS_closing
    LOAN_APPLICATION_PROCESS_submit_documentation
    LOAN_APPLICATION_PROCESS_withdraw_application
    LOAN_APPLICATION_apply_for_joint_loan
    LOAN_APPLICATION_apply_for_loan
    LOAN_APPLICATION_consolidate_debt
    LOAN_APPLICATION_reapply_for_loan
    LOAN_MODIFICATIONS_add_co-borrower
    LOAN_MODIFICATIONS_change_due_date
    LOAN_MODIFICATIONS_extend_loan
    PAYMENT_check_loan_terms
    PAYMENT_check_repayment_methods
    PAYMENT_make_additional_payments
    PAYMENT_pay_off_loan
    PAYMENT_refinance_loan
    PAYMENT_request_payment_arrangement
    PAYMENT_split_payment
    PAYMENT_turn_off_recurring_payments
    PAYMENT_turn_on_recurring_payments
    PERSONAL_INFORMATION_change_personal_data
    PERSONAL_INFORMATION_change_preferred_bank_account
    PERSONAL_INFORMATION_check_credit_report
    PERSONAL_INFORMATION_check_credit_score
    PERSONAL_INFORMATION_check_loan_details
    PERSONAL_INFORMATION_check_privacy_policy

     

    Moving & Storage (List of Intents)
    English
    CATEGORY_intent
    COMPANY_check_company
    COMPANY_check_moves
    COMPLAINT_file_complaint
    CONTACT_customer_service
    CONTACT_human_agent
    FEEDBACK_file_complaint
    MOVE_MANAGEMENT_cancel_move
    MOVE_MANAGEMENT_delay_move
    MOVE_PREPARATION_check_delivery_options
    MOVE_PREPARATION_information_quotes
    MOVE_PREPARATION_prepare_move
    MOVE_PREPARATION_request_quote
    MOVING_PROCESS_information_delivery
    MOVING_PROCESS_information_pick_up
    MOVING_PROCESS_search_for_tracking_number
    MOVING_PROCESS_track_shipment
    PACKING_AND_ITEMS_information_packing
    PACKING_AND_ITEMS_move_dangerous_items
    PACKING_AND_ITEMS_move_special_items
    PACKING_AND_ITEMS_transport_pets
    PAPERWORK_AND_DOCUMENTS_check_insurance
    PAPERWORK_AND_DOCUMENTS_information_bill_of_lading
    PAPERWORK_AND_DOCUMENTS_information_order_for_service
    PAPERWORK_AND_DOCUMENTS_report_contract_issue
    PAPERWORK_AND_DOCUMENTS_sign_order_for_service
    PAYMENT_pay
    PAYMENT_payment_methods
    PAYMENT_report_payment_issue
    STORAGE_rent_storage_unit

     

    Real Estate/Construction (List of Intents)
    English
    CATEGORY_intent
    ACCOUNT_change_account
    ACCOUNT_create_account
    ACCOUNT_delete_account
    ACCOUNT_edit_account
    APPOINTMENT_cancel_appointment
    APPOINTMENT_reschedule_appointment
    APPOINTMENT_schedule_appointment
    CHARACTERISTICS_check_accessibility
    CHARACTERISTICS_check_asking_price
    CHARACTERISTICS_check_availability
    CHARACTERISTICS_check_characteristics
    CHARACTERISTICS_check_equipment
    CHARACTERISTICS_check_location
    CHARACTERISTICS_check_number_of_rooms
    CHARACTERISTICS_check_size
    CONTACT_customer_service
    CONTACT_human_agent
    CONTACT_owner
    LIST_PROPERTY_add_pictures
    LIST_PROPERTY_change_asking_price
    LIST_PROPERTY_change_rent_to_sale
    LIST_PROPERTY_create_listing
    LIST_PROPERTY_delete_pictures
    LIST_PROPERTY_edit_listing
    LIST_PROPERTY_remove_listing
    LOOK_FOR_PROPERTY_look_for_property
    REPORT_report_listing
    VISITING_HOURS_visiting_hours

     

    Restaurant & Bar Chains (List of Intents)
    English
    CATEGORY_intent
    CATERING_cancel_catering
    CATERING_change_catering
    CATERING_information_about_catering
    CATERING_order_catering
    COMPANY_information_about_company
    COMPANY_locations
    CONTACT_customer_service
    CONTACT_human_agent
    EVENTS_events
    FEEDBACK_file_complaint
    FEEDBACK_leave_review
    FRANCHISE_apply_for_franchise
    FRANCHISE_find_franchise
    FRANCHISE_franchising
    LEGAL_privacy_policy
    MENU_check_menu
    MENU_check_offers
    MENU_information_about_allergens
    ONLINE_ORDER_cancel_order
    ONLINE_ORDER_change_order
    ONLINE_ORDER_delivery_time
    ONLINE_ORDER_order_food_online
    ONLINE_ORDER_order_issue
    ONLINE_ORDER_track_order
    PAYMENT_payment_methods
    PAYMENT_report_payment_issue
    RESERVATIONS_cancel_reservation
    RESERVATIONS_change_reservation
    RESERVATIONS_make_reservation
    RESTAURANT_find_restaurant

     

    Retail/E-commerce (List of Intents)
    English
    CATEGORY_intent
    ACCOUNT_change_account
    ACCOUNT_order_history
    ACCOUNT_recover_password
    APP_WEBSITE_technical_issue
    APP_WEBSITE_website_functionality
    CONTACT_human_agent
    DELIVERY_damaged_item
    DELIVERY_damaged_package
    DELIVERY_delivery_issue
    DELIVERY_missing_item
    DELIVERY_shipping_costs
    DELIVERY_wrong_item
    FEEDBACK_submit_consumer_feedback
    ORDER_cancel_order
    ORDER_change_order
    ORDER_order_status
    ORDER_request_invoice
    PAYMENT_pay
    PAYMENT_report_payment_issue
    PRODUCT_availability
    PRODUCT_exchange_product
    PRODUCT_exchange_status
    PRODUCT_product_information
    PRODUCT_product_issue
    PRODUCT_refund_policy
    PRODUCT_refund_status
    PRODUCT_request_refund
    PRODUCT_return_order
    PRODUCT_return_policy
    PRODUCT_submit_product_feedback
    PRODUCT_submit_product_idea
    STORE_store_location
    STORE_store_opening_hours
    USER_request_right_to_rectification

     

    Telecommunications (List of Intents)
    English
    CATEGORY_intent
    BILLING_check_bill
    BILLING_dispute_bill
    COMPLAINTS_get_compensation
    COMPLAINTS_report_poor_signal_coverage
    COMPLAINTS_report_problem
    CONSUMPTION_check_excess_data_charges
    CONSUMPTION_check_usage
    CONSUMPTION_set_usage_limits
    CONTACT_customer_service
    CONTACT_human_agent
    PAYMENT_pay
    PAYMENT_payment_methods
    PAYMENT_schedule_payments
    SERVICES_activate_call_management_services
    SERVICES_activate_phone
    SERVICES_activate_roaming
    SERVICES_check_internet_availability
    SERVICES_check_signal_coverage
    SERVICES_deactivate_call_management_services
    SERVICES_deactivate_phone
    SERVICES_install_internet
    SUBSCRIPTION_cancel_plan
    SUBSCRIPTION_change_plan
    SUBSCRIPTION_change_provider
    SUBSCRIPTION_check_cancellation_fee
    SUBSCRIPTION_sign_up_for_plan

     

    Travel (List of Intents)
    English
    CATEGORY_intent
    ARRIVAL_TIME_check_arrival_time
    BAGGAGE_checked_baggage_allowance
    BOARDING_PASS_get_boarding_pass
    BOARDING_PASS_print_boarding_pass
    BOOK_book_flight
    BOOK_book_trip
    CANCELLATION_FEES_cancellation_fees
    CANCEL_cancel_flight
    CANCEL_cancel_trip
    CHANGE_change_flight
    CHANGE_change_trip
    CHECK_IN_check_in
    CHECK_PRICES_check_flight_prices
    CONTACT_human_agent
    DEPARTURE_TIME_check_departure_time
    FLIGHT_STATUS_check_flight_status
    INSURANCE_check_flight_insurance_coverage
    INSURANCE_check_trip_insurance_coverage
    INSURANCE_purchase_flight_insurance
    INSURANCE_purchase_trip_insurance
    INSURANCE_search_flight_insurance
    INSURANCE_search_trip_insurance
    OFFERS_check_flight_offer
    OFFERS_check_trip_offers
    PRICES_check_trip_prices
    REFUND_get_refund
    RESERVATION_check_flight_reservation
    SEARCH_search_flight
    SEARCH_search_trip
    SEAT_change_seat
    SEAT_choose_seat
    TRIP_DETAILS_check_trip_details
    TRIP_PLAN_check_trip_plan

     

    Utilities (List of Intents)
    English
    CATEGORY_intent
    ACCOUNT_change_account_holder
    BILLING_invoices
    COMPLAINTS_complaints
    CONSUMPTION_consumption
    CONTACT_customer_service
    CONTACT_human_agent
    CONTRACT_cancel_contract
    HOUSE_moving_house
    INSPECTION_request_inspection
    MAINTENANCE_maintenance
    OUTAGES_check_outages
    PAYMENT_pay
    RATE_check_rates
    RATE_compare_rates
    REPAIR_available_repair_times
    REPAIR_cost_repair
    REPAIR_request_repair
    SERVICE_service
    SIGN_UP_sign_up_services
    SUBSCRIPTION_cancellation_fees
    SWITCH_switch_provider

     

    Wealth Management (List of Intents)
    English
    CATEGORY_intent
    ACCOUNT_create_account
    ACCOUNT_delete_account
    ACCOUNT_recover_password
    ACCOUNT_requirements_to_create_account
    BECOME_CLIENT_arrange_meeting
    BECOME_CLIENT_become_client
    BECOME_CLIENT_calculate_portfolio_risk
    BECOME_CLIENT_check_fees
    BECOME_CLIENT_check_services
    BECOME_CLIENT_get_manager
    BECOME_CLIENT_minimum_amount_to_invest
    BECOME_CLIENT_run_simulator
    CONTACT_contact_manager
    CONTACT_customer_service
    CONTACT_human_agent
    MANAGEMENT_implement_own_plan
    MANAGEMENT_transfer_money_to_account
    MANAGEMENT_withdraw_money_from_account
    PORTFOLIO_check_balances
    PORTFOLIO_check_portfolio
    PORTFOLIO_portfolio_performance
    PORTFOLIO_portfolio_value
    PORTFOLIO_search_for_stocks
    PORTFOLIO_set_price_alert

     

    How did we select these intents?

    To select the intent for each vertical displayed above, we have followed an automated process with 5 steps:
    Select Representative Set Texts About Domain Bitext

    1. Select a representative set of texts about the domain

    2. Extract frequent actions by parsing the texts above and extracting the most common triples SUBJECT + VERB + OBJECT

    Extract Frequent Actions by Bitext
    Analyzing Vertical Specific Synonyms Table

    3. Normalize frequent actions by analyzing vertical-specific synonyms: “purchase + item” and “buy + product” can be normalized under “purchase + product”

    4. We build a bottom-up knowledge graph to automatically structure intents, through their SUBJECT + VERB + OBJECT triples

    we-build-bottom-up-knowledge-graph-automatically-structure-intents-bitext-chatbot
    we-curate-custom-ontology-specific-for-each-clientchatbot-bitext

    5. We offer to curate a custom ontology specific for each client/chatbot

    Bitext NLP Data Overview

    Bitext’s Deep Linguistic Analysis

    Bitext develops comprehensive NLP datasets and multilingual tools (like lexical, semantic, and syntactic annotation tools) in up to 77 languages.

    Bitext offers multilingual datasets, designed for enterprise use, to analyze & tag text at three levels:

     

    • Lexical
    • Syntactic
    • Semantic

    Lexical Level and Lemmatization

    At the lexical level, the main component is the lemmatizer, which has integrated tools to perform decompounding or word segmentation (something required by some languages to perform proper lemmatization).

    The lemmatizer can be additionally packaged to cover the full pipeline of language analysis, from sentence segmentation to full parsing, and includes tools like spell-checking.

    Both components of the lemmatizer, data and software, can be distributed integrated or separately. All these tools are available in 77 languages and 25 language variants.

    Syntactic Level and Parsing

    At the syntactic level, the parser is the main component. The parser analyzes the structure of the sentences in the text and is used for tasks like POS Tagging and Phrase Extraction. Additionally, it is used as the base component for various semantic level tasks like Named Entity Recognition (NER), Topic-Level Sentiment Analysis or Generation of Synthetic Text. We have developed parsers for 21 languages and are always adding new languages.

    For a full list of services, at the lexical, syntactic and semantic levels, check our linguistic services.

    Example of dataset for Customer Service

    MADRID, SPAIN

    Camino de las Huertas, 20, 28223 Pozuelo
    Madrid, Spain

    SAN FRANCISCO, USA

    541 Jefferson Ave Ste 100, Redwood City
    CA 94063, USA