Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide functions for reference data #312

Merged
merged 23 commits into from
Jul 30, 2024
Merged

Provide functions for reference data #312

merged 23 commits into from
Jul 30, 2024

Conversation

jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Jul 29, 2024

Summary by CodeRabbit

  • New Features
    • Introduced functions to retrieve detailed chemical information based on element symbols from Mendeleev's database and Wolfram Alpha.
    • Added functionality to fetch experimental elastic properties of elements from Wikipedia.
    • Enhanced the environment with additional dependencies for improved capabilities and functionality.

These enhancements provide users with a structured and efficient way to access extensive chemical data, benefiting educational and research purposes.

Copy link

coderabbitai bot commented Jul 29, 2024

Walkthrough

This update significantly enhances the capability to retrieve detailed chemical information from sources like Mendeleev, Wikipedia, and Wolfram Alpha. New modules streamline access to elemental properties, promoting better data accessibility for research and education. Additionally, the environment configurations have been optimized, with various dependencies added and others removed, ensuring a more streamlined and effective functionality.

Changes

File Path Change Summary
.ci_support/environment-old.yml Major updates to dependencies: several packages added (e.g., dynaphopy, requests), and others removed (e.g., phonopy, seekpath).
.ci_support/environment.yml New dependencies added (e.g., lxml, mendeleev, pandas, phonopy, requests, seekpath), while previous outdated entries were removed.
atomistics/referencedata/__init__.py Establishes a public API for chemical information retrieval, managing imports to control function access.
atomistics/referencedata/mendeleev.py Introduces get_chemical_information, which retrieves extensive data about elements based on their symbols.
atomistics/referencedata/wikipedia.py Implements get_elastic_properties to fetch elastic properties of elements from Wikipedia using their symbols.
atomistics/referencedata/wolframalpha.py Adds get_chemical_information, for extracting various chemical properties from Wolfram Alpha, including physical and thermal characteristics.
tests/test_referencedata.py New test suite validating retrieval functions for chemical data, ensuring robustness and correctness under specified conditions.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Mendeleev
    participant Wikipedia
    participant Wolfram

    User->>Mendeleev: Request chemical information
    Mendeleev-->>User: Return detailed properties

    User->>Wikipedia: Request elastic properties
    Wikipedia-->>User: Return elastic property values

    User->>Wolfram: Request chemical data
    Wolfram-->>User: Return comprehensive chemical properties
Loading

🐇 In fields of data where elements reside,
New functions appear, oh what a ride!
Mendeleev, Wolfram, and Wikipedia's might,
Share secrets of atoms, in day and in night!
With knowledge so vast, for all to embrace,
Let's leap into science, a beautiful chase! ✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@jan-janssen jan-janssen marked this pull request as draft July 29, 2024 17:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d6cb030 and dc43179.

Files selected for processing (3)
  • atomistics/referencedata/mendeleevdb.py (1 hunks)
  • atomistics/referencedata/wiki.py (1 hunks)
  • atomistics/referencedata/wolfram.py (1 hunks)
Additional context used
Ruff
atomistics/referencedata/wolfram.py

83-83: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


106-109: Combine if branches using logical or operator

Combine if branches

(SIM114)

Additional comments not posted (1)
atomistics/referencedata/wiki.py (1)

21-23: Handle potential URL changes.

The Wikipedia URL might change, breaking the function. Consider adding a fallback mechanism or updating the URL dynamically.

Is the URL expected to remain stable? If not, consider adding a fallback mechanism or a way to update the URL dynamically.

Comment on lines 25 to 29
for i, p in enumerate(property_lst):
df_tmp = df_lst[i]
property_dict[p] = float(
df_tmp[df_tmp.symbol == chemical_symbol].squeeze(axis=0).to_dict()["WEL[1]"]
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for missing chemical symbols.

The function assumes that the chemical symbol will always be found. Add error handling for cases where the symbol is not found.

-        property_dict[p] = float(
-            df_tmp[df_tmp.symbol == chemical_symbol].squeeze(axis=0).to_dict()["WEL[1]"]
-        )
+        try:
+            property_dict[p] = float(
+                df_tmp[df_tmp.symbol == chemical_symbol].squeeze(axis=0).to_dict()["WEL[1]"]
+            )
+        except KeyError:
+            property_dict[p] = None
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i, p in enumerate(property_lst):
df_tmp = df_lst[i]
property_dict[p] = float(
df_tmp[df_tmp.symbol == chemical_symbol].squeeze(axis=0).to_dict()["WEL[1]"]
)
for i, p in enumerate(property_lst):
df_tmp = df_lst[i]
try:
property_dict[p] = float(
df_tmp[df_tmp.symbol == chemical_symbol].squeeze(axis=0).to_dict()["WEL[1]"]
)
except KeyError:
property_dict[p] = None

Comment on lines 4 to 14
def get_experimental_elastic_property_wikipedia(chemical_symbol: str) -> dict:
"""
Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com.

Args:
chemical_symbol (str): Chemical symbol of the element.
property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus

Returns:
str: Property value (various types): Value of the property for the given element, if available.
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve the function documentation.

The return type in the docstring should be dict instead of str to match the actual return type.

-        str: Property value (various types): Value of the property for the given element, if available.
+        dict: Dictionary containing the property values for the given element, if available.
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_experimental_elastic_property_wikipedia(chemical_symbol: str) -> dict:
"""
Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com.
Args:
chemical_symbol (str): Chemical symbol of the element.
property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus
Returns:
str: Property value (various types): Value of the property for the given element, if available.
"""
def get_experimental_elastic_property_wikipedia(chemical_symbol: str) -> dict:
"""
Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com.
Args:
chemical_symbol (str): Chemical symbol of the element.
property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus
Returns:
dict: Dictionary containing the property values for the given element, if available.
"""

Comment on lines 113 to 114
df = fetch_table("elements")
return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider caching the fetched data.

Fetching the table every time the function is called might be inefficient. Consider caching the data to improve performance.

from functools import lru_cache

@lru_cache(maxsize=1)
def get_chemical_information_from_mendeleev(chemical_symbol: str) -> dict:
    ...

Comment on lines 4 to 12
def get_chemical_information_from_mendeleev(chemical_symbol: str) -> dict:
"""
Get information of a given chemical element

Args:
chemical_symbol: Chemical Element like Au for Gold

Returns:
dict: Dictionary with the following keys
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for missing chemical symbols.

The function assumes that the chemical symbol will always be found. Add error handling for cases where the symbol is not found.

-    return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict()
+    try:
+        return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict()
+    except KeyError:
+        return {}
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_chemical_information_from_mendeleev(chemical_symbol: str) -> dict:
"""
Get information of a given chemical element
Args:
chemical_symbol: Chemical Element like Au for Gold
Returns:
dict: Dictionary with the following keys
def get_chemical_information_from_mendeleev(chemical_symbol: str) -> dict:
"""
Get information of a given chemical element
Args:
chemical_symbol: Chemical Element like Au for Gold
Returns:
dict: Dictionary with the following keys
"""
try:
return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict()
except KeyError:
return {}

Comment on lines 213 to 223
result = pandas.concat(
[
_collect(
url=v["url"],
column=k,
select_function=v["select_function"],
current_filter=v["current_filter"],
).set_index("element")
for k, v in tqdm(data_dict.items())
],
axis=1,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add logging for download progress.

The function might benefit from logging the download progress to provide feedback to the user.

import logging

logging.basicConfig(level=logging.INFO)

def _wolframalpha_download():
    logging.info("Starting download from Wolfram Alpha...")
    result = pandas.concat(
        [
            _collect(
                url=v["url"],
                column=k,
                select_function=v["select_function"],
                current_filter=v["current_filter"],
            ).set_index("element")
            for k, v in tqdm(data_dict.items())
        ],
        axis=1,
        sort=False,
    )
    result["volume"] = result.apply(
        lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
    )
    data_path = os.path.join(os.path.dirname(__file__), "..", "data")
    os.makedirs(data_path, exist_ok=True)
    result.to_csv(os.path.join(data_path, "wolfram.csv"))
    logging.info("Download completed.")

Comment on lines 260 to 263
filename = os.path.join(os.path.dirname(__file__), "..", "data", "wolfram.csv")
if not os.path.exists(filename):
_wolframalpha_download()
df = pandas.read_csv(filename)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider caching the fetched data.

Fetching the data every time the function is called might be inefficient. Consider caching the data to improve performance.

from functools import lru_cache

@lru_cache(maxsize=1)
def get_chemical_information_from_wolframalpha(chemical_element):
    ...

Comment on lines 212 to 232
def _wolframalpha_download():
result = pandas.concat(
[
_collect(
url=v["url"],
column=k,
select_function=v["select_function"],
current_filter=v["current_filter"],
).set_index("element")
for k, v in tqdm(data_dict.items())
],
axis=1,
sort=False,
)
result["volume"] = result.apply(
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
)
data_path = os.path.join(os.path.dirname(__file__), "..", "data")
os.makedirs(data_path, exist_ok=True)
result.to_csv(os.path.join(data_path, "wolfram.csv"))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for network-related errors.

The function does not handle network-related errors, which might cause it to fail silently.

-    result = pandas.concat(
+    try:
+        result = pandas.concat(
-    result.to_csv(os.path.join(data_path, "wolfram.csv"))
+        result.to_csv(os.path.join(data_path, "wolfram.csv"))
+    except requests.RequestException as e:
+        print(f"Network error occurred: {e}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _wolframalpha_download():
result = pandas.concat(
[
_collect(
url=v["url"],
column=k,
select_function=v["select_function"],
current_filter=v["current_filter"],
).set_index("element")
for k, v in tqdm(data_dict.items())
],
axis=1,
sort=False,
)
result["volume"] = result.apply(
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
)
data_path = os.path.join(os.path.dirname(__file__), "..", "data")
os.makedirs(data_path, exist_ok=True)
result.to_csv(os.path.join(data_path, "wolfram.csv"))
def _wolframalpha_download():
try:
result = pandas.concat(
[
_collect(
url=v["url"],
column=k,
select_function=v["select_function"],
current_filter=v["current_filter"],
).set_index("element")
for k, v in tqdm(data_dict.items())
],
axis=1,
sort=False,
)
result["volume"] = result.apply(
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
)
data_path = os.path.join(os.path.dirname(__file__), "..", "data")
os.makedirs(data_path, exist_ok=True)
result.to_csv(os.path.join(data_path, "wolfram.csv"))
except requests.RequestException as e:
print(f"Network error occurred: {e}")

Comment on lines 234 to 264
def get_chemical_information_from_wolframalpha(chemical_element):
"""
Get information of a given chemical element
Args:
chemical_element: Chemical Element like Au for Gold
Returns:
dict: Dictionary with the following keys
element: chemical element
thermalcondictivity: thermal conductivity
atomicradius: calculated distance from nucleus of outermost electron
bulkmodulus: bulk modulus (incompressibility)
shearmodulus: shear modulus of solid
youngmodulus: Young's modulus of solid
poissonratio: Poisson ratio of solid
density: density at standard temperature and pressure
liquiddensity: liquid density at melting point
thermalexpansion: linear thermal expansion coefficient
meltingpoint: melting temperature in kelvin
vaporizationheat: latent heat for liquid-gas transition
specificheat: specific heat capacity
latticeconstant: crystal lattice constants
crystal: basic crystal lattice structure
volmolar: molar volume
mass: average atomic weight in atomic mass units
volume: Volume
"""
filename = os.path.join(os.path.dirname(__file__), "..", "data", "wolfram.csv")
if not os.path.exists(filename):
_wolframalpha_download()
df = pandas.read_csv(filename)
return df[df.element == chemical_element].squeeze(axis=0).to_dict()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for missing chemical elements.

The function assumes that the chemical element will always be found. Add error handling for cases where the element is not found.

-    return df[df.element == chemical_element].squeeze(axis=0).to_dict()
+    try:
+        return df[df.element == chemical_element].squeeze(axis=0).to_dict()
+    except KeyError:
+        return {}

Committable suggestion was skipped due to low confidence.

@jan-janssen jan-janssen requested a review from samwaseda July 29, 2024 20:32
@jan-janssen jan-janssen marked this pull request as ready for review July 29, 2024 20:32
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Outside diff range, codebase verification and nitpick comments (4)
tests/test_referencedata.py (3)

29-31: Consider adding more assertions.

The test currently checks only the atomic number. Adding more assertions can improve test coverage and ensure the function retrieves all expected data correctly.

self.assertEqual(al_data["atomic_number"], 13)
self.assertEqual(al_data["symbol"], "Al")
self.assertEqual(al_data["name"], "Aluminum")

37-39: Consider adding more assertions.

The test currently checks only the lattice constant. Adding more assertions can improve test coverage and ensure the function retrieves all expected data correctly.

self.assertEqual(al_data["latticeconstant"], "(404.95, 404.95, 404.95)")
self.assertEqual(al_data["atomic_number"], 13)
self.assertEqual(al_data["symbol"], "Al")
self.assertEqual(al_data["name"], "Aluminum")

44-53: Consider adding more assertions.

The test currently checks only the elastic properties. Adding more assertions can improve test coverage and ensure the function retrieves all expected data correctly.

self.assertEqual(al_data["bulk_modulus"], 76.0)
self.assertEqual(al_data["poissons_ratio"], 0.35)
self.assertEqual(al_data["shear_modulus"], 26.0)
self.assertEqual(al_data["youngs_modulus"], 70.0)
self.assertEqual(al_data["symbol"], "Al")
self.assertEqual(al_data["name"], "Aluminum")
pyproject.toml (1)

59-64: LGTM! Consider adding a comment explaining the purpose of the experimental section.

The addition of the experimental section is a good way to manage dependencies that are not yet fully integrated into the main project workflow. Adding a brief comment can help other developers understand its purpose.

[project.optional-dependencies]
# Dependencies that are still experimental and not fully integrated
experimental = [
    "lxml==5.2.2",
    "mendeleev==0.17.0",
    "pandas==2.2.2",
    "requests==2.32.3",
]
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between dc43179 and a013161.

Files selected for processing (6)
  • .ci_support/environment-old.yml (1 hunks)
  • .ci_support/environment.yml (1 hunks)
  • atomistics/referencedata/init.py (1 hunks)
  • atomistics/referencedata/wolfram.py (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_referencedata.py (1 hunks)
Additional context used
Ruff
tests/test_referencedata.py

17-17: lxml imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

atomistics/referencedata/wolfram.py

82-82: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


105-108: Combine if branches using logical or operator

Combine if branches

(SIM114)

Additional comments not posted (20)
.ci_support/environment.yml (6)

6-6: Dependency Addition: lxml

The addition of lxml =5.2.2 looks good. Ensure that this version is compatible with other dependencies in the environment.


7-7: Dependency Addition: mendeleev

The addition of mendeleev =0.17.0 looks good. Ensure that this version is compatible with other dependencies in the environment.


9-9: Dependency Addition: pandas

The addition of pandas =2.2.2 looks good. Ensure that this version is compatible with other dependencies in the environment.


10-10: Dependency Addition: phonopy

The addition of phonopy =2.26.6 looks good. Ensure that this version is compatible with other dependencies in the environment.


11-11: Dependency Addition: requests

The addition of requests =2.32.3 looks good. Ensure that this version is compatible with other dependencies in the environment.


13-13: Dependency Addition: seekpath

The addition of seekpath =2.1.0 looks good. Ensure that this version is compatible with other dependencies in the environment.

.ci_support/environment-old.yml (7)

5-5: Dependency Addition: dynaphopy

The addition of dynaphopy =1.17.5 looks good. Ensure that this version is compatible with other dependencies in the environment.


7-7: Dependency Addition: iprpy-data

The addition of iprpy-data =2023.07.25 looks good. Ensure that this version is compatible with other dependencies in the environment.


8-8: Dependency Addition: jinja2

The addition of jinja2 =2.11.3 looks good. Ensure that this version is compatible with other dependencies in the environment.


9-9: Dependency Addition: lammps

The addition of lammps =2022.06.23 looks good. Ensure that this version is compatible with other dependencies in the environment.


10-10: Dependency Addition: lxml

The addition of lxml =4.9.1 looks good. Ensure that this version is compatible with other dependencies in the environment.


11-11: Dependency Addition: mendeleev

The addition of mendeleev =0.10.0 looks good. Ensure that this version is compatible with other dependencies in the environment.


16-16: Dependency Addition: requests

The addition of requests =2.24.0 looks good. Ensure that this version is compatible with other dependencies in the environment.

atomistics/referencedata/__init__.py (3)

1-1: Import Statement: get_experimental_elastic_property_wikipedia

The import of get_experimental_elastic_property_wikipedia from atomistics.referencedata.wiki looks good. Ensure that this function exists in the specified module.


3-14: Import Statements and Error Handling: get_chemical_information_from_mendeleev and get_chemical_information_from_wolframalpha

The import of get_chemical_information_from_mendeleev and get_chemical_information_from_wolframalpha from atomistics.referencedata.mendeleevdb looks good. The try-except block effectively handles potential import errors. Ensure that these functions exist in the specified module.


9-17: all Declaration

The __all__ declaration correctly lists the imported functions. This ensures that only the specified functions are accessible when the module is imported.

atomistics/referencedata/wolfram.py (4)

17-21: LGTM!

The function correctly handles the conversion of density values from g/l to kg/l.


24-28: LGTM!

The function correctly handles the conversion of split string values to float.


31-36: LGTM!

The function correctly handles the conversion of lattice constant values from a split string.


39-43: LGTM!

The function correctly handles the conversion of scientific notation strings to float values.

Comment on lines 211 to 230
def _wolframalpha_download():
result = pandas.concat(
[
_collect(
url=v["url"],
column=k,
select_function=v["select_function"],
current_filter=v["current_filter"],
).set_index("element")
for k, v in data_dict.items()
],
axis=1,
sort=False,
)
result["volume"] = result.apply(
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
)
data_path = os.path.join(os.path.dirname(__file__), "data")
os.makedirs(data_path, exist_ok=True)
result.to_csv(os.path.join(data_path, "wolfram.csv"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for network-related errors.

The function does not handle network-related errors, which might cause it to fail silently.

-    result = pandas.concat(
+    try:
+        result = pandas.concat(
-    result.to_csv(os.path.join(data_path, "wolfram.csv"))
+        result.to_csv(os.path.join(data_path, "wolfram.csv"))
+    except requests.RequestException as e:
+        print(f"Network error occurred: {e}")

Committable suggestion was skipped due to low confidence.

Comment on lines 46 to 50
def _extract_entry(df, n):
for i in range(int(len(df.columns) / 2)):
if n in df[i].values:
v = df[i + 1].values[df[i].values.tolist().index(n)]
return v
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for missing values.

The function does not handle cases where the value is not found in the DataFrame, which might cause it to fail silently.

-            v = df[i + 1].values[df[i].values.tolist().index(n)]
+            try:
+                v = df[i + 1].values[df[i].values.tolist().index(n)]
+            except ValueError:
+                print(f"Value {n} not found in DataFrame.")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _extract_entry(df, n):
for i in range(int(len(df.columns) / 2)):
if n in df[i].values:
v = df[i + 1].values[df[i].values.tolist().index(n)]
return v
def _extract_entry(df, n):
for i in range(int(len(df.columns) / 2)):
if n in df[i].values:
try:
v = df[i + 1].values[df[i].values.tolist().index(n)]
except ValueError:
print(f"Value {n} not found in DataFrame.")
return v

Comment on lines 233 to 263
def get_chemical_information_from_wolframalpha(chemical_element):
"""
Get information of a given chemical element
Args:
chemical_element: Chemical Element like Au for Gold
Returns:
dict: Dictionary with the following keys
element: chemical element
thermalcondictivity: thermal conductivity
atomicradius: calculated distance from nucleus of outermost electron
bulkmodulus: bulk modulus (incompressibility)
shearmodulus: shear modulus of solid
youngmodulus: Young's modulus of solid
poissonratio: Poisson ratio of solid
density: density at standard temperature and pressure
liquiddensity: liquid density at melting point
thermalexpansion: linear thermal expansion coefficient
meltingpoint: melting temperature in kelvin
vaporizationheat: latent heat for liquid-gas transition
specificheat: specific heat capacity
latticeconstant: crystal lattice constants
crystal: basic crystal lattice structure
volmolar: molar volume
mass: average atomic weight in atomic mass units
volume: Volume
"""
filename = os.path.join(os.path.dirname(__file__), "data", "wolfram.csv")
if not os.path.exists(filename):
_wolframalpha_download()
df = pandas.read_csv(filename)
return df[df.element == chemical_element].squeeze(axis=0).to_dict()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for missing chemical elements.

The function assumes that the chemical element will always be found. Add error handling for cases where the element is not found.

-    return df[df.element == chemical_element].squeeze(axis=0).to_dict()
+    try:
+        return df[df.element == chemical_element].squeeze(axis=0).to_dict()
+    except KeyError:
+        return {}

Committable suggestion was skipped due to low confidence.

Comment on lines 9 to 14
def _get_content_from_url(url):
content = pandas.read_html(requests.get(url).text)
if len(content[8]) > 1:
return content[8]
else:
return content[9]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for network-related errors.

The function does not handle network-related errors, which might cause it to fail silently.

-    content = pandas.read_html(requests.get(url).text)
+    try:
+        response = requests.get(url)
+        response.raise_for_status()
+        content = pandas.read_html(response.text)
+    except requests.RequestException as e:
+        print(f"Network error occurred: {e}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _get_content_from_url(url):
content = pandas.read_html(requests.get(url).text)
if len(content[8]) > 1:
return content[8]
else:
return content[9]
def _get_content_from_url(url):
try:
response = requests.get(url)
response.raise_for_status()
content = pandas.read_html(response.text)
except requests.RequestException as e:
print(f"Network error occurred: {e}")
return None
if len(content[8]) > 1:
return content[8]
else:
return content[9]

Comment on lines 72 to 82
def _extract_lst(df, column, select_function, current_filter):
element_lst, property_lst = [], []
try:
ptable = fetch_table("elements")
for n, el in zip(ptable.name.values, ptable.symbol.values):
v = _extract_entry(df=df, n=n)
if current_filter(v):
element_lst.append(el)
property_lst.append(select_function(v))
except ValueError:
raise ValueError(column, el, v)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve error handling by providing more context.

The function raises a ValueError without providing context about the error. Use raise ... from None to distinguish it from errors in exception handling.

-    except ValueError:
-        raise ValueError(column, el, v)
+    except ValueError as e:
+        raise ValueError(f"Error extracting {column} for element {el} with value {v}") from e
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _extract_lst(df, column, select_function, current_filter):
element_lst, property_lst = [], []
try:
ptable = fetch_table("elements")
for n, el in zip(ptable.name.values, ptable.symbol.values):
v = _extract_entry(df=df, n=n)
if current_filter(v):
element_lst.append(el)
property_lst.append(select_function(v))
except ValueError:
raise ValueError(column, el, v)
def _extract_lst(df, column, select_function, current_filter):
element_lst, property_lst = [], []
try:
ptable = fetch_table("elements")
for n, el in zip(ptable.name.values, ptable.symbol.values):
v = _extract_entry(df=df, n=n)
if current_filter(v):
element_lst.append(el)
property_lst.append(select_function(v))
except ValueError as e:
raise ValueError(f"Error extracting {column} for element {el} with value {v}") from e
Tools
Ruff

82-82: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Comment on lines 103 to 108
def _get_volume(lat_lst, crystal):
if not isinstance(lat_lst, float) and len(lat_lst) == 3:
if crystal == "Face-centered Cubic":
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
elif crystal == "Body-centered Cubic":
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine the if conditions.

The if conditions for Face-centered Cubic and Body-centered Cubic are redundant and can be combined.

-        if crystal == "Face-centered Cubic":
-            return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
-        elif crystal == "Body-centered Cubic":
+        if crystal in ["Face-centered Cubic", "Body-centered Cubic"]:
            return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _get_volume(lat_lst, crystal):
if not isinstance(lat_lst, float) and len(lat_lst) == 3:
if crystal == "Face-centered Cubic":
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
elif crystal == "Body-centered Cubic":
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
def _get_volume(lat_lst, crystal):
if not isinstance(lat_lst, float) and len(lat_lst) == 3:
if crystal in ["Face-centered Cubic", "Body-centered Cubic"]:
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
Tools
Ruff

105-108: Combine if branches using logical or operator

Combine if branches

(SIM114)

@samwaseda
Copy link
Member

I need an example to see how you imagined the interaction with the database.

@jan-janssen
Copy link
Member Author

I need an example to see how you imagined the interaction with the database.

That's what the unit tests show, right?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Outside diff range, codebase verification and nitpick comments (1)
tests/test_referencedata.py (1)

1-3: Remove unnecessary blank lines.

There are unnecessary blank lines at the beginning of the file.

import unittest
from atomistics.referencedata import get_experimental_elastic_property_wikipedia
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between a013161 and 7cfd7c9.

Files selected for processing (2)
  • atomistics/referencedata/wolfram.py (1 hunks)
  • tests/test_referencedata.py (1 hunks)
Additional context used
Ruff
tests/test_referencedata.py

17-17: lxml imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

atomistics/referencedata/wolfram.py

201-201: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


250-253: Combine if branches using logical or operator

Combine if branches

(SIM114)

Additional comments not posted (13)
tests/test_referencedata.py (3)

29-32: LGTM!

The test method is well-written and conditionally skipped if mendeleev is not available.


37-39: LGTM!

The test method is well-written and conditionally skipped if mendeleev is not available.


44-54: LGTM!

The test method is well-written and conditionally skipped if lxml is not available.

atomistics/referencedata/wolfram.py (10)

27-40: LGTM!

The function is straightforward and correctly handles different density units.


43-56: LGTM!

The function is straightforward and correctly handles both string and float inputs.


59-73: LGTM!

The function is straightforward and correctly handles the conversion.


76-93: LGTM!

The function is straightforward and correctly handles the conversion.


114-124: LGTM!

The function is straightforward and correctly handles the validation.


127-137: LGTM!

The function is straightforward and correctly handles the validation.


140-151: LGTM!

The function is straightforward and correctly handles the selection.


154-168: LGTM!

The function is straightforward and correctly handles the selection.


205-233: LGTM!

The function is straightforward and correctly handles the data collection and extraction.


96-111: Add error handling for missing values.

The function does not handle cases where the value is not found in the DataFrame, which might cause it to fail silently.

            return v
            try:
                return v
            except ValueError:
                print(f"Value {n} not found in DataFrame.")

Likely invalid or redundant comment.

Comment on lines +5 to +14
try:
from atomistics.referencedata import (
get_chemical_information_from_mendeleev,
get_chemical_information_from_wolframalpha,
)

mendeleev_not_available = False
except ImportError:
mendeleev_not_available = True

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using importlib.util.find_spec for conditional imports.

Instead of using try-except blocks for conditional imports, consider using importlib.util.find_spec to check for module availability.

try:
    from atomistics.referencedata import (
        get_chemical_information_from_mendeleev,
        get_chemical_information_from_wolframalpha,
    )
    mendeleev_not_available = False
except ImportError:
    mendeleev_not_available = True
import importlib.util
mendeleev_not_available = importlib.util.find_spec("atomistics.referencedata") is None
if not mendeleev_not_available:
    from atomistics.referencedata import (
        get_chemical_information_from_mendeleev,
        get_chemical_information_from_wolframalpha,
    )

Committable suggestion was skipped due to low confidence.

Comment on lines +16 to +21
try:
import lxml

lxml_not_available = False
except ImportError:
lxml_not_available = True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using importlib.util.find_spec for conditional imports.

Instead of using try-except blocks for conditional imports, consider using importlib.util.find_spec to check for module availability.

try:
    import lxml
    lxml_not_available = False
except ImportError:
    lxml_not_available = True
lxml_not_available = importlib.util.find_spec("lxml") is None
if not lxml_not_available:
    import lxml

Committable suggestion was skipped due to low confidence.

Tools
Ruff

17-17: lxml imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

Comment on lines 272 to 382
"url": "https://periodictable.com/Properties/A/ThermalConductivity.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"atomicradius": {
"url": "https://periodictable.com/Properties/A/AtomicRadius.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"bulkmodulus": {
"url": "https://periodictable.com/Properties/A/BulkModulus.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"shearmodulus": {
"url": "https://periodictable.com/Properties/A/ShearModulus.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"youngmodulus": {
"url": "https://periodictable.com/Properties/A/YoungModulus.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"poissonratio": {
"url": "https://periodictable.com/Properties/A/PoissonRatio.an.html",
"select_function": _select_function_poisson,
"current_filter": _poisson_filter,
},
"density": {
"url": "https://periodictable.com/Properties/A/Density.an.html",
"select_function": _select_function_density,
"current_filter": _default_filter,
},
"liquiddensity": {
"url": "https://periodictable.com/Properties/A/LiquidDensity.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"thermalexpansion": {
"url": "https://periodictable.com/Properties/A/ThermalExpansion.an.html",
"select_function": _select_function_scientific,
"current_filter": _default_filter,
},
"meltingpoint": {
"url": "https://periodictable.com/Properties/A/AbsoluteMeltingPoint.an.html",
"select_function": _select_function_scientific,
"current_filter": _default_filter,
},
"vaporizationheat": {
"url": "https://periodictable.com/Properties/A/VaporizationHeat.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"specificheat": {
"url": "https://periodictable.com/Properties/A/SpecificHeat.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"latticeconstant": {
"url": "https://periodictable.com/Properties/A/LatticeConstants.an.html",
"select_function": _select_function_lattice,
"current_filter": _default_filter,
},
"crystal": {
"url": "https://periodictable.com/Properties/A/CrystalStructure.an.html",
"select_function": _select_function_poisson,
"current_filter": _default_filter,
},
"volmolar": {
"url": "https://periodictable.com/Properties/A/MolarVolume.an.html",
"select_function": _select_function_scientific,
"current_filter": _default_filter,
},
"mass": {
"url": "https://periodictable.com/Properties/A/AtomicMass.an.html",
"select_function": _select_function_mass,
"current_filter": _default_filter,
},
}
result = pandas.concat(
[
_collect(
url=v["url"],
column=k,
select_function=v["select_function"],
current_filter=v["current_filter"],
).set_index("element")
for k, v in data_dict.items()
],
axis=1,
sort=False,
)
result["volume"] = result.apply(
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
)
data_path = os.path.join(os.path.dirname(__file__), "data")
os.makedirs(data_path, exist_ok=True)
result.to_csv(os.path.join(data_path, "wolfram.csv"))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add logging for download progress and error handling for network-related errors.

The function might benefit from logging the download progress to provide feedback to the user and handling network-related errors.

import logging

logging.basicConfig(level=logging.INFO)

def _wolframalpha_download() -> None:
    """
    Downloads data from Wolfram Alpha and saves it to a CSV file.

    This function downloads various properties data from Wolfram Alpha using the provided URLs in the `data_dict`.
    The downloaded data is then processed and saved to a CSV file named "wolfram.csv" in the "data" directory.

    Note: This function requires internet connectivity to download the data.
    """
    data_dict = {
        "thermalcondictivity": {
            "url": "https://periodictable.com/Properties/A/ThermalConductivity.an.html",
            "select_function": _select_function_split,
            "current_filter": _default_filter,
        },
        "atomicradius": {
            "url": "https://periodictable.com/Properties/A/AtomicRadius.an.html",
            "select_function": _select_function_split,
            "current_filter": _default_filter,
        },
        "bulkmodulus": {
            "url": "https://periodictable.com/Properties/A/BulkModulus.an.html",
            "select_function": _select_function_split,
            "current_filter": _default_filter,
        },
        "shearmodulus": {
            "url": "https://periodictable.com/Properties/A/ShearModulus.an.html",
            "select_function": _select_function_split,
            "current_filter": _default_filter,
        },
        "youngmodulus": {
            "url": "https://periodictable.com/Properties/A/YoungModulus.an.html",
            "select_function": _select_function_split,
            "current_filter": _default_filter,
        },
        "poissonratio": {
            "url": "https://periodictable.com/Properties/A/PoissonRatio.an.html",
            "select_function": _select_function_poisson,
            "current_filter": _poisson_filter,
        },
        "density": {
            "url": "https://periodictable.com/Properties/A/Density.an.html",
            "select_function": _select_function_density,
            "current_filter": _default_filter,
        },
        "liquiddensity": {
            "url": "https://periodictable.com/Properties/A/LiquidDensity.an.html",
            "select_function": _select_function_split,
            "current_filter": _default_filter,
        },
        "thermalexpansion": {
            "url": "https://periodictable.com/Properties/A/ThermalExpansion.an.html",
            "select_function": _select_function_scientific,
            "current_filter": _default_filter,
        },
        "meltingpoint": {
            "url": "https://periodictable.com/Properties/A/AbsoluteMeltingPoint.an.html",
            "select_function": _select_function_scientific,
            "current_filter": _default_filter,
        },
        "vaporizationheat": {
            "url": "https://periodictable.com/Properties/A/VaporizationHeat.an.html",
            "select_function": _select_function_split,
            "current_filter": _default_filter,
        },
        "specificheat": {
            "url": "https://periodictable.com/Properties/A/SpecificHeat.an.html",
            "select_function": _select_function_split,
            "current_filter": _default_filter,
        },
        "latticeconstant": {
            "url": "https://periodictable.com/Properties/A/LatticeConstants.an.html",
            "select_function": _select_function_lattice,
            "current_filter": _default_filter,
        },
        "crystal": {
            "url": "https://periodictable.com/Properties/A/CrystalStructure.an.html",
            "select_function": _select_function_poisson,
            "current_filter": _default_filter,
        },
        "volmolar": {

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines 236 to 269
def _get_volume(
lat_lst: Union[float, Tuple[float, float, float]], crystal: str
) -> Optional[float]:
"""
Calculate the volume of a crystal unit cell.

Args:
lat_lst: Lattice constants of the crystal unit cell.
crystal: Basic crystal lattice structure.

Returns:
The volume of the crystal unit cell in cubic meters, or None if the input is invalid.
"""
if not isinstance(lat_lst, float) and len(lat_lst) == 3:
if crystal == "Face-centered Cubic":
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
elif crystal == "Body-centered Cubic":
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
elif crystal == "Simple Hexagonal":
return (
lat_lst[0]
* lat_lst[1]
* lat_lst[2]
* 3
* np.sqrt(3)
/ 2
/ 100
/ 100
/ 100
)
else:
return None
else:
return None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine the if conditions.

The if conditions for "Face-centered Cubic" and "Body-centered Cubic" crystals are redundant and can be combined.

        if crystal == "Face-centered Cubic":
            return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
        elif crystal == "Body-centered Cubic":
        if crystal in ["Face-centered Cubic", "Body-centered Cubic"]:
            return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _get_volume(
lat_lst: Union[float, Tuple[float, float, float]], crystal: str
) -> Optional[float]:
"""
Calculate the volume of a crystal unit cell.
Args:
lat_lst: Lattice constants of the crystal unit cell.
crystal: Basic crystal lattice structure.
Returns:
The volume of the crystal unit cell in cubic meters, or None if the input is invalid.
"""
if not isinstance(lat_lst, float) and len(lat_lst) == 3:
if crystal == "Face-centered Cubic":
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
elif crystal == "Body-centered Cubic":
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
elif crystal == "Simple Hexagonal":
return (
lat_lst[0]
* lat_lst[1]
* lat_lst[2]
* 3
* np.sqrt(3)
/ 2
/ 100
/ 100
/ 100
)
else:
return None
else:
return None
if not isinstance(lat_lst, float) and len(lat_lst) == 3:
if crystal in ["Face-centered Cubic", "Body-centered Cubic"]:
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
elif crystal == "Simple Hexagonal":
return (
lat_lst[0]
* lat_lst[1]
* lat_lst[2]
* 3
* np.sqrt(3)
/ 2
/ 100
/ 100
/ 100
)
else:
return None
else:
return None
Tools
Ruff

250-253: Combine if branches using logical or operator

Combine if branches

(SIM114)

Comment on lines 10 to 25
def _get_content_from_url(url: str) -> pandas.DataFrame:
"""
Retrieves content from the given URL and returns a pandas DataFrame.

Args:
url (str): The URL to retrieve content from.

Returns:
pandas.DataFrame: The content retrieved from the URL as a pandas DataFrame.
"""
content = pandas.read_html(requests.get(url).text)
if len(content[8]) > 1:
return content[8]
else:
return content[9]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for network-related errors.

The function does not handle network-related errors, which might cause it to fail silently.

    content = pandas.read_html(requests.get(url).text)
    try:
        response = requests.get(url)
        response.raise_for_status()
        content = pandas.read_html(response.text)
    except requests.RequestException as e:
        print(f"Network error occurred: {e}")

Committable suggestion was skipped due to low confidence.

Comment on lines 171 to 202
def _extract_lst(
df: pandas.DataFrame,
column: str,
select_function: Callable,
current_filter: Callable,
) -> Tuple[List[str], List[str]]:
"""
Extracts element and property lists from a DataFrame based on the given column, select function, and current filter.

Args:
df (DataFrame): The DataFrame to extract data from.
column (str): The column name to extract data from.
select_function (Callable): A function that selects a property value from a DataFrame entry.
current_filter (Callable): A function that filters DataFrame entries based on a condition.

Returns:
Tuple[List[str], List[str]]: A tuple containing the element list and property list.

Raises:
ValueError: If an error occurs during the extraction process.
"""
element_lst, property_lst = [], []
try:
ptable = fetch_table("elements")
for n, el in zip(ptable.name.values, ptable.symbol.values):
v = _extract_entry(df=df, n=n)
if current_filter(v):
element_lst.append(el)
property_lst.append(select_function(v))
except ValueError:
raise ValueError(column, el, v)
return element_lst, property_lst
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve error handling by providing more context.

The function raises a ValueError without providing context about the error. Use raise ... from None to distinguish it from errors in exception handling.

        raise ValueError(column, el, v)
        raise ValueError(f"Error extracting {column} for element {el} with value {v}") from e
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _extract_lst(
df: pandas.DataFrame,
column: str,
select_function: Callable,
current_filter: Callable,
) -> Tuple[List[str], List[str]]:
"""
Extracts element and property lists from a DataFrame based on the given column, select function, and current filter.
Args:
df (DataFrame): The DataFrame to extract data from.
column (str): The column name to extract data from.
select_function (Callable): A function that selects a property value from a DataFrame entry.
current_filter (Callable): A function that filters DataFrame entries based on a condition.
Returns:
Tuple[List[str], List[str]]: A tuple containing the element list and property list.
Raises:
ValueError: If an error occurs during the extraction process.
"""
element_lst, property_lst = [], []
try:
ptable = fetch_table("elements")
for n, el in zip(ptable.name.values, ptable.symbol.values):
v = _extract_entry(df=df, n=n)
if current_filter(v):
element_lst.append(el)
property_lst.append(select_function(v))
except ValueError:
raise ValueError(column, el, v)
return element_lst, property_lst
def _extract_lst(
df: pandas.DataFrame,
column: str,
select_function: Callable,
current_filter: Callable,
) -> Tuple[List[str], List[str]]:
"""
Extracts element and property lists from a DataFrame based on the given column, select function, and current filter.
Args:
df (DataFrame): The DataFrame to extract data from.
column (str): The column name to extract data from.
select_function (Callable): A function that selects a property value from a DataFrame entry.
current_filter (Callable): A function that filters DataFrame entries based on a condition.
Returns:
Tuple[List[str], List[str]]: A tuple containing the element list and property list.
Raises:
ValueError: If an error occurs during the extraction process.
"""
element_lst, property_lst = [], []
try:
ptable = fetch_table("elements")
for n, el in zip(ptable.name.values, ptable.symbol.values):
v = _extract_entry(df=df, n=n)
if current_filter(v):
element_lst.append(el)
property_lst.append(select_function(v))
except ValueError as e:
raise ValueError(f"Error extracting {column} for element {el} with value {v}") from e
return element_lst, property_lst
Tools
Ruff

201-201: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Copy link
Member

@samwaseda samwaseda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names must be consistent: currently get_{property}_from_{source} for Wolfram and Mendeleev but get_{property}_{source} for Wikipedia. And I would also avoid putting specifiers, in this case experimental, in the function names. There's anyway no way to get theoretical elastic constants, but in general I would put experimental yes/no etc. in the function arguments.

@samwaseda
Copy link
Member

Actually I would simply call these functions something like get_chemical_information or get_elastic_property, so that people can use it like

from referencedata import wikipedia


elast = wikipedia.get_elastic_property("Al")

because this way the hierarchy becomes clearer, it avoids having to import too many functions and the function names are shorter.

…y_wikipedia() to get_elastic_properties_from_wikipedia()
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (2)
tests/test_referencedata.py (2)

5-14: Consider using importlib.util.find_spec for conditional imports.

Instead of using try-except blocks for conditional imports, consider using importlib.util.find_spec to check for module availability.

- try:
-     from atomistics.referencedata import (
-         get_chemical_information_from_mendeleev,
-         get_chemical_information_from_wolframalpha,
-     )
-     mendeleev_not_available = False
- except ImportError:
-     mendeleev_not_available = True
+ import importlib.util
+ mendeleev_not_available = importlib.util.find_spec("atomistics.referencedata") is None
+ if not mendeleev_not_available:
+     from atomistics.referencedata import (
+         get_chemical_information_from_mendeleev,
+         get_chemical_information_from_wolframalpha,
+     )

16-21: Remove unused import lxml.

The lxml module is imported but not used in the code. Consider using importlib.util.find_spec to check for module availability instead.

- try:
-     import lxml
-     lxml_not_available = False
- except ImportError:
-     lxml_not_available = True
+ import importlib.util
+ lxml_not_available = importlib.util.find_spec("lxml") is None
Tools
Ruff

17-17: lxml imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 7cfd7c9 and fa0929b.

Files selected for processing (4)
  • atomistics/referencedata/init.py (1 hunks)
  • atomistics/referencedata/wiki.py (1 hunks)
  • atomistics/referencedata/wolfram.py (1 hunks)
  • tests/test_referencedata.py (1 hunks)
Files skipped from review as they are similar to previous changes (3)
  • atomistics/referencedata/init.py
  • atomistics/referencedata/wiki.py
  • atomistics/referencedata/wolfram.py
Additional context used
Ruff
tests/test_referencedata.py

17-17: lxml imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

Additional comments not posted (6)
tests/test_referencedata.py (6)

24-28: LGTM!

The unittest.skipIf condition is correctly implemented to skip the test if mendeleev is not available.


29-32: LGTM!

The test method correctly checks if the atomic number of Aluminum is 13.


33-36: LGTM!

The unittest.skipIf condition is correctly implemented to skip the test if mendeleev is not available.


37-40: LGTM!

The test method correctly checks if the lattice constant of Aluminum is retrieved as expected.


41-43: LGTM!

The unittest.skipIf condition is correctly implemented to skip the test if lxml is not available.


44-54: LGTM!

The test method correctly checks if the elastic properties of Aluminum are retrieved as expected.

from atomistics.referencedata import get_elastic_properties_from_wikipedia

as well as

from atomistics.referencedata.wikipedia import get_elastic_properties
@jan-janssen
Copy link
Member Author

jan-janssen commented Jul 30, 2024

@samwaseda Support both:

from atomistics.referencedata import get_elastic_properties_from_wikipedia 

as well as

from atomistics.referencedata.wikipedia import get_elastic_properties

I like the first one, as it allows me to get a quick overview of which modules are available but I can also live with the second one in particular for properties which are available from multiple sources.

Copy link
Member

@samwaseda samwaseda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for the rest we have to see how things evolve. I don't think we'll suffer from backward incompatibility in the near future, so you can merge it

property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus

Returns:
str: Property value (various types): Value of the property for the given element, if available.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
str: Property value (various types): Value of the property for the given element, if available.
str: Property value (various types): Value of the property for the given element in GPa, if available.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Outside diff range, codebase verification and nitpick comments (2)
atomistics/referencedata/mendeleev.py (1)

4-112: Specify the return type in the docstring.

The docstring should specify that the function returns a dictionary.

-        dict: Dictionary with the following keys
+        dict: A dictionary containing the following keys
atomistics/referencedata/wolframalpha.py (1)

384-414: Specify the return type in the docstring.

The docstring should specify that the function returns a dictionary.

-        dict: Dictionary with the following keys
+        dict: A dictionary containing the following keys
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between fa0929b and fe3c0d7.

Files selected for processing (4)
  • atomistics/referencedata/init.py (1 hunks)
  • atomistics/referencedata/mendeleev.py (1 hunks)
  • atomistics/referencedata/wikipedia.py (1 hunks)
  • atomistics/referencedata/wolframalpha.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • atomistics/referencedata/init.py
Additional comments not posted (2)
atomistics/referencedata/wikipedia.py (1)

25-29: Verify table order assumption and handle potential changes.

The function assumes that the order of tables in the Wikipedia page will not change, which might not be a safe assumption. Consider adding a verification step to ensure the correct table is being used.

+    if len(df_lst) < len(property_lst):
+        raise ValueError("The number of tables on the Wikipedia page is less than expected.")
atomistics/referencedata/wolframalpha.py (1)

21-24: Verify table order assumption and handle potential changes.

The function assumes that the content will always be in the 8th or 9th table, which might not be a safe assumption. Consider adding a verification step to ensure the correct table is being used.

+    if len(content) < 9:
+        raise ValueError("The number of tables on the page is less than expected.")

Comment on lines +21 to +23
df_lst = pandas.read_html(
"https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page)"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle potential errors during data retrieval.

The function does not handle potential errors that may occur during the data retrieval process. Add error handling to manage cases where the URL might be inaccessible or the data format changes.

+    try:
+        df_lst = pandas.read_html(
+            "https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page)"
+        )
+    except Exception as e:
+        raise ValueError(f"Error retrieving data from Wikipedia: {e}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
df_lst = pandas.read_html(
"https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page)"
)
try:
df_lst = pandas.read_html(
"https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page)"
)
except Exception as e:
raise ValueError(f"Error retrieving data from Wikipedia: {e}")

Comment on lines 4 to 14
def get_elastic_properties(chemical_symbol: str) -> dict:
"""
Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com.

Args:
chemical_symbol (str): Chemical symbol of the element.
property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus

Returns:
str: Property value (various types): Value of the property for the given element, if available.
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the docstring to match the function signature.

The docstring mentions an argument property which is not present in the function signature. Remove the mention of the property argument from the docstring.

-        property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_elastic_properties(chemical_symbol: str) -> dict:
"""
Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com.
Args:
chemical_symbol (str): Chemical symbol of the element.
property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus
Returns:
str: Property value (various types): Value of the property for the given element, if available.
"""
def get_elastic_properties(chemical_symbol: str) -> dict:
"""
Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com.
Args:
chemical_symbol (str): Chemical symbol of the element.
Returns:
str: Property value (various types): Value of the property for the given element, if available.
"""

Comment on lines +113 to +114
df = fetch_table("elements")
return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle cases where the chemical symbol is not found.

The function does not handle cases where the chemical symbol is not found in the table. Add error handling to manage this scenario.

+    element_data = df[df.symbol == chemical_symbol]
+    if element_data.empty:
+        raise ValueError(f"Chemical symbol '{chemical_symbol}' not found in the table.")
+    return element_data.squeeze(axis=0).to_dict()
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
df = fetch_table("elements")
return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict()
df = fetch_table("elements")
element_data = df[df.symbol == chemical_symbol]
if element_data.empty:
raise ValueError(f"Chemical symbol '{chemical_symbol}' not found in the table.")
return element_data.squeeze(axis=0).to_dict()

Comment on lines +76 to +93
def _select_function_scientific(v: Union[str, float]) -> float:
"""
Converts a scientific notation string to a float.

Args:
v (Union[str, float]): The value to be converted. If it is a string, it should be in scientific notation.

Returns:
float: The converted value.

Example:
>>> _select_function_scientific('1.23E-4')
0.000123
"""
if isinstance(v, str):
return float(v.split()[0].replace("×10", "E"))
else:
return v
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle unexpected input formats.

The function does not handle cases where the input string is not in the expected format. Add error handling to manage these scenarios.

+    try:
+        if isinstance(v, str):
+            return float(v.split()[0].replace("×10", "E"))
+        else:
+            return v
+    except (ValueError, IndexError) as e:
+        raise ValueError(f"Unexpected input format for scientific value '{v}': {e}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _select_function_scientific(v: Union[str, float]) -> float:
"""
Converts a scientific notation string to a float.
Args:
v (Union[str, float]): The value to be converted. If it is a string, it should be in scientific notation.
Returns:
float: The converted value.
Example:
>>> _select_function_scientific('1.23E-4')
0.000123
"""
if isinstance(v, str):
return float(v.split()[0].replace("×10", "E"))
else:
return v
def _select_function_scientific(v: Union[str, float]) -> float:
"""
Converts a scientific notation string to a float.
Args:
v (Union[str, float]): The value to be converted. If it is a string, it should be in scientific notation.
Returns:
float: The converted value.
Example:
>>> _select_function_scientific('1.23E-4')
0.000123
"""
try:
if isinstance(v, str):
return float(v.split()[0].replace("×10", "E"))
else:
return v
except (ValueError, IndexError) as e:
raise ValueError(f"Unexpected input format for scientific value '{v}': {e}")

Comment on lines +59 to +73
def _select_function_lattice(v: str) -> tuple[float, float, float]:
"""
Convert a string representation of a lattice vector to a tuple of floats.

Args:
v (str): The string representation of the lattice vector.

Returns:
tuple[float, float, float]: The lattice vector as a tuple of floats.
"""
return (
float(v.split(", ")[0]),
float(v.split(", ")[1]),
float(v.split(", ")[2].split()[0]),
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle unexpected input formats.

The function does not handle cases where the input string is not in the expected format. Add error handling to manage these scenarios.

+    try:
+        return (
+            float(v.split(", ")[0]),
+            float(v.split(", ")[1]),
+            float(v.split(", ")[2].split()[0]),
+        )
+    except (ValueError, IndexError) as e:
+        raise ValueError(f"Unexpected input format for lattice value '{v}': {e}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _select_function_lattice(v: str) -> tuple[float, float, float]:
"""
Convert a string representation of a lattice vector to a tuple of floats.
Args:
v (str): The string representation of the lattice vector.
Returns:
tuple[float, float, float]: The lattice vector as a tuple of floats.
"""
return (
float(v.split(", ")[0]),
float(v.split(", ")[1]),
float(v.split(", ")[2].split()[0]),
)
def _select_function_lattice(v: str) -> tuple[float, float, float]:
"""
Convert a string representation of a lattice vector to a tuple of floats.
Args:
v (str): The string representation of the lattice vector.
Returns:
tuple[float, float, float]: The lattice vector as a tuple of floats.
"""
try:
return (
float(v.split(", ")[0]),
float(v.split(", ")[1]),
float(v.split(", ")[2].split()[0]),
)
except (ValueError, IndexError) as e:
raise ValueError(f"Unexpected input format for lattice value '{v}': {e}")

Comment on lines +43 to +56
def _select_function_split(v: Union[str, float]) -> float:
"""
Splits a string and returns the first element as a float.

Args:
v (Union[str, float]): The input value to be processed.

Returns:
float: The first element of the input string as a float, or the input value if it's already a float.
"""
if isinstance(v, str):
return float(v.split()[0])
else:
return v
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle unexpected input formats.

The function does not handle cases where the input string is not in the expected format. Add error handling to manage these scenarios.

+    try:
+        if isinstance(v, str):
+            return float(v.split()[0])
+        else:
+            return v
+    except (ValueError, IndexError) as e:
+        raise ValueError(f"Unexpected input format for value '{v}': {e}")

Committable suggestion was skipped due to low confidence.

Comment on lines +96 to +111
def _extract_entry(df: pandas.DataFrame, n: str) -> Optional[Union[str, float]]:
"""
Extracts the value corresponding to the given element name from a DataFrame.

Args:
df (pandas.DataFrame): The DataFrame to extract data from.
n (str): The element name to extract data for.

Returns:
Optional[Union[str, float]]: The extracted value if found, None otherwise.
"""
for i in range(int(len(df.columns) / 2)):
if n in df[i].values:
v = df[i + 1].values[df[i].values.tolist().index(n)]
return v
return None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle cases where the element name is not found.

The function does not handle cases where the element name is not found in the DataFrame. Add error handling to manage this scenario.

+    for i in range(int(len(df.columns) / 2)):
+        if n in df[i].values:
+            v = df[i + 1].values[df[i].values.tolist().index(n)]
+            return v
+    raise ValueError(f"Element name '{n}' not found in the DataFrame.")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _extract_entry(df: pandas.DataFrame, n: str) -> Optional[Union[str, float]]:
"""
Extracts the value corresponding to the given element name from a DataFrame.
Args:
df (pandas.DataFrame): The DataFrame to extract data from.
n (str): The element name to extract data for.
Returns:
Optional[Union[str, float]]: The extracted value if found, None otherwise.
"""
for i in range(int(len(df.columns) / 2)):
if n in df[i].values:
v = df[i + 1].values[df[i].values.tolist().index(n)]
return v
return None
def _extract_entry(df: pandas.DataFrame, n: str) -> Optional[Union[str, float]]:
"""
Extracts the value corresponding to the given element name from a DataFrame.
Args:
df (pandas.DataFrame): The DataFrame to extract data from.
n (str): The element name to extract data for.
Returns:
Optional[Union[str, float]]: The extracted value if found, None otherwise.
"""
for i in range(int(len(df.columns) / 2)):
if n in df[i].values:
v = df[i + 1].values[df[i].values.tolist().index(n)]
return v
raise ValueError(f"Element name '{n}' not found in the DataFrame.")

Comment on lines +207 to +235
def _collect(
url: str, column: str, select_function: callable, current_filter: callable
) -> pandas.DataFrame:
"""
Collect data from a given URL and extract specific properties.

Args:
url: The URL to fetch the data from.
column: The name of the column to extract.
select_function: The function to apply to the extracted data.
current_filter: The filter function to apply to the extracted data.

Returns:
A pandas DataFrame containing the extracted data.
"""
return pandas.DataFrame(
{
n: d
for n, d in zip(
["element", column],
_extract_lst(
df=_get_content_from_url(url=url),
column=column,
select_function=select_function,
current_filter=current_filter,
),
)
}
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle potential errors during data retrieval.

The function does not handle potential errors that may occur during the data retrieval process. Add error handling to manage cases where the URL might be inaccessible or the data format changes.

+    try:
+        data = _extract_lst(
+            df=_get_content_from_url(url=url),
+            column=column,
+            select_function=select_function,
+            current_filter=current_filter,
+        )
+    except Exception as e:
+        raise ValueError(f"Error collecting data from URL '{url}': {e}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _collect(
url: str, column: str, select_function: callable, current_filter: callable
) -> pandas.DataFrame:
"""
Collect data from a given URL and extract specific properties.
Args:
url: The URL to fetch the data from.
column: The name of the column to extract.
select_function: The function to apply to the extracted data.
current_filter: The filter function to apply to the extracted data.
Returns:
A pandas DataFrame containing the extracted data.
"""
return pandas.DataFrame(
{
n: d
for n, d in zip(
["element", column],
_extract_lst(
df=_get_content_from_url(url=url),
column=column,
select_function=select_function,
current_filter=current_filter,
),
)
}
)
def _collect(
url: str, column: str, select_function: callable, current_filter: callable
) -> pandas.DataFrame:
"""
Collect data from a given URL and extract specific properties.
Args:
url: The URL to fetch the data from.
column: The name of the column to extract.
select_function: The function to apply to the extracted data.
current_filter: The filter function to apply to the extracted data.
Returns:
A pandas DataFrame containing the extracted data.
"""
try:
data = _extract_lst(
df=_get_content_from_url(url=url),
column=column,
select_function=select_function,
current_filter=current_filter,
)
except Exception as e:
raise ValueError(f"Error collecting data from URL '{url}': {e}")
return pandas.DataFrame(
{
n: d
for n, d in zip(
["element", column],
data,
)
}
)

Comment on lines +171 to +204
def _extract_lst(
df: pandas.DataFrame,
column: str,
select_function: Callable,
current_filter: Callable,
) -> Tuple[List[str], List[str]]:
"""
Extracts element and property lists from a DataFrame based on the given column, select function, and current filter.

Args:
df (DataFrame): The DataFrame to extract data from.
column (str): The column name to extract data from.
select_function (Callable): A function that selects a property value from a DataFrame entry.
current_filter (Callable): A function that filters DataFrame entries based on a condition.

Returns:
Tuple[List[str], List[str]]: A tuple containing the element list and property list.

Raises:
ValueError: If an error occurs during the extraction process.
"""
element_lst, property_lst = [], []
try:
ptable = fetch_table("elements")
for n, el in zip(ptable.name.values, ptable.symbol.values):
v = _extract_entry(df=df, n=n)
if current_filter(v):
element_lst.append(el)
property_lst.append(select_function(v))
except ValueError as e:
raise ValueError(
f"Error extracting {column} for element {el} with value {v}"
) from e
return element_lst, property_lst
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle cases where the DataFrame is empty or the column is not found.

The function does not handle cases where the DataFrame is empty or the column is not found. Add error handling to manage these scenarios.

+    if df.empty:
+        raise ValueError("The DataFrame is empty.")
+    if column not in df.columns:
+        raise ValueError(f"Column '{column}' not found in the DataFrame.")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _extract_lst(
df: pandas.DataFrame,
column: str,
select_function: Callable,
current_filter: Callable,
) -> Tuple[List[str], List[str]]:
"""
Extracts element and property lists from a DataFrame based on the given column, select function, and current filter.
Args:
df (DataFrame): The DataFrame to extract data from.
column (str): The column name to extract data from.
select_function (Callable): A function that selects a property value from a DataFrame entry.
current_filter (Callable): A function that filters DataFrame entries based on a condition.
Returns:
Tuple[List[str], List[str]]: A tuple containing the element list and property list.
Raises:
ValueError: If an error occurs during the extraction process.
"""
element_lst, property_lst = [], []
try:
ptable = fetch_table("elements")
for n, el in zip(ptable.name.values, ptable.symbol.values):
v = _extract_entry(df=df, n=n)
if current_filter(v):
element_lst.append(el)
property_lst.append(select_function(v))
except ValueError as e:
raise ValueError(
f"Error extracting {column} for element {el} with value {v}"
) from e
return element_lst, property_lst
def _extract_lst(
df: pandas.DataFrame,
column: str,
select_function: Callable,
current_filter: Callable,
) -> Tuple[List[str], List[str]]:
"""
Extracts element and property lists from a DataFrame based on the given column, select function, and current filter.
Args:
df (DataFrame): The DataFrame to extract data from.
column (str): The column name to extract data from.
select_function (Callable): A function that selects a property value from a DataFrame entry.
current_filter (Callable): A function that filters DataFrame entries based on a condition.
Returns:
Tuple[List[str], List[str]]: A tuple containing the element list and property list.
Raises:
ValueError: If an error occurs during the extraction process.
"""
if df.empty:
raise ValueError("The DataFrame is empty.")
if column not in df.columns:
raise ValueError(f"Column '{column}' not found in the DataFrame.")
element_lst, property_lst = [], []
try:
ptable = fetch_table("elements")
for n, el in zip(ptable.name.values, ptable.symbol.values):
v = _extract_entry(df=df, n=n)
if current_filter(v):
element_lst.append(el)
property_lst.append(select_function(v))
except ValueError as e:
raise ValueError(
f"Error extracting {column} for element {el} with value {v}"
) from e
return element_lst, property_lst

Comment on lines +272 to +381
"thermalcondictivity": {
"url": "https://periodictable.com/Properties/A/ThermalConductivity.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"atomicradius": {
"url": "https://periodictable.com/Properties/A/AtomicRadius.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"bulkmodulus": {
"url": "https://periodictable.com/Properties/A/BulkModulus.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"shearmodulus": {
"url": "https://periodictable.com/Properties/A/ShearModulus.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"youngmodulus": {
"url": "https://periodictable.com/Properties/A/YoungModulus.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"poissonratio": {
"url": "https://periodictable.com/Properties/A/PoissonRatio.an.html",
"select_function": _select_function_poisson,
"current_filter": _poisson_filter,
},
"density": {
"url": "https://periodictable.com/Properties/A/Density.an.html",
"select_function": _select_function_density,
"current_filter": _default_filter,
},
"liquiddensity": {
"url": "https://periodictable.com/Properties/A/LiquidDensity.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"thermalexpansion": {
"url": "https://periodictable.com/Properties/A/ThermalExpansion.an.html",
"select_function": _select_function_scientific,
"current_filter": _default_filter,
},
"meltingpoint": {
"url": "https://periodictable.com/Properties/A/AbsoluteMeltingPoint.an.html",
"select_function": _select_function_scientific,
"current_filter": _default_filter,
},
"vaporizationheat": {
"url": "https://periodictable.com/Properties/A/VaporizationHeat.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"specificheat": {
"url": "https://periodictable.com/Properties/A/SpecificHeat.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"latticeconstant": {
"url": "https://periodictable.com/Properties/A/LatticeConstants.an.html",
"select_function": _select_function_lattice,
"current_filter": _default_filter,
},
"crystal": {
"url": "https://periodictable.com/Properties/A/CrystalStructure.an.html",
"select_function": _select_function_poisson,
"current_filter": _default_filter,
},
"volmolar": {
"url": "https://periodictable.com/Properties/A/MolarVolume.an.html",
"select_function": _select_function_scientific,
"current_filter": _default_filter,
},
"mass": {
"url": "https://periodictable.com/Properties/A/AtomicMass.an.html",
"select_function": _select_function_mass,
"current_filter": _default_filter,
},
}
result = pandas.concat(
[
_collect(
url=v["url"],
column=k,
select_function=v["select_function"],
current_filter=v["current_filter"],
).set_index("element")
for k, v in data_dict.items()
],
axis=1,
sort=False,
)
result["volume"] = result.apply(
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
)
data_path = os.path.join(os.path.dirname(__file__), "data")
os.makedirs(data_path, exist_ok=True)
result.to_csv(os.path.join(data_path, "wolfram.csv"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle potential errors during data retrieval and file writing.

The function does not handle potential errors that may occur during the data retrieval and file writing process. Add error handling to manage these scenarios.

+    try:
+        result = pandas.concat(
+            [
+                _collect(
+                    url=v["url"],
+                    column=k,
+                    select_function=v["select_function"],
+                    current_filter=v["current_filter"],
+                ).set_index("element")
+                for k, v in data_dict.items()
+            ],
+            axis=1,
+            sort=False,
+        )
+        result["volume"] = result.apply(
+            lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
+        )
+        data_path = os.path.join(os.path.dirname(__file__), "data")
+        os.makedirs(data_path, exist_ok=True)
+        result.to_csv(os.path.join(data_path, "wolfram.csv"))
+    except Exception as e:
+        raise ValueError(f"Error downloading data from Wolfram Alpha: {e}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _wolframalpha_download() -> None:
"""
Downloads data from Wolfram Alpha and saves it to a CSV file.
This function downloads various properties data from Wolfram Alpha using the provided URLs in the `data_dict`.
The downloaded data is then processed and saved to a CSV file named "wolfram.csv" in the "data" directory.
Note: This function requires internet connectivity to download the data.
"""
data_dict = {
"thermalcondictivity": {
"url": "https://periodictable.com/Properties/A/ThermalConductivity.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"atomicradius": {
"url": "https://periodictable.com/Properties/A/AtomicRadius.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"bulkmodulus": {
"url": "https://periodictable.com/Properties/A/BulkModulus.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"shearmodulus": {
"url": "https://periodictable.com/Properties/A/ShearModulus.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"youngmodulus": {
"url": "https://periodictable.com/Properties/A/YoungModulus.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"poissonratio": {
"url": "https://periodictable.com/Properties/A/PoissonRatio.an.html",
"select_function": _select_function_poisson,
"current_filter": _poisson_filter,
},
"density": {
"url": "https://periodictable.com/Properties/A/Density.an.html",
"select_function": _select_function_density,
"current_filter": _default_filter,
},
"liquiddensity": {
"url": "https://periodictable.com/Properties/A/LiquidDensity.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"thermalexpansion": {
"url": "https://periodictable.com/Properties/A/ThermalExpansion.an.html",
"select_function": _select_function_scientific,
"current_filter": _default_filter,
},
"meltingpoint": {
"url": "https://periodictable.com/Properties/A/AbsoluteMeltingPoint.an.html",
"select_function": _select_function_scientific,
"current_filter": _default_filter,
},
"vaporizationheat": {
"url": "https://periodictable.com/Properties/A/VaporizationHeat.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"specificheat": {
"url": "https://periodictable.com/Properties/A/SpecificHeat.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"latticeconstant": {
"url": "https://periodictable.com/Properties/A/LatticeConstants.an.html",
"select_function": _select_function_lattice,
"current_filter": _default_filter,
},
"crystal": {
"url": "https://periodictable.com/Properties/A/CrystalStructure.an.html",
"select_function": _select_function_poisson,
"current_filter": _default_filter,
},
"volmolar": {
"url": "https://periodictable.com/Properties/A/MolarVolume.an.html",
"select_function": _select_function_scientific,
"current_filter": _default_filter,
},
"mass": {
"url": "https://periodictable.com/Properties/A/AtomicMass.an.html",
"select_function": _select_function_mass,
"current_filter": _default_filter,
},
}
result = pandas.concat(
[
_collect(
url=v["url"],
column=k,
select_function=v["select_function"],
current_filter=v["current_filter"],
).set_index("element")
for k, v in data_dict.items()
],
axis=1,
sort=False,
)
result["volume"] = result.apply(
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
)
data_path = os.path.join(os.path.dirname(__file__), "data")
os.makedirs(data_path, exist_ok=True)
result.to_csv(os.path.join(data_path, "wolfram.csv"))
def _wolframalpha_download() -> None:
"""
Downloads data from Wolfram Alpha and saves it to a CSV file.
This function downloads various properties data from Wolfram Alpha using the provided URLs in the `data_dict`.
The downloaded data is then processed and saved to a CSV file named "wolfram.csv" in the "data" directory.
Note: This function requires internet connectivity to download the data.
"""
data_dict = {
"thermalcondictivity": {
"url": "https://periodictable.com/Properties/A/ThermalConductivity.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"atomicradius": {
"url": "https://periodictable.com/Properties/A/AtomicRadius.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"bulkmodulus": {
"url": "https://periodictable.com/Properties/A/BulkModulus.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"shearmodulus": {
"url": "https://periodictable.com/Properties/A/ShearModulus.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"youngmodulus": {
"url": "https://periodictable.com/Properties/A/YoungModulus.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"poissonratio": {
"url": "https://periodictable.com/Properties/A/PoissonRatio.an.html",
"select_function": _select_function_poisson,
"current_filter": _poisson_filter,
},
"density": {
"url": "https://periodictable.com/Properties/A/Density.an.html",
"select_function": _select_function_density,
"current_filter": _default_filter,
},
"liquiddensity": {
"url": "https://periodictable.com/Properties/A/LiquidDensity.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"thermalexpansion": {
"url": "https://periodictable.com/Properties/A/ThermalExpansion.an.html",
"select_function": _select_function_scientific,
"current_filter": _default_filter,
},
"meltingpoint": {
"url": "https://periodictable.com/Properties/A/AbsoluteMeltingPoint.an.html",
"select_function": _select_function_scientific,
"current_filter": _default_filter,
},
"vaporizationheat": {
"url": "https://periodictable.com/Properties/A/VaporizationHeat.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"specificheat": {
"url": "https://periodictable.com/Properties/A/SpecificHeat.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"latticeconstant": {
"url": "https://periodictable.com/Properties/A/LatticeConstants.an.html",
"select_function": _select_function_lattice,
"current_filter": _default_filter,
},
"crystal": {
"url": "https://periodictable.com/Properties/A/CrystalStructure.an.html",
"select_function": _select_function_poisson,
"current_filter": _default_filter,
},
"volmolar": {
"url": "https://periodictable.com/Properties/A/MolarVolume.an.html",
"select_function": _select_function_scientific,
"current_filter": _default_filter,
},
"mass": {
"url": "https://periodictable.com/Properties/A/AtomicMass.an.html",
"select_function": _select_function_mass,
"current_filter": _default_filter,
},
}
try:
result = pandas.concat(
[
_collect(
url=v["url"],
column=k,
select_function=v["select_function"],
current_filter=v["current_filter"],
).set_index("element")
for k, v in data_dict.items()
],
axis=1,
sort=False,
)
result["volume"] = result.apply(
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
)
data_path = os.path.join(os.path.dirname(__file__), "data")
os.makedirs(data_path, exist_ok=True)
result.to_csv(os.path.join(data_path, "wolfram.csv"))
except Exception as e:
raise ValueError(f"Error downloading data from Wolfram Alpha: {e}")

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between fe3c0d7 and 92cf77a.

Files selected for processing (1)
  • atomistics/referencedata/wikipedia.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • atomistics/referencedata/wikipedia.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 92cf77a and c744330.

Files selected for processing (1)
  • .ci_support/environment-old.yml (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • .ci_support/environment-old.yml

@jan-janssen jan-janssen merged commit 3e1a594 into main Jul 30, 2024
22 checks passed
@jan-janssen jan-janssen deleted the reference_data branch July 30, 2024 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants